| Commit message (Collapse) | Author | Age |
| | |
|
| |
|
|
|
|
|
| |
Also:
* Double # of audio device slots
* Fetch CuDNN from NVIDIA at runtime instead of vendoring
|
| | |
|
| |
|
|
| |
Also disable flash-attention when CPU mode is selected
|
| |
|
|
|
| |
CUDNN now pulls from dropbox instead of google drive. This has the added
benefit of being about 10-20x faster (assuming you have fast internet).
|
| |
|
|
|
| |
Google drive intentionally broke CLI downloads ("don't be evil") and
UwwwuPP went away. Begin work rehosting both files.
|
| |
|
|
| |
Remove unused proxy code, curl, and images.
|
| |
|
|
|
|
|
|
| |
FuzzyRepeatCommitter was approximating this behavior in the
best-performing configuration, so switch to it in earnest.
This committer simply commits audio once we detect a long enough gap in
speech. That's it!
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Create a simple server with 3 endpoints:
* /create_session: Create a session and return its identifier.
* /set_transcript: Update a session's transcript.
* /get_transcript: Fetch a session's transcript.
Right now the session ID provides authentication *and* authorization.
There is no public/private ID so you have to trust whoever you share
your ID with.
IDs are long and generated by the server, so it should be somewhat
secure against low-effort hacking.
Other updates:
* Drop whisper_requirements.txt - no longer needed.
* Vendor curl to make it easier to interact with the server.
TODO:
* Fuzz test the server.
|
| |
|
|
|
|
|
| |
Add toggle to UI to enable a profanity filter. It replaces vowels in bad
words with asterisks.
Bugfix: filters now apply to OBS
|
| |
|
|
| |
No longer used.
|
| |
|
|
|
|
| |
Use UwwwuPP to translate your boring old speech into uwu-ified version.
Still need to add a UI toggle for this.
|
| |
|
|
|
|
| |
Useful on devices with multiple GPUs, such as gaming laptops.
* Update GUI/README.md.
|
| |
|
|
|
|
|
| |
Depth was being calculated wrong, causing text box to render behind
objects it's in front of.
* Fix package.ps1 compression. 7z was increasing file size, somehow.
|
| |
|
|
| |
I'm able to use the new code to show text in game. Not yet play-tested.
|
| |
|
|
| |
Intended to avoid accidentally releasing dirty environments.
|
| |
|
|
|
|
|
|
| |
This dependency fails to install with the embedded python, so now it's
vendored.
Installing pip after wheel would result in wheel reinstalling, so we
also vendor pip.
|
| |
|
|
|
| |
Need python310._pth, specifically 'import site' line, for
embedded python + pip to get along.
|
| |
|
|
|
|
|
| |
Use forked Whisper implementation which has tweaks to reduce dropped
words around the beginning VAD segments.
* Retain audio after VAD segmentation events
|
| |
|
|
|
| |
Browser source queries /api/transcript at 10Hz via jquery and renders
the response.
|
| |
|
|
|
|
| |
Documented in BrowserSource::Run().
* Parameterize Release/Debug in build scripts
|
| |
|
|
|
|
|
|
| |
Use Const-me/Whisper to perform transcription. This implementation is
vastly more efficient: CPU usage, memory usage, and VRAM usage are all
dramatically reduced. It's slightly less accurate when comparing the
same model (due to the lack of beam search decoding), but since you can
use larger models, the impact is largely a wash.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When you generate Unity assets, you have to configure
rows/cols/chars per sync/ bytes per char. When you switch over to the
transcription panel, these choices will be automatically populated.
This should reduce accidental mismatch between the two panels.
* Merge Config classes. Now just use one big AppConfig class instead of
one class per panel.
* Factor out (most) input field initialization into a function. Call it
when switching panels so input fields synchronize.
* Wrap a lot of lines at 80 columns.
* Add -skip_zip switch to package.ps1.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Done:
* Users can add images to Fonts/Emotes/
* The basename of that image ('clueless.png' becomes 'clueless') is the
keyword to make the image show up in game.
* Fix a bug in the shader where letters on the 2nd texture and later
would have UV outside of [0.0, 1.0]
Not yet implemented:
* transcribed words are encoded using emotes mapping
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I was using this file to constrain the set of paths that Python can see,
but since `future` doesn't have a wheel, it will fail to install on a
fresh system.
If you set pip's --cache-dir to some new directory, you'll see it fail
to install.
The _pth doesn't really seem to matter, since without it, packages are
still installed under the virtual environment.
|
| |
|
|
| |
Whisper doesn't like 0.18.3, so downgrade to the last version.
|
| |
|
|
|
|
|
| |
Don't literally check in Python since it looks dodgy (rightfully so).
Instead the build script just fetches it.
* Update README, simplifying language and documenting other projects
|
| |
|
|
|
|
|
| |
package.ps1 fetches PortableGit and embeds it in the package. This
eliminates all but one runtime dependency (MSVC++ Redistributable).
* Move Python into a new FOSS folder.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Users can now control how many characters they send per sync event, as
well as the number of bytes used to represent each character.
This gives them the power to pick between faster paging and fewer sync
params.
International users must use 2 bytes per char (at least for now).
* package.ps1: don't distribute the gigantic TTF files, just the bitmaps
|
| |
|
|
|
|
|
|
|
|
|
| |
GUI now generates parameters & menu.
Still need to handle write defaults.
* Add capability to append to avatar parameters & menu
* Install canned Unity assets, shaders, and fonts in avatar folder
* Check in materials for ease of use
* Bugfix: correctly label menu/parameters file pickers
|
| |
|
|
| |
Still need to generate params & merge menus. Getting close....
|
| |
|
|
| |
No more WSL dependencies!
|
| |
|
|
|
|
|
| |
* icon now works when pinned to taskbar
* add model selection
* add script to dump mic devices
* whisper models now download into the virtual environment
|
| |
|
|
|
|
|
|
| |
Users can now select their mic & spoken language in the GUI.
* pyaudio now samples at the mic rate, fixing an issue where frames
would drop. We downsample in the callback by dropping frames.
* add Sounds folder to package
|
| |
|
|
|
|
|
|
|
| |
GUI can now download all TaSTT dependencies and install them into a
virtual environment.
* Add buttons to check embedded python version & install dependencies
* Add class to wrap interacting with embedded Python
* Put all TaSTT python scripts into a folder
|
| |
|
|
| |
License is included in source & distributable package.
|
|
|
* GUI now shows logo
* Add package.ps1 to generate distributable application bundle
* Rename ~GUI to GUI
* Add ScopeGuard class
|