summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Finish translation for Western European language speakersv0.12.0yum2023-05-30
| | | | | | | | | | | | | | NLLB needs its input to be split up into sentences. I use the sentence_splitter Python package to do this. It supports ~20 Western European languages, but notably, no Asian languages. * Sort spoken language list. English is still at the top. * Remove 'Translation source' dropdown. Infer this from the spoken language. * Add lang_compat.py to map language codes between the various libraries (whisper, nllb, sentence_splitter). * Fix bug where old text would appear in textbox when you first bring it up.
* Add ability to translate into 200 languagesyum2023-05-25
| | | | | | | | | Use Meta's No Language Left Behind (NLLB) algorithm to provide translation capabilities into 200 languages. Obviously most are very untested. This requires either 4.1 or 7.1 GB of RAM and significiantly increases transcription latency.
* Add more text filtersyum2023-05-24
| | | | | | | | | | Add 3 filters: * Remove trailing period * Convert to uppercase * Convert to lowercase All may be composed. Upper/lower just overwrite each other so just use one.
* All transcription panel fields now persist across app restartyum2023-05-24
| | | | | | | | | | I forgor to put them into ApplyConfigToInputFields. The reason this is necessary: we need to create the text field where we log things before we can deserialize the config. To keep the code structure "clean" I just wrote another function to apply the config (ApplyConfigToInputFields). However I have to remember to update it when I add new fields.
* Add UI toggle for uwu filteryum2023-05-24
| | | | | UI now has a checkbox for the uwu filter. Does not materially affect resource usage or latency when enabled.
* Begin work on uwu filteryum2023-05-24
| | | | | | Use UwwwuPP to translate your boring old speech into uwu-ified version. Still need to add a UI toggle for this.
* Add ability to type using STTyum2023-05-23
| | | | | | | | | | To use it, do a medium hold + long hold. Keep the long hold depressed until you're done speaking. The transcription will be typed into the currently selected input field. * Add more audio feedback * Make audio feedback play asynchronously so it doesn't slow down the controller input state machine as much.
* Automatically set up virtual envyum2023-05-23
| | | | | | | | | Remove the button. This is a big source of confusion for new users. Now it happens automatically upon starting any task that needs it. * Begin removing CPP implementation of Whisper. faster-whisper is a much easier/better solution. * Flip default of `clear OSC configs` from false to true.
* Add ability to update textbox in placeyum2023-05-22
| | | | | | | By holding the button while talking for at least 1.5 seconds, you can update the contents of the textbox without unlocking it from worldspace. So now you can carefully position your textbox once, then continually speak into it without having to reposition it every time.
* Shader improvementsyum2023-05-22
| | | | | | | * Fix thin outline in transparent region of rounded corners * Remove anti-aliasing. Now that VRC supports it natively, this is no longer necessary. * Use more efficient noise function for dithering.
* Restore mipmap sampling in custom chatboxyum2023-05-22
|
* Add keyboard togglev0.11.4yum2023-05-22
| | | | | | Users can now configure a keybind to start/stop/dismiss the STT when in desktop mode. The default keybind is ctrl+x, since by default VRC doesn't use 'x' for anything.
* Merge pull request #1 from faker2048/patch-1yum-food2023-05-22
|\ | | | | Fix accidental semicolon typo
| * Fix accidental semicolon typofaker2023-05-22
|/
* Enable selecting specific GPU when transcribingyum2023-05-21
| | | | | | Useful on devices with multiple GPUs, such as gaming laptops. * Update GUI/README.md.
* Fix noop animations on current creator companion buildv0.11.3yum2023-05-09
| | | | | | See comment for details. * Update README
* Drop torch from requirements.txtv0.11.2yum2023-05-01
| | | | | | | | | faster-whisper doesn't need it. This reduces install size from 6.00GB with base.en model to 1.70GB. * Use a single sampler in shader (enables using more than 16 textures) * Minor legibility regression - need to improve AA. * Enable backface culling in shader (minor performance win)
* Restore string matching, remove affinity maskv0.11.1yum2023-04-25
| | | | | | Affinity mask no longer affects performance. String matching is still needed for temporal stability in fast-paced long-form transcription tasks.
* Fix custom chatbox zwrite/depthyum2023-04-25
| | | | | | | Depth was being calculated wrong, causing text box to render behind objects it's in front of. * Fix package.ps1 compression. 7z was increasing file size, somehow.
* ~Finish integrating faster-whisperyum2023-04-24
| | | | I'm able to use the new code to show text in game. Not yet play-tested.
* Begin integrating faster-whisperv0.11.0yum2023-04-23
| | | | | | This is a much faster, lower-VRAM reimplementation of Whisper in Python. Early testing is extremely promising: fast transcription speed, extremely low resource usage (CPU/RAM/VRAM), high accuracy.
* package.ps1 always regenerates Python/v0.10.1yum2023-03-28
| | | | Intended to avoid accidentally releasing dirty environments.
* Fix virtual env resetyum2023-03-28
| | | | | | Use `pip freeze` and `pip uninstall` to reset the venv to a near-default state. Filter out `future` since we need to vendor it. If it ever gets removed, the installation is borked.
* Vendor pip and futureyum2023-03-28
| | | | | | | | This dependency fails to install with the embedded python, so now it's vendored. Installing pip after wheel would result in wheel reinstalling, so we also vendor pip.
* Custom chatbox shader writes depthyum2023-03-23
| | | | | | | | This fixes issues where the transparent corners of the textbox render in front of other materials, causing those other materials to skip rendering. * Update README.md with roadmap and avatar resource usage.
* Reduce texture memory usage for English speakersv0.10.0yum2023-03-21
| | | | | | | | | | We used to populate 7 4k textures + 1 2k texture for all users. Now if the user has configured `bytes_per_char=1` in the Unity panel, we just populate a single 512x512 texture containing the first 128 ASCII characters. This reduces texture memory usage by 99.74%, from 134.67 MB to 340 KB.
* Fix _socket module not found issueyum2023-03-21
| | | | | Need python310._pth, specifically 'import site' line, for embedded python + pip to get along.
* Begin work fixing venv setupyum2023-03-09
| | | | | If you don't have Python installed, venv setup will fail. Begin work fixing environment config so `pip install` uses vendored Python.
* Set PYTHONPATH in synchronous multiprocessing layeryum2023-03-08
| | | | | | | | | A user saw an error like `ModuleNotFoundError: No module named _socket`. StackOverflow blames this on PYTHONPATH, so let's try setting it. * Fix latent bug in Scripts/transcribe.py. PyAudio.open() positional parameters must be specified in correct order, even when telling it which parameter is which. *shrug*
* Expose more C++ whisper parameters in GUIyum2023-03-08
| | | | | | | | Expose decode method, beam search parameters, and voice activity detection parameters in GUI. * Remove WhisperCPP::Init(), do it on launch instead. * Add float support to ConfigMarshal
* Silence virtual env setup PATH warningsyum2023-03-06
| | | | | | Twofold approach: * All spawned processes have the desired path (new codepath) * Setup command silences the warning (old codepath)
* Animator generation and dumping mics no longer hang GUIv0.9.0yum2023-03-05
| | | | | | Do these in a std::future. * SetAffinityMask() now returns a value on all control paths
* Update README.txtyum2023-03-02
|
* Implement thread affinity optimization for Python transcription engineyum2023-02-28
| | | | | | | | | A user pointed out that constraining the Python implmentation to a single core does not affect visible latency. This seems true on my PC as well. * Reimplement Python transcription wxProcess as a std::async. App shutdown is much faster now.
* Bugfix: fix use-after-free in GetMicsImplyum2023-02-28
| | | | | * Plumb beam search params into whisper cpp implementation (currently broken)
* Filter out more transcription noisev0.8.2yum2023-02-26
| | | | | Things like " (static)" and " *explosions*" were showing up a lot with ggml-medium.bin. Filter them out.
* Improve behavior around VAD segmentation eventsyum2023-02-26
| | | | | | | Use forked Whisper implementation which has tweaks to reduce dropped words around the beginning VAD segments. * Retain audio after VAD segmentation events
* CPP implementation refinementsyum2023-02-26
| | | | | | | | * Pip install, dependency install, and model download can be gracefully interrupted and resume later. * Mic list was pointing at freed memory. Fix this by copying into the heap with std::unique_ptr()s. Mic list in CPP panel is much more reliable now.
* Bugfix: C++ transcription engine should not launch OSC layerv0.8.1yum2023-02-26
| | | | Not ready yet.
* Bugfix: add vendored git to PATHv0.8.0yum2023-02-26
|
* Begin work on C++ custom chatboxyum2023-02-26
| | | | | | | | | | Sort of a misnomer. The idea is to use C++ for transcription and Python for steamvr and OSC. Having issues getting output from multithreaded Python code. Not in the mood to figure this out today. * Hide unimplemented parts of C++ panel.
* Convert most PythonWrapper wxLogError() to Log()yum2023-02-25
| | | | Simplifies debugging process.
* Drop rymlyum2023-02-25
| | | | | | | | | | | Rapidyaml started refusing to parse config files so I dropped it. * Add ConfigMarshal clas to support very simple config marshalling * No versioning, no type indicators, nothing. * Supports int, bool, and string. * Bool are serialized as int. * Log no longer segfaults if given nullptr wxTextCtrl*. * Fix how whisper CPP GUI fields restore from config
* Complete OBS browser sourceyum2023-02-25
| | | | | * Implement HTTPMapper classes * Browser source respects user-configured source port
* Add HTTP parseryum2023-02-25
| | | | | | Server needs to parse incoming HTTP. * Server spawns a thread for each incoming connection
* Begin work on custom webserveryum2023-02-25
| | | | | | | oatpp was a crashy mess. Begin making a simple web server from scratch. * Add Designs/ folder to document nontrivial things like the webserver design
* Finish browser source proof-of-conceptyum2023-02-24
| | | | | | It's a crashy mess, but it sort of works. * Add Transcript class to send transcription segments between layers
* Add HTML for BrowserSourceyum2023-02-24
| | | | | Browser source queries /api/transcript at 10Hz via jquery and renders the response.
* Add hack to prevent browser source crash on shutdownyum2023-02-24
| | | | | | Documented in BrowserSource::Run(). * Parameterize Release/Debug in build scripts
* Wire up browser sourceyum2023-02-23
| | | | | | | Browser source can be started and stopped via the UI. It still serves a hello world json blob. Observing occasional crashes when stopping the C++ transcription engine.