summaryrefslogtreecommitdiffstats
path: root/GUI
Commit message (Collapse)AuthorAge
* Bugfix: app no longer hangs if closed while transcribingyum2023-09-01
| | | | | Fix how OnExit callback is wired into GUI. Also make it exit Unity process, if that's going on.
* Various cleanupyum2023-09-01
|
* Add `Enable phonemes` toggle to radial menuyum2023-09-01
| | | | | | | | | | Also: * Fully scrub AudioSource references from prefab when not using phonemes. * Disable net sync on phoneme params when not using them. When not synced, they don't count against the total memory limit. * Use config file in generate_params.py
* Add Unity panel toggle for phonemes (in-game audio indicator)yum2023-09-01
| | | | If not set, the prefab will have its audio sources removed.
* libtastt.py now uses config file where appropriateyum2023-08-31
|
* transcribe.py now just reads from config fileyum2023-08-31
| | | | | Duplicating config between args and config is a huge pain in the ass to maintain. Now we just launch using the config generated by the UI. ezpz.
* Clean up UI stoi patternyum2023-08-31
| | | | | | | | | wxWidgets encodes text inputs & multiple-choice inputs as strings. I frequently have to convert these into ints & apply a range check. Encapsulate that in a function and use a shitty little ASSIGN_OR_RETURN macro to make the parsing as concise as possible. Also delete unused WhisperCPP config settings.
* Bugfixes and tweaksyum2023-08-31
| | | | | | | | | | | | | * Temporarily restore normal process priority. Working on adding a UI option to set STT prio. * Give audio indicator phonemes a 1/3 chance to do nothing. Makes result sound a little better imo. * Quiet down steamVR thread when steamVR isn't running * Fix use of `button_id` and `hand_id` in steamvr.py * Increase amount of silence allowed before transcript from 1 to 5 seconds. You want enough buffer to allow for a few full transcripts, else you risk spuriously dropping audio. * Enable background loading in audio metadata (required by vrc sdk)
* Deprecate commit similarity thresholdyum2023-08-30
| | | | | | | | This is now dynamically set inside transcribe.py. As the buffer grows long, the threshold grows exponentially, keeping the buffer short. The threshold starts small so that transcription starts strict (accurate, slow) and get looser (inaccurate, fast) as needed.
* Fix in-game audio indicatoryum2023-08-29
| | | | | | | Also fix prefab default size (no longer colossal). TODO * Add runtime & unity-time toggles
* Finish pyopenvr -> pyopenxr migrationyum2023-08-25
| | | | pyopenvr is both deprecated and buggy, so switch to pyopenxr.
* Add animated ellipsis to shaderyum2023-08-11
| | | | | | | Not yet done: * Animator toggle * OSC integration
* Bugfix: shader no longer shows up as pinkv0.14.0yum2023-08-10
| | | | | Fix up .mat to point to correct textures/shader. Also delete templates after copying shaders.
* Fix user-reported bug in generate_shader.pyyum2023-08-10
| | | | Specify file encoding when generating shaders.
* Add show/hide animation for ray-marched custom chatboxyum2023-08-10
| | | | * Fix mirror behavior for ray-marched chatbox
* Begin work on show/hide animationsyum2023-08-10
|
* Add ray-marched custom chatboxyum2023-08-09
| | | | | | | | * Refactor shader code to make development easier. Templates are now as small as possible. * Update scaling code. Use Unity scaling instead of a blendshape. * Check in a fuckton of shader FOSS. Mostly unused. * Update TaSTT.fbx. Now has 6 faces instead of 2.
* Fix issue where white boxes appear on custom chatboxv0.13.3yum2023-08-09
| | | | | | GUI was not correctly managing .meta files, causing two textures to use the same GUID. Unity would notice and regenerate GUIDs, breaking the custom chatbox material's texture references.
* Add ability to auto-regen unity assetsyum2023-07-25
| | | | | | | | | | | | | | | | Add two buttons: start auto re-generation of Unity assets, and stop. These start/stop a thread which periodically (every 3 seconds) hashes the user-provided animator, menu and parameters. When any one of these change, it invokes the function to generate Unity assets. The hash is non-cryptographic, so it's light. The only hit is that we have to read the entire file contents every few seconds, and compute a sum across that entire memory region. This is extremely light unless you're on a spinning platter hard drive with a small cache. Still seeing the bug where the material drops ref to the font bitmaps. Probably need to update the .mat using the guids in the bitmap .meta files.
* Subsequent calls to `Generate unity assets` don't break texturesyum2023-07-25
| | | | | Avoid deleting bitmap .meta files so that once the user sets up their shader, it doesn't break.
* Unity assets can be generated at a configurable pathv0.13.2yum2023-07-24
| | | | Useful for projects with multiple avatars with different animators.
* Bugfix: unity panel now shows saved pathsyum2023-07-24
| | | | | | The paths you enter in the Unity panel (animator, menu, params, and assets folder) are saved in the app config, but were not populated correctly on app restart or pane redraw. Now they are.
* Begin work on proxy serveryum2023-07-03
| | | | | | | | | | | | | | | | | | | | | Create a simple server with 3 endpoints: * /create_session: Create a session and return its identifier. * /set_transcript: Update a session's transcript. * /get_transcript: Fetch a session's transcript. Right now the session ID provides authentication *and* authorization. There is no public/private ID so you have to trust whoever you share your ID with. IDs are long and generated by the server, so it should be somewhat secure against low-effort hacking. Other updates: * Drop whisper_requirements.txt - no longer needed. * Vendor curl to make it easier to interact with the server. TODO: * Fuzz test the server.
* Add visual commit indicator to OBS browser sourceyum2023-06-30
| | | | | | | | Circle goes red when speaking, grey when done. Ideally it would be in the top right portion of the browser source, but this is a good start. Also, hard-cap transcripts to 4096 chars. This prevents the STT from lagging during long sessions.
* Bugfix: commit no longer wipes out audio bufferyum2023-06-28
| | | | | | | | | | | | Audio data is stored in chunks of frames, not in individual frames. When I commit a transcript, I want to get rid of the portion of the audio data responsible for that particular transcript. I have code that does this, but it was dropping a slice of the list assuming that each sample is stored individually. Extra fun: Because we have to decimate mic frames, we have to convert between whisper frames and mic frames to drop the correct amount of audio data.
* Add profanity filteryum2023-06-28
| | | | | | | Add toggle to UI to enable a profanity filter. It replaces vowels in bad words with asterisks. Bugfix: filters now apply to OBS
* Add toggle for debug modeyum2023-06-28
| | | | | | | | Most transcription output is now gone by default. Users can enable a more verbose output by toggling `Enable debug mode`. Bugfix: Toggling off transcription would reset audio state, frequently resulting in the loss of the last few words spoken.
* Add UI for fuzzy commit thresholdyum2023-06-27
| | | | | | | | | | | | | | Recap: In the STT there's an algorithm that tries to determine when a transcript is "stable" enough to commit. If that is too loose, then accuracy suffers; if too strict, then the audio buffer eventually fills. To mitigate the problem, I check whether the last N transcripts are within some edit distance (Levenshtein edit distance) of each other. The fuzzy matching lets us forgive small instabilities, like differences in uppercase/lowercase or punctuation, while rejecting large instabilities. The default value of 8 seems to be in the sweet spot of accuracy & performance, but it will likely be tuned in the future.
* Adjust commit logic to use fuzzy string match thresholdyum2023-06-27
| | | | | | | | ... instead of simple equality. TODO: add UI for threshold. Bugfix: Frame::onAppStop() joins the OBS app thread.
* Add ability to preserve transcript while using push to talkyum2023-06-27
| | | | | | | | | | | | | This is useful when streaming. Occasionally the STT can get into a bad state, and manually segmenting clears it up. However doing so would clear your accumulated transcript, which isn't always desired. Add ability to preserve the transcript. A small wrinkle: the new commit logic requires N consecutive identical windows before committing. To make this feature play nicely with it, I had to forcibly commit any preview text that hasn't yet been committed. Failing to do this would usually cause short utterances / the most recently said stuff to get wiped out.
* Limit priority of transcription processyum2023-06-27
| | | | Seems to help reduce impact on time-sensitive apps like OBS.
* Scrub out old C++-based Whisper codeyum2023-06-26
| | | | No longer used.
* Add UI for browser srcyum2023-06-26
| | | | Add ability to toggle on/off browser src & configure port.
* Add browser source, hardcoded to port 8097yum2023-06-26
| | | | | | | | | | | | | | | | | | | Transcription output now streams to localhost:8097. In OBS: * Create a browser source. * url: localhost:8097 * width: 2200 * height: 400 TODO: * Put behind toggle. * Create input field for port. Misc cleanup: * transcribe.py: Drop frames from audio capture thread instead of the transcription thread. Doing it the other way would result in occasional data loss.
* Remove window duration fieldyum2023-06-24
| | | | | | | | No longer needed with new commit logic (8d0add86f66db532). Assign it to 5 minutes. Assuming 4 bytes per sample @ 16 kHz, this buffer maxes out at 19.2 megabytes of memory usage.
* Remove time-based venv setupyum2023-06-24
| | | | | | | This was slowing down app startup to an unacceptable degree. Now it just runs once ever. Add a button to the debug panel to manually re-setup venv if needed.
* Finish translation for Western European language speakersv0.12.0yum2023-05-30
| | | | | | | | | | | | | | NLLB needs its input to be split up into sentences. I use the sentence_splitter Python package to do this. It supports ~20 Western European languages, but notably, no Asian languages. * Sort spoken language list. English is still at the top. * Remove 'Translation source' dropdown. Infer this from the spoken language. * Add lang_compat.py to map language codes between the various libraries (whisper, nllb, sentence_splitter). * Fix bug where old text would appear in textbox when you first bring it up.
* Add ability to translate into 200 languagesyum2023-05-25
| | | | | | | | | Use Meta's No Language Left Behind (NLLB) algorithm to provide translation capabilities into 200 languages. Obviously most are very untested. This requires either 4.1 or 7.1 GB of RAM and significiantly increases transcription latency.
* Add more text filtersyum2023-05-24
| | | | | | | | | | Add 3 filters: * Remove trailing period * Convert to uppercase * Convert to lowercase All may be composed. Upper/lower just overwrite each other so just use one.
* All transcription panel fields now persist across app restartyum2023-05-24
| | | | | | | | | | I forgor to put them into ApplyConfigToInputFields. The reason this is necessary: we need to create the text field where we log things before we can deserialize the config. To keep the code structure "clean" I just wrote another function to apply the config (ApplyConfigToInputFields). However I have to remember to update it when I add new fields.
* Add UI toggle for uwu filteryum2023-05-24
| | | | | UI now has a checkbox for the uwu filter. Does not materially affect resource usage or latency when enabled.
* Begin work on uwu filteryum2023-05-24
| | | | | | Use UwwwuPP to translate your boring old speech into uwu-ified version. Still need to add a UI toggle for this.
* Automatically set up virtual envyum2023-05-23
| | | | | | | | | Remove the button. This is a big source of confusion for new users. Now it happens automatically upon starting any task that needs it. * Begin removing CPP implementation of Whisper. faster-whisper is a much easier/better solution. * Flip default of `clear OSC configs` from false to true.
* Add keyboard togglev0.11.4yum2023-05-22
| | | | | | Users can now configure a keybind to start/stop/dismiss the STT when in desktop mode. The default keybind is ctrl+x, since by default VRC doesn't use 'x' for anything.
* Enable selecting specific GPU when transcribingyum2023-05-21
| | | | | | Useful on devices with multiple GPUs, such as gaming laptops. * Update GUI/README.md.
* Restore string matching, remove affinity maskv0.11.1yum2023-04-25
| | | | | | Affinity mask no longer affects performance. String matching is still needed for temporal stability in fast-paced long-form transcription tasks.
* Fix custom chatbox zwrite/depthyum2023-04-25
| | | | | | | Depth was being calculated wrong, causing text box to render behind objects it's in front of. * Fix package.ps1 compression. 7z was increasing file size, somehow.
* ~Finish integrating faster-whisperyum2023-04-24
| | | | I'm able to use the new code to show text in game. Not yet play-tested.
* package.ps1 always regenerates Python/v0.10.1yum2023-03-28
| | | | Intended to avoid accidentally releasing dirty environments.
* Fix virtual env resetyum2023-03-28
| | | | | | Use `pip freeze` and `pip uninstall` to reset the venv to a near-default state. Filter out `future` since we need to vendor it. If it ever gets removed, the installation is borked.