summaryrefslogtreecommitdiffstats
path: root/GUI
Commit message (Collapse)AuthorAge
* Delete unused filesv1.0.0-beta00yum2025-07-23
|
* App sends text to built-in chatbox by defaultv0.21.0yum2024-11-16
| | | | This is overwhelmingly more common than custom chatbox.
* Remove flash_attention toggleyum2024-11-16
| | | | | Deprecated in the Python release of CTranslate2 as of 4.4.0: https://github.com/OpenNMT/CTranslate2/blob/master/CHANGELOG.md#v440-2024-09-09
* Add support for whisper large v3 turboyum2024-11-16
| | | | | | | Also: * Double # of audio device slots * Fetch CuDNN from NVIDIA at runtime instead of vendoring
* Support as few as 1 char per sync in custom chatboxyum2024-07-30
|
* Translation shows original language by defaultyum2024-07-12
| | | | | | | * Add checkbox to disable this feature if so desired. * Delete old optimization code; can get it back from git if needed. * Enforce that there's at least one space character ' ' between committed segments.
* `use_flash_attention` checkbox now persists across sessionsyum2024-07-12
|
* Upgrade vendored CUDA to 12.5v0.19.2yum2024-06-09
|
* Bump CUDNN to v8.9.7v0.19.1yum2024-06-09
| | | | Also disable flash-attention when CPU mode is selected
* Add checkbox for flash-attentionyum2024-06-09
| | | | Pre-3000 series GPUs don't support it. Oops!
* Update defaults to work with modular prefabv0.19.0yum2024-06-06
| | | | | There's a modular avatar prefab for the custom chatbox on my gumroad. Update the default settings to work with that prefab.
* Upgrade faster-whisper with flash-attention2yum2024-06-05
| | | | | | | | This should be significantly more efficient than prior versions. * add large-v3 & distilled variant * simplify model acquisition code now that distilled models are part of faster-whisper.
* Fix distilled modelsyum2024-03-14
| | | | | | | These were broken due to some logic errors in the codepath which acquires models from huggingface. Distilled large-v2 seems promising as a new default model.
* Finish fixing build breakyum2024-03-04
| | | | | CUDNN now pulls from dropbox instead of google drive. This has the added benefit of being about 10-20x faster (assuming you have fast internet).
* Begin fixing build on new hostsyum2024-03-04
| | | | | Google drive intentionally broke CLI downloads ("don't be evil") and UwwwuPP went away. Begin work rehosting both files.
* Add dropdown for GPU compute typev0.18.0yum2024-02-09
| | | | Should enable compatibility with older GPUs.
* Add another threshold to filter out common hallucinationsyum2024-02-05
| | | | | | | | | | | | | | | | | | | | | The paper recommends filtering out segments with no_speech_prob > 0.6 and avg_logprob < -1. This is too loose of a bound for short-form audio which is not guaranteed to contain speech. I already have a tighter bound: no_speech > 0.6 and avg_logprob < -0.5 While listening to instrumental music I find that a lot of hallucinations sneak past that bound. So I added a second bound: no_speech > 0.15 and avg_logprob < -0.7 Basically we filter out things that look like speech but have a worse avg_logprob. Seems to not have false negatives. Requires testing. Also: dial back the default max segment length from 15 seconds to 10 seconds. This is done based on performance observations in desktop.
* Verify that audio is clean after VAD segmentationyum2024-02-05
| | | | | | | | Indeed it is. Bumped up the default max segment length to decrease error. Also add mic presets for beyond (the vr headset) and motu (my mic interface).
* Add distilled whisper large-v2 modelyum2023-12-08
|
* Add distilled whisper-medium modelyum2023-11-07
| | | | | | I converted distil-whisper-medium.en to CTranslate2 format and uploaded it to huggingface. This model is exceptionally fast and light compared to the non-distilled version, at the cost of some accuracy.
* Transcripts preceding long pauses now dropv0.16.0yum2023-10-05
| | | | | | | | | | | | | | | | | | When hot-miking into the built-in chatbox, there are sometimes long pauses in conversation. After these pauses, it's undesirable to show the transcript generate before the pause. This feature makes it so that those transcripts can be dropped. Also: * Limit number of segments sent to browser source to 10. Allow this to grow up to 10 segments before dropping the first 5 segments. * Silence warnings generated by `install_in_venv`, used by e.g. translation codepath. * Enable audio normalization to improve accuracy when speaking softly, at the cost of some accuracy when speaking normally. Credit: user endo0269 on Discord suggested this feature.
* Reimplement BrowserSource as a StreamingPluginyum2023-09-18
| | | | | | | | | BrowserSource now fades text out continuously over time. TODO * Delete C++ webserver, browsersource, transcript code * Add UI for text age fading
* Add UI for process priorityyum2023-09-17
| | | | Default is normal prio.
* Bugfixesv0.15.4yum2023-09-16
| | | | | | | * uwu filter no longer adds extra whitespace before/after segments. This would defeat commit logic. * disabling phonemes works again - path to prefab was being quoted twice, breaking the codepath.
* General cleanupv0.15.3yum2023-09-13
| | | | Remove unused proxy code, curl, and images.
* Pin huggingface_hub to 0.16.4v0.15.2yum2023-09-11
| | | | | | | | | 0.17.x are breaking faster_whisper's ability to download models. Also: * Start using frozen requirements.txt. * Conditionally install torch & legacy whisper only when doing mechanical optimization.
* Users can now choose custom chatbox texture size in UIyum2023-09-10
|
* Bugfix: only cap display of transcript at 4K charsyum2023-09-10
| | | | | | | | Actually retain the whole transcript to avoid breaking the OSC pager. Also constrain the UI buffer size by characters instead of lines. Since some lines can be massive and others short, characters are a better way of consistently keeping the UI memory in check.
* Add UI for transcription loop delayyum2023-09-10
| | | | | | | | Allows users to directly modulate the performance-latency tradeoff. Also: * Bump up UI buffer to 1k lines. * Fix browser source reset. It now also resets preview text.
* Browser source now shows preview text as slightly transparentyum2023-09-09
| | | | Improves viewer experience.
* Add UI for max speech durationyum2023-09-09
| | | | | Also fix bug when not using previews. Audio buffer no longer grows without bound while there's no speech.
* Constrain log file, UI text field, and transcript sizesyum2023-09-09
| | | | | | | | | Log file is constrained to 1 MB and UI to 100-200 lines. 1k lines is too high to keep the UI from lagging. Transcript is constrained to 4k characters. Also put a 5 ms sleep in the transcription hot path.
* Constrain UI text buffers to 1000 linesyum2023-09-09
| | | | This keeps memory usage from growing without bound.
* Make min silence duration configurable in UIyum2023-09-09
|
* Add `lock at spawn` optionyum2023-09-09
| | | | | | I find it kind of annoying when people wave around a big chatbox so I added the option to have the chatbox be locked in worldspace whenever it's visible. This defaults to on and can be disabled.
* Bugfix: fix process leak in PythonWrapper::InvokeCommandWithArgsyum2023-09-09
| | | | | | | | | | | | | It now waits up to 10 seconds for a graceful exit and falls back on the equivalent of a SIGKILL. The caller is assumed to have signaled to the process through `in_cb` that an exit is desired. Also: * Fix graceful exit path of transcribe_v2.py. * Add toggle to enable/disable preview text. It is enabled by default. * Constrain transcription temperature to 0.0. This keeps latency more predictable at the cost of some accuracy.
* Switch to VadCommitteryum2023-09-07
| | | | | | | | FuzzyRepeatCommitter was approximating this behavior in the best-performing configuration, so switch to it in earnest. This committer simply commits audio once we detect a long enough gap in speech. That's it!
* Wire transcribe_v2.py into GUIyum2023-09-03
| | | | | | | | Also: * Enable SO_REUSEADDR on browser src socket * Temporarily add evaluation dependencies to requirements.txt * Fix browser src. It's now looking for a prefix that the python app actually uses.
* Bugfix: app no longer hangs if closed while transcribingyum2023-09-01
| | | | | Fix how OnExit callback is wired into GUI. Also make it exit Unity process, if that's going on.
* Various cleanupyum2023-09-01
|
* Add `Enable phonemes` toggle to radial menuyum2023-09-01
| | | | | | | | | | Also: * Fully scrub AudioSource references from prefab when not using phonemes. * Disable net sync on phoneme params when not using them. When not synced, they don't count against the total memory limit. * Use config file in generate_params.py
* Add Unity panel toggle for phonemes (in-game audio indicator)yum2023-09-01
| | | | If not set, the prefab will have its audio sources removed.
* libtastt.py now uses config file where appropriateyum2023-08-31
|
* transcribe.py now just reads from config fileyum2023-08-31
| | | | | Duplicating config between args and config is a huge pain in the ass to maintain. Now we just launch using the config generated by the UI. ezpz.
* Clean up UI stoi patternyum2023-08-31
| | | | | | | | | wxWidgets encodes text inputs & multiple-choice inputs as strings. I frequently have to convert these into ints & apply a range check. Encapsulate that in a function and use a shitty little ASSIGN_OR_RETURN macro to make the parsing as concise as possible. Also delete unused WhisperCPP config settings.
* Bugfixes and tweaksyum2023-08-31
| | | | | | | | | | | | | * Temporarily restore normal process priority. Working on adding a UI option to set STT prio. * Give audio indicator phonemes a 1/3 chance to do nothing. Makes result sound a little better imo. * Quiet down steamVR thread when steamVR isn't running * Fix use of `button_id` and `hand_id` in steamvr.py * Increase amount of silence allowed before transcript from 1 to 5 seconds. You want enough buffer to allow for a few full transcripts, else you risk spuriously dropping audio. * Enable background loading in audio metadata (required by vrc sdk)
* Deprecate commit similarity thresholdyum2023-08-30
| | | | | | | | This is now dynamically set inside transcribe.py. As the buffer grows long, the threshold grows exponentially, keeping the buffer short. The threshold starts small so that transcription starts strict (accurate, slow) and get looser (inaccurate, fast) as needed.
* Fix in-game audio indicatoryum2023-08-29
| | | | | | | Also fix prefab default size (no longer colossal). TODO * Add runtime & unity-time toggles
* Finish pyopenvr -> pyopenxr migrationyum2023-08-25
| | | | pyopenvr is both deprecated and buggy, so switch to pyopenxr.
* Add animated ellipsis to shaderyum2023-08-11
| | | | | | | Not yet done: * Animator toggle * OSC integration