TaSTT.git/GUI, branch v0.13.0

TaSTT.git/GUI, branch v0.13.0 Free self-hosted STT for VRChat. https://git.yummers.dev/TaSTT.git/atom?h=v0.13.0 2023-06-29T05:18:18+00:00 Bugfix: commit no longer wipes out audio buffer 2023-06-29T05:18:18+00:00 yum yum.food.vr@gmail.com 2023-06-29T05:11:46+00:00 urn:sha1:b1efbf5ce1ebd584796d4a57cf9c7b6517f91fac Audio data is stored in chunks of frames, not in individual frames. When I commit a transcript, I want to get rid of the portion of the audio data responsible for that particular transcript. I have code that does this, but it was dropping a slice of the list assuming that each sample is stored individually. Extra fun: Because we have to decimate mic frames, we have to convert between whisper frames and mic frames to drop the correct amount of audio data. Add profanity filter 2023-06-29T04:24:56+00:00 yum yum.food.vr@gmail.com 2023-06-29T04:24:56+00:00 urn:sha1:bdaeb1911297d7901a12e3ac51b38c3463789279 Add toggle to UI to enable a profanity filter. It replaces vowels in bad words with asterisks. Bugfix: filters now apply to OBS Add toggle for debug mode 2023-06-29T03:35:10+00:00 yum yum.food.vr@gmail.com 2023-06-29T03:35:10+00:00 urn:sha1:ff7eb3c212195af71cd0ce4a3cd0c9a081d6ebda Most transcription output is now gone by default. Users can enable a more verbose output by toggling `Enable debug mode`. Bugfix: Toggling off transcription would reset audio state, frequently resulting in the loss of the last few words spoken. Add UI for fuzzy commit threshold 2023-06-27T23:01:16+00:00 yum yum.food.vr@gmail.com 2023-06-27T23:01:16+00:00 urn:sha1:6638993e313773ba6ca8bdb6d7690b798d41f0d4 Recap: In the STT there's an algorithm that tries to determine when a transcript is "stable" enough to commit. If that is too loose, then accuracy suffers; if too strict, then the audio buffer eventually fills. To mitigate the problem, I check whether the last N transcripts are within some edit distance (Levenshtein edit distance) of each other. The fuzzy matching lets us forgive small instabilities, like differences in uppercase/lowercase or punctuation, while rejecting large instabilities. The default value of 8 seems to be in the sweet spot of accuracy & performance, but it will likely be tuned in the future. Adjust commit logic to use fuzzy string match threshold 2023-06-27T22:38:42+00:00 yum yum.food.vr@gmail.com 2023-06-27T22:35:30+00:00 urn:sha1:241813a5af11093c6b86e70ada729788c1f0dee6 ... instead of simple equality. TODO: add UI for threshold. Bugfix: Frame::onAppStop() joins the OBS app thread. Add ability to preserve transcript while using push to talk 2023-06-27T22:16:06+00:00 yum yum.food.vr@gmail.com 2023-06-27T22:16:06+00:00 urn:sha1:cf75998dab6db1b1d21ca06bde18a56b5e896937 This is useful when streaming. Occasionally the STT can get into a bad state, and manually segmenting clears it up. However doing so would clear your accumulated transcript, which isn't always desired. Add ability to preserve the transcript. A small wrinkle: the new commit logic requires N consecutive identical windows before committing. To make this feature play nicely with it, I had to forcibly commit any preview text that hasn't yet been committed. Failing to do this would usually cause short utterances / the most recently said stuff to get wiped out. Limit priority of transcription process 2023-06-27T08:04:44+00:00 yum yum.food.vr@gmail.com 2023-06-27T08:04:44+00:00 urn:sha1:40e0202f5954c475c9c48155b95bc4dc67433242 Seems to help reduce impact on time-sensitive apps like OBS. Scrub out old C++-based Whisper code 2023-06-27T00:21:59+00:00 yum yum.food.vr@gmail.com 2023-06-27T00:21:59+00:00 urn:sha1:694756a96a6109cd79a77221dd4e40638ff55b82 No longer used. Add UI for browser src 2023-06-27T00:13:31+00:00 yum yum.food.vr@gmail.com 2023-06-27T00:12:41+00:00 urn:sha1:011cfdd4bab866a64b06406ceaa7563294af9225 Add ability to toggle on/off browser src & configure port. Add browser source, hardcoded to port 8097 2023-06-26T08:46:42+00:00 yum yum.food.vr@gmail.com 2023-06-26T07:58:58+00:00 urn:sha1:0ed379f2c99ac5c126a6f101965ef1eaa58c017b Transcription output now streams to localhost:8097. In OBS: * Create a browser source. * url: localhost:8097 * width: 2200 * height: 400 TODO: * Put behind toggle. * Create input field for port. Misc cleanup: * transcribe.py: Drop frames from audio capture thread instead of the transcription thread. Doing it the other way would result in occasional data loss.