TaSTT.git/Scripts/transcribe.py, branch master

TaSTT.git/Scripts/transcribe.py, branch master Free self-hosted STT for VRChat. https://git.yummers.dev/TaSTT.git/atom?h=master 2023-09-10T21:52:05+00:00 Check in vad.py and delete transcribe.py 2023-09-10T21:52:05+00:00 yum yum.food.vr@gmail.com 2023-09-10T21:52:05+00:00 urn:sha1:1681ac276da46ea61a04f6db916522778ac964e7 Oops, I meant to check this in a while back. Since transcribe_v2.py now has feature parity with transcribe.py, delete the old code. Switch to VadCommitter 2023-09-08T05:04:16+00:00 yum yum.food.vr@gmail.com 2023-09-08T05:04:16+00:00 urn:sha1:a82e43c16ff097a7c57ee87e67fa67e7f007b977 FuzzyRepeatCommitter was approximating this behavior in the best-performing configuration, so switch to it in earnest. This committer simply commits audio once we detect a long enough gap in speech. That's it! Fix reference to deprecated symbol 2023-09-01T20:41:34+00:00 yum yum.food.vr@gmail.com 2023-09-01T20:41:34+00:00 urn:sha1:c0c53fc3f0aeb762d44ce43f123385b2c87869ca Add Unity panel toggle for phonemes (in-game audio indicator) 2023-09-01T07:07:06+00:00 yum yum.food.vr@gmail.com 2023-09-01T07:06:20+00:00 urn:sha1:cb44e4744ac82d1d35547d12254cfea09dc63fae If not set, the prefab will have its audio sources removed. transcribe.py now just reads from config file 2023-09-01T06:06:33+00:00 yum yum.food.vr@gmail.com 2023-09-01T06:06:33+00:00 urn:sha1:45f983b2f2670a79cbb83d2c8944d922015291be Duplicating config between args and config is a huge pain in the ass to maintain. Now we just launch using the config generated by the UI. ezpz. Bugfixes and tweaks 2023-09-01T00:17:01+00:00 yum yum.food.vr@gmail.com 2023-09-01T00:11:11+00:00 urn:sha1:3db4f81573d89f6ebefb5ec119c7d66affc1a4a0 * Temporarily restore normal process priority. Working on adding a UI option to set STT prio. * Give audio indicator phonemes a 1/3 chance to do nothing. Makes result sound a little better imo. * Quiet down steamVR thread when steamVR isn't running * Fix use of `button_id` and `hand_id` in steamvr.py * Increase amount of silence allowed before transcript from 1 to 5 seconds. You want enough buffer to allow for a few full transcripts, else you risk spuriously dropping audio. * Enable background loading in audio metadata (required by vrc sdk) Deprecate commit similarity threshold 2023-08-31T00:45:53+00:00 yum yum.food.vr@gmail.com 2023-08-31T00:45:53+00:00 urn:sha1:4fcf3e1e3ac8dcf510be96a84b81a688b1092869 This is now dynamically set inside transcribe.py. As the buffer grows long, the threshold grows exponentially, keeping the buffer short. The threshold starts small so that transcription starts strict (accurate, slow) and get looser (inaccurate, fast) as needed. Switch back to openvr 2023-08-29T03:09:35+00:00 yum yum.food.vr@gmail.com 2023-08-29T03:09:35+00:00 urn:sha1:2daa2c8057cf036357a64e09925487e6f5c0025e openxr doesn't have any notion of background process, making it unusable trash :) Put audio feedback into its own thread 2023-08-25T19:50:59+00:00 yum yum.food.vr@gmail.com 2023-08-25T19:50:59+00:00 urn:sha1:302f7ba09f2ee115d0ee4b8f0841f6ffcd50ec57 I this improves the code structure of the controller input thread and leads to some deduplication, so I'm going to keep it. However, the intended purpose was to decrease lag when pressing buttons, and in that regard it failed. The lag goes all the way down to the input layer, implying that the input thread is not able to consistently run at its intended 100 Hz sample rate. I suspect that the Python global interpreter lock (GIL) is at fault. Since we can't realistically move all our functionality into one thread in a non-blocking model, I think multiprocessing is the logical choice going forward. Each thread in transcribe.py would become its own process, and pub/sub through some intermediary process sitting in the middle. Finish pyopenvr -> pyopenxr migration 2023-08-25T19:08:07+00:00 yum yum.food.vr@gmail.com 2023-08-25T15:21:56+00:00 urn:sha1:9e43487c1bf62402e96cb6139b24cd8446515673 pyopenvr is both deprecated and buggy, so switch to pyopenxr.