TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Delete unused filesv1.0.0-beta00	yum	2025-07-23
\|
*	Add support for whisper large v3 turbo	yum	2024-11-16
\| \| \| \| \| \| \|	Also: * Double # of audio device slots * Fetch CuDNN from NVIDIA at runtime instead of vendoring
*	Upgrade faster-whisper with flash-attention2	yum	2024-06-05
\| \| \| \| \| \| \| \|	This should be significantly more efficient than prior versions. * add large-v3 & distilled variant * simplify model acquisition code now that distilled models are part of faster-whisper.
*	Revert "Begin experimenting with flash-attention"	yum	2024-01-08
\| \| \| \|	This reverts commit 921b92a69f36502dc5eefd14ba3487c1bb49bb9d.
*	Begin experimenting with flash-attention	yum	2023-12-13
\| \| \| \| \| \| \| \| \| \| \|	Seems much faster than faster-whisper. There are two issues: * Requires NVIDIA 3000 series or higher. * Incompatible with faster-whisper dependencies. So it seems like we'll either need to toggle between two sets of dependencies at runtime or have two environments.
*	Pin huggingface_hub to 0.16.4v0.15.2	yum	2023-09-11
\| \| \| \| \| \| \| \| \|	0.17.x are breaking faster_whisper's ability to download models. Also: * Start using frozen requirements.txt. * Conditionally install torch & legacy whisper only when doing mechanical optimization.
*	Wire transcribe_v2.py into GUI	yum	2023-09-03
\| \| \| \| \| \| \| \|	Also: * Enable SO_REUSEADDR on browser src socket * Temporarily add evaluation dependencies to requirements.txt * Fix browser src. It's now looking for a prefix that the python app actually uses.
*	Begin rewriting transcribe.py	yum	2023-09-02
\| \| \| \| \| \| \| \| \| \|	A set of proper interfaces is called for. See #dev-update-spam in discord for drawing of design. Also add code to mechanically optimize committer parameters using an audio file. Not perfectly repeatable since it depends on the performance characteristics of the machine, but prob better than what we had before (nothing).
*	Switch back to openvr	yum	2023-08-28
\| \| \| \| \|	openxr doesn't have any notion of background process, making it unusable trash :)
*	Put audio feedback into its own thread	yum	2023-08-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I this improves the code structure of the controller input thread and leads to some deduplication, so I'm going to keep it. However, the intended purpose was to decrease lag when pressing buttons, and in that regard it failed. The lag goes all the way down to the input layer, implying that the input thread is not able to consistently run at its intended 100 Hz sample rate. I suspect that the Python global interpreter lock (GIL) is at fault. Since we can't realistically move all our functionality into one thread in a non-blocking model, I think multiprocessing is the logical choice going forward. Each thread in transcribe.py would become its own process, and pub/sub through some intermediary process sitting in the middle.
*	Finish pyopenvr -> pyopenxr migration	yum	2023-08-25
\| \| \| \|	pyopenvr is both deprecated and buggy, so switch to pyopenxr.
*	Enforce a stricter avg_logbprob than defaultv0.13.1	yum	2023-07-07
\| \| \| \| \| \| \| \|	Common hallucinations sneak in around -0.9 avg_logprob. Also: * Limit temperatures to just 0.0. Multiple values cause latency to occasionally spike.
*	Finish translation for Western European language speakersv0.12.0	yum	2023-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	NLLB needs its input to be split up into sentences. I use the sentence_splitter Python package to do this. It supports ~20 Western European languages, but notably, no Asian languages. * Sort spoken language list. English is still at the top. * Remove 'Translation source' dropdown. Infer this from the spoken language. * Add lang_compat.py to map language codes between the various libraries (whisper, nllb, sentence_splitter). * Fix bug where old text would appear in textbox when you first bring it up.
*	Add ability to translate into 200 languages	yum	2023-05-25
\| \| \| \| \| \| \| \| \|	Use Meta's No Language Left Behind (NLLB) algorithm to provide translation capabilities into 200 languages. Obviously most are very untested. This requires either 4.1 or 7.1 GB of RAM and significiantly increases transcription latency.
*	Add keyboard togglev0.11.4	yum	2023-05-22
\| \| \| \| \| \|	Users can now configure a keybind to start/stop/dismiss the STT when in desktop mode. The default keybind is ctrl+x, since by default VRC doesn't use 'x' for anything.
*	Drop torch from requirements.txtv0.11.2	yum	2023-05-01
\| \| \| \| \| \| \| \| \|	faster-whisper doesn't need it. This reduces install size from 6.00GB with base.en model to 1.70GB. * Use a single sampler in shader (enables using more than 16 textures) * Minor legibility regression - need to improve AA. * Enable backface culling in shader (minor performance win)
*	~Finish integrating faster-whisper	yum	2023-04-24
\| \| \| \|	I'm able to use the new code to show text in game. Not yet play-tested.
*	Begin integrating faster-whisperv0.11.0	yum	2023-04-23
\| \| \| \| \| \|	This is a much faster, lower-VRAM reimplementation of Whisper in Python. Early testing is extremely promising: fast transcription speed, extremely low resource usage (CPU/RAM/VRAM), high accuracy.
*	Fix _socket module not found issue	yum	2023-03-21
\| \| \| \| \|	Need python310._pth, specifically 'import site' line, for embedded python + pip to get along.
*	Specify exact version for torch	yum	2023-01-31
\| \| \| \|	Ruling out possibilities for a user reported bug.
*	Bugfix: requirements.txt installs correct version of pytorch	yum	2023-01-25
\| \| \| \|	The --extra-index-url must appear before the dependency in this file.
*	Use requirements.txt for Scripts/	yum	2023-01-25
	This seems to be the canonical way of listing a Python app's dependencies. * Installing dependencies no longer hangs the GUI