TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Archive projectHEAD master	yum	2025-09-18
\|
*	Drop turbo; use old logic when no_speech ts available	yum	2025-09-03
\|
*	Work more on hallucination filteringv1.0.0-beta03	yum	2025-07-25
\|
*	Experiment with hallucination reduction	yum	2025-07-25
\| \| \| \| \| \| \| \| \| \|	- update cursorignore - add hallucination filter training & inference code - put logging into a central module - segment metadata logging occurs before filtering - segment metadata logging is on by default - check in embedded python setup script - include trained hallucination filter model
*	Sanitize transcript before putting in filename	yum	2025-07-24
\|
*	Switch to embedded python	yum	2025-07-24
\|
*	Enforce clean venv on buildv1.0.0-beta02 v1.0.0-beta01	yum	2025-07-24
\|
*	Redact usernames from console output	yum	2025-07-24
\|
*	Clean up python process environment	yum	2025-07-24
\|
*	Clean up package	yum	2025-07-23
\|
*	Delete unused filesv1.0.0-beta00	yum	2025-07-23
\|
*	Import FastTextPager repo	yum	2025-07-23
\|\
\| *	Set target loudness to -16, and enable segment metadata logging by default	yum	2025-07-23
\| \|
\| *	Update avg_logprob cutoff, fix sounds, fix electron build	yum	2025-07-23
\| \|
\| *	add segment metadata logging feature	yum	2025-07-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Segment metadata can now be logged to a json as the app runs. The goal is to identify the params that heavily correlate with hallucinations. Also: * use 7zip for compression in build, speeding things up * log dll download progress every few seconds * shrink package
\| *	bugfixes	yum	2025-07-23
\| \| \| \| \| \| \| \| \| \| \| \|	* fix model acquisition * fix local beepsnd * fix volume control
\| *	More stuff	yum	2025-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- add desktop and vr input threads - add audio feedback for input - add volume control for audio feedback - add UI for custom chatbox/built in chatbox - add ability to dismiss built in chatbox (sync empty messages) - limit lines in python console - limit length of each transcript
\| *	More stuff	yum	2025-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- fix unicode output from python terminal - fix cpu inference - add filters - add beam search params to UI - DRY up config definition in UI
\| *	More polish	yum	2025-05-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Filters actually get applied now, huge accuracy boost - Use silero-vad python library instead of rolling our own - Expose prompt parameter - Auto setup venv on launch - Clean up python output - Auto acquire all dependencies on launch - Add icon
\| *	More UI work	yum	2025-05-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	1. main STT app works in new project structure 2. UI dumps mics on startup to populate mic list 3. add missing deps (hf-xet, wave) 4. normalize audio volume when transcribing. Probably still wrong tbqh. 5. add checkbox to save audio segments & improve logic so only segments with speech get saved. 6. add default config settings
\| *	Begin roughing out STT UI	yum	2025-05-29
\| \| \| \| \| \| \| \|	HEAVILY VIBE CODED!
\| *	Add basic electron+tailwind hello world	yum	2025-05-29
\| \|
\| *	Move core app logic into folder	yum	2025-05-29
\| \|
\| *	Add STT code	yum	2025-05-17
\| \|
\| *	code bomb	yum	2025-05-11
\|
*	App sends text to built-in chatbox by defaultv0.21.0	yum	2024-11-16
\| \| \| \|	This is overwhelmingly more common than custom chatbox.
*	Remove flash_attention toggle	yum	2024-11-16
\| \| \| \| \|	Deprecated in the Python release of CTranslate2 as of 4.4.0: https://github.com/OpenNMT/CTranslate2/blob/master/CHANGELOG.md#v440-2024-09-09
*	Add support for whisper large v3 turbo	yum	2024-11-16
\| \| \| \| \| \| \|	Also: * Double # of audio device slots * Fetch CuDNN from NVIDIA at runtime instead of vendoring
*	Support as few as 1 char per sync in custom chatbox	yum	2024-07-30
\|
*	Update README	yum	2024-07-30
\|
*	Replace hardcoded localhost with js magicv0.20.0	yum	2024-07-12
\| \| \| \| \| \|	Use some js magic to deduce the hostname instead of hardcoding localhost. If you used the browser source under 127.0.0.1, then you'd get XSS blocked from making the ajax calls. This fixes that.
*	Another edge case: first commit should not get a leading space	yum	2024-07-12
\|
*	Edge case: initial preview should not have a space added in front of it	yum	2024-07-12
\| \| \| \|	God this code is a fucking nightmare
*	Fix spacing in browser source	yum	2024-07-12
\|
*	Translation shows original language by default	yum	2024-07-12
\| \| \| \| \| \| \|	* Add checkbox to disable this feature if so desired. * Delete old optimization code; can get it back from git if needed. * Enforce that there's at least one space character ' ' between committed segments.
*	Fix translation plugin	yum	2024-07-12
\| \| \| \| \| \| \| \| \| \| \|	Translation needs torch to convert the nllb model, but the latest version (2.3.1) has an embedded OMP dll which clashes with ctranslate2's dll. Using the last minor version instead (2.2.2) doesn't clash. Also propagate the device, quantization, and flash attention settings to the translator. If you're using GPU, this is a HUUUUGE performance uplift. Translation is basically instant. The bigger models are now feasible to use.
*	`use_flash_attention` checkbox now persists across sessions	yum	2024-07-12
\|
*	Upgrade vendored CUDA to 12.5v0.19.2	yum	2024-06-09
\|
*	Bump CUDNN to v8.9.7v0.19.1	yum	2024-06-09
\| \| \| \|	Also disable flash-attention when CPU mode is selected
*	Add checkbox for flash-attention	yum	2024-06-09
\| \| \| \|	Pre-3000 series GPUs don't support it. Oops!
*	Update defaults to work with modular prefabv0.19.0	yum	2024-06-06
\| \| \| \| \|	There's a modular avatar prefab for the custom chatbox on my gumroad. Update the default settings to work with that prefab.
*	Upgrade faster-whisper with flash-attention2	yum	2024-06-05
\| \| \| \| \| \| \| \|	This should be significantly more efficient than prior versions. * add large-v3 & distilled variant * simplify model acquisition code now that distilled models are part of faster-whisper.
*	Fix distilled models	yum	2024-03-14
\| \| \| \| \| \| \|	These were broken due to some logic errors in the codepath which acquires models from huggingface. Distilled large-v2 seems promising as a new default model.
*	Add "simple" text-to-text demo for the modular avatar chatbox	yum	2024-03-08
\| \| \| \| \| \| \|	To use it: $ python3 -m pip install python-osc pillow $ cd Scripts $ python3 ./text_to_text_demo.py
*	Finish fixing build break	yum	2024-03-04
\| \| \| \| \|	CUDNN now pulls from dropbox instead of google drive. This has the added benefit of being about 10-20x faster (assuming you have fast internet).
*	Begin fixing build on new hosts	yum	2024-03-04
\| \| \| \| \|	Google drive intentionally broke CLI downloads ("don't be evil") and UwwwuPP went away. Begin work rehosting both files.
*	Update LICENSE year	yum	2024-03-02
\|
*	Finish plumbing GPU compute typev0.18.1	yum	2024-02-09
\|
*	Add dropdown for GPU compute typev0.18.0	yum	2024-02-09
\| \| \| \|	Should enable compatibility with older GPUs.
*	Add another threshold to filter out common hallucinations	yum	2024-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The paper recommends filtering out segments with no_speech_prob > 0.6 and avg_logprob < -1. This is too loose of a bound for short-form audio which is not guaranteed to contain speech. I already have a tighter bound: no_speech > 0.6 and avg_logprob < -0.5 While listening to instrumental music I find that a lot of hallucinations sneak past that bound. So I added a second bound: no_speech > 0.15 and avg_logprob < -0.7 Basically we filter out things that look like speech but have a worse avg_logprob. Seems to not have false negatives. Requires testing. Also: dial back the default max segment length from 15 seconds to 10 seconds. This is done based on performance observations in desktop.