TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Delete unused filesv1.0.0-beta00	yum	2025-07-23
\|
*	Add support for whisper large v3 turbo	yum	2024-11-16
\| \| \| \| \| \| \|	Also: * Double # of audio device slots * Fetch CuDNN from NVIDIA at runtime instead of vendoring
*	Upgrade vendored CUDA to 12.5v0.19.2	yum	2024-06-09
\|
*	Bump CUDNN to v8.9.7v0.19.1	yum	2024-06-09
\| \| \| \|	Also disable flash-attention when CPU mode is selected
*	Finish fixing build break	yum	2024-03-04
\| \| \| \| \|	CUDNN now pulls from dropbox instead of google drive. This has the added benefit of being about 10-20x faster (assuming you have fast internet).
*	Begin fixing build on new hosts	yum	2024-03-04
\| \| \| \| \|	Google drive intentionally broke CLI downloads ("don't be evil") and UwwwuPP went away. Begin work rehosting both files.
*	General cleanupv0.15.3	yum	2023-09-13
\| \| \| \|	Remove unused proxy code, curl, and images.
*	Switch to VadCommitter	yum	2023-09-07
\| \| \| \| \| \| \| \|	FuzzyRepeatCommitter was approximating this behavior in the best-performing configuration, so switch to it in earnest. This committer simply commits audio once we detect a long enough gap in speech. That's it!
*	Begin work on proxy server	yum	2023-07-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Create a simple server with 3 endpoints: * /create_session: Create a session and return its identifier. * /set_transcript: Update a session's transcript. * /get_transcript: Fetch a session's transcript. Right now the session ID provides authentication and authorization. There is no public/private ID so you have to trust whoever you share your ID with. IDs are long and generated by the server, so it should be somewhat secure against low-effort hacking. Other updates: * Drop whisper_requirements.txt - no longer needed. * Vendor curl to make it easier to interact with the server. TODO: * Fuzz test the server.
*	Add profanity filter	yum	2023-06-28
\| \| \| \| \| \| \|	Add toggle to UI to enable a profanity filter. It replaces vowels in bad words with asterisks. Bugfix: filters now apply to OBS
*	Scrub out old C++-based Whisper code	yum	2023-06-26
\| \| \| \|	No longer used.
*	Begin work on uwu filter	yum	2023-05-24
\| \| \| \| \| \|	Use UwwwuPP to translate your boring old speech into uwu-ified version. Still need to add a UI toggle for this.
*	Enable selecting specific GPU when transcribing	yum	2023-05-21
\| \| \| \| \| \|	Useful on devices with multiple GPUs, such as gaming laptops. * Update GUI/README.md.
*	Fix custom chatbox zwrite/depth	yum	2023-04-25
\| \| \| \| \| \| \|	Depth was being calculated wrong, causing text box to render behind objects it's in front of. * Fix package.ps1 compression. 7z was increasing file size, somehow.
*	~Finish integrating faster-whisper	yum	2023-04-24
\| \| \| \|	I'm able to use the new code to show text in game. Not yet play-tested.
*	package.ps1 always regenerates Python/v0.10.1	yum	2023-03-28
\| \| \| \|	Intended to avoid accidentally releasing dirty environments.
*	Vendor pip and future	yum	2023-03-28
\| \| \| \| \| \| \| \|	This dependency fails to install with the embedded python, so now it's vendored. Installing pip after wheel would result in wheel reinstalling, so we also vendor pip.
*	Fix _socket module not found issue	yum	2023-03-21
\| \| \| \| \|	Need python310._pth, specifically 'import site' line, for embedded python + pip to get along.
*	Improve behavior around VAD segmentation events	yum	2023-02-26
\| \| \| \| \| \| \|	Use forked Whisper implementation which has tweaks to reduce dropped words around the beginning VAD segments. * Retain audio after VAD segmentation events
*	Add HTML for BrowserSource	yum	2023-02-24
\| \| \| \| \|	Browser source queries /api/transcript at 10Hz via jquery and renders the response.
*	Add hack to prevent browser source crash on shutdown	yum	2023-02-24
\| \| \| \| \| \|	Documented in BrowserSource::Run(). * Parameterize Release/Debug in build scripts
*	Begin work on C++ implementation	yum	2023-02-22
\| \| \| \| \| \| \| \|	Use Const-me/Whisper to perform transcription. This implementation is vastly more efficient: CPU usage, memory usage, and VRAM usage are all dramatically reduced. It's slightly less accurate when comparing the same model (due to the lack of beam search decoding), but since you can use larger models, the impact is largely a wash.
*	Transcription and Unity input fields now auto-synchronize	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When you generate Unity assets, you have to configure rows/cols/chars per sync/ bytes per char. When you switch over to the transcription panel, these choices will be automatically populated. This should reduce accidental mismatch between the two panels. * Merge Config classes. Now just use one big AppConfig class instead of one class per panel. * Factor out (most) input field initialization into a function. Call it when switching panels so input fields synchronize. * Wrap a lot of lines at 80 columns. * Add -skip_zip switch to package.ps1.
*	Begin work adding emotes	yum	2023-02-13
\| \| \| \| \| \| \| \| \| \| \| \|	Done: * Users can add images to Fonts/Emotes/ * The basename of that image ('clueless.png' becomes 'clueless') is the keyword to make the image show up in game. * Fix a bug in the shader where letters on the 2nd texture and later would have UV outside of [0.0, 1.0] Not yet implemented: * transcribed words are encoded using emotes mapping
*	Delete python310._pth	yum	2023-01-28
\| \| \| \| \| \| \| \| \| \| \| \|	I was using this file to constrain the set of paths that Python can see, but since `future` doesn't have a wheel, it will fail to install on a fresh system. If you set pip's --cache-dir to some new directory, you'll see it fail to install. The _pth doesn't really seem to matter, since without it, packages are still installed under the virtual environment.
*	Bugfix: Use future 0.18.2 instead of 0.18.3	yum	2023-01-23
\| \| \| \|	Whisper doesn't like 0.18.3, so downgrade to the last version.
*	package.ps1 now fetches all dependencies	yum	2023-01-23
\| \| \| \| \| \| \|	Don't literally check in Python since it looks dodgy (rightfully so). Instead the build script just fetches it. * Update README, simplifying language and documenting other projects
*	Embed git in package	yum	2023-01-01
\| \| \| \| \| \| \|	package.ps1 fetches PortableGit and embeds it in the package. This eliminates all but one runtime dependency (MSVC++ Redistributable). * Move Python into a new FOSS folder.
*	GUI: expose chars per sync, bytes per char	yum	2022-12-24
\| \| \| \| \| \| \| \| \| \| \| \|	Users can now control how many characters they send per sync event, as well as the number of bytes used to represent each character. This gives them the power to pick between faster paging and fewer sync params. International users must use 2 bytes per char (at least for now). * package.ps1: don't distribute the gigantic TTF files, just the bitmaps
*	GUI: "Finish" avatar generation workflow	yum	2022-12-20
\| \| \| \| \| \| \| \| \| \| \|	GUI now generates parameters & menu. Still need to handle write defaults. * Add capability to append to avatar parameters & menu * Install canned Unity assets, shaders, and fonts in avatar folder * Check in materials for ease of use * Bugfix: correctly label menu/parameters file pickers
*	GUI can now generate animator	yum	2022-12-20
\| \| \| \|	Still need to generate params & merge menus. Getting close....
*	Now it's possible to build the app from Powershell	yum	2022-12-18
\| \| \| \|	No more WSL dependencies!
*	Add ability to select model	yum	2022-12-18
\| \| \| \| \| \| \|	* icon now works when pinned to taskbar * add model selection * add script to dump mic devices * whisper models now download into the virtual environment
*	GUI: Add mic, language selection	yum	2022-12-18
\| \| \| \| \| \| \| \|	Users can now select their mic & spoken language in the GUI. * pyaudio now samples at the mic rate, fixing an issue where frames would drop. We downsample in the callback by dropping frames. * add Sounds folder to package
*	Finish python virtual env	yum	2022-12-17
\| \| \| \| \| \| \| \| \|	GUI can now download all TaSTT dependencies and install them into a virtual environment. * Add buttons to check embedded python version & install dependencies * Add class to wrap interacting with embedded Python * Put all TaSTT python scripts into a folder
*	Check in python 3.11	yum	2022-12-16
\| \| \| \|	License is included in source & distributable package.
*	Add logo	yum	2022-12-16
	* GUI now shows logo * Add package.ps1 to generate distributable application bundle * Rename ~GUI to GUI * Add ScopeGuard class