TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Add keyboard togglev0.11.4	yum	2023-05-22
\| \| \| \| \| \|	Users can now configure a keybind to start/stop/dismiss the STT when in desktop mode. The default keybind is ctrl+x, since by default VRC doesn't use 'x' for anything.
*	Merge pull request #1 from faker2048/patch-1	yum-food	2023-05-22
\|\ \| \| \| \|	Fix accidental semicolon typo
\| *	Fix accidental semicolon typo	faker	2023-05-22
\|/
*	Enable selecting specific GPU when transcribing	yum	2023-05-21
\| \| \| \| \| \|	Useful on devices with multiple GPUs, such as gaming laptops. * Update GUI/README.md.
*	Fix noop animations on current creator companion buildv0.11.3	yum	2023-05-09
\| \| \| \| \| \|	See comment for details. * Update README
*	Drop torch from requirements.txtv0.11.2	yum	2023-05-01
\| \| \| \| \| \| \| \| \|	faster-whisper doesn't need it. This reduces install size from 6.00GB with base.en model to 1.70GB. * Use a single sampler in shader (enables using more than 16 textures) * Minor legibility regression - need to improve AA. * Enable backface culling in shader (minor performance win)
*	Restore string matching, remove affinity maskv0.11.1	yum	2023-04-25
\| \| \| \| \| \|	Affinity mask no longer affects performance. String matching is still needed for temporal stability in fast-paced long-form transcription tasks.
*	Fix custom chatbox zwrite/depth	yum	2023-04-25
\| \| \| \| \| \| \|	Depth was being calculated wrong, causing text box to render behind objects it's in front of. * Fix package.ps1 compression. 7z was increasing file size, somehow.
*	~Finish integrating faster-whisper	yum	2023-04-24
\| \| \| \|	I'm able to use the new code to show text in game. Not yet play-tested.
*	Begin integrating faster-whisperv0.11.0	yum	2023-04-23
\| \| \| \| \| \|	This is a much faster, lower-VRAM reimplementation of Whisper in Python. Early testing is extremely promising: fast transcription speed, extremely low resource usage (CPU/RAM/VRAM), high accuracy.
*	package.ps1 always regenerates Python/v0.10.1	yum	2023-03-28
\| \| \| \|	Intended to avoid accidentally releasing dirty environments.
*	Fix virtual env reset	yum	2023-03-28
\| \| \| \| \| \|	Use `pip freeze` and `pip uninstall` to reset the venv to a near-default state. Filter out `future` since we need to vendor it. If it ever gets removed, the installation is borked.
*	Vendor pip and future	yum	2023-03-28
\| \| \| \| \| \| \| \|	This dependency fails to install with the embedded python, so now it's vendored. Installing pip after wheel would result in wheel reinstalling, so we also vendor pip.
*	Custom chatbox shader writes depth	yum	2023-03-23
\| \| \| \| \| \| \| \|	This fixes issues where the transparent corners of the textbox render in front of other materials, causing those other materials to skip rendering. * Update README.md with roadmap and avatar resource usage.
*	Reduce texture memory usage for English speakersv0.10.0	yum	2023-03-21
\| \| \| \| \| \| \| \| \| \|	We used to populate 7 4k textures + 1 2k texture for all users. Now if the user has configured `bytes_per_char=1` in the Unity panel, we just populate a single 512x512 texture containing the first 128 ASCII characters. This reduces texture memory usage by 99.74%, from 134.67 MB to 340 KB.
*	Fix _socket module not found issue	yum	2023-03-21
\| \| \| \| \|	Need python310._pth, specifically 'import site' line, for embedded python + pip to get along.
*	Begin work fixing venv setup	yum	2023-03-09
\| \| \| \| \|	If you don't have Python installed, venv setup will fail. Begin work fixing environment config so `pip install` uses vendored Python.
*	Set PYTHONPATH in synchronous multiprocessing layer	yum	2023-03-08
\| \| \| \| \| \| \| \| \|	A user saw an error like `ModuleNotFoundError: No module named _socket`. StackOverflow blames this on PYTHONPATH, so let's try setting it. * Fix latent bug in Scripts/transcribe.py. PyAudio.open() positional parameters must be specified in correct order, even when telling it which parameter is which. shrug
*	Expose more C++ whisper parameters in GUI	yum	2023-03-08
\| \| \| \| \| \| \| \|	Expose decode method, beam search parameters, and voice activity detection parameters in GUI. * Remove WhisperCPP::Init(), do it on launch instead. * Add float support to ConfigMarshal
*	Silence virtual env setup PATH warnings	yum	2023-03-06
\| \| \| \| \| \|	Twofold approach: * All spawned processes have the desired path (new codepath) * Setup command silences the warning (old codepath)
*	Animator generation and dumping mics no longer hang GUIv0.9.0	yum	2023-03-05
\| \| \| \| \| \|	Do these in a std::future. * SetAffinityMask() now returns a value on all control paths
*	Update README.txt	yum	2023-03-02
\|
*	Implement thread affinity optimization for Python transcription engine	yum	2023-02-28
\| \| \| \| \| \| \| \| \|	A user pointed out that constraining the Python implmentation to a single core does not affect visible latency. This seems true on my PC as well. * Reimplement Python transcription wxProcess as a std::async. App shutdown is much faster now.
*	Bugfix: fix use-after-free in GetMicsImpl	yum	2023-02-28
\| \| \| \| \|	* Plumb beam search params into whisper cpp implementation (currently broken)
*	Filter out more transcription noisev0.8.2	yum	2023-02-26
\| \| \| \| \|	Things like " (static)" and " explosions" were showing up a lot with ggml-medium.bin. Filter them out.
*	Improve behavior around VAD segmentation events	yum	2023-02-26
\| \| \| \| \| \| \|	Use forked Whisper implementation which has tweaks to reduce dropped words around the beginning VAD segments. * Retain audio after VAD segmentation events
*	CPP implementation refinements	yum	2023-02-26
\| \| \| \| \| \| \| \|	* Pip install, dependency install, and model download can be gracefully interrupted and resume later. * Mic list was pointing at freed memory. Fix this by copying into the heap with std::unique_ptr()s. Mic list in CPP panel is much more reliable now.
*	Bugfix: C++ transcription engine should not launch OSC layerv0.8.1	yum	2023-02-26
\| \| \| \|	Not ready yet.
*	Bugfix: add vendored git to PATHv0.8.0	yum	2023-02-26
\|
*	Begin work on C++ custom chatbox	yum	2023-02-26
\| \| \| \| \| \| \| \| \| \|	Sort of a misnomer. The idea is to use C++ for transcription and Python for steamvr and OSC. Having issues getting output from multithreaded Python code. Not in the mood to figure this out today. * Hide unimplemented parts of C++ panel.
*	Convert most PythonWrapper wxLogError() to Log()	yum	2023-02-25
\| \| \| \|	Simplifies debugging process.
*	Drop ryml	yum	2023-02-25
\| \| \| \| \| \| \| \| \| \| \|	Rapidyaml started refusing to parse config files so I dropped it. * Add ConfigMarshal clas to support very simple config marshalling * No versioning, no type indicators, nothing. * Supports int, bool, and string. * Bool are serialized as int. * Log no longer segfaults if given nullptr wxTextCtrl. Fix how whisper CPP GUI fields restore from config
*	Complete OBS browser source	yum	2023-02-25
\| \| \| \| \|	* Implement HTTPMapper classes * Browser source respects user-configured source port
*	Add HTTP parser	yum	2023-02-25
\| \| \| \| \| \|	Server needs to parse incoming HTTP. * Server spawns a thread for each incoming connection
*	Begin work on custom webserver	yum	2023-02-25
\| \| \| \| \| \| \|	oatpp was a crashy mess. Begin making a simple web server from scratch. * Add Designs/ folder to document nontrivial things like the webserver design
*	Finish browser source proof-of-concept	yum	2023-02-24
\| \| \| \| \| \|	It's a crashy mess, but it sort of works. * Add Transcript class to send transcription segments between layers
*	Add HTML for BrowserSource	yum	2023-02-24
\| \| \| \| \|	Browser source queries /api/transcript at 10Hz via jquery and renders the response.
*	Add hack to prevent browser source crash on shutdown	yum	2023-02-24
\| \| \| \| \| \|	Documented in BrowserSource::Run(). * Parameterize Release/Debug in build scripts
*	Wire up browser source	yum	2023-02-23
\| \| \| \| \| \| \|	Browser source can be started and stopped via the UI. It still serves a hello world json blob. Observing occasional crashes when stopping the C++ transcription engine.
*	Implement streaming output for synchronous multiprocessing layer	yum	2023-02-23
\| \| \| \| \|	Synchronous multiprocessing layer now accepts a callback, which the caller can use to stream output to the UI.
*	Add input fields for browser source	yum	2023-02-22
\| \| \| \| \| \|	Not wired up yet. * Add browser source fields to persistent config
*	Begin sketching out browser source	yum	2023-02-22
\| \| \| \|	* Fix oatpp fetch and build
*	Revert "Apply previous window conditioning to decoding layer"	yum	2023-02-22
\| \| \| \| \| \| \| \|	This reverts commit cece1ee8f1b985c2a89adb661dd02c6d44787f67. This does not in fact result in improved temporal stability. It makes makes things so unstable that even single-sentence messages fail to ever stabilize.
*	Finish reimplementing synchronous process layer	yum	2023-02-22
\| \| \| \| \| \| \| \| \| \| \|	Use raw WIN32 APIs to launch processes instead of wxProcess. This enables spawning processes from arbitrary thread contexts, such as std::async or std::thread. In the future, this layer should be redone to support streaming output. * TODO: update setting path. This is almost certainly broken for users without git installed. Test in VM!
*	Checkpoint: begin work reimplementing processes	yum	2023-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It appears that you cannot spawn a wxProcess from an independent thread of execution. I imagine they're supposed to be spawned from the main thread. Fuck that, I'm going to try to use the raw WIN32 API to spawn helper processes, and do it from arbitrary thread context. * Log() now delegates to a queue which the main thread periodically drains. * Log() now writes to a file. * WhisperCPP thread is now done with std::async. * Default chars per sync is now 8 * oatpp: Promising web framework.
*	Various refinements	yum	2023-02-22
\| \| \| \| \| \| \| \| \|	* Filter out transcriptions like " (music)" * Whisper mic choice auto-populates with queried values * No more manually lining up numbers! * Persist whisper mic in config * Remove setup and dump mics button from Whisper page * Redesign makes these unnecessary
*	Begin work on C++ implementation	yum	2023-02-22
\| \| \| \| \| \| \| \|	Use Const-me/Whisper to perform transcription. This implementation is vastly more efficient: CPU usage, memory usage, and VRAM usage are all dramatically reduced. It's slightly less accurate when comparing the same model (due to the lack of beam search decoding), but since you can use larger models, the impact is largely a wash.
*	Apply previous window conditioning to decoding layer	yum	2023-02-22
\| \| \| \| \|	Per the Whisper source code, this should result in better temporal stability.
*	Transcription and Unity input fields now auto-synchronize	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When you generate Unity assets, you have to configure rows/cols/chars per sync/ bytes per char. When you switch over to the transcription panel, these choices will be automatically populated. This should reduce accidental mismatch between the two panels. * Merge Config classes. Now just use one big AppConfig class instead of one class per panel. * Factor out (most) input field initialization into a function. Call it when switching panels so input fields synchronize. * Wrap a lot of lines at 80 columns. * Add -skip_zip switch to package.ps1.
*	Remove exponential backoff capv0.7.0	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \|	Allows sustained exponential backoff when not transcribing. Used to cap out at 1s. * Add more items to README TODO list * Adjust emote metadata * Emotes bugfix: Non-existent emote map doesn't cause transcription engine to bail out.