TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Revert "Apply previous window conditioning to decoding layer"	yum	2023-02-22
\| \| \| \| \| \| \| \|	This reverts commit cece1ee8f1b985c2a89adb661dd02c6d44787f67. This does not in fact result in improved temporal stability. It makes makes things so unstable that even single-sentence messages fail to ever stabilize.
*	Begin work on C++ implementation	yum	2023-02-22
\| \| \| \| \| \| \| \|	Use Const-me/Whisper to perform transcription. This implementation is vastly more efficient: CPU usage, memory usage, and VRAM usage are all dramatically reduced. It's slightly less accurate when comparing the same model (due to the lack of beam search decoding), but since you can use larger models, the impact is largely a wash.
*	Apply previous window conditioning to decoding layer	yum	2023-02-22
\| \| \| \| \|	Per the Whisper source code, this should result in better temporal stability.
*	Remove exponential backoff capv0.7.0	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \|	Allows sustained exponential backoff when not transcribing. Used to cap out at 1s. * Add more items to README TODO list * Adjust emote metadata * Emotes bugfix: Non-existent emote map doesn't cause transcription engine to bail out.
*	Add hack to reduce outlines around emotes	yum	2023-02-13
\| \| \| \| \| \| \| \|	Don't render any part of an emote with alpha < 0.5. Improves visual clarity in the common case at the cost of generality. * Emotes now use physically-based shading. * Use round() to denoise shader parameters instead of floor()
*	Finish emotes	yum	2023-02-13
\| \| \| \| \| \| \| \| \| \| \|	Emotes require 2 bytes per char. They're encoded into the region [0xE000, infinity). The texture is 4k, and uses 1k vertical pixels per emote segment, for a maximum of 32 segments. * Reduce volume of noise indicator by 90%. Quiet is probably better. Might want to add a volume slider idk. * Bugfix: emotes without a transparency channel now work * Address a couple Unity performance complaints about the shader
*	Begin work adding emotes	yum	2023-02-13
\| \| \| \| \| \| \| \| \| \| \| \|	Done: * Users can add images to Fonts/Emotes/ * The basename of that image ('clueless.png' becomes 'clueless') is the keyword to make the image show up in game. * Fix a bug in the shader where letters on the 2nd texture and later would have UV outside of [0.0, 1.0] Not yet implemented: * transcribed words are encoded using emotes mapping
*	Built-in chatbox no longer shows empty messages	yum	2023-02-04
\| \| \| \|	* Reduce noise on/off indicator volume by 50%
*	Use bold font for English	yum	2023-01-31
\| \| \| \|	Looks more legible. Thanks Noppers for the feedback!
*	Specify exact version for torch	yum	2023-01-31
\| \| \| \|	Ruling out possibilities for a user reported bug.
*	Bugfixesv0.4.0	yum	2023-01-27
\| \| \| \| \| \|	* Fix prefab: bounding box & position are now set to 0 * Fix shader: text is no longer upside down * Update README
*	Finish basic PBR shading	yum	2023-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \|	TaSTT shader now uses physically based rendering (PBR). Users can pick smoothness, metallic, and emissive. This implementation borrows heavily from catlikecoding.com's excellent tutorials, which are released under MIT No Attribution (MIT-0). https://catlikecoding.com/unity/tutorials/license/ To retain what little clarity remains in the shader, I have chosen not to attribute the code in the source itself.
*	Bugfix: requirements.txt installs correct version of pytorch	yum	2023-01-25
\| \| \| \|	The --extra-index-url must appear before the dependency in this file.
*	GUI: Add ability to choose button	yum	2023-01-25
\| \| \| \| \| \|	We use a button to start/stop transcription. Previously this was hardcoded to left joystick. Now users can pick from {left, right} x {joystick, a, b}.
*	Use requirements.txt for Scripts/	yum	2023-01-25
\| \| \| \| \| \| \|	This seems to be the canonical way of listing a Python app's dependencies. * Installing dependencies no longer hangs the GUI
*	Enable using built-in chatboxv0.3	yum	2023-01-22
\| \| \| \| \| \| \| \| \| \| \| \| \|	VRChat exposes a built-in chatbox which can be seen by anyone who has it enabled. This was not the case when I started this project: the chatbox would only be visible to friends. Since this is clearly useful, enabling the STT on public models, let's enable sending data to it. Caveats: * The built-in chatbox has anti-spam tech which limits us to updating about once every 2 seconds. The custom chatbox has no such limitation and is thus typically much faster.
*	Bugfix: user-provided paths may now contain spaces	yum	2023-01-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, paths containing spaces would be interpreted by python's argument parser as multiple separate arguments, causing it to fail. Now we escape paths inside PythonWrapper using std::quoted(). * Improve PII filtering. Python output would contain multiple path separators (like C:\\Users\\foo\\), defeating the PII regex. * Silence compiler warning in PII filter. * Document usability improvements. * Transcription layer exponential backoff goes to ~infinity when paused. This is a hack, since we really don't need to transcribe at all when paused, but it lets us keep the code simple. Good enough until the next rewrite. * Shader only samples background when necessary. * Limit matchStrings() print()s to DEBUG mode
*	Portability bugfixes	yum	2023-01-01
\| \| \| \| \|	* Expose option to run transcription engine on CPU instead of GPU * Use embedded git when setting up the Python virtual environment
*	Tweak paging logicv0.1	yum	2022-12-31
\| \| \| \| \| \|	Re-paging anything on screen N causes screens N+1...infinity to completely re-page. This fixes cases where we go back and draw something at the bottom of the board, and it never gets overwritten.
*	Bugfix: regions truncate correctly at page boundaries	yum	2022-12-30
\| \| \| \| \| \| \| \|	Boards whose size is an even multiple of CHARS_PER_SYNC would lose the entire last region. * Attempt to fix runaway memory usage of GUI text frames, but this needs more work
*	GUI: Expose transcription window duration	yum	2022-12-30
\| \| \| \| \|	Users can pick longer transcription durations for accuracy-critical tasks, or shorter durations for latency-critical tasks.
*	Bugfix: regenerated FX layers now work on uploaded avatars	yum	2022-12-30
\| \| \| \| \| \| \| \| \| \| \| \| \|	VRChat won't update the FX layer associated with an avatar unless its GUID changes. Delete the GUID file when overwriting our generated FX layer to work around this. * Change paging behavior: when a region is updated, we re-page everything that comes after it. This fixes the issue where we go back to update something, then jump back to the current screen, leaving some random chunk of text somewhere on the board. * Reduce transcription time from 28s to 10s. I'm going to expose this to the user since there's a fundamental latency/stability tradeoff here.
*	Fine-tune transcription	yum	2022-12-30
\| \| \| \| \| \| \| \|	Bump up recording window to 28 seconds. This helps a lot with long-form transcription tasks, s.a. transcribing an audiobook. We should expose this as a parameter, since at 10s the transcription delay is typically 300ms, while at 28s it's typically 1.1-1.2s.
*	GUI: Users can now control board dimensions	yum	2022-12-29
\| \| \| \| \| \| \| \|	Users can now control how many letters wide and tall the board is. Tested at 4x48, 5x60, 10x120, and 20x240. At 20x240, Unity freezes and does not make forward progress. Perhaps creating 4800 float parameters isn't a truly scalable interface.
*	Add Scripts/generate_shader.py	yum	2022-12-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now it's possible to generate shaders with a custom number of rows, columns, and bytes per character. All edits to the shader should go through TaSTT_template.shader. To generate a new shader from the template: $ ./Scripts/generate_shader.py \ --bytes_per_char 2 \ --rows 1 \ --cols 12 --shader_template $(pwd)/Shaders/TaSTT_template.shader \ --shader_path $(pwd)/Shaders/TaSTT.shader
*	GUI: preview number of parameter bits the config will use	yum	2022-12-29
\| \| \| \| \|	Users can now see the number of avatar parameter bits they'll use prior to committing.
*	First letter no longer disappears	yum	2022-12-29
\| \| \| \| \| \| \| \| \|	An off-by-one issue in numRegions() would result in one extra layer trying to drive a letter in the last region, which would wrap back around to the 0th character slot (cell). * GUI explicitly logs when it's done generating avatar stuff * OSC layer no longer tries to update cells which don't exist
*	Users can disable local beep	yum	2022-12-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The transcription engine beeps when you start/stop transcribing so you know that it's listening. Users can now disable this. * add help text to all input fields in GUI * make TaSTT generated file textctrls readonly, since I haven't tested them being reassigned * document idea to configure unity & transcription apps with config files * controller input thread no longer crashes if steamvr isn't running, it just slowly spins and waits * when you stop transcribing, the transcription engine re-transcribes a few times. I think this should improve end-of-transcription tail latencies * transcribe.py now prints out its args
*	Encapsulate paging & text wrapping logic	yum	2022-12-27
\| \| \| \| \| \| \| \|	Define proper interfaces for these things. Simplify osc_ctrl, temporarily dropping support for emotes (they were broken anyway). * Bugfix: Japanese no longer crashes transcribe.py, but it still doesn't show up in the wxTextCtrl
*	Bugfix: don't use last region if it's partial	yum	2022-12-25
\| \| \| \| \| \| \| \| \| \|	Because we allow users to customize the # of sync params, the board is no longer divided into regions of uniform size. When the last region is a different size than the rest, we simply omit it from paging. This is a hack but it's easy to reason about. Of course the entire paging stack should be rewritten, but not today.
*	Make transcription sleeps interruptible	yum	2022-12-24
\| \| \| \| \|	This reduces the expected delay to wake up the board & start transcribing from 750 milliseconds to 2.5 milliseconds.
*	GUI: expose chars per sync, bytes per char	yum	2022-12-24
\| \| \| \| \| \| \| \| \| \| \| \|	Users can now control how many characters they send per sync event, as well as the number of bytes used to represent each character. This gives them the power to pick between faster paging and fewer sync params. International users must use 2 bytes per char (at least for now). * package.ps1: don't distribute the gigantic TTF files, just the bitmaps
*	Quick hack: don't exponentially back off when unpaused	yum	2022-12-22
\| \| \| \| \|	This fixed some slowness I was seeing when waking up the STT. The right fix is to add interruptible sleeps. Let's fix this soon.
*	Control tweak: introduce long/short hold behavior	yum	2022-12-20
\| \| \| \| \| \| \| \| \| \| \| \|	The typical use pattern is now possible without entering radial. Leaving mounted to the world for a long time is no longer possible. Maybe I need an override param? Left joystick controls: * Short press toggle 1: show board, lock to hand, start transcribing * Short press toggle 2: lock to world, stop transcribing * Long press: hide board, stop transcribing
*	Bugfix: animators may now include Unicode characters	yum	2022-12-20
\| \| \| \|	Completed first end-to-end test on a third party avatar :)
*	GUI: "Finish" avatar generation workflow	yum	2022-12-20
\| \| \| \| \| \| \| \| \| \| \|	GUI now generates parameters & menu. Still need to handle write defaults. * Add capability to append to avatar parameters & menu * Install canned Unity assets, shaders, and fonts in avatar folder * Check in materials for ease of use * Bugfix: correctly label menu/parameters file pickers
*	GUI can now generate animator	yum	2022-12-20
\| \| \| \|	Still need to generate params & merge menus. Getting close....
*	GUI: Begin work generating animator	yum	2022-12-20
\| \| \| \|	The GUI can now generate guid.map and animations.
*	Add ability to select model	yum	2022-12-18
\| \| \| \| \| \| \|	* icon now works when pinned to taskbar * add model selection * add script to dump mic devices * whisper models now download into the virtual environment
*	GUI: Add mic, language selection	yum	2022-12-18
\| \| \| \| \| \| \| \|	Users can now select their mic & spoken language in the GUI. * pyaudio now samples at the mic rate, fixing an issue where frames would drop. We downsample in the callback by dropping frames. * add Sounds folder to package
*	Finish python virtual env	yum	2022-12-17
	GUI can now download all TaSTT dependencies and install them into a virtual environment. * Add buttons to check embedded python version & install dependencies * Add class to wrap interacting with embedded Python * Put all TaSTT python scripts into a folder