TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Update README.mdv0.0	yum	2022-12-22
\|
*	Quick hack: don't exponentially back off when unpaused	yum	2022-12-22
\| \| \| \| \|	This fixed some slowness I was seeing when waking up the STT. The right fix is to add interruptible sleeps. Let's fix this soon.
*	Don't delete TaSTT_Generated	yum	2022-12-21
\| \| \| \| \|	This makes incremental workflows much more efficient, since you don't have to reassign the FX controller, params, and menu.
*	Add shader toggles	yum	2022-12-21
\| \| \| \| \| \|	* Fix shader background rendering * Add ability to control margin size * Add ability to disable speech indicator
*	GUI: Add better logging interface	yum	2022-12-21
\| \| \| \| \| \| \| \|	Create printf-like interface for writing to wxTextCtrl objects. Also mask out PII. I wanted a way to not dox myself when recording demos, but I wound up making a second user on my PC to serve the same purpose. Maybe I'll delete the code later idk.
*	Control tweak: introduce long/short hold behavior	yum	2022-12-20
\| \| \| \| \| \| \| \| \| \| \| \|	The typical use pattern is now possible without entering radial. Leaving mounted to the world for a long time is no longer possible. Maybe I need an override param? Left joystick controls: * Short press toggle 1: show board, lock to hand, start transcribing * Short press toggle 2: lock to world, stop transcribing * Long press: hide board, stop transcribing
*	Bugfix: animators may now include Unicode characters	yum	2022-12-20
\| \| \| \|	Completed first end-to-end test on a third party avatar :)
*	Check in `World Constraint.prefab`	yum	2022-12-20
\| \| \| \|	Can simply drag this into hierarchy & update reset target.
*	GUI: "Finish" avatar generation workflow	yum	2022-12-20
\| \| \| \| \| \| \| \| \| \| \|	GUI now generates parameters & menu. Still need to handle write defaults. * Add capability to append to avatar parameters & menu * Install canned Unity assets, shaders, and fonts in avatar folder * Check in materials for ease of use * Bugfix: correctly label menu/parameters file pickers
*	GUI can now generate animator	yum	2022-12-20
\| \| \| \|	Still need to generate params & merge menus. Getting close....
*	GUI: Begin work generating animator	yum	2022-12-20
\| \| \| \|	The GUI can now generate guid.map and animations.
*	GUI: Fix transcription output	yum	2022-12-19
\| \| \| \| \| \| \| \| \| \| \|	Output now shows up in the textbox in ~real time. We do this by disabling Python's output buffering. This has a performance impact, but it should be negligible. * Fix crash when setting up python environment * UI tweak: text displays now expand with window * Fix how we merge transcribe.py; usually don't have to resort to SIGKILL, which loses stdout/stderr.
*	GUI: Improve error logging	yum	2022-12-19
\| \| \| \| \|	PythonWrapper correctly captures wxProcess stdout & stderr in sync and async execution modes.
*	GUI: Sketch out Unity panel	yum	2022-12-19
\| \| \| \| \| \| \|	Now there are two panels: one to run transcription, one to generate avatar assets. Also, getting mics & python version can no longer crash the app.
*	Now it's possible to build the app from Powershell	yum	2022-12-18
\| \| \| \|	No more WSL dependencies!
*	Add resource file header	yum	2022-12-18
\|
*	Add ability to select model	yum	2022-12-18
\| \| \| \| \| \| \|	* icon now works when pinned to taskbar * add model selection * add script to dump mic devices * whisper models now download into the virtual environment
*	GUI: Add mic, language selection	yum	2022-12-18
\| \| \| \| \| \| \| \|	Users can now select their mic & spoken language in the GUI. * pyaudio now samples at the mic rate, fixing an issue where frames would drop. We downsample in the callback by dropping frames. * add Sounds folder to package
*	GUI: Add ability to start & stop transcription engine	yum	2022-12-17
\|
*	Finish python virtual env	yum	2022-12-17
\| \| \| \| \| \| \| \| \|	GUI can now download all TaSTT dependencies and install them into a virtual environment. * Add buttons to check embedded python version & install dependencies * Add class to wrap interacting with embedded Python * Put all TaSTT python scripts into a folder
*	Check in `future` package	yum	2022-12-17
\| \| \| \| \| \| \| \| \| \| \|	I hit some issues installing Whisper and had to embed this package. I haven't taken the time to deeply understand what's going on. I think that embedded Python follows different rules about resolving module paths than regular system Python. Basically, `future`'s setup.py has a line like `import src`, where `src` is a module inside future (like `future/src/__init__.py`). This doesn't work unless we put that directory on the search path.
*	Downgrade to Python 3.10.9	yum	2022-12-17
\| \| \| \|	Whisper needs Python < 3.11.
*	Document embedded venv hack	yum	2022-12-16
\| \| \| \| \| \| \|	Check in pip & modify embedded python to install to Lib and Lib/site-packages. Experimentally, packages may be installed with pip and do reside in Lib/site-packages. Hard to tell if this is also touching files outside the venv.
*	Check in python 3.11	yum	2022-12-16
\| \| \| \|	License is included in source & distributable package.
*	Refactor app	yum	2022-12-16
\| \| \| \|	Create headers & implementation files for App and Frame.
*	Add logo	yum	2022-12-16
\| \| \| \| \| \| \|	* GUI now shows logo * Add package.ps1 to generate distributable application bundle * Rename ~GUI to GUI * Add ScopeGuard class
*	Add GUI hello world	yum	2022-12-15
\| \| \| \| \| \| \| \|	Literally just the wxWidgets hello world. ~GUI is named that way to prevent Unity from generating .meta files. Build instructions in ~GUI/README.md.
*	Optimize transcription latency	yum	2022-12-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Shave off ~500ms due to locking. Acquiring a threading.Lock takes hundreds of milliseconds and the global interpreter lock already takes care of most crashy race conditions, so just remove the locks. Avoid writing audio to disk, saving more time (and disk wear / IOPS). Add basic profiling to transcribe(). Omit timestamps, since we don't use them (maybe we should!) Shorten noise indicators to 350ms The whisper behavior where it repeats tokens causes certain transcriptions to take many seconds. I haven't thought about how to fix this, yet.
*	Update README.md	yum	2022-12-01
\| \| \| \|	Also decrease sync params & add a few more emotes.
*	Add emotes	yum	2022-11-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add emotes.py. It accepts a list of images and creates a texture with 64 total embedded images. The shader knows how to draw these into fixed 6-character-wide slots. Each slot must be aligned to a 6-character boundary. osc_ctrl has to pad with spaces to make this work. This whole patch is a little more complicated than it has any right to be, but my brain feels fuzzy and I don't know where to start fixing it, so I'm going to leave it shitty-but-functional for now. There's also some bug where writing a character into the 11th slot causes it to show up at the end of the board. I'll figure that out later, idk. I didn't include any of the emotes I use since I couldn't find any info on their licenses. I'm just banking on having a good workflow later on so people can add their own.
*	Add on/off sound indicator (local)	yum	2022-11-25
\| \| \| \| \| \|	Now we have a visual and auditory indicator for transcription. The auditory indicator is only heard by the user, and can be used to reset the state of the board prior to displaying.
*	Add scaling capability	yum	2022-11-25
\| \| \| \|	Text box may be scaled up and down now.
*	Code cleanup	yum	2022-11-25
\| \| \| \|	Reorganize locations, remove a couple unused parameters.
*	Tweak speech indicator	yum	2022-11-23
\| \| \| \| \| \| \| \| \|	Use a single indicator with 3 states: 1. green: actively speaking 2. orange: waiting for paging 3. red: up-to-date Use slightly nicer colors.
*	Shorten audio window to 10 seconds	yum	2022-11-22
\| \| \| \| \|	This helps with temporal stability in long-running transcriptions, and lets us get rid of that hack where we refuse to update old pages.
*	Update STT demo	yum	2022-11-22
\| \| \| \| \|	Zero mistranscribed words. One minor hiccup caused by instability in the (very long) transcription. I think the paging indicator is also buggy.
*	Fix audio bug	yum	2022-11-22
\| \| \| \| \| \| \|	Coarse locking was causing audio frames to drop, severely degrading transcription quality. We really need a spoken word integration test.
*	Rework input controls	yum	2022-11-22
\| \| \| \| \| \| \| \|	Press joystick once to start recording, again to stop. When you start recording, any previous text on the board is cleared. Add 2 visual indicators: one to indicate speech, another to indicate that audio is paging.
*	Begin work on obfuscation	yum	2022-11-17
\| \| \| \| \| \| \| \| \|	The basic idea is that we can raise the barrier to entry for potential data miners by encrypting traffic with a pre-shared key. Any data miner would probably have access to both the compiled shader and network data, which is obviously sufficient to decrypt that data. But they would have to spend a little time figuring it out, which should defeat most casual miners.
*	Tweak transcription again	yum	2022-11-16
\| \| \| \| \| \| \|	Works a little better on longer transcriptions while maintaining the same improved performance on short transcriptions. We really need a benchmark to evaluate performance mechanically.
*	Another transcription rework	yum	2022-11-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After re-reading the paper, I noticed that they apply a couple optimizations I wasn't using. Use the top-level `whisper.transcribe` method, which is a little slower, but more accurate than the one I was using. Although this method is slower, it has better temporal stability due to the increased quality, which I think should make for an overall more responsive UX. Lower transcription quality means the paging layer has to waste time updating earlier cells. Also, drop the auto-commit stuff and go back to string stitching. I think it's better to let the user manually commit. A rework of the hand controls is probably coming soon. Finally, update README.
*	Fix reset button	yum	2022-11-12
\| \| \| \| \| \| \| \|	Board would lock up if you reset after the first page. osc_ctrl.clear() was assigning the wrong member :) Tweak continuous transcription logic: now we only commit if the transcription remains identical for N seconds.
*	Clicking the left joystick resets the board.	yum	2022-11-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Increase no speech probability threshold. This is what was preventing short transcriptions from working. We rely more on the avg logprob filter now. * Remove string matching logic from transcribe. Now when we get 2 consecutive identical transcriptions, we commit the transcription. This could cause words to get cut off but in practice it doesn't seem to happen. * Fix steamvr joystick click detection. Moving the joystick would also fire the event, which is not correct. * Combine locks in transcribe.py. * Remove "clear" vocal control. * osc_ctrl.clear() resets last_message_encoded * Remove osc_ctrl.sendMessage (unused)
*	Add capability to listen for controller inputs	yum	2022-11-12
\| \| \| \| \| \|	Add steamvr.py, which listens for the left-hand joystick being clicked. Simply call pollButtonPress() and check if it returns RISING_EDGE or FALLING_EDGE. Does not block if there are no events.
*	License scrub	yum	2022-11-10
\| \| \| \|	Begin auditing dependencies' licenses.
*	Update fonts	yum	2022-11-08
\| \| \| \| \| \| \| \| \| \| \|	English, Japanese, Chinese, and Korean should look much better now. French, German, and Spanish look like shit now, because I haven't figured out how to best make Noto Sans stay within its bounding box. * Use Noto Sans for most things * Simplify how we enable unicode blocks & assign fonts to them * Increase string matching window to 300. Works better in real-world test.
*	Fix matchStrings O(n^2) loop	yum	2022-11-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	This slides 2 windows across input strings, looking for a region where they are most similar. It then uses that region to stitch the strings together. Since transcribe.py passes in a continuous transcription as the `old_text` argument, we can wind up spending a lot of time here. Constrain the area of the `old_text` argument that we look at to the most recent 50 characters. This should be good enough. Also fix how we calculate levenshtein_distance. Uh... yeah, let's not talk about how it was before.
*	Fix font clipping bug	yum	2022-11-07
\| \| \| \| \| \|	When fonts completely fill a slot, any pixel touching a perimeter border gets stretched due to clamping. To avoid this, add a 2% margin around each slot.
*	Add generate.py	yum	2022-11-07
\| \| \| \| \|	Generates a string with every character starting from a minimum. Useful for testing paging and font issues.
*	Fix osc_ctrl diffing	yum	2022-11-07
\| \| \| \| \| \|	Now we actually maintain an ongoing buffer with what we think is on the display. When we send a new cell, we update only that cell instead of overwriting the entire prefix up to that cell.