TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Add speech-to-text demo	yum	2022-11-04
\|
*	Reduce sync rate from 10 Hz to 5 Hz	yum	2022-11-03
\| \| \| \| \| \| \| \|	Also reduce the number of syncs per cell from 3 to 2. Thus the effective sync rate went from (10 / 3 == 3.33 Hz) to (5 / 2 == 2.5 Hz). This comes at the cost of a degraded UX: updating a cell temporarily shows the contents of the previous cell.
*	Improve transcription quality	yum	2022-11-01
\| \| \| \| \| \| \| \| \| \|	Apply heuristics described in whisper paper. Dramatically improve silence detection as well as overall transcription quality. I was able to read the entire demo script at speed without any serious transcription inaccuracies. Field testing is TODO.
*	Combine 4 boolean select parameters into one	yum	2022-11-01
\| \| \| \| \|	Should further improve reliability, especially in laggy environments. We'll see!
*	Fix bug where some text would show up after saying 'Clear'	yum	2022-11-01
\|
*	Update README	yum	2022-10-30
\|
*	Reduce total # of select bits from 44 to 4	yum	2022-10-30
\| \| \| \| \| \| \| \| \|	The board is divided into 16 regions. We select the region to be updated by updating 4 boolean parameters. We used to define 4 parameters per layer. Now we just have 4 params total, which affect every layer. Total param memory: 142 bits -> 102 bits Params updated per region update: 56 -> 16
*	Disable mipmaps on board texture	yum	2022-10-27
\| \| \| \| \|	This fixes the faint outline issue at close range (!) at the cost of making it less legible from far away.
*	Flip text in mirror	yum	2022-10-27
\| \| \| \| \| \| \| \| \|	Use some of pema99's tricks described in their 'shader-knowledge' repo (MIT license). * Text is now readable in mirrors * GetLetterParameter() now uses a jump table instead of a ton of `if` statements
*	Change board size	yum	2022-10-27
\| \| \| \| \| \| \|	It's now twice as wide and half as tall. * Add small margin to board * Add simple backplate shader
*	Tweak continuous transcription	yum	2022-10-27
\| \| \| \| \|	Stitching new uses 6 word sliding window instead of 4 word. Seems to dramatically improve transcription quality.
*	Add 'over' keyword	yum	2022-10-27
\| \| \| \| \| \| \|	When the user says 'over', the board will stop displaying new transcriptions until the user says 'clear'. * Remove the control thread from transcribe.py
*	Add fast clear animation	yum	2022-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old clear mechanism would write an empty cell in every layer, which would take (0.3 seconds) * (11 layers) == about 3 seconds. The new mechanism drives an animation which overwrites every character slot simultaneously, taking only 0.1 seconds. A nice ~30x speedup. * Fix the transcription exponential backoff logic. Saying new things will reset the delay to the minimum again. * Clearing the board will also reset the transcription delay back to the minimum. * Tune the noise detection minimum to 0.2 instead of 0.1. Speaking softly into the mic seems to fail to exceed the 0.1 threshold pretty often.
*	De-scuff continuous transcription	yum	2022-10-25
\| \| \| \| \| \| \| \| \| \|	Transcription stitching now occurs in word space, rather than in text space. This avoids problems where we accidentally duplicate or delete letters in the middle of words. Factor out stitching into its own module and add a small handful of test cases. Hopefully if we hit problems in production, we can just grow this list and avoid regressions if we reimplement.
*	Tweak transcription heuristics	yum	2022-10-25
\| \| \| \| \|	The heuristics now occur in the filtered word space, so punctuation and casing changes won't confound them.
*	Add exponentially longer sleeps to transcribe loop	yum	2022-10-25
\| \| \| \| \| \| \| \| \| \| \|	When the user pauses their speech for an extended period of time, the transcription engine will sleep for progressively longer intervals, up to 1.5 seconds between transcriptions. This allows us to reduce idle resource consumption. To enable responsive transcription while the user is speaking actively, we reset the sleep duration to the minimum whenever a change is detected.
*	Add TaSTT menu	yum	2022-10-25
\| \| \| \|	Use this as a submenu
*	Add toggle to disable beeping	yum	2022-10-25
\|
*	Saying the word "clear" clears the board	yum	2022-10-24
\| \| \| \| \| \| \|	While the board is clearing, you can keep talking, and it will be rendered when the board finishes clearing. * bugfix: STT only beeps when it's out
*	STT now beeps when it shows text, and can be locked to world	yum	2022-10-24
\| \| \| \| \| \| \| \| \| \| \|	Empty cells are excluded from the beeping behavior. Note: I have not checked in the prefab with the audio source yet. * libtastt gen_fx now adds 3 toggles to FX layer: toggle board, toggle world lock, toggle beep sound * libunity guid_map can now append instead of replacing * TaSTT_Toggle_{On,Off}.anim now use the prefab path, as do generated animations
*	Add speech noise on/off animations	yum	2022-10-24
\|
*	redo speech noise	yum	2022-10-24
\|
*	add speech noise	yum	2022-10-24
\|
*	Rewrite FX and animation generators	yum	2022-10-23
\| \| \| \| \| \| \| \| \| \|	* Fix bug where facial animations cause already-written letters to change (!!!) * Add libtastt.py to hold abstractions layered over libunity * Fix * libunity: Fix bug where integer equality state transition conditions ignored the threshold * libunity: Support placing animator states at different positions
*	Fix fixWriteDefaults duplicate ID error	yum	2022-10-23
\| \| \| \| \|	fixWriteDefaults would assign two documents the same anchor. Unsure why but this fixes it.
*	Update example animator	yum	2022-10-22
\| \| \| \|	Add LH/RH weighted animations.
*	Fix UnityAnimator.merge	yum	2022-10-22
\| \| \| \| \| \| \|	A few changes: * we never infer class ID from object ID * merged object IDs are allocated in a flat namespace, not in a per-class namespaces
*	Reimplement "do nothing" animation	yum	2022-10-20
\| \| \| \|	Enable host armature instead of relying on a TaSTT parameter.
*	Add example FX layer with write defaults off	yum	2022-10-20
\| \| \| \| \|	Simple hands animator that doesn't rely on write defaults. Use libunity gen_off_anims utility to generate per-animation reset animations.
*	Add "off" animation generator	yum	2022-10-20
\| \| \| \| \| \| \| \|	Add utility to programmatically generate "off" animations. Scans every animation in the project, checks if it sets anything nonzero, and if so, generates a copy of it which sets everything to zero. This is useful for transitioning away from write defaults.
*	Add preliminary support for negative anchors	yum	2022-10-20
\| \| \| \| \| \| \| \|	Some animators generate negative anchors. Casting to u64 doesn't produce an anchor with a valid prefix, so idk what it is. Use the class ID from the little !u! bit instead of deriving it from the anchor. Some things probably don't work yet.
*	Quiet down transcribe.py	yum	2022-10-20
\| \| \| \| \| \|	Also adjust continuous transcription algorithm to use leftmost minimum instead of rightmost. This prevents some cases where we generate longer and longer text.
*	Add continuous transcription mode	yum	2022-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Algorithm: * look at last 20 chars of last committed transcription * scan new transcription using 10-char sliding window * find spot where distance is minimized * stitch two messages together Thus we're able to maintain a continuously growing transcription without having to feed the AI more than 30 seconds of data at a time. Seems to work reasonably well in bench tests. Also fix silence detection. AI exposes a probability that nothing was said. Hand-pick a probability of 0.1. Sometimes the AI still goes sicko mode with this setting but going higher occasionally results in no transcription.
*	Update backlog	yum	2022-10-16
\|
*	Add dev cheatsheet	yum	2022-10-16
\| \| \| \|	Scratch doc containing commands I've been using a lot
*	Programmatically set noop animation	yum	2022-10-16
\| \| \| \| \|	Overwrite any animation containing an unknown GUID to the tastt noop animation. This seems to help the reset layer function properly.
*	Semi-fix gesture reset layer	yum	2022-10-16
\| \| \| \| \| \| \| \| \| \| \| \|	Now we only overwrite gesture parameters if there's no active gesture. This makes gesturing smoother, since we're not overwriting gesture params twice on every frame. Gestures don't reliably reset. I think I need to add the noop animation across the entire animator. No idea what's really causing it. Also factor out code for generating transitions that have parameter conditions. Support exists for boolean and integer equality conditions.
*	Fix a couple unity/YAML bugs	yum	2022-10-16
\| \| \| \| \| \| \| \| \|	* Unity needs empty Mappings to be indicated with {} or it will assume they're a Sequence * Unity doesn't like it when we reassign the default animation layer's MonoBehaviour ID, so hack around this by simply reusing the existing MonoBehaviour's ID * Use MulticoreUnityParser everywhere
*	Add multicore YAML parser	yum	2022-10-16
\| \| \| \| \| \| \| \|	Divide YAML stream into `nproc` chunks and parse each sub-stream in a process. We can't use threads because of the python global interpreter lock, but processes work pretty well. Parsing my 1.2M line / 43k document YAML goes from 65 seconds to 13.
*	Add libunity.addTransition	yum	2022-10-15
\| \| \| \| \|	* Implement basic board toggle using new transition logic * Metadata can now restore from file
*	Transcribe.py now pages	yum	2022-10-15
\| \| \| \| \| \| \| \| \|	Messages longer than a board will automatically write over the top. TODO * Real cell-based message diffing * Cumulative transcription * this would completely mitigate the effects of trim events
*	Further improve transcribe.py responsiveness	yum	2022-10-15
\| \| \| \| \| \| \|	Add a third heuristic. If the transcription is relatively long and the first bit differs from the previous transcription, immediately overwrite. Because the transcription is long, it's a bit less likely to be a complete mistranscription.
*	Tweak transcribe.py	yum	2022-10-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slightly improve temporal stability and responsiveness at the cost of limiting to a 30 second recording. Before committing to a transcription, wait for two consecutive transcriptions such that they are identical, or the former is a prefix of the latter. This helps with temporal stability by eliminating most one-off wildly inaccurate transcriptions. Also make osc_ctrl.sendMessageLazy a little lazier, limiting it to 2 consecutive non-empty cells per call. This allows us to recover from mistranscriptions faster.
*	Fix animations: renamed prefab from CustomSTT to TaSTT	yum	2022-10-15
\| \| \| \| \| \| \| \| \|	Also: * Check in toggle on/off animations * Add toggle parameter * libunity bug: getUniqueId() was calling allocateId() incorrectly * Remove osc_ctrl `client` global * Fix transcribe.py text encoding
*	libunity: can now add layers, params, and animations	yum	2022-10-15
\| \| \| \|	Write defaults fix is now complete
*	Begin fixWriteDefaults logic	yum	2022-10-15
\| \| \| \| \| \| \| \| \|	* Generate an animation which zeroes out everything which uses write defaults * Disable write defaults on every animation for which we do this * Add copy() method to Mapping and Sequence * Because of the `parent` pointer, copy.deepcopy() doesn't really work on this data structure.
*	Add libunity CLI	yum	2022-10-15
\| \| \| \| \| \|	* Add guid scanning method * Generate mapping from guid to filename * Mapping may be saved & restored from disk
*	reimplement animator merging in yaml parser	yum	2022-10-14
\| \| \| \| \| \| \|	Object IDs are allocated optimally now, but it's a bit slower due to long parse times. Also fix minor bug in generate_fx.py.
*	Add simple yaml parser (WIP)	yum	2022-10-13
\| \| \| \| \| \| \| \|	Add parser for Unity's malformed YAML. This should make it easier to manipulate animators. It probably doesn't quite work yet, and certainly needs some usability features.
*	Add ability to merge FX controllers	yum	2022-10-12
\| \| \| \| \| \|	TODO * write default detection/correction * real cmdline interface