TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Improve font alignment	yum	2022-11-06
\|
*	Add language flag to transcription CLI	yum	2022-11-06
\|
*	String matching no longer relies on spaces	yum	2022-11-06
\| \| \| \| \| \| \| \| \| \| \|	Add a `matchStrings` which does basically the same thing as `matchStringList` except it doesn't split the input at space boundaries. I think this should work better for Japanese and Chinese, since they don't use spaces. Doesn't seem to cause any accuracy regressions for English. Also update the README.
*	Fix clear animation	yum	2022-11-05
\| \| \| \|	Also adjust letter positioning to avoid clipping.
*	Expand character set from 80 to 64K characters	yum	2022-11-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	Each character is now addressed with 2 bytes instead of 1. The number of bytes per character is configured in (I think) exactly one spot, so increasing or decreasing this is trivial. English speakers can just set it to 1. The animator seems a little unstable; if I leave my character in a public for a while, the board becomes unresponsive. Oh well. * Check in fonts. Did this so users don't have to remember to set the resolution or to disable mipmaps.
*	Update shader to use new font files	yum	2022-11-05
\| \| \| \|	So far only the first file is used.
*	Add generate_fonts.py	yum	2022-11-05
\| \| \| \| \| \| \| \|	Add code to generate 4k textures holding a bunch of unicode characters. Add unicode blocks for English, Japanese, Chinese, and Korean. Embed GNU's excellent Unifont ttf, which I use to generate these textures. The license is included under $filename.LICENSE.
*	OSC controller uses one sync per region instead of 2	yum	2022-11-05
\| \| \| \| \| \| \| \| \| \|	My theory as to why this seems to work: VRChat batches parameter updates together and applies them simultaneously. Thus we don't run into the expected failure mode where we update the prior region before paging over to the next region. * Fix beep feature * update the STT demo
*	Reduce dimensionality of animator by factor of 80	yum	2022-11-05
\| \| \| \| \| \| \| \| \| \| \|	Instead of generating one animation for every single character in our character set, we just generate 2: the lowest and the highest. We use blend trees to interpolate between these two extremes. This reduces the number of animations we have to generate by a factor of 80. It also clears the way for multi-language support (coming soon). It also means we don't have to reopen unity every time we generate a new animator.
*	Add speech-to-text demo	yum	2022-11-04
\|
*	Reduce sync rate from 10 Hz to 5 Hz	yum	2022-11-03
\| \| \| \| \| \| \| \|	Also reduce the number of syncs per cell from 3 to 2. Thus the effective sync rate went from (10 / 3 == 3.33 Hz) to (5 / 2 == 2.5 Hz). This comes at the cost of a degraded UX: updating a cell temporarily shows the contents of the previous cell.
*	Improve transcription quality	yum	2022-11-01
\| \| \| \| \| \| \| \| \| \|	Apply heuristics described in whisper paper. Dramatically improve silence detection as well as overall transcription quality. I was able to read the entire demo script at speed without any serious transcription inaccuracies. Field testing is TODO.
*	Combine 4 boolean select parameters into one	yum	2022-11-01
\| \| \| \| \|	Should further improve reliability, especially in laggy environments. We'll see!
*	Fix bug where some text would show up after saying 'Clear'	yum	2022-11-01
\|
*	Update README	yum	2022-10-30
\|
*	Reduce total # of select bits from 44 to 4	yum	2022-10-30
\| \| \| \| \| \| \| \| \|	The board is divided into 16 regions. We select the region to be updated by updating 4 boolean parameters. We used to define 4 parameters per layer. Now we just have 4 params total, which affect every layer. Total param memory: 142 bits -> 102 bits Params updated per region update: 56 -> 16
*	Disable mipmaps on board texture	yum	2022-10-27
\| \| \| \| \|	This fixes the faint outline issue at close range (!) at the cost of making it less legible from far away.
*	Flip text in mirror	yum	2022-10-27
\| \| \| \| \| \| \| \| \|	Use some of pema99's tricks described in their 'shader-knowledge' repo (MIT license). * Text is now readable in mirrors * GetLetterParameter() now uses a jump table instead of a ton of `if` statements
*	Change board size	yum	2022-10-27
\| \| \| \| \| \| \|	It's now twice as wide and half as tall. * Add small margin to board * Add simple backplate shader
*	Tweak continuous transcription	yum	2022-10-27
\| \| \| \| \|	Stitching new uses 6 word sliding window instead of 4 word. Seems to dramatically improve transcription quality.
*	Add 'over' keyword	yum	2022-10-27
\| \| \| \| \| \| \|	When the user says 'over', the board will stop displaying new transcriptions until the user says 'clear'. * Remove the control thread from transcribe.py
*	Add fast clear animation	yum	2022-10-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The old clear mechanism would write an empty cell in every layer, which would take (0.3 seconds) * (11 layers) == about 3 seconds. The new mechanism drives an animation which overwrites every character slot simultaneously, taking only 0.1 seconds. A nice ~30x speedup. * Fix the transcription exponential backoff logic. Saying new things will reset the delay to the minimum again. * Clearing the board will also reset the transcription delay back to the minimum. * Tune the noise detection minimum to 0.2 instead of 0.1. Speaking softly into the mic seems to fail to exceed the 0.1 threshold pretty often.
*	De-scuff continuous transcription	yum	2022-10-25
\| \| \| \| \| \| \| \| \| \|	Transcription stitching now occurs in word space, rather than in text space. This avoids problems where we accidentally duplicate or delete letters in the middle of words. Factor out stitching into its own module and add a small handful of test cases. Hopefully if we hit problems in production, we can just grow this list and avoid regressions if we reimplement.
*	Tweak transcription heuristics	yum	2022-10-25
\| \| \| \| \|	The heuristics now occur in the filtered word space, so punctuation and casing changes won't confound them.
*	Add exponentially longer sleeps to transcribe loop	yum	2022-10-25
\| \| \| \| \| \| \| \| \| \| \|	When the user pauses their speech for an extended period of time, the transcription engine will sleep for progressively longer intervals, up to 1.5 seconds between transcriptions. This allows us to reduce idle resource consumption. To enable responsive transcription while the user is speaking actively, we reset the sleep duration to the minimum whenever a change is detected.
*	Add TaSTT menu	yum	2022-10-25
\| \| \| \|	Use this as a submenu
*	Add toggle to disable beeping	yum	2022-10-25
\|
*	Saying the word "clear" clears the board	yum	2022-10-24
\| \| \| \| \| \| \|	While the board is clearing, you can keep talking, and it will be rendered when the board finishes clearing. * bugfix: STT only beeps when it's out
*	STT now beeps when it shows text, and can be locked to world	yum	2022-10-24
\| \| \| \| \| \| \| \| \| \| \|	Empty cells are excluded from the beeping behavior. Note: I have not checked in the prefab with the audio source yet. * libtastt gen_fx now adds 3 toggles to FX layer: toggle board, toggle world lock, toggle beep sound * libunity guid_map can now append instead of replacing * TaSTT_Toggle_{On,Off}.anim now use the prefab path, as do generated animations
*	Add speech noise on/off animations	yum	2022-10-24
\|
*	redo speech noise	yum	2022-10-24
\|
*	add speech noise	yum	2022-10-24
\|
*	Rewrite FX and animation generators	yum	2022-10-23
\| \| \| \| \| \| \| \| \| \|	* Fix bug where facial animations cause already-written letters to change (!!!) * Add libtastt.py to hold abstractions layered over libunity * Fix * libunity: Fix bug where integer equality state transition conditions ignored the threshold * libunity: Support placing animator states at different positions
*	Fix fixWriteDefaults duplicate ID error	yum	2022-10-23
\| \| \| \| \|	fixWriteDefaults would assign two documents the same anchor. Unsure why but this fixes it.
*	Update example animator	yum	2022-10-22
\| \| \| \|	Add LH/RH weighted animations.
*	Fix UnityAnimator.merge	yum	2022-10-22
\| \| \| \| \| \| \|	A few changes: * we never infer class ID from object ID * merged object IDs are allocated in a flat namespace, not in a per-class namespaces
*	Reimplement "do nothing" animation	yum	2022-10-20
\| \| \| \|	Enable host armature instead of relying on a TaSTT parameter.
*	Add example FX layer with write defaults off	yum	2022-10-20
\| \| \| \| \|	Simple hands animator that doesn't rely on write defaults. Use libunity gen_off_anims utility to generate per-animation reset animations.
*	Add "off" animation generator	yum	2022-10-20
\| \| \| \| \| \| \| \|	Add utility to programmatically generate "off" animations. Scans every animation in the project, checks if it sets anything nonzero, and if so, generates a copy of it which sets everything to zero. This is useful for transitioning away from write defaults.
*	Add preliminary support for negative anchors	yum	2022-10-20
\| \| \| \| \| \| \| \|	Some animators generate negative anchors. Casting to u64 doesn't produce an anchor with a valid prefix, so idk what it is. Use the class ID from the little !u! bit instead of deriving it from the anchor. Some things probably don't work yet.
*	Quiet down transcribe.py	yum	2022-10-20
\| \| \| \| \| \|	Also adjust continuous transcription algorithm to use leftmost minimum instead of rightmost. This prevents some cases where we generate longer and longer text.
*	Add continuous transcription mode	yum	2022-10-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Algorithm: * look at last 20 chars of last committed transcription * scan new transcription using 10-char sliding window * find spot where distance is minimized * stitch two messages together Thus we're able to maintain a continuously growing transcription without having to feed the AI more than 30 seconds of data at a time. Seems to work reasonably well in bench tests. Also fix silence detection. AI exposes a probability that nothing was said. Hand-pick a probability of 0.1. Sometimes the AI still goes sicko mode with this setting but going higher occasionally results in no transcription.
*	Update backlog	yum	2022-10-16
\|
*	Add dev cheatsheet	yum	2022-10-16
\| \| \| \|	Scratch doc containing commands I've been using a lot
*	Programmatically set noop animation	yum	2022-10-16
\| \| \| \| \|	Overwrite any animation containing an unknown GUID to the tastt noop animation. This seems to help the reset layer function properly.
*	Semi-fix gesture reset layer	yum	2022-10-16
\| \| \| \| \| \| \| \| \| \| \| \|	Now we only overwrite gesture parameters if there's no active gesture. This makes gesturing smoother, since we're not overwriting gesture params twice on every frame. Gestures don't reliably reset. I think I need to add the noop animation across the entire animator. No idea what's really causing it. Also factor out code for generating transitions that have parameter conditions. Support exists for boolean and integer equality conditions.
*	Fix a couple unity/YAML bugs	yum	2022-10-16
\| \| \| \| \| \| \| \| \|	* Unity needs empty Mappings to be indicated with {} or it will assume they're a Sequence * Unity doesn't like it when we reassign the default animation layer's MonoBehaviour ID, so hack around this by simply reusing the existing MonoBehaviour's ID * Use MulticoreUnityParser everywhere
*	Add multicore YAML parser	yum	2022-10-16
\| \| \| \| \| \| \| \|	Divide YAML stream into `nproc` chunks and parse each sub-stream in a process. We can't use threads because of the python global interpreter lock, but processes work pretty well. Parsing my 1.2M line / 43k document YAML goes from 65 seconds to 13.
*	Add libunity.addTransition	yum	2022-10-15
\| \| \| \| \|	* Implement basic board toggle using new transition logic * Metadata can now restore from file
*	Transcribe.py now pages	yum	2022-10-15
\| \| \| \| \| \| \| \| \|	Messages longer than a board will automatically write over the top. TODO * Real cell-based message diffing * Cumulative transcription * this would completely mitigate the effects of trim events