<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git/GUI, branch v0.13.2</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.13.2</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.13.2'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2023-07-25T00:52:19+00:00</updated>
<entry>
<title>Unity assets can be generated at a configurable path</title>
<updated>2023-07-25T00:52:19+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-25T00:52:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=2074749d4263991d835b19257bf1510dcaf55211'/>
<id>urn:sha1:2074749d4263991d835b19257bf1510dcaf55211</id>
<content type='text'>
Useful for projects with multiple avatars with different animators.
</content>
</entry>
<entry>
<title>Bugfix: unity panel now shows saved paths</title>
<updated>2023-07-25T00:27:21+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-25T00:27:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=e64c71199c39fb1637bb2fd410cf27b9f4575b12'/>
<id>urn:sha1:e64c71199c39fb1637bb2fd410cf27b9f4575b12</id>
<content type='text'>
The paths you enter in the Unity panel (animator, menu, params, and
assets folder) are saved in the app config, but were not populated
correctly on app restart or pane redraw. Now they are.
</content>
</entry>
<entry>
<title>Begin work on proxy server</title>
<updated>2023-07-04T02:36:13+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-04T01:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=76ae7c28ea6224b2c919122d5dc71bcc00a0ecaa'/>
<id>urn:sha1:76ae7c28ea6224b2c919122d5dc71bcc00a0ecaa</id>
<content type='text'>
Create a simple server with 3 endpoints:
* /create_session: Create a session and return its identifier.
* /set_transcript: Update a session's transcript.
* /get_transcript: Fetch a session's transcript.

Right now the session ID provides authentication *and* authorization.
There is no public/private ID so you have to trust whoever you share
your ID with.

IDs are long and generated by the server, so it should be somewhat
secure against low-effort hacking.

Other updates:
* Drop whisper_requirements.txt - no longer needed.
* Vendor curl to make it easier to interact with the server.

TODO:
* Fuzz test the server.
</content>
</entry>
<entry>
<title>Add visual commit indicator to OBS browser source</title>
<updated>2023-07-01T02:46:17+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-01T02:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=4f3131b4a36d8e1557edb31d3754a431717dab7b'/>
<id>urn:sha1:4f3131b4a36d8e1557edb31d3754a431717dab7b</id>
<content type='text'>
Circle goes red when speaking, grey when done. Ideally it would be in
the top right portion of the browser source, but this is a good start.

Also, hard-cap transcripts to 4096 chars. This prevents the STT from
lagging during long sessions.
</content>
</entry>
<entry>
<title>Bugfix: commit no longer wipes out audio buffer</title>
<updated>2023-06-29T05:18:18+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T05:11:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=b1efbf5ce1ebd584796d4a57cf9c7b6517f91fac'/>
<id>urn:sha1:b1efbf5ce1ebd584796d4a57cf9c7b6517f91fac</id>
<content type='text'>
Audio data is stored in chunks of frames, not in individual frames.
When I commit a transcript, I want to get rid of the portion of the
audio data responsible for that particular transcript. I have code that
does this, but it was dropping a slice of the list assuming that each
sample is stored individually.

Extra fun: Because we have to decimate mic frames, we have to convert
between whisper frames and mic frames to drop the correct amount of
audio data.
</content>
</entry>
<entry>
<title>Add profanity filter</title>
<updated>2023-06-29T04:24:56+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T04:24:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=bdaeb1911297d7901a12e3ac51b38c3463789279'/>
<id>urn:sha1:bdaeb1911297d7901a12e3ac51b38c3463789279</id>
<content type='text'>
Add toggle to UI to enable a profanity filter. It replaces vowels in bad
words with asterisks.

Bugfix: filters now apply to OBS
</content>
</entry>
<entry>
<title>Add toggle for debug mode</title>
<updated>2023-06-29T03:35:10+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T03:35:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=ff7eb3c212195af71cd0ce4a3cd0c9a081d6ebda'/>
<id>urn:sha1:ff7eb3c212195af71cd0ce4a3cd0c9a081d6ebda</id>
<content type='text'>
Most transcription output is now gone by default. Users can enable a
more verbose output by toggling `Enable debug mode`.

Bugfix: Toggling off transcription would reset audio state, frequently
resulting in the loss of the last few words spoken.
</content>
</entry>
<entry>
<title>Add UI for fuzzy commit threshold</title>
<updated>2023-06-27T23:01:16+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T23:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=6638993e313773ba6ca8bdb6d7690b798d41f0d4'/>
<id>urn:sha1:6638993e313773ba6ca8bdb6d7690b798d41f0d4</id>
<content type='text'>
Recap: In the STT there's an algorithm that tries to determine when a
transcript is "stable" enough to commit. If that is too loose, then
accuracy suffers; if too strict, then the audio buffer eventually fills.

To mitigate the problem, I check whether the last N transcripts are
within some edit distance (Levenshtein edit distance) of each other. The
fuzzy matching lets us forgive small instabilities, like differences in
uppercase/lowercase or punctuation, while rejecting large instabilities.

The default value of 8 seems to be in the sweet spot of accuracy &amp;
performance, but it will likely be tuned in the future.
</content>
</entry>
<entry>
<title>Adjust commit logic to use fuzzy string match threshold</title>
<updated>2023-06-27T22:38:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T22:35:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=241813a5af11093c6b86e70ada729788c1f0dee6'/>
<id>urn:sha1:241813a5af11093c6b86e70ada729788c1f0dee6</id>
<content type='text'>
... instead of simple equality.

TODO: add UI for threshold.

Bugfix: Frame::onAppStop() joins the OBS app thread.
</content>
</entry>
<entry>
<title>Add ability to preserve transcript while using push to talk</title>
<updated>2023-06-27T22:16:06+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T22:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=cf75998dab6db1b1d21ca06bde18a56b5e896937'/>
<id>urn:sha1:cf75998dab6db1b1d21ca06bde18a56b5e896937</id>
<content type='text'>
This is useful when streaming. Occasionally the STT can get into
a bad state, and manually segmenting clears it up. However doing so
would clear your accumulated transcript, which isn't always desired. Add
ability to preserve the transcript.

A small wrinkle: the new commit logic requires N consecutive identical
windows before committing. To make this feature play nicely with it, I
had to forcibly commit any preview text that hasn't yet been committed.
Failing to do this would usually cause short utterances / the most
recently said stuff to get wiped out.
</content>
</entry>
</feed>
