<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git/GUI, branch v0.13.0</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.13.0</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.13.0'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2023-06-29T05:18:18+00:00</updated>
<entry>
<title>Bugfix: commit no longer wipes out audio buffer</title>
<updated>2023-06-29T05:18:18+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T05:11:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=b1efbf5ce1ebd584796d4a57cf9c7b6517f91fac'/>
<id>urn:sha1:b1efbf5ce1ebd584796d4a57cf9c7b6517f91fac</id>
<content type='text'>
Audio data is stored in chunks of frames, not in individual frames.
When I commit a transcript, I want to get rid of the portion of the
audio data responsible for that particular transcript. I have code that
does this, but it was dropping a slice of the list assuming that each
sample is stored individually.

Extra fun: Because we have to decimate mic frames, we have to convert
between whisper frames and mic frames to drop the correct amount of
audio data.
</content>
</entry>
<entry>
<title>Add profanity filter</title>
<updated>2023-06-29T04:24:56+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T04:24:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=bdaeb1911297d7901a12e3ac51b38c3463789279'/>
<id>urn:sha1:bdaeb1911297d7901a12e3ac51b38c3463789279</id>
<content type='text'>
Add toggle to UI to enable a profanity filter. It replaces vowels in bad
words with asterisks.

Bugfix: filters now apply to OBS
</content>
</entry>
<entry>
<title>Add toggle for debug mode</title>
<updated>2023-06-29T03:35:10+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-29T03:35:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=ff7eb3c212195af71cd0ce4a3cd0c9a081d6ebda'/>
<id>urn:sha1:ff7eb3c212195af71cd0ce4a3cd0c9a081d6ebda</id>
<content type='text'>
Most transcription output is now gone by default. Users can enable a
more verbose output by toggling `Enable debug mode`.

Bugfix: Toggling off transcription would reset audio state, frequently
resulting in the loss of the last few words spoken.
</content>
</entry>
<entry>
<title>Add UI for fuzzy commit threshold</title>
<updated>2023-06-27T23:01:16+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T23:01:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=6638993e313773ba6ca8bdb6d7690b798d41f0d4'/>
<id>urn:sha1:6638993e313773ba6ca8bdb6d7690b798d41f0d4</id>
<content type='text'>
Recap: In the STT there's an algorithm that tries to determine when a
transcript is "stable" enough to commit. If that is too loose, then
accuracy suffers; if too strict, then the audio buffer eventually fills.

To mitigate the problem, I check whether the last N transcripts are
within some edit distance (Levenshtein edit distance) of each other. The
fuzzy matching lets us forgive small instabilities, like differences in
uppercase/lowercase or punctuation, while rejecting large instabilities.

The default value of 8 seems to be in the sweet spot of accuracy &amp;
performance, but it will likely be tuned in the future.
</content>
</entry>
<entry>
<title>Adjust commit logic to use fuzzy string match threshold</title>
<updated>2023-06-27T22:38:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T22:35:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=241813a5af11093c6b86e70ada729788c1f0dee6'/>
<id>urn:sha1:241813a5af11093c6b86e70ada729788c1f0dee6</id>
<content type='text'>
... instead of simple equality.

TODO: add UI for threshold.

Bugfix: Frame::onAppStop() joins the OBS app thread.
</content>
</entry>
<entry>
<title>Add ability to preserve transcript while using push to talk</title>
<updated>2023-06-27T22:16:06+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T22:16:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=cf75998dab6db1b1d21ca06bde18a56b5e896937'/>
<id>urn:sha1:cf75998dab6db1b1d21ca06bde18a56b5e896937</id>
<content type='text'>
This is useful when streaming. Occasionally the STT can get into
a bad state, and manually segmenting clears it up. However doing so
would clear your accumulated transcript, which isn't always desired. Add
ability to preserve the transcript.

A small wrinkle: the new commit logic requires N consecutive identical
windows before committing. To make this feature play nicely with it, I
had to forcibly commit any preview text that hasn't yet been committed.
Failing to do this would usually cause short utterances / the most
recently said stuff to get wiped out.
</content>
</entry>
<entry>
<title>Limit priority of transcription process</title>
<updated>2023-06-27T08:04:44+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T08:04:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=40e0202f5954c475c9c48155b95bc4dc67433242'/>
<id>urn:sha1:40e0202f5954c475c9c48155b95bc4dc67433242</id>
<content type='text'>
Seems to help reduce impact on time-sensitive apps like OBS.
</content>
</entry>
<entry>
<title>Scrub out old C++-based Whisper code</title>
<updated>2023-06-27T00:21:59+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T00:21:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=694756a96a6109cd79a77221dd4e40638ff55b82'/>
<id>urn:sha1:694756a96a6109cd79a77221dd4e40638ff55b82</id>
<content type='text'>
No longer used.
</content>
</entry>
<entry>
<title>Add UI for browser src</title>
<updated>2023-06-27T00:13:31+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-27T00:12:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=011cfdd4bab866a64b06406ceaa7563294af9225'/>
<id>urn:sha1:011cfdd4bab866a64b06406ceaa7563294af9225</id>
<content type='text'>
Add ability to toggle on/off browser src &amp; configure port.
</content>
</entry>
<entry>
<title>Add browser source, hardcoded to port 8097</title>
<updated>2023-06-26T08:46:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-06-26T07:58:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=0ed379f2c99ac5c126a6f101965ef1eaa58c017b'/>
<id>urn:sha1:0ed379f2c99ac5c126a6f101965ef1eaa58c017b</id>
<content type='text'>
Transcription output now streams to localhost:8097.

In OBS:
* Create a browser source.
* url: localhost:8097
* width: 2200
* height: 400

TODO:
* Put behind toggle.
* Create input field for port.

Misc cleanup:
* transcribe.py: Drop frames from audio capture thread instead of the
  transcription thread. Doing it the other way would result in
  occasional data loss.
</content>
</entry>
</feed>
