<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git/Scripts/transcribe.py, branch v0.14.0</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.14.0</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.14.0'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2023-08-02T06:42:24+00:00</updated>
<entry>
<title>Fix race condition in commit logic</title>
<updated>2023-08-02T06:42:24+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-08-02T06:42:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=7b5cbfd76ede7522555dcc87b014239b4f6fbe8c'/>
<id>urn:sha1:7b5cbfd76ede7522555dcc87b014239b4f6fbe8c</id>
<content type='text'>
Transcription thread now blocks until microphone thread deletes samples
as requested.

(This is hacky design, it should use a work queue or something, but I
don't feel like doing that right now)
</content>
</entry>
<entry>
<title>Only back off transcription loop when not transcribing</title>
<updated>2023-08-02T06:30:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-08-02T06:25:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=fa7cb7220029fcc506476bf7b32aab90a0077a14'/>
<id>urn:sha1:fa7cb7220029fcc506476bf7b32aab90a0077a14</id>
<content type='text'>
It's possible that the user has toggled off transcription while the
algorithm is still working. In this case we should *not* begin
exponential backoff since there's still work to do.

Also:
* Shorten the hot-path sleep from 50ms to 5ms.
* Remove unused variable in SleepInterruptible
</content>
</entry>
<entry>
<title>Preserve audio chunk length when dropping samples</title>
<updated>2023-07-09T02:06:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-09T01:55:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=a602bfb95665697b15a2de58694c6ac064af2916'/>
<id>urn:sha1:a602bfb95665697b15a2de58694c6ac064af2916</id>
<content type='text'>
When we commit a transcription, we drop the corresponding audio data.
Audio data is represented as a list of chunks. Each chunk contains a few
hundred samples of audio data, representing O(10ms) of audio.

If we want to drop a few seconds of data, this means simply deleting
many chunks of audio. There's usually a chunk where we want to drop some
portion of audio data.

Instead of slicing away that part of the chunk, which would change its
length, this change zeroes it out. This preserves the assumption that
each chunk has the same temporal length.
</content>
</entry>
<entry>
<title>Commit logic now drops parts of frames</title>
<updated>2023-07-08T22:57:39+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-08T22:57:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=80f46a7a346e73c94a3bb8ae01099743020ef2a4'/>
<id>urn:sha1:80f46a7a346e73c94a3bb8ae01099743020ef2a4</id>
<content type='text'>
We used to drop entire frames only, leading to situations where more
audio is dropped than desired. Now we drop frames down to the precision
of the individual audio sample requested.
</content>
</entry>
<entry>
<title>Update README</title>
<updated>2023-07-08T00:54:40+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-08T00:54:40+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=5db7426bb14b7e51275c14d8173bd67e8addc4ce'/>
<id>urn:sha1:5db7426bb14b7e51275c14d8173bd67e8addc4ce</id>
<content type='text'>
Mostly updating roadmap stuff. Non-VRC use cases are "complete" since I
was mostly targeting streaming. The ability to type into arbitrary text
fields is still somewhat nascent &amp; could be improved.

Also update some other random stuff to be more up to date. KillFrenzy
Avatar Text is now MIT, pog!
</content>
</entry>
<entry>
<title>Enforce a stricter avg_logbprob than default</title>
<updated>2023-07-07T09:35:51+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T09:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=7a576bcac1c37c3c5a59fadf172aa70b15ff83c8'/>
<id>urn:sha1:7a576bcac1c37c3c5a59fadf172aa70b15ff83c8</id>
<content type='text'>
Common hallucinations sneak in around -0.9 avg_logprob.

Also:
* Limit temperatures to just 0.0. Multiple values cause latency to
  occasionally spike.
</content>
</entry>
<entry>
<title>Filter out segments based on avg_log_prob &amp; no_speech_prob</title>
<updated>2023-07-07T08:58:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:57:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=2793ac9dd31059f2fc29f7978bcb688a7de664ed'/>
<id>urn:sha1:2793ac9dd31059f2fc29f7978bcb688a7de664ed</id>
<content type='text'>
Surprisingly, these args do not cause transcribe() to omit those
segments from the result, so we have to manually filter them out.
Hallucinated phrases generally have one or both of these params set
high.
</content>
</entry>
<entry>
<title>Use 16-bit ints with generated silence</title>
<updated>2023-07-07T08:44:28+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:44:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=742eb86d652d7689bbf3ae8b286bf0a6b1c2380d'/>
<id>urn:sha1:742eb86d652d7689bbf3ae8b286bf0a6b1c2380d</id>
<content type='text'>
Each sample of audio data is a 16-bit int, not an 8-bit int.
</content>
</entry>
<entry>
<title>Fix performance regression</title>
<updated>2023-07-07T08:27:02+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:27:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=cdc4889cb5e752d00f7f8933a5486f4f3441f6e9'/>
<id>urn:sha1:cdc4889cb5e752d00f7f8933a5486f4f3441f6e9</id>
<content type='text'>
Each chunk of audio samples should be encoded as a binary string, not as
a list.
</content>
</entry>
<entry>
<title>Enforce minimum 5.0 second duration on audio buffer</title>
<updated>2023-07-07T00:36:14+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T00:36:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=d0d3b18ad0a859e5e7a1cc5b8a569349b505c924'/>
<id>urn:sha1:d0d3b18ad0a859e5e7a1cc5b8a569349b505c924</id>
<content type='text'>
New commit logic would reduce buffer to a size smaller than this,
causing it to hallucinate things like:

* "See you next time!"
* "Thanks for watching!"
* "Bye!"

The hope is that by keeping the buffer at least 5.0 seconds long, as
described in the paper, this will cut down on these events.
</content>
</entry>
</feed>
