<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git, branch v0.18.1</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.18.1</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.18.1'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2024-02-10T01:51:53+00:00</updated>
<entry>
<title>Finish plumbing GPU compute type</title>
<updated>2024-02-10T01:51:53+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-10T01:51:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=5ef207d28f2a9d943384b9ec6872aedae2917ac0'/>
<id>urn:sha1:5ef207d28f2a9d943384b9ec6872aedae2917ac0</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add dropdown for GPU compute type</title>
<updated>2024-02-10T01:21:46+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-10T01:21:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=3b84b185d1286e1b954f5ad636b26188efa141e4'/>
<id>urn:sha1:3b84b185d1286e1b954f5ad636b26188efa141e4</id>
<content type='text'>
Should enable compatibility with older GPUs.
</content>
</entry>
<entry>
<title>Add another threshold to filter out common hallucinations</title>
<updated>2024-02-06T01:40:37+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:40:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=e58c718cb115c44ef3a546bea245e05e50d24c55'/>
<id>urn:sha1:e58c718cb115c44ef3a546bea245e05e50d24c55</id>
<content type='text'>
The paper recommends filtering out segments with no_speech_prob &gt; 0.6
and avg_logprob &lt; -1. This is too loose of a bound for short-form audio
which is not guaranteed to contain speech.

I already have a tighter bound:

  no_speech &gt; 0.6 and avg_logprob &lt; -0.5

While listening to instrumental music I find that a lot of
hallucinations sneak past that bound. So I added a second bound:

  no_speech &gt; 0.15 and avg_logprob &lt; -0.7

Basically we filter out things that look like speech but have a worse
avg_logprob. Seems to not have false negatives. Requires testing.

Also: dial back the default max segment length from 15 seconds to 10
seconds. This is done based on performance observations in desktop.
</content>
</entry>
<entry>
<title>Verify that audio is clean after VAD segmentation</title>
<updated>2024-02-06T01:02:23+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1'/>
<id>urn:sha1:acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1</id>
<content type='text'>
Indeed it is. Bumped up the default max segment length to decrease
error.

Also add mic presets for beyond (the vr headset) and motu (my mic
interface).
</content>
</entry>
<entry>
<title>Revert "Begin experimenting with flash-attention"</title>
<updated>2024-01-09T02:59:27+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-01-09T02:59:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=33db3dcc23a45cae611bcf839c33d6615ccbf59e'/>
<id>urn:sha1:33db3dcc23a45cae611bcf839c33d6615ccbf59e</id>
<content type='text'>
This reverts commit 921b92a69f36502dc5eefd14ba3487c1bb49bb9d.
</content>
</entry>
<entry>
<title>Fix font rendering ddx/ddy logic</title>
<updated>2024-01-09T02:59:21+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-01-09T02:59:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=f7e1cf963efc6e4e564b41445cfd328c3baa0f0a'/>
<id>urn:sha1:f7e1cf963efc6e4e564b41445cfd328c3baa0f0a</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Begin experimenting with flash-attention</title>
<updated>2023-12-13T21:54:57+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-13T21:54:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=921b92a69f36502dc5eefd14ba3487c1bb49bb9d'/>
<id>urn:sha1:921b92a69f36502dc5eefd14ba3487c1bb49bb9d</id>
<content type='text'>
Seems much faster than faster-whisper.

There are two issues:
* Requires NVIDIA 3000 series or higher.
* Incompatible with faster-whisper dependencies.

So it seems like we'll either need to toggle between two sets of
dependencies at runtime or have two environments.
</content>
</entry>
<entry>
<title>Decrease OSC sync rate from 5 Hz to 3 Hz</title>
<updated>2023-12-09T02:15:03+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-09T02:15:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=859caec3d5c1b6aa9eee98571af3324b6ed1bd21'/>
<id>urn:sha1:859caec3d5c1b6aa9eee98571af3324b6ed1bd21</id>
<content type='text'>
Paging is now slower but more reliable.
</content>
</entry>
<entry>
<title>Add distilled whisper large-v2 model</title>
<updated>2023-12-09T02:13:56+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-09T02:13:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=15368618a109eeec69209a6693839eb359ecd190'/>
<id>urn:sha1:15368618a109eeec69209a6693839eb359ecd190</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add distilled whisper-medium model</title>
<updated>2023-11-07T23:05:29+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-11-07T23:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=dbb2f72792e2af3ff220313f84bf76a9a1ddbeb4'/>
<id>urn:sha1:dbb2f72792e2af3ff220313f84bf76a9a1ddbeb4</id>
<content type='text'>
I converted distil-whisper-medium.en to CTranslate2 format and uploaded
it to huggingface. This model is exceptionally fast and light compared
to the non-distilled version, at the cost of some accuracy.
</content>
</entry>
</feed>
