<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git/GUI, branch v0.19.1</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.19.1</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.19.1'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2024-06-09T23:43:34+00:00</updated>
<entry>
<title>Bump CUDNN to v8.9.7</title>
<updated>2024-06-09T23:43:34+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-06-09T23:43:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=4fec36c3cc00bd649dfb3c9d7e9079b5c8685a0e'/>
<id>urn:sha1:4fec36c3cc00bd649dfb3c9d7e9079b5c8685a0e</id>
<content type='text'>
Also disable flash-attention when CPU mode is selected
</content>
</entry>
<entry>
<title>Add checkbox for flash-attention</title>
<updated>2024-06-09T22:54:30+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-06-09T22:54:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=f2b21dd5afebd6b76b5835168f7d1bd3bec21f5d'/>
<id>urn:sha1:f2b21dd5afebd6b76b5835168f7d1bd3bec21f5d</id>
<content type='text'>
Pre-3000 series GPUs don't support it. Oops!
</content>
</entry>
<entry>
<title>Update defaults to work with modular prefab</title>
<updated>2024-06-06T07:45:27+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-06-06T01:30:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=72b9fb8337cfb7bddc58f74b8977e4a2283e6728'/>
<id>urn:sha1:72b9fb8337cfb7bddc58f74b8977e4a2283e6728</id>
<content type='text'>
There's a modular avatar prefab for the custom chatbox on my gumroad.
Update the default settings to work with that prefab.
</content>
</entry>
<entry>
<title>Upgrade faster-whisper with flash-attention2</title>
<updated>2024-06-06T01:15:47+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-06-06T01:15:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=4f0fb5b17de990517e3c1de7ffee5d0f3c9a8961'/>
<id>urn:sha1:4f0fb5b17de990517e3c1de7ffee5d0f3c9a8961</id>
<content type='text'>
This should be significantly more efficient than prior versions.

* add large-v3 &amp; distilled variant
* simplify model acquisition code now that distilled models are part of
  faster-whisper.
</content>
</entry>
<entry>
<title>Fix distilled models</title>
<updated>2024-03-15T01:03:54+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-03-15T01:03:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=5638d86c97041de31217e058e411034143e9c882'/>
<id>urn:sha1:5638d86c97041de31217e058e411034143e9c882</id>
<content type='text'>
These were broken due to some logic errors in the codepath which
acquires models from huggingface.

Distilled large-v2 seems promising as a new default model.
</content>
</entry>
<entry>
<title>Finish fixing build break</title>
<updated>2024-03-04T23:50:16+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-03-04T23:50:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=b3983d3274b92b6f96efca894a56b1cb5422b621'/>
<id>urn:sha1:b3983d3274b92b6f96efca894a56b1cb5422b621</id>
<content type='text'>
CUDNN now pulls from dropbox instead of google drive. This has the added
benefit of being about 10-20x faster (assuming you have fast internet).
</content>
</entry>
<entry>
<title>Begin fixing build on new hosts</title>
<updated>2024-03-04T23:36:48+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-03-04T23:36:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=db8618577175a5f6031b0002d269a2535a71a818'/>
<id>urn:sha1:db8618577175a5f6031b0002d269a2535a71a818</id>
<content type='text'>
Google drive intentionally broke CLI downloads ("don't be evil") and
UwwwuPP went away. Begin work rehosting both files.
</content>
</entry>
<entry>
<title>Add dropdown for GPU compute type</title>
<updated>2024-02-10T01:21:46+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-10T01:21:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=3b84b185d1286e1b954f5ad636b26188efa141e4'/>
<id>urn:sha1:3b84b185d1286e1b954f5ad636b26188efa141e4</id>
<content type='text'>
Should enable compatibility with older GPUs.
</content>
</entry>
<entry>
<title>Add another threshold to filter out common hallucinations</title>
<updated>2024-02-06T01:40:37+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:40:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=e58c718cb115c44ef3a546bea245e05e50d24c55'/>
<id>urn:sha1:e58c718cb115c44ef3a546bea245e05e50d24c55</id>
<content type='text'>
The paper recommends filtering out segments with no_speech_prob &gt; 0.6
and avg_logprob &lt; -1. This is too loose of a bound for short-form audio
which is not guaranteed to contain speech.

I already have a tighter bound:

  no_speech &gt; 0.6 and avg_logprob &lt; -0.5

While listening to instrumental music I find that a lot of
hallucinations sneak past that bound. So I added a second bound:

  no_speech &gt; 0.15 and avg_logprob &lt; -0.7

Basically we filter out things that look like speech but have a worse
avg_logprob. Seems to not have false negatives. Requires testing.

Also: dial back the default max segment length from 15 seconds to 10
seconds. This is done based on performance observations in desktop.
</content>
</entry>
<entry>
<title>Verify that audio is clean after VAD segmentation</title>
<updated>2024-02-06T01:02:23+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1'/>
<id>urn:sha1:acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1</id>
<content type='text'>
Indeed it is. Bumped up the default max segment length to decrease
error.

Also add mic presets for beyond (the vr headset) and motu (my mic
interface).
</content>
</entry>
</feed>
