<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git, branch v0.18.0</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.18.0</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.18.0'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2024-02-10T01:21:46+00:00</updated>
<entry>
<title>Add dropdown for GPU compute type</title>
<updated>2024-02-10T01:21:46+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-10T01:21:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=3b84b185d1286e1b954f5ad636b26188efa141e4'/>
<id>urn:sha1:3b84b185d1286e1b954f5ad636b26188efa141e4</id>
<content type='text'>
Should enable compatibility with older GPUs.
</content>
</entry>
<entry>
<title>Add another threshold to filter out common hallucinations</title>
<updated>2024-02-06T01:40:37+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:40:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=e58c718cb115c44ef3a546bea245e05e50d24c55'/>
<id>urn:sha1:e58c718cb115c44ef3a546bea245e05e50d24c55</id>
<content type='text'>
The paper recommends filtering out segments with no_speech_prob &gt; 0.6
and avg_logprob &lt; -1. This is too loose of a bound for short-form audio
which is not guaranteed to contain speech.

I already have a tighter bound:

  no_speech &gt; 0.6 and avg_logprob &lt; -0.5

While listening to instrumental music I find that a lot of
hallucinations sneak past that bound. So I added a second bound:

  no_speech &gt; 0.15 and avg_logprob &lt; -0.7

Basically we filter out things that look like speech but have a worse
avg_logprob. Seems to not have false negatives. Requires testing.

Also: dial back the default max segment length from 15 seconds to 10
seconds. This is done based on performance observations in desktop.
</content>
</entry>
<entry>
<title>Verify that audio is clean after VAD segmentation</title>
<updated>2024-02-06T01:02:23+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-02-06T01:01:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1'/>
<id>urn:sha1:acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1</id>
<content type='text'>
Indeed it is. Bumped up the default max segment length to decrease
error.

Also add mic presets for beyond (the vr headset) and motu (my mic
interface).
</content>
</entry>
<entry>
<title>Revert "Begin experimenting with flash-attention"</title>
<updated>2024-01-09T02:59:27+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-01-09T02:59:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=33db3dcc23a45cae611bcf839c33d6615ccbf59e'/>
<id>urn:sha1:33db3dcc23a45cae611bcf839c33d6615ccbf59e</id>
<content type='text'>
This reverts commit 921b92a69f36502dc5eefd14ba3487c1bb49bb9d.
</content>
</entry>
<entry>
<title>Fix font rendering ddx/ddy logic</title>
<updated>2024-01-09T02:59:21+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2024-01-09T02:59:21+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=f7e1cf963efc6e4e564b41445cfd328c3baa0f0a'/>
<id>urn:sha1:f7e1cf963efc6e4e564b41445cfd328c3baa0f0a</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Begin experimenting with flash-attention</title>
<updated>2023-12-13T21:54:57+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-13T21:54:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=921b92a69f36502dc5eefd14ba3487c1bb49bb9d'/>
<id>urn:sha1:921b92a69f36502dc5eefd14ba3487c1bb49bb9d</id>
<content type='text'>
Seems much faster than faster-whisper.

There are two issues:
* Requires NVIDIA 3000 series or higher.
* Incompatible with faster-whisper dependencies.

So it seems like we'll either need to toggle between two sets of
dependencies at runtime or have two environments.
</content>
</entry>
<entry>
<title>Decrease OSC sync rate from 5 Hz to 3 Hz</title>
<updated>2023-12-09T02:15:03+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-09T02:15:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=859caec3d5c1b6aa9eee98571af3324b6ed1bd21'/>
<id>urn:sha1:859caec3d5c1b6aa9eee98571af3324b6ed1bd21</id>
<content type='text'>
Paging is now slower but more reliable.
</content>
</entry>
<entry>
<title>Add distilled whisper large-v2 model</title>
<updated>2023-12-09T02:13:56+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-12-09T02:13:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=15368618a109eeec69209a6693839eb359ecd190'/>
<id>urn:sha1:15368618a109eeec69209a6693839eb359ecd190</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Add distilled whisper-medium model</title>
<updated>2023-11-07T23:05:29+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-11-07T23:05:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=dbb2f72792e2af3ff220313f84bf76a9a1ddbeb4'/>
<id>urn:sha1:dbb2f72792e2af3ff220313f84bf76a9a1ddbeb4</id>
<content type='text'>
I converted distil-whisper-medium.en to CTranslate2 format and uploaded
it to huggingface. This model is exceptionally fast and light compared
to the non-distilled version, at the cost of some accuracy.
</content>
</entry>
<entry>
<title>Transcripts preceding long pauses now drop</title>
<updated>2023-10-06T01:28:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-10-06T01:22:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=add7bd8ef86ec21cd1327eb45bcb739aa54f7db8'/>
<id>urn:sha1:add7bd8ef86ec21cd1327eb45bcb739aa54f7db8</id>
<content type='text'>
When hot-miking into the built-in chatbox, there are sometimes long
pauses in conversation. After these pauses, it's undesirable to show the
transcript generate before the pause. This feature makes it so that
those transcripts can be dropped.

Also:

* Limit number of segments sent to browser source to 10. Allow this to
  grow up to 10 segments before dropping the first 5 segments.
* Silence warnings generated by `install_in_venv`, used by e.g.
  translation codepath.
* Enable audio normalization to improve accuracy when speaking softly,
  at the cost of some accuracy when speaking normally.

Credit: user endo0269 on Discord suggested this feature.
</content>
</entry>
</feed>
