<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git, branch v0.13.1</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=v0.13.1</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=v0.13.1'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2023-07-07T09:35:51+00:00</updated>
<entry>
<title>Enforce a stricter avg_logbprob than default</title>
<updated>2023-07-07T09:35:51+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T09:30:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=7a576bcac1c37c3c5a59fadf172aa70b15ff83c8'/>
<id>urn:sha1:7a576bcac1c37c3c5a59fadf172aa70b15ff83c8</id>
<content type='text'>
Common hallucinations sneak in around -0.9 avg_logprob.

Also:
* Limit temperatures to just 0.0. Multiple values cause latency to
  occasionally spike.
</content>
</entry>
<entry>
<title>Filter out segments based on avg_log_prob &amp; no_speech_prob</title>
<updated>2023-07-07T08:58:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:57:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=2793ac9dd31059f2fc29f7978bcb688a7de664ed'/>
<id>urn:sha1:2793ac9dd31059f2fc29f7978bcb688a7de664ed</id>
<content type='text'>
Surprisingly, these args do not cause transcribe() to omit those
segments from the result, so we have to manually filter them out.
Hallucinated phrases generally have one or both of these params set
high.
</content>
</entry>
<entry>
<title>Use 16-bit ints with generated silence</title>
<updated>2023-07-07T08:44:28+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:44:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=742eb86d652d7689bbf3ae8b286bf0a6b1c2380d'/>
<id>urn:sha1:742eb86d652d7689bbf3ae8b286bf0a6b1c2380d</id>
<content type='text'>
Each sample of audio data is a 16-bit int, not an 8-bit int.
</content>
</entry>
<entry>
<title>Fix performance regression</title>
<updated>2023-07-07T08:27:02+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T08:27:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=cdc4889cb5e752d00f7f8933a5486f4f3441f6e9'/>
<id>urn:sha1:cdc4889cb5e752d00f7f8933a5486f4f3441f6e9</id>
<content type='text'>
Each chunk of audio samples should be encoded as a binary string, not as
a list.
</content>
</entry>
<entry>
<title>Enforce minimum 5.0 second duration on audio buffer</title>
<updated>2023-07-07T00:36:14+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-07T00:36:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=d0d3b18ad0a859e5e7a1cc5b8a569349b505c924'/>
<id>urn:sha1:d0d3b18ad0a859e5e7a1cc5b8a569349b505c924</id>
<content type='text'>
New commit logic would reduce buffer to a size smaller than this,
causing it to hallucinate things like:

* "See you next time!"
* "Thanks for watching!"
* "Bye!"

The hope is that by keeping the buffer at least 5.0 seconds long, as
described in the paper, this will cut down on these events.
</content>
</entry>
<entry>
<title>Begin work on proxy server</title>
<updated>2023-07-04T02:36:13+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-04T01:44:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=76ae7c28ea6224b2c919122d5dc71bcc00a0ecaa'/>
<id>urn:sha1:76ae7c28ea6224b2c919122d5dc71bcc00a0ecaa</id>
<content type='text'>
Create a simple server with 3 endpoints:
* /create_session: Create a session and return its identifier.
* /set_transcript: Update a session's transcript.
* /get_transcript: Fetch a session's transcript.

Right now the session ID provides authentication *and* authorization.
There is no public/private ID so you have to trust whoever you share
your ID with.

IDs are long and generated by the server, so it should be somewhat
secure against low-effort hacking.

Other updates:
* Drop whisper_requirements.txt - no longer needed.
* Vendor curl to make it easier to interact with the server.

TODO:
* Fuzz test the server.
</content>
</entry>
<entry>
<title>Add profanity filter</title>
<updated>2023-07-02T23:45:07+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-02T23:45:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=7888ccc96d001512dd3bdc01f299856e86c876f5'/>
<id>urn:sha1:7888ccc96d001512dd3bdc01f299856e86c876f5</id>
<content type='text'>
Forgot to check this in, oops!
</content>
</entry>
<entry>
<title>Add visual commit indicator to OBS browser source</title>
<updated>2023-07-01T02:46:17+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-01T02:44:27+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=4f3131b4a36d8e1557edb31d3754a431717dab7b'/>
<id>urn:sha1:4f3131b4a36d8e1557edb31d3754a431717dab7b</id>
<content type='text'>
Circle goes red when speaking, grey when done. Ideally it would be in
the top right portion of the browser source, but this is a good start.

Also, hard-cap transcripts to 4096 chars. This prevents the STT from
lagging during long sessions.
</content>
</entry>
<entry>
<title>Bugfix: trailing period filter ignores ellipses</title>
<updated>2023-07-01T01:55:12+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-07-01T01:55:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=9ab500036bdfa87215e9a05fc167c4d9dea8e437'/>
<id>urn:sha1:9ab500036bdfa87215e9a05fc167c4d9dea8e437</id>
<content type='text'>
... also print out "Ready!" when the STT is done loading.
</content>
</entry>
<entry>
<title>Merge pull request #3 from jsopn/fix-gpu-device-index</title>
<updated>2023-06-30T07:15:50+00:00</updated>
<author>
<name>yum-food</name>
<email>114886918+yum-food@users.noreply.github.com</email>
</author>
<published>2023-06-30T07:15:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=ab1d4499e1b3198b387a1e2d7333a93f694cdfae'/>
<id>urn:sha1:ab1d4499e1b3198b387a1e2d7333a93f694cdfae</id>
<content type='text'>
Set GPU device index in whisper model</content>
</entry>
</feed>
