<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT.git/transcribe.py, branch master</title>
<subtitle>Free self-hosted STT for VRChat.</subtitle>
<id>https://git.yummers.dev/TaSTT.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/TaSTT.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/'/>
<updated>2022-12-18T01:51:12+00:00</updated>
<entry>
<title>Finish python virtual env</title>
<updated>2022-12-18T01:51:12+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-12-18T01:51:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=ee8213d1d2c2008d2d996929500c9e87dac325a3'/>
<id>urn:sha1:ee8213d1d2c2008d2d996929500c9e87dac325a3</id>
<content type='text'>
GUI can now download all TaSTT dependencies and install them into a
virtual environment.

* Add buttons to check embedded python version &amp; install dependencies
* Add class to wrap interacting with embedded Python
* Put all TaSTT python scripts into a folder
</content>
</entry>
<entry>
<title>Optimize transcription latency</title>
<updated>2022-12-15T07:04:29+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-12-15T05:45:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=8326dee0bf01956b450858212cbdba3403b32b0d'/>
<id>urn:sha1:8326dee0bf01956b450858212cbdba3403b32b0d</id>
<content type='text'>
Shave off ~500ms due to locking. Acquiring a threading.Lock takes
hundreds of milliseconds and the global interpreter lock already takes
care of most crashy race conditions, so just remove the locks.

Avoid writing audio to disk, saving more time (and disk wear / IOPS).

Add basic profiling to transcribe().

Omit timestamps, since we don't use them (maybe we should!)

Shorten noise indicators to 350ms

The whisper behavior where it repeats tokens causes certain
transcriptions to take many seconds. I haven't thought about how to fix
this, yet.
</content>
</entry>
<entry>
<title>Add on/off sound indicator (local)</title>
<updated>2022-11-26T01:57:42+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-26T01:57:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=89a929fe76e4dbd56436288055366d9416c13e74'/>
<id>urn:sha1:89a929fe76e4dbd56436288055366d9416c13e74</id>
<content type='text'>
Now we have a visual and auditory indicator for transcription. The
auditory indicator is only heard by the user, and can be used to reset
the state of the board prior to displaying.
</content>
</entry>
<entry>
<title>Tweak speech indicator</title>
<updated>2022-11-23T20:25:05+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-23T20:24:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=ae9ac5ba5942447f47d8d996d2d340381e730c33'/>
<id>urn:sha1:ae9ac5ba5942447f47d8d996d2d340381e730c33</id>
<content type='text'>
Use a single indicator with 3 states:
  1. green: actively speaking
  2. orange: waiting for paging
  3. red: up-to-date

Use slightly nicer colors.
</content>
</entry>
<entry>
<title>Shorten audio window to 10 seconds</title>
<updated>2022-11-23T03:01:01+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-23T03:01:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=9f87674d1b484a2e61e87ad53d8ebcf9985dce6b'/>
<id>urn:sha1:9f87674d1b484a2e61e87ad53d8ebcf9985dce6b</id>
<content type='text'>
This helps with temporal stability in long-running transcriptions, and
lets us get rid of that hack where we refuse to update old pages.
</content>
</entry>
<entry>
<title>Fix audio bug</title>
<updated>2022-11-23T02:21:49+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-23T02:21:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=0f95401b7f7388b7ce3b6bf4d4aa1694f484db11'/>
<id>urn:sha1:0f95401b7f7388b7ce3b6bf4d4aa1694f484db11</id>
<content type='text'>
Coarse locking was causing audio frames to drop, severely degrading
transcription quality.

We really need a spoken word integration test.
</content>
</entry>
<entry>
<title>Rework input controls</title>
<updated>2022-11-23T02:13:18+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-22T23:36:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=bd8b63a357bb374f5875f0fedf2d677589419810'/>
<id>urn:sha1:bd8b63a357bb374f5875f0fedf2d677589419810</id>
<content type='text'>
Press joystick once to start recording, again to stop. When you start
recording, any previous text on the board is cleared.

Add 2 visual indicators: one to indicate speech, another to indicate
that audio is paging.
</content>
</entry>
<entry>
<title>Tweak transcription again</title>
<updated>2022-11-16T08:45:09+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-16T08:45:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=d2e06445c42b22d2b75f5da1980b7a8d833a9c5b'/>
<id>urn:sha1:d2e06445c42b22d2b75f5da1980b7a8d833a9c5b</id>
<content type='text'>
Works a little better on longer transcriptions while maintaining the
same improved performance on short transcriptions.

We really need a benchmark to evaluate performance mechanically.
</content>
</entry>
<entry>
<title>Another transcription rework</title>
<updated>2022-11-15T05:36:13+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-15T05:30:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=2505a5cc486cd913db50a475e45c3701b9710282'/>
<id>urn:sha1:2505a5cc486cd913db50a475e45c3701b9710282</id>
<content type='text'>
After re-reading the paper, I noticed that they apply a couple
optimizations I wasn't using. Use the top-level `whisper.transcribe`
method, which is a little slower, but more accurate than the one I was
using.

Although this method is slower, it has better temporal stability due to
the increased quality, which I think should make for an overall more
responsive UX. Lower transcription quality means the paging layer has to
waste time updating earlier cells.

Also, drop the auto-commit stuff and go back to string stitching. I
think it's better to let the user manually commit. A rework of the hand
controls is probably coming soon.

Finally, update README.
</content>
</entry>
<entry>
<title>Fix reset button</title>
<updated>2022-11-12T23:02:34+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2022-11-12T23:02:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT.git/commit/?id=9921697816c9f9473bac54444793f702e54d24a6'/>
<id>urn:sha1:9921697816c9f9473bac54444793f702e54d24a6</id>
<content type='text'>
Board would lock up if you reset after the first page. osc_ctrl.clear()
was assigning the wrong member :)

Tweak continuous transcription logic: now we only commit if the
transcription remains identical for N seconds.
</content>
</entry>
</feed>
