<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT-Whisper.git/Whisper/MF, branch master</title>
<subtitle>High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model</subtitle>
<id>https://git.yummers.dev/TaSTT-Whisper.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/TaSTT-Whisper.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/'/>
<updated>2023-04-05T00:41:30+00:00</updated>
<entry>
<title>Fix audio normalization</title>
<updated>2023-04-05T00:41:30+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-04-05T00:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=59297502afb8f61c1216c6d57d6cc18ab5b9f467'/>
<id>urn:sha1:59297502afb8f61c1216c6d57d6cc18ab5b9f467</id>
<content type='text'>
Normalization was putting audio onto range [0, 255], while it should
have been on range [0, 1].

* Add AudioBuffer::save() to enable debugging audio issues.
</content>
</entry>
<entry>
<title>Begin work on evaluation framework</title>
<updated>2023-03-03T23:43:24+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-03T23:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=f7d5741e5c069d759f8412bd40b279e1d7abac4c'/>
<id>urn:sha1:f7d5741e5c069d759f8412bd40b279e1d7abac4c</id>
<content type='text'>
Need a way to verify that beam search is actually working better than
greedy.
</content>
</entry>
<entry>
<title>Add retainDuration option to CaptureParams</title>
<updated>2023-02-27T04:09:15+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-27T03:42:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=1136acfc365f357d2df13a263714e8ae0614c4f9'/>
<id>urn:sha1:1136acfc365f357d2df13a263714e8ae0614c4f9</id>
<content type='text'>
This allows users to retain a suffix of the PCM buffer after a VAD
segmentation event, reducing some instances of words being lost at
the start of the next VAD window.
</content>
</entry>
<entry>
<title>Normalize audio before sending to transcription layer</title>
<updated>2023-02-27T04:08:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-27T03:12:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=02c2605454288f7c86023ae700366acf08cd2206'/>
<id>urn:sha1:02c2605454288f7c86023ae700366acf08cd2206</id>
<content type='text'>
Helps in cases where the speaker is speaking softly, or their mic gain
is set low.
</content>
</entry>
<entry>
<title>Frames with no VAD are shortened, not dropped</title>
<updated>2023-02-27T03:49:40+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-27T02:57:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=00a0350a0218cf4b03d14acac84110bc1e882bee'/>
<id>urn:sha1:00a0350a0218cf4b03d14acac84110bc1e882bee</id>
<content type='text'>
On PCM buffers of length &gt;= captureParams.dropStartSilence, a
"no voice" VAD verdict would result in the PCM buffer being entirely
cleared. The emergent behavior is that when VAD segments speech,
words right after the segmentation window can frequently be dropped.

By removing a prefix from the PCM buffer and clearing the VAD buffers,
the transcription algorithm has access to "leading" frames before the
frames which triggered VAD. This reduces cases where words are omitted
in the middle of long statements.
</content>
</entry>
<entry>
<title>Bugfix, stereo PCM handling</title>
<updated>2023-01-28T15:15:39+00:00</updated>
<author>
<name>Konstantin</name>
<email>const@const.me</email>
</author>
<published>2023-01-28T15:15:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=cfd20a0f796ab6cc046b080bb7af8967cb7c361b'/>
<id>urn:sha1:cfd20a0f796ab6cc046b080bb7af8967cb7c361b</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Minor, error handling</title>
<updated>2023-01-20T12:31:49+00:00</updated>
<author>
<name>Konstantin</name>
<email>const@const.me</email>
</author>
<published>2023-01-20T12:31:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=8880867a950f78292f8cf4b37771b08dd0376074'/>
<id>urn:sha1:8880867a950f78292f8cf4b37771b08dd0376074</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Workaround for the Microsoft’s bug in their MP3 decoder MFT</title>
<updated>2023-01-19T16:10:24+00:00</updated>
<author>
<name>Konstantin</name>
<email>const@const.me</email>
</author>
<published>2023-01-19T16:10:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=9df2ee2ead4ce23d06351a6cdb4fea588f79e429'/>
<id>urn:sha1:9df2ee2ead4ce23d06351a6cdb4fea588f79e429</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Minor, optimized away memcpy() when running audio capture</title>
<updated>2023-01-17T12:12:14+00:00</updated>
<author>
<name>Konstantin</name>
<email>const@const.me</email>
</author>
<published>2023-01-17T12:12:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=aae8860d0e7b2bf68a1c20c6f30999fff531f03c'/>
<id>urn:sha1:aae8860d0e7b2bf68a1c20c6f30999fff531f03c</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Source codes</title>
<updated>2023-01-16T13:52:43+00:00</updated>
<author>
<name>Konstantin</name>
<email>const@const.me</email>
</author>
<published>2023-01-16T13:52:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=8c4603c73675958efc960fbd4bb599a2909d106a'/>
<id>urn:sha1:8c4603c73675958efc960fbd4bb599a2909d106a</id>
<content type='text'>
</content>
</entry>
</feed>
