<feed xmlns='http://www.w3.org/2005/Atom'>
<title>TaSTT-Whisper.git, branch master</title>
<subtitle>High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model</subtitle>
<id>https://git.yummers.dev/TaSTT-Whisper.git/atom?h=master</id>
<link rel='self' href='https://git.yummers.dev/TaSTT-Whisper.git/atom?h=master'/>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/'/>
<updated>2023-04-05T00:41:30+00:00</updated>
<entry>
<title>Fix audio normalization</title>
<updated>2023-04-05T00:41:30+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-04-05T00:40:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=59297502afb8f61c1216c6d57d6cc18ab5b9f467'/>
<id>urn:sha1:59297502afb8f61c1216c6d57d6cc18ab5b9f467</id>
<content type='text'>
Normalization was putting audio onto range [0, 255], while it should
have been on range [0, 1].

* Add AudioBuffer::save() to enable debugging audio issues.
</content>
</entry>
<entry>
<title>begin work disabling vad</title>
<updated>2023-03-17T11:11:18+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-17T11:11:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=aaa0188da81056748ef8ffcd5ad86d6f4bffa6bd'/>
<id>urn:sha1:aaa0188da81056748ef8ffcd5ad86d6f4bffa6bd</id>
<content type='text'>
</content>
</entry>
<entry>
<title>Fix beam search previous window conditioning</title>
<updated>2023-03-08T04:22:56+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-08T04:16:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=5e30b2366a4a320f59ed7e0bfcfe72f5f8c9d108'/>
<id>urn:sha1:5e30b2366a4a320f59ed7e0bfcfe72f5f8c9d108</id>
<content type='text'>
Not all contexts had `prev_prompt`, causing most beams to misbehave.
</content>
</entry>
<entry>
<title>Use logprobs, fix beam candidate selection</title>
<updated>2023-03-04T04:42:10+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-04T04:10:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=a74ee78dbb79c551851dc182090e7d4292b1e80c'/>
<id>urn:sha1:a74ee78dbb79c551851dc182090e7d4292b1e80c</id>
<content type='text'>
Incorrect sort condition resulted in worst 5 beams being picked instead of
best 5.

Use log probabilities for joint probability calculation instead of
linear probabilities. Long beams would have probabilities converge
exponentially towards zero; now they converge linearly towards
-INFINITY.

Using both transcripts in Evaluation/setup.ps1, I see a small edit
distance regression (~5%) using beam search vs. greedy.
</content>
</entry>
<entry>
<title>Begin work on evaluation framework</title>
<updated>2023-03-03T23:43:24+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-03T23:43:24+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=f7d5741e5c069d759f8412bd40b279e1d7abac4c'/>
<id>urn:sha1:f7d5741e5c069d759f8412bd40b279e1d7abac4c</id>
<content type='text'>
Need a way to verify that beam search is actually working better than
greedy.
</content>
</entry>
<entry>
<title>Finish beam search rough draft</title>
<updated>2023-03-03T04:33:22+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-03T04:32:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=d743645ba27cc85d36fe6820cd9d21f0fc4a11f2'/>
<id>urn:sha1:d743645ba27cc85d36fe6820cd9d21f0fc4a11f2</id>
<content type='text'>
Seems to work. Doesn't crash. Lots of room for optimization and cleanup.
</content>
</entry>
<entry>
<title>Continue work on beam search</title>
<updated>2023-03-03T01:43:26+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-03-03T01:43:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=dcd7f3b60e3b9ad8df83d444f8bc67091b411529'/>
<id>urn:sha1:dcd7f3b60e3b9ad8df83d444f8bc67091b411529</id>
<content type='text'>
Define ContextImpl::Context, wrapping all the data used in decoding.
Using a vector of these is much simpler than using N vectors of all
the random stuff we need.
</content>
</entry>
<entry>
<title>Begin work on beam search decoding</title>
<updated>2023-02-28T02:03:02+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-28T02:03:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=4f967dbdb7972ec52039bd3e3ce3e1e4cbcf6545'/>
<id>urn:sha1:4f967dbdb7972ec52039bd3e3ce3e1e4cbcf6545</id>
<content type='text'>
* ContextImpl.h puts prompts, previous prompts, probabilities, and
  probability IDs into vectors of size 1 or N_BEAMS, depending on
  the decoding strategy.
* Extend sampleBest and friends to return top N tokens, instead of
  just the top 1 token.
</content>
</entry>
<entry>
<title>Add retainDuration option to CaptureParams</title>
<updated>2023-02-27T04:09:15+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-27T03:42:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=1136acfc365f357d2df13a263714e8ae0614c4f9'/>
<id>urn:sha1:1136acfc365f357d2df13a263714e8ae0614c4f9</id>
<content type='text'>
This allows users to retain a suffix of the PCM buffer after a VAD
segmentation event, reducing some instances of words being lost at
the start of the next VAD window.
</content>
</entry>
<entry>
<title>Normalize audio before sending to transcription layer</title>
<updated>2023-02-27T04:08:45+00:00</updated>
<author>
<name>yum</name>
<email>yum.food.vr@gmail.com</email>
</author>
<published>2023-02-27T03:12:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.yummers.dev/TaSTT-Whisper.git/commit/?id=02c2605454288f7c86023ae700366acf08cd2206'/>
<id>urn:sha1:02c2605454288f7c86023ae700366acf08cd2206</id>
<content type='text'>
Helps in cases where the speaker is speaking softly, or their mic gain
is set low.
</content>
</entry>
</feed>
