summaryrefslogtreecommitdiffstats
path: root/Whisper/MF
Commit message (Collapse)AuthorAge
* Fix audio normalizationHEADmasteryum2023-04-04
| | | | | | | Normalization was putting audio onto range [0, 255], while it should have been on range [0, 1]. * Add AudioBuffer::save() to enable debugging audio issues.
* Begin work on evaluation frameworkyum2023-03-03
| | | | | Need a way to verify that beam search is actually working better than greedy.
* Add retainDuration option to CaptureParamsyum2023-02-26
| | | | | | This allows users to retain a suffix of the PCM buffer after a VAD segmentation event, reducing some instances of words being lost at the start of the next VAD window.
* Normalize audio before sending to transcription layeryum2023-02-26
| | | | | Helps in cases where the speaker is speaking softly, or their mic gain is set low.
* Frames with no VAD are shortened, not droppedyum2023-02-26
| | | | | | | | | | | | On PCM buffers of length >= captureParams.dropStartSilence, a "no voice" VAD verdict would result in the PCM buffer being entirely cleared. The emergent behavior is that when VAD segments speech, words right after the segmentation window can frequently be dropped. By removing a prefix from the PCM buffer and clearing the VAD buffers, the transcription algorithm has access to "leading" frames before the frames which triggered VAD. This reduces cases where words are omitted in the middle of long statements.
* Bugfix, stereo PCM handlingKonstantin2023-01-28
|
* Minor, error handlingKonstantin2023-01-20
|
* Workaround for the Microsoft’s bug in their MP3 decoder MFTKonstantin2023-01-19
|
* Minor, optimized away memcpy() when running audio captureKonstantin2023-01-17
|
* Source codesKonstantin2023-01-16