summaryrefslogtreecommitdiffstats
path: root/Whisper
Commit message (Collapse)AuthorAge
* Fix audio normalizationHEADmasteryum2023-04-04
| | | | | | | Normalization was putting audio onto range [0, 255], while it should have been on range [0, 1]. * Add AudioBuffer::save() to enable debugging audio issues.
* begin work disabling vadyum2023-03-17
|
* Fix beam search previous window conditioningyum2023-03-07
| | | | Not all contexts had `prev_prompt`, causing most beams to misbehave.
* Use logprobs, fix beam candidate selectionyum2023-03-03
| | | | | | | | | | | | | Incorrect sort condition resulted in worst 5 beams being picked instead of best 5. Use log probabilities for joint probability calculation instead of linear probabilities. Long beams would have probabilities converge exponentially towards zero; now they converge linearly towards -INFINITY. Using both transcripts in Evaluation/setup.ps1, I see a small edit distance regression (~5%) using beam search vs. greedy.
* Begin work on evaluation frameworkyum2023-03-03
| | | | | Need a way to verify that beam search is actually working better than greedy.
* Finish beam search rough draftyum2023-03-02
| | | | Seems to work. Doesn't crash. Lots of room for optimization and cleanup.
* Continue work on beam searchyum2023-03-02
| | | | | | Define ContextImpl::Context, wrapping all the data used in decoding. Using a vector of these is much simpler than using N vectors of all the random stuff we need.
* Begin work on beam search decodingyum2023-02-27
| | | | | | | | * ContextImpl.h puts prompts, previous prompts, probabilities, and probability IDs into vectors of size 1 or N_BEAMS, depending on the decoding strategy. * Extend sampleBest and friends to return top N tokens, instead of just the top 1 token.
* Add retainDuration option to CaptureParamsyum2023-02-26
| | | | | | This allows users to retain a suffix of the PCM buffer after a VAD segmentation event, reducing some instances of words being lost at the start of the next VAD window.
* Normalize audio before sending to transcription layeryum2023-02-26
| | | | | Helps in cases where the speaker is speaking softly, or their mic gain is set low.
* Frames with no VAD are shortened, not droppedyum2023-02-26
| | | | | | | | | | | | On PCM buffers of length >= captureParams.dropStartSilence, a "no voice" VAD verdict would result in the PCM buffer being entirely cleared. The emergent behavior is that when VAD segments speech, words right after the segmentation window can frequently be dropped. By removing a prefix from the PCM buffer and clearing the VAD buffers, the transcription algorithm has access to "leading" frames before the frames which triggered VAD. This reduces cases where words are omitted in the middle of long statements.
* Restored missing token-level timestamps experimental featureKonstantin2023-02-14
|
* Version 1.7Konstantin2023-02-07
|
* CommentsKonstantin2023-02-03
|
* Bugfix, incorrect output of command-line examples when launched with ↵Konstantin2023-02-03
| | | | multiple input files
* Version 1.6Konstantin2023-01-29
|
* Diarize feature for buffered audioKonstantin2023-01-28
|
* Minor, micro-optimizationKonstantin2023-01-28
|
* Diarize feature, initial versionKonstantin2023-01-28
|
* Bugfix, stereo PCM handlingKonstantin2023-01-28
|
* DLL API for diarize featureKonstantin2023-01-28
|
* Version 1.5Konstantin2023-01-24
|
* Performance tuning on AMD iGPUKonstantin2023-01-24
|
* Minor, micro-optimizationKonstantin2023-01-23
|
* Performance improvement, no longer destroying temporary buffers in ↵Konstantin2023-01-23
| | | | `encode()` method
* Improved VRAM memory management, both speed and memory usageKonstantin2023-01-23
|
* Minor, performance and VRAM useKonstantin2023-01-23
|
* Minor, micro-optimizationKonstantin2023-01-23
|
* Performance improvement, `softMax` shaderKonstantin2023-01-23
|
* Minor, profiler tagsKonstantin2023-01-23
|
* VAD CPU performance, slightly better code generationKonstantin2023-01-23
|
* GPU performance, optimized away a few shader dispatchesKonstantin2023-01-22
|
* Experimental, alternative busy wait implementationKonstantin2023-01-21
| | | | | | Disabled with a `constexpr` flag because on a desktop with discrete GPU this slowed down by about 20%. But the CPU load is about zero. Need to test on iGPUs, thermal shenanigans might make a difference there.
* Minor, CPU performanceKonstantin2023-01-21
|
* CPU performance, SSE vectorization for MEL spectrogramKonstantin2023-01-21
|
* Version 1.4Konstantin2023-01-20
|
* Minor, error handlingKonstantin2023-01-20
|
* Version 1.3Konstantin2023-01-19
|
* Workaround for the Microsoft’s bug in their MP3 decoder MFTKonstantin2023-01-19
|
* Version 1.2Konstantin2023-01-18
|
* Minor, logging and UXKonstantin2023-01-18
|
* Optional startup flags to override performance-related defaults for the ↵Konstantin2023-01-18
| | | | compute shaders
* Consistent cancellation API across the library: S_OK = continue, S_FALSE = stopKonstantin2023-01-18
|
* Minor, optimized away memcpy() when running audio captureKonstantin2023-01-17
|
* CommentKonstantin2023-01-16
|
* CommentsKonstantin2023-01-16
|
* DLL version 1.1Konstantin2023-01-16
|
* Bugfix: when processing files, “Run” CPU block was erroneously measured ↵Konstantin2023-01-16
| | | | twice
* Bugfix, failed C++ with the lack of move constructor in the CPU profiler ↵Konstantin2023-01-16
| | | | RAII class
* CommentKonstantin2023-01-16
|