summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Fix audio normalizationHEADmasteryum2023-04-04
| | | | | | | Normalization was putting audio onto range [0, 255], while it should have been on range [0, 1]. * Add AudioBuffer::save() to enable debugging audio issues.
* begin work disabling vadyum2023-03-17
|
* Fix beam search previous window conditioningyum2023-03-07
| | | | Not all contexts had `prev_prompt`, causing most beams to misbehave.
* Use logprobs, fix beam candidate selectionyum2023-03-03
| | | | | | | | | | | | | Incorrect sort condition resulted in worst 5 beams being picked instead of best 5. Use log probabilities for joint probability calculation instead of linear probabilities. Long beams would have probabilities converge exponentially towards zero; now they converge linearly towards -INFINITY. Using both transcripts in Evaluation/setup.ps1, I see a small edit distance regression (~5%) using beam search vs. greedy.
* Begin work on evaluation frameworkyum2023-03-03
| | | | | Need a way to verify that beam search is actually working better than greedy.
* Finish beam search rough draftyum2023-03-02
| | | | Seems to work. Doesn't crash. Lots of room for optimization and cleanup.
* Continue work on beam searchyum2023-03-02
| | | | | | Define ContextImpl::Context, wrapping all the data used in decoding. Using a vector of these is much simpler than using N vectors of all the random stuff we need.
* Begin work on beam search decodingyum2023-02-27
| | | | | | | | * ContextImpl.h puts prompts, previous prompts, probabilities, and probability IDs into vectors of size 1 or N_BEAMS, depending on the decoding strategy. * Extend sampleBest and friends to return top N tokens, instead of just the top 1 token.
* Add retainDuration option to CaptureParamsyum2023-02-26
| | | | | | This allows users to retain a suffix of the PCM buffer after a VAD segmentation event, reducing some instances of words being lost at the start of the next VAD window.
* Normalize audio before sending to transcription layeryum2023-02-26
| | | | | Helps in cases where the speaker is speaking softly, or their mic gain is set low.
* Frames with no VAD are shortened, not droppedyum2023-02-26
| | | | | | | | | | | | On PCM buffers of length >= captureParams.dropStartSilence, a "no voice" VAD verdict would result in the PCM buffer being entirely cleared. The emergent behavior is that when VAD segments speech, words right after the segmentation window can frequently be dropped. By removing a prefix from the PCM buffer and clearing the VAD buffers, the transcription algorithm has access to "leading" frames before the frames which triggered VAD. This reduces cases where words are omitted in the middle of long statements.
* ReadmeKonstantin2023-02-18
|
* When token timestamps are requested, disabled streaming in C++ CLI exampleKonstantin2023-02-14
|
* Restored missing token-level timestamps experimental featureKonstantin2023-02-14
|
* Build instructionsKonstantin2023-02-14
|
* ReadmeKonstantin2023-02-08
|
* API documentationKonstantin2023-02-08
|
* Release automationKonstantin2023-02-07
|
* Version 1.7Konstantin2023-02-07
|
* Cleaned up unused dialog templateKonstantin2023-02-07
|
* CommentsKonstantin2023-02-06
|
* CommentsKonstantin2023-02-03
|
* Minor, code generationKonstantin2023-02-03
|
* CommentsKonstantin2023-02-03
|
* Refactor, removed a redundant functionKonstantin2023-02-03
|
* Bugfix, incorrect output of command-line examples when launched with ↵Konstantin2023-02-03
| | | | multiple input files
* C++ console example, static link to C++ runtimeKonstantin2023-02-03
|
* C++ console example now outputs text files when askedKonstantin2023-02-03
|
* Minor, C++ exampleKonstantin2023-02-03
|
* UX bugfix, microphone C# exampleKonstantin2023-02-03
|
* Bugfix, addRepeatEx compute shaderKonstantin2023-02-03
|
* Minor, .NET wrapperKonstantin2023-02-03
|
* Minor, API documentationKonstantin2023-02-03
|
* CommentsKonstantin2023-02-03
|
* Desktop app version 1.6.1Konstantin2023-01-30
|
* “Text with timestamps” output format optionKonstantin2023-01-30
|
* Added `*.m4a` file extension to the browse dialogKonstantin2023-01-30
|
* Better performance of C++ samples on laptops with two graphics cardsKonstantin2023-01-30
| | | | Untested
* ReadmeKonstantin2023-01-29
|
* CommentsKonstantin2023-01-29
|
* Version 1.6Konstantin2023-01-29
|
* C# microphone example, diarize integrationKonstantin2023-01-29
|
* C# console example, diarize featureKonstantin2023-01-29
|
* Minor, C++ console exampleKonstantin2023-01-29
|
* Diarize feature for buffered audioKonstantin2023-01-28
|
* Minor, micro-optimizationKonstantin2023-01-28
|
* Diarize feature, initial versionKonstantin2023-01-28
|
* Bugfix, stereo PCM handlingKonstantin2023-01-28
|
* DLL API for diarize featureKonstantin2023-01-28
|
* Bugfix, compilation error in C++ console exampleKonstantin2023-01-28
|