diff options
| author | yum <yum.food.vr@gmail.com> | 2024-02-05 17:40:37 -0800 |
|---|---|---|
| committer | yum <yum.food.vr@gmail.com> | 2024-02-05 17:40:37 -0800 |
| commit | e58c718cb115c44ef3a546bea245e05e50d24c55 (patch) | |
| tree | 228f5cfddfd974a0567dde29bc199fde0f169f22 /GUI | |
| parent | acccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1 (diff) | |
Add another threshold to filter out common hallucinations
The paper recommends filtering out segments with no_speech_prob > 0.6
and avg_logprob < -1. This is too loose of a bound for short-form audio
which is not guaranteed to contain speech.
I already have a tighter bound:
no_speech > 0.6 and avg_logprob < -0.5
While listening to instrumental music I find that a lot of
hallucinations sneak past that bound. So I added a second bound:
no_speech > 0.15 and avg_logprob < -0.7
Basically we filter out things that look like speech but have a worse
avg_logprob. Seems to not have false negatives. Requires testing.
Also: dial back the default max segment length from 15 seconds to 10
seconds. This is done based on performance observations in desktop.
Diffstat (limited to 'GUI')
| -rw-r--r-- | GUI/GUI/GUI/Config.cpp | 4 |
1 files changed, 2 insertions, 2 deletions
diff --git a/GUI/GUI/GUI/Config.cpp b/GUI/GUI/GUI/Config.cpp index a92502d..52f6f30 100644 --- a/GUI/GUI/GUI/Config.cpp +++ b/GUI/GUI/GUI/Config.cpp @@ -87,8 +87,8 @@ AppConfig::AppConfig(wxTextCtrl* out) enable_lock_at_spawn(true),
gpu_idx(0),
min_silence_duration_ms(250),
- max_speech_duration_s(15),
- reset_after_silence_s(10),
+ max_speech_duration_s(10),
+ reset_after_silence_s(15),
transcription_loop_delay_ms(100),
keybind("ctrl+x"),
|
