From e58c718cb115c44ef3a546bea245e05e50d24c55 Mon Sep 17 00:00:00 2001 From: yum Date: Mon, 5 Feb 2024 17:40:37 -0800 Subject: Add another threshold to filter out common hallucinations The paper recommends filtering out segments with no_speech_prob > 0.6 and avg_logprob < -1. This is too loose of a bound for short-form audio which is not guaranteed to contain speech. I already have a tighter bound: no_speech > 0.6 and avg_logprob < -0.5 While listening to instrumental music I find that a lot of hallucinations sneak past that bound. So I added a second bound: no_speech > 0.15 and avg_logprob < -0.7 Basically we filter out things that look like speech but have a worse avg_logprob. Seems to not have false negatives. Requires testing. Also: dial back the default max segment length from 15 seconds to 10 seconds. This is done based on performance observations in desktop. --- GUI/GUI/GUI/Config.cpp | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'GUI') diff --git a/GUI/GUI/GUI/Config.cpp b/GUI/GUI/GUI/Config.cpp index a92502d..52f6f30 100644 --- a/GUI/GUI/GUI/Config.cpp +++ b/GUI/GUI/GUI/Config.cpp @@ -87,8 +87,8 @@ AppConfig::AppConfig(wxTextCtrl* out) enable_lock_at_spawn(true), gpu_idx(0), min_silence_duration_ms(250), - max_speech_duration_s(15), - reset_after_silence_s(10), + max_speech_duration_s(10), + reset_after_silence_s(15), transcription_loop_delay_ms(100), keybind("ctrl+x"), -- cgit v1.2.3