From e58c718cb115c44ef3a546bea245e05e50d24c55 Mon Sep 17 00:00:00 2001
From: yum <yum.food.vr@gmail.com>
Date: Mon, 5 Feb 2024 17:40:37 -0800
Subject: Add another threshold to filter out common hallucinations

The paper recommends filtering out segments with no_speech_prob > 0.6
and avg_logprob < -1. This is too loose of a bound for short-form audio
which is not guaranteed to contain speech.

I already have a tighter bound:

  no_speech > 0.6 and avg_logprob < -0.5

While listening to instrumental music I find that a lot of
hallucinations sneak past that bound. So I added a second bound:

  no_speech > 0.15 and avg_logprob < -0.7

Basically we filter out things that look like speech but have a worse
avg_logprob. Seems to not have false negatives. Requires testing.

Also: dial back the default max segment length from 15 seconds to 10
seconds. This is done based on performance observations in desktop.
---
 GUI/GUI/GUI/Config.cpp | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'GUI')

diff --git a/GUI/GUI/GUI/Config.cpp b/GUI/GUI/GUI/Config.cpp
index a92502d..52f6f30 100644
--- a/GUI/GUI/GUI/Config.cpp
+++ b/GUI/GUI/GUI/Config.cpp
@@ -87,8 +87,8 @@ AppConfig::AppConfig(wxTextCtrl* out)
 	enable_lock_at_spawn(true),
 	gpu_idx(0),
 	min_silence_duration_ms(250),
-	max_speech_duration_s(15),
-	reset_after_silence_s(10),
+	max_speech_duration_s(10),
+	reset_after_silence_s(15),
 	transcription_loop_delay_ms(100),
 	keybind("ctrl+x"),
 
-- 
cgit v1.2.3