summaryrefslogtreecommitdiffstats
path: root/Scripts/osc_ctrl.py
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2024-02-05 17:40:37 -0800
committeryum <yum.food.vr@gmail.com>2024-02-05 17:40:37 -0800
commite58c718cb115c44ef3a546bea245e05e50d24c55 (patch)
tree228f5cfddfd974a0567dde29bc199fde0f169f22 /Scripts/osc_ctrl.py
parentacccf8ebcff0f7cc2b26e45e497f8b12ab73d8e1 (diff)
Add another threshold to filter out common hallucinations
The paper recommends filtering out segments with no_speech_prob > 0.6 and avg_logprob < -1. This is too loose of a bound for short-form audio which is not guaranteed to contain speech. I already have a tighter bound: no_speech > 0.6 and avg_logprob < -0.5 While listening to instrumental music I find that a lot of hallucinations sneak past that bound. So I added a second bound: no_speech > 0.15 and avg_logprob < -0.7 Basically we filter out things that look like speech but have a worse avg_logprob. Seems to not have false negatives. Requires testing. Also: dial back the default max segment length from 15 seconds to 10 seconds. This is done based on performance observations in desktop.
Diffstat (limited to 'Scripts/osc_ctrl.py')
0 files changed, 0 insertions, 0 deletions