Experiment with Collector filters - TaSTT.git - Free self-hosted STT for VRChat.

diff options

author	yum <yum.food.vr@gmail.com>	2023-09-03 13:23:50 -0700
committer	yum <yum.food.vr@gmail.com>	2023-09-03 13:23:50 -0700
commit	606d223f8ba9174a2984d7cb15e6e94ef6e48228 (patch)
tree	afd1b19fe801d9aac54b4e5bbe4a671e5df2217c /GUI/Libraries
parent	e9b5b4f1da2a8ff07b2d13e5e63dae491325251d (diff)

Experiment with Collector filters

Try adding two filters on top of the usual AudioCollector: * Minimum length preservation: never report fewer than N seconds worth of audio data. Pad with silence as needed. * Volume normalizing: normalize audio volume. Using my benchmark of 30-second audio clips from 3 speakers (lower is better): length enf + norm = 87.118 nothing = 90.917 norm = 94.538 length = 111.402 Both together are a slight improvement, but independently degrade the result by a lot. I also observed more hallucinations in a conversational pattern when using them vs. not. So I'll phase them out. I'm still curious about *compression* as opposed to normalization.

Diffstat (limited to 'GUI/Libraries')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: