diff options
| author | yum <yum.food.vr@gmail.com> | 2023-09-03 13:23:50 -0700 |
|---|---|---|
| committer | yum <yum.food.vr@gmail.com> | 2023-09-03 13:23:50 -0700 |
| commit | 606d223f8ba9174a2984d7cb15e6e94ef6e48228 (patch) | |
| tree | afd1b19fe801d9aac54b4e5bbe4a671e5df2217c /Scripts/osc_ctrl.py | |
| parent | e9b5b4f1da2a8ff07b2d13e5e63dae491325251d (diff) | |
Experiment with Collector filters
Try adding two filters on top of the usual AudioCollector:
* Minimum length preservation: never report fewer than N seconds worth
of audio data. Pad with silence as needed.
* Volume normalizing: normalize audio volume.
Using my benchmark of 30-second audio clips from 3 speakers (lower is
better):
length enf + norm = 87.118
nothing = 90.917
norm = 94.538
length = 111.402
Both together are a slight improvement, but independently degrade the
result by a lot. I also observed more hallucinations in a conversational
pattern when using them vs. not. So I'll phase them out.
I'm still curious about *compression* as opposed to normalization.
Diffstat (limited to 'Scripts/osc_ctrl.py')
0 files changed, 0 insertions, 0 deletions
