TaSTT.git/GUI, branch v0.19.1

Bump CUDNN to v8.9.7

2024-06-09T23:43:34+00:00

Also disable flash-attention when CPU mode is selected

Add checkbox for flash-attention

2024-06-09T22:54:30+00:00

Pre-3000 series GPUs don't support it. Oops!

Update defaults to work with modular prefab

2024-06-06T07:45:27+00:00

There's a modular avatar prefab for the custom chatbox on my gumroad. Update the default settings to work with that prefab.

Upgrade faster-whisper with flash-attention2

2024-06-06T01:15:47+00:00

This should be significantly more efficient than prior versions. * add large-v3 & distilled variant * simplify model acquisition code now that distilled models are part of faster-whisper.

Fix distilled models

2024-03-15T01:03:54+00:00

These were broken due to some logic errors in the codepath which acquires models from huggingface. Distilled large-v2 seems promising as a new default model.

Finish fixing build break

2024-03-04T23:50:16+00:00

CUDNN now pulls from dropbox instead of google drive. This has the added benefit of being about 10-20x faster (assuming you have fast internet).

Begin fixing build on new hosts

2024-03-04T23:36:48+00:00

Google drive intentionally broke CLI downloads ("don't be evil") and UwwwuPP went away. Begin work rehosting both files.

Add dropdown for GPU compute type

2024-02-10T01:21:46+00:00

Should enable compatibility with older GPUs.

Add another threshold to filter out common hallucinations

2024-02-06T01:40:37+00:00

The paper recommends filtering out segments with no_speech_prob > 0.6 and avg_logprob < -1. This is too loose of a bound for short-form audio which is not guaranteed to contain speech. I already have a tighter bound: no_speech > 0.6 and avg_logprob < -0.5 While listening to instrumental music I find that a lot of hallucinations sneak past that bound. So I added a second bound: no_speech > 0.15 and avg_logprob < -0.7 Basically we filter out things that look like speech but have a worse avg_logprob. Seems to not have false negatives. Requires testing. Also: dial back the default max segment length from 15 seconds to 10 seconds. This is done based on performance observations in desktop.

Verify that audio is clean after VAD segmentation

2024-02-06T01:02:23+00:00

Indeed it is. Bumped up the default max segment length to decrease error. Also add mic presets for beyond (the vr headset) and motu (my mic interface).