From 2505a5cc486cd913db50a475e45c3701b9710282 Mon Sep 17 00:00:00 2001 From: yum Date: Mon, 14 Nov 2022 21:30:50 -0800 Subject: Another transcription rework After re-reading the paper, I noticed that they apply a couple optimizations I wasn't using. Use the top-level `whisper.transcribe` method, which is a little slower, but more accurate than the one I was using. Although this method is slower, it has better temporal stability due to the increased quality, which I think should make for an overall more responsive UX. Lower transcription quality means the paging layer has to waste time updating earlier cells. Also, drop the auto-commit stuff and go back to string stitching. I think it's better to let the user manually commit. A rework of the hand controls is probably coming soon. Finally, update README. --- README.md | 3 +++ 1 file changed, 3 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index a5e3ff8..885e424 100644 --- a/README.md +++ b/README.md @@ -157,6 +157,9 @@ To use the STT: 3. ~~Speech-to-text interface. Speak out loud, show in game.~~ DONE 4. Translation into non-English. Whisper natively supports translating N languages into English, but not the other way around. + 5. Display text in overlay. Enables (1) lower latency view of TaSTT's + transcription state; (2) checking transcriptions ahead of time; (3) + checking transcriptions without having to see the board in game. 4. Optimization 1. ~~Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the total # of parameters (and therefore OSC messages & sync events). Note -- cgit v1.2.3