summaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2022-11-14 21:30:50 -0800
committeryum <yum.food.vr@gmail.com>2022-11-14 21:36:13 -0800
commit2505a5cc486cd913db50a475e45c3701b9710282 (patch)
tree86855b5772cc6400205926ed8d935227a574a7e6 /README.md
parent9921697816c9f9473bac54444793f702e54d24a6 (diff)
Another transcription rework
After re-reading the paper, I noticed that they apply a couple optimizations I wasn't using. Use the top-level `whisper.transcribe` method, which is a little slower, but more accurate than the one I was using. Although this method is slower, it has better temporal stability due to the increased quality, which I think should make for an overall more responsive UX. Lower transcription quality means the paging layer has to waste time updating earlier cells. Also, drop the auto-commit stuff and go back to string stitching. I think it's better to let the user manually commit. A rework of the hand controls is probably coming soon. Finally, update README.
Diffstat (limited to 'README.md')
-rw-r--r--README.md3
1 files changed, 3 insertions, 0 deletions
diff --git a/README.md b/README.md
index a5e3ff8..885e424 100644
--- a/README.md
+++ b/README.md
@@ -157,6 +157,9 @@ To use the STT:
3. ~~Speech-to-text interface. Speak out loud, show in game.~~ DONE
4. Translation into non-English. Whisper natively supports translating N
languages into English, but not the other way around.
+ 5. Display text in overlay. Enables (1) lower latency view of TaSTT's
+ transcription state; (2) checking transcriptions ahead of time; (3)
+ checking transcriptions without having to see the board in game.
4. Optimization
1. ~~Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the
total # of parameters (and therefore OSC messages & sync events). Note