From 6cf2a048e79afb886739dd66ea4c94fe191780e7 Mon Sep 17 00:00:00 2001 From: yum Date: Mon, 23 Jan 2023 15:17:20 -0800 Subject: Update README * Point to a more up-to-date demo. * Improve wordsmithing/flow --- README.md | 41 +++++++++++++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index bc6f6b0..786bebe 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,12 @@ ## TaSTT: A deliciously free STT TaSTT (pronounced "tasty") is a free speech-to-text tool for VRChat. It uses -local machine transcription to turn your voice into text, then sends it into -VRChat via OSC. +a GPU-based transcription algorithm to turn your voice into text, then sends it +into VRChat via OSC. -![Speech-to-text demo](Images/speech_to_text_demo.gif) +To get started, download the latest .zip from [the releases page](https://github.com/yum-food/TaSTT/releases/latest). + +[![Speech-to-text demo](https://img.youtube.com/vi/u5h-ivkwS0M/0.jpg)](https://youtube.com/watch?v=u5h-ivkwS0M) Contents: @@ -35,12 +37,14 @@ Basic controls: * Customizable board resolution, [up to ridiculous sizes](https://www.youtube.com/watch?v=u5h-ivkwS0M). * 8-bit and 16-bit character encodings. -* Japanese, Korean, and Chinese glyphs included. -* Multiple language support. +* Multi-language support. + * Japanese, Korean, and Chinese glyphs included. * Resizable. -* Audio feedback: hear distinct beeps when transcription starts and stops. - * May also enable in-game noise indicator, to grab others' attention. -* Visual transcription indicator. +* Audio feedback: hear distinct beeps when transcription starts and stops + (optional). + * May also enable in-game noise indicator, to grab others' attention + (optional). +* Visual transcription indicator (optional). * Locks to world space when done speaking. * Can use built-in chatbox (usable with public avatars!) * Privacy-respecting: transcription is done on your GPU, not in the cloud. @@ -50,6 +54,27 @@ Basic controls: * Free as in freedom. * MIT license. +## Requirements + +* ~5GB disk space + * I apologize that this is so big. The libraries used to perform + GPU-accelerated transcription (pytorch and whisper) are really, + really big. There is no performant implementation of Whisper or a + any other comparable algorithm available in a systems programming + language, so for now we're stuck with this. You only need to + download this stuff once! +* NVIDIA GPU with at least 2GB of spare VRAM. + * You *can* run it in CPU mode, but it's really slow and lags you a + lot more, so I wouldn't recommend it. + * I've tested on a 1080 Ti and a 3090 and saw comparable performance. +* SteamVR. + * No Oculus support, yet. +* Left joystick click must not be bound to anything else. +* No write defaults on your avatar if you're using the custom text box. + +For the last 3 bullets: please let me know in the Discord if these are +deal breakers. I'd be happy to fix them! + ### Motivation Many VRChat players choose not to use their mics, but as a practical matter, -- cgit v1.2.3