From b5ba8345cf5ceafbb24b73cf4bf7dd38510f6c22 Mon Sep 17 00:00:00 2001 From: yum Date: Thu, 2 Mar 2023 16:21:30 -0800 Subject: Update README.txt --- README.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index ff4781e..5ec9a6e 100644 --- a/README.md +++ b/README.md @@ -96,9 +96,9 @@ reliable as possible. There are existing tools which help here, but they are all imperfect for one reason or another: -1. RabidCrab's STT costs money and relies on cloud-based transcription. I have - struggled with latency, quality, and reliability issues. It's also - closed-source. +1. RabidCrab's STT costs money and relies on cloud-based transcription. + Because of the reliance on cloud-based transcription services, it's + typically slower and less reliable than local transcription. 2. The in-game text box is not visible in streamer mode, and limits you to one update every ~2 seconds, making it a poor choice for latency-sensitive communication. @@ -109,6 +109,18 @@ reason or another: KillFrenzy's AvatarText and Whisper kiss. It's the closest spiritual cousin to this repository. There are two crucial differences: it's GPL not MIT, and it doesn't abstract away the command line. +5. [VRCWizard's TTS-Voice-Wizard](https://github.com/VRCWizard/TTS-Voice-Wizard) + also uses Whisper, but they rely on the C# interface to Const-Me's + CUDA-enabled Whisper implementation. This implementation does not support + beam search decoding and waits for pauses to segment your voice. Thus it's + less accurate and higher latency than this project's Python-based + transcription engine, but it's more performant. It supports more feature + (like cloud-based TTS), so you might want to check it out. + +Why should you pick this project over the alternatives? This project has +the lowest latency (measured <500ms end-to-end on mid-range hardware), most +reliable transcriptions of any STT in VRChat, period. There is no network hop +to worry about and no subscription to manage. Just download and go. ## Design overview @@ -228,6 +240,8 @@ Ping the discord if you need help getting set up. The other encoding scheme is thus ~2.5 times more efficient. This could be used to significantly speed up sync times. (Thanks, Noppers for the idea!) + 6. Use Const-Me/Whisper for transcription. + 7. Implement beam search in Const-Me/Whisper. 5. Bugfixes 1. ~~The whisper STT says "Thank you." when there's no audio?~~ DONE 2. JP and CN transcription does not work in the GUI due to encoding issues. -- cgit v1.2.3