summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2023-03-02 16:21:30 -0800
committeryum <yum.food.vr@gmail.com>2023-03-02 17:07:28 -0800
commitb5ba8345cf5ceafbb24b73cf4bf7dd38510f6c22 (patch)
tree57ee481d2a38372b04bbba7b85c47fc0a889fb95
parent64c158c549f6f5136846a0f546e8a204843e1ef8 (diff)
Update README.txt
-rw-r--r--README.md20
1 files changed, 17 insertions, 3 deletions
diff --git a/README.md b/README.md
index ff4781e..5ec9a6e 100644
--- a/README.md
+++ b/README.md
@@ -96,9 +96,9 @@ reliable as possible.
There are existing tools which help here, but they are all imperfect for one
reason or another:
-1. RabidCrab's STT costs money and relies on cloud-based transcription. I have
- struggled with latency, quality, and reliability issues. It's also
- closed-source.
+1. RabidCrab's STT costs money and relies on cloud-based transcription.
+ Because of the reliance on cloud-based transcription services, it's
+ typically slower and less reliable than local transcription.
2. The in-game text box is not visible in streamer mode, and limits you to one
update every ~2 seconds, making it a poor choice for latency-sensitive
communication.
@@ -109,6 +109,18 @@ reason or another:
KillFrenzy's AvatarText and Whisper kiss. It's the closest spiritual cousin
to this repository. There are two crucial differences: it's GPL not MIT, and
it doesn't abstract away the command line.
+5. [VRCWizard's TTS-Voice-Wizard](https://github.com/VRCWizard/TTS-Voice-Wizard)
+ also uses Whisper, but they rely on the C# interface to Const-Me's
+ CUDA-enabled Whisper implementation. This implementation does not support
+ beam search decoding and waits for pauses to segment your voice. Thus it's
+ less accurate and higher latency than this project's Python-based
+ transcription engine, but it's more performant. It supports more feature
+ (like cloud-based TTS), so you might want to check it out.
+
+Why should you pick this project over the alternatives? This project has
+the lowest latency (measured <500ms end-to-end on mid-range hardware), most
+reliable transcriptions of any STT in VRChat, period. There is no network hop
+to worry about and no subscription to manage. Just download and go.
## Design overview
@@ -228,6 +240,8 @@ Ping the discord if you need help getting set up.
The other encoding scheme is thus ~2.5 times more efficient. This could
be used to significantly speed up sync times. (Thanks, Noppers for the
idea!)
+ 6. Use Const-Me/Whisper for transcription.
+ 7. Implement beam search in Const-Me/Whisper.
5. Bugfixes
1. ~~The whisper STT says "Thank you." when there's no audio?~~ DONE
2. JP and CN transcription does not work in the GUI due to encoding issues.