summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2022-11-06 21:22:54 -0800
committeryum <yum.food.vr@gmail.com>2022-11-06 21:22:54 -0800
commit629c0f611de1622131bb0fa364c170219f6252ed (patch)
tree605ef13186df9a21d9e6fc48ef02a2a391317c0e
parentfe7e51db4c341f9510351e9b3942430f6d44edf2 (diff)
Update README
-rw-r--r--README.md13
-rw-r--r--osc_ctrl.py3
2 files changed, 11 insertions, 5 deletions
diff --git a/README.md b/README.md
index f0cce3d..a5e3ff8 100644
--- a/README.md
+++ b/README.md
@@ -1,16 +1,23 @@
## TaSTT: A deliciously free STT
TaSTT (pronounced "tasty") is a free speech-to-text tool for VRChat. It uses
-local machine translation to turn your voice into text, then sends it into
+local machine transcription to turn your voice into text, then sends it into
VRChat via OSC. A few parameters, a machine-generated FX layer, and a
custom shader display the text in game.
![Speech-to-text demo](Images/speech_to_text_demo.gif)
Features:
+
* 4x44 grid, 256 or 65536 characters per slot.
* Text-to-text interface.
* Speech-to-text interface.
+* Multiple language support.
+ * Transcription within the same language works for many languages.
+ * Translation from N languages to English is supported.
+ * Translation from English into other languages is added case by case. This
+ is a limitation of the state of the art in machine translation: fine-tuned
+ English->other language models far outperform English->many language models.
* Free as in beer.
* Free as in freedom.
* Privacy-respecting: transcription is done on your GPU, not in the cloud.
@@ -35,7 +42,7 @@ reliable as possible.
There are existing tools which help here, but they are all imperfect for one
reason or another:
-1. RabidCrab's STT costs money and relies on cloud-based translation. I have
+1. RabidCrab's STT costs money and relies on cloud-based transcription. I have
struggled with latency, quality, and reliability issues. It's also
closed-source.
2. The in-game text box is only visible to your friends list, making it
@@ -148,6 +155,8 @@ To use the STT:
1. Error detection & correction.
2. ~~Text-to-text interface. Type in terminal, show in game.~~ DONE
3. ~~Speech-to-text interface. Speak out loud, show in game.~~ DONE
+ 4. Translation into non-English. Whisper natively supports translating N
+ languages into English, but not the other way around.
4. Optimization
1. ~~Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the
total # of parameters (and therefore OSC messages & sync events). Note
diff --git a/osc_ctrl.py b/osc_ctrl.py
index bb6dd87..5ab65de 100644
--- a/osc_ctrl.py
+++ b/osc_ctrl.py
@@ -77,9 +77,6 @@ def disable(client):
# `which_cell` is an integer in the range [0,2**INDEX_BITS).
def sendMessageCellDiscrete(client, msg_cell, which_cell):
empty_cell = [state.encoding[' ']] * NUM_LAYERS
- if msg_cell != state.encoding[' '] * BOARD_COLS:
- addr="/avatar/parameters/" + generate_utils.getSpeechNoiseToggleParam()
- client.send_message(addr, False)
if msg_cell != empty_cell:
addr="/avatar/parameters/" + generate_utils.getSpeechNoiseToggleParam()