From 9fff496394dcd94c4084694ca96a5e07ab836274 Mon Sep 17 00:00:00 2001
From: yum <yum.food.vr@gmail.com>
Date: Mon, 23 Jan 2023 14:28:53 -0800
Subject: package.ps1 now fetches all dependencies

Don't literally check in Python since it looks dodgy (rightfully so).
Instead the build script just fetches it.

* Update README, simplifying language and documenting other projects
---
 README.md | 96 ++++++++++++++++++++++++++-------------------------------------
 1 file changed, 39 insertions(+), 57 deletions(-)

(limited to 'README.md')

diff --git a/README.md b/README.md
index 4a48d2e..bc6f6b0 100644
--- a/README.md
+++ b/README.md
@@ -2,8 +2,7 @@
 
 TaSTT (pronounced "tasty") is a free speech-to-text tool for VRChat. It uses
 local machine transcription to turn your voice into text, then sends it into
-VRChat via OSC. A few parameters, a machine-generated FX layer, and a
-custom shader display the text in game.
+VRChat via OSC.
 
 ![Speech-to-text demo](Images/speech_to_text_demo.gif)
 
@@ -20,9 +19,7 @@ Made with love by yum\_food.
 
 ## Usage and setup
 
-To use a prebuilt package, go to the releases tab and download the latest
-release. Follow the guide associated with that release. To give you a taste,
-[here's the v0.0 setup guide](https://www.youtube.com/watch?v=0qjxkdVTqcs).
+Get the latest package from [the releases page](https://github.com/yum-food/TaSTT/releases/latest).
 
 Please [join the discord](https://discord.gg/YWmCvbCRyn) to share feedback and
 get technical help.
@@ -30,35 +27,28 @@ get technical help.
 To build your own package from source, see GUI/README.md.
 
 Basic controls:
-* Short click the left joystick to make it show up & start transcribing.
-* Short click the left joystick to make it lock in place & stop transcribing.
-* Long click the left joystick to make it go away & stop transcribing.
+* Short click the left joystick to toggle transcription.
+* Long click the left joystick to hide the text box.
 * Scale it up/down in the radial menu.
 
 ## Features
 
-* 4x48 grid, 256 or 65536 characters per slot.
-* Text-to-text interface.
-* Speech-to-text interface.
+* Customizable board resolution, [up to ridiculous sizes](https://www.youtube.com/watch?v=u5h-ivkwS0M).
+* 8-bit and 16-bit character encodings.
+* Japanese, Korean, and Chinese glyphs included.
 * Multiple language support.
-  * Transcription within the same language works for many languages.
-  * Translation from N languages to English is supported.
-  * Translation from English into other languages is added case by case. This
-    is a limitation of the state of the art in machine translation: fine-tuned
-    English->other language models far outperform English->many language models.
-* Start/stop transcription by clicking left joystick.
-* Resizable: talk to friends close up or far away.
+* Resizable.
 * Audio feedback: hear distinct beeps when transcription starts and stops.
   * May also enable in-game noise indicator, to grab others' attention.
-* Visual transcription indicator. Green == talking, orange == waiting for sync,
-  red == done talking.
-* May be attached to hand or left in world space.
-* Free as in beer.
-* Free as in freedom.
+* Visual transcription indicator.
+* Locks to world space when done speaking.
+* Can use built-in chatbox (usable with public avatars!)
 * Privacy-respecting: transcription is done on your GPU, not in the cloud.
 * Hackable.
-* 100% from-scratch implementation.
-* Permissive MIT license.
+* From-scratch implementation.
+* Free as in beer.
+* Free as in freedom.
+* MIT license.
 
 ### Motivation
 
@@ -72,32 +62,37 @@ reason or another:
 1. RabidCrab's STT costs money and relies on cloud-based transcription. I have
    struggled with latency, quality, and reliability issues. It's also
    closed-source.
-2. The in-game text box is only visible to your friends list, making it
-   useless for those who like to make new friends.
-
-Thus I believe that a free alternative is both needed and justified.
-
-I hope that this codebase aids and motivates the creation of better, more
-expressive communication tools for mutes.
+2. The in-game text box is not visible in streamer mode, and limits you to one
+   update every ~2 seconds, making it a poor choice for latency-sensitive
+   communication.
+3. [KillFrenzy's AvatarText](https://github.com/killfrenzy96/KillFrenzyAvatarText)
+   only supports text-to-text, and is GPL, making it legally risky for people
+   who want to sell closed-source software.
+4. [I5UCC's VRCTextboxSTT](https://github.com/I5UCC/VRCTextboxSTT) makes
+   KillFrenzy's AvatarText and Whisper kiss. It's the closest spiritual cousin
+   to this repository. There are two crucial differences: it's GPL not MIT, and
+   it doesn't abstract away the command line.
 
 ### Design overview
 
-There are currently 5 important pieces:
+These are the important bits:
 
-1. `TaSTT.shader`. A simple unlit shader. Has one parameter per cell in the
-   display.
-2. `libunity.py`. Contains the logic required to generate and manipulate Unity
+1. `TaSTT_template.shader`. A simple unlit shader template. Contains the
+   business logic for the shader that shows text in game.
+2. `generate_shader.py`. Adds parameters and an accessor function to the
+   shader template.
+3. `libunity.py`. Contains the logic required to generate and manipulate Unity
    YAML files. Works well enough on YAMLs up to ~40k documents, 1M lines.
-3. `libtastt.py`. Contains the logic to generate TaSTT-specific Unity files,
+4. `libtastt.py`. Contains the logic to generate TaSTT-specific Unity files,
    namely the animations and the animator.
-4. `osc_ctrl.py`. Sends OSC messages to VRChat, which it dutifully passes along
+5. `osc_ctrl.py`. Sends OSC messages to VRChat, which it dutifully passes along
    to the generated FX layer.
-5. `transcribe.py`. Uses OpenAI's whisper neural network to transcribe audio
+6. `transcribe.py`. Uses OpenAI's whisper neural network to transcribe audio
    and sends it to the board using osc_ctrl.
 
 #### Parameters & board indexing
 
-I divide the board into 16 regions and use a single int parameter,
+I divide the board into several regions and use a single int parameter,
 `TaSTT_Select`, to select the active region. For each byte of data
 in the active region, I use a float parameter to blend between two
 animations: one with value 0, and one with value 255.
@@ -105,24 +100,11 @@ animations: one with value 0, and one with value 255.
 To support wide character sets, I support 2 bytes per character. This
 can be configured down to 1 byte per character to save parameter bits.
 
-The the total amount of parameter memory used is dictated by this equation:
-
-```
-ROWS = 4
-COLS = 44
-CELLS = 16
-MEMORY = ROWS * COLS * (N bits per character) / CELLS + 1 + log2(CELLS)
-```
-
-This is currently 93 bits for 1-byte characters and 181 bits for 2-byte
-characters.
-
 #### FX controller design
 
 The FX controller (AKA animator) is pretty simple. There is one layer for each
-character in a cell. The layer has to work out which cell it's in, then
-work out which letter we want to write in that cell, then run an animation for
-that letter.
+sync parameter (i.e. each character byte). The layer has to work out which
+region it's in, then write a byte to the correct shader parameter.
 
 ![One FX layer with 16 cells](Images/tastt_anim.png)
 
@@ -172,8 +154,8 @@ Contributions welcome. Send a pull request to this repository.
       checking transcriptions without having to see the board in game.
    6. TTS. Multiple people have requested this. See if there are open source
       algorithms available; or, figure out how to integrate with
-   7. Save UI input fields to config file. Persist across process exit. It's
-      annoying having to re-enter the config every time I use the STT.
+   7. ~~Save UI input fields to config file. Persist across process exit. It's
+      annoying having to re-enter the config every time I use the STT.~~ DONE
    8. Customizable controller bindings. Someone mentioned they use left click
       to unmute. Let's work around users, not make them change their existing
       keybinds.
-- 
cgit v1.2.3