diff options
| author | yum <yum.food.vr@gmail.com> | 2022-10-02 17:24:19 -0700 |
|---|---|---|
| committer | yum <yum.food.vr@gmail.com> | 2022-10-02 17:24:19 -0700 |
| commit | d4556af258ae3911c83ece4b817335e8c5a2a2d2 (patch) | |
| tree | 7336f68ef7f3b249849c7f18d783ea73caad2c24 /README.md | |
| parent | 21c17fcb5698ed238e5397a0c2b0530034804d34 (diff) | |
Add LICENSE
* Update README with contribution instructions & design details.
* Add text-to-text demo gif
* Document known Unity landmines in generate.sh.
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 131 |
1 files changed, 103 insertions, 28 deletions
@@ -5,20 +5,26 @@ local machine translation to turn your voice into text, then sends it into VRChat via OSC. A few parameters, a machine-generated FX layer, and a custom shader display the text in game. + + Features: +* 8x22 display grid, 80 characters per slot. +* Text-to-text interface. +* Speech-to-text interface (planned) * Free as in beer. * Free as in freedom. -* Privacy respecting. Speech-to-text done locally using an open source language - model. -* Low-latency. -* Stable. -* Configurable. -* 6x14 display grid, 80 characters per slot. - * Each parameter - grid size, characters per slot, may be dialed up or down - as desired. +* Hackable. * 100% from-scratch implementation. * Permissive MIT license. +Contents: +1. [Motivation](#motivation) +2. [Design overview](#design-overview) +3. [Contributing](#contributing) +4. [Backlog](#backlog) + +Made with love by yum\_food. + ### Motivation Many VRChat players choose not to use their mics, but as a practical matter, @@ -30,7 +36,7 @@ reason or another: 1. RabidCrab's STT costs money and relies on cloud-based translation. I have struggled with latency, quality, and reliability issues. It's also - closed-source, and uses quite a few parameters. + closed-source. 2. The in-game text box is only visible to your friends list, making it useless for those who like to make new friends. @@ -41,37 +47,106 @@ expressive communication tools for mutes. ### Design overview -There are roughly 4 important pieces here: +There are currently 4 important pieces: -1. TaSTT.shader. A simple CG shader. Has one parameter per cell in the display. -2. generate\_animations.sh. Generates one animation per (row, column, letter). +1. `TaSTT.shader`. A simple unlit shader. Has one parameter per cell in the + display. +2. `generate\_animations.sh`. Generates one animation per (row, column, letter). These animations allow us to write the shader's parameters from an FX layer. -3. generate\_fx.py. Generates a colossal FX layer which maps (row, column, +3. `generate\_fx.py`. Generates a colossal FX layer which maps (row, column, letter, active) to exactly one of TaSTT.shader's parameters. -4. osc\_ctrl.py. Sends OSC messages to VRChat, which it dutifully passes along +4. `osc\_ctrl.py`. Sends OSC messages to VRChat, which it dutifully passes along to the generated FX layer. +#### Parameters & board indexing + +There are 2 obvious ways to tell the board how to display a message: + +1. Independently parameterize every character slot. If we want to display + a 140-character tweet, this means using (140 characters) * (8 bits + per character) == 1120 bits of parameter memory. VRChat only gives us 256! +2. Parameterize one character slot. We could have an 8-bit letter, an 8-bit row + select, and an 8-bit column select. To avoid overwriting cells while we seek, + we could include a 1-bit enable. This approach works, and uses very few + parameter bits, but it requires us to update the same parameter very quickly. + Experimental results with this were not promising; remote viewers would see + the wrong letters pretty often. + +Thus I settled on a hybrid approach: we divide the board into `cells`, +inside of which we can independently address each character slot. There +are currently 16 cells. + +Since the board has (22 columns) * (8 rows) == 176 character slots, each cell +contains (176 characters) / (16 cells) = 11 characters. + +To update a cell, we do this for every single character: +1. Select the cell. Since there are 16 cells, this requires 4 bits. +2. Select the letter. Since we support 256 letters per cell, this requires 8 bits. + +To avoid overwriting cells while we seek around, we also have a single boolean +which enables/disables updating any cells. + +Thus the total amount of parameter memory used is dictated by this equation: + +``` +ROWS * COLS * (log2(CELLS) + 8) / CELLS + 1 +``` + +This is currently 133 bits. + +#### FX controller design + +The FX controller (AKA animator) is pretty simple. There is one layer for each +character in a cell. Thus the layer has to work out which cell it's in, then +work out which letter we want to write in that cell, then run an animation for +that letter. + +Here's a layer where I manually moved things around to show the structure of +the decision tree: + + + +### Contributing + +Contributions welcome. Send a pull request to this repository. + +To use the STT: + +1. Enable Windows Subsystem for Linux. This is a lightweight Linux virtual + machine that runs on your Windows host. You can access the Windows + filesystem at /mnt/c/.... +2. $ cd /mnt/c/path/to/your/unity/project +2. $ cd Assets +3. $ git clone https://github.com/yum\_food/TaSTT +4. $ cd TaSTT +5. $ ./generate.sh +6. Put TaSTT\_fx.controller and TaSTT\_params.asset on your avatar. +7. Upload (or build & test). +8. Open powershell. +9. Navigate to TaSTT. +10. $ python3 ./osc\_ctrl.py +11. Start typing. Your messages should show display in-game. + ### Backlog 1. Better Unity integrations -1.1. Port all scripts to Unity-native C# scripts. -1.2. Support appending to existing FX layers. -1.3. Use VRCSDK to generate FX layer instead of generating the serialized files. -1.4. Optimize FX layer. Unity takes quite a while to load in the current one. + 1. Port all scripts to Unity-native C# scripts. + 2. Support appending to existing FX layers. + 3. Use VRCSDK to generate FX layer instead of generating the serialized files. + 4. Optimize FX layer. Unity takes quite a while to load in the current one. Some redesign is likely needed. 2. In-game usability features. -2.1. Resizing (talk to friends far away). -2.2. Basic toggles (hide it when not needed). -2.3. World mounting (leave it in a fixed position in world space). -2.4. Avatar mounting (attach it to your hand). -2.5. Controller triggers (avoid having to use the radial menu every time you + 1. Resizing (talk to friends far away). + 2. Basic toggles (hide it when not needed). + 3. World mounting (leave it in a fixed position in world space). + 4. Avatar mounting (attach it to your hand). + 5. Controller triggers (avoid having to use the radial menu every time you want to speak). 3. General usability features. -3.1. Error detection & correction. -3.2. Text-to-text interface. Type in terminal, show in game. + 1. Error detection & correction. + 2. Text-to-text interface. Type in terminal, show in game. 4. Optimization -4.1. Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the + 1. Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the total # of parameters (and therefore OSC messages & sync events). Note that the parameter memory usage may not decrease. - -Made with love by yum\_food. +5. Bugfixes |
