summaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2022-11-06 12:50:38 -0800
committeryum <yum.food.vr@gmail.com>2022-11-06 12:50:38 -0800
commit7146acb9d4ad751fc5ced411a2990d0aad17d08f (patch)
tree30d5f9f9a7f47bc4272fa9e9fff5c0226c376686 /README.md
parent3a123fb5cabdbdef4f1b98031ec90c42e1d6e911 (diff)
String matching no longer relies on spaces
Add a `matchStrings` which does basically the same thing as `matchStringList` except it doesn't split the input at space boundaries. I think this should work better for Japanese and Chinese, since they don't use spaces. Doesn't seem to cause any accuracy regressions for English. Also update the README.
Diffstat (limited to 'README.md')
-rw-r--r--README.md79
1 files changed, 35 insertions, 44 deletions
diff --git a/README.md b/README.md
index 9ee090e..f0cce3d 100644
--- a/README.md
+++ b/README.md
@@ -8,7 +8,7 @@ custom shader display the text in game.
![Speech-to-text demo](Images/speech_to_text_demo.gif)
Features:
-* 8x22 display grid, 80 characters per slot.
+* 4x44 grid, 256 or 65536 characters per slot.
* Text-to-text interface.
* Speech-to-text interface.
* Free as in beer.
@@ -52,10 +52,10 @@ There are currently 5 important pieces:
1. `TaSTT.shader`. A simple unlit shader. Has one parameter per cell in the
display.
-2. `generate_animations.sh`. Generates one animation per (row, column, letter).
- These animations allow us to write the shader's parameters from an FX layer.
-3. `generate_fx.py`. Generates a colossal FX layer which maps (row, column,
- letter, active) to exactly one of TaSTT.shader's parameters.
+2. `libunity.py`. Contains the logic required to generate and manipulate Unity
+ YAML files. Works well enough on YAMLs up to ~40k documents, 1M lines.
+3. `libtastt.py`. Contains the logic to generate TaSTT-specific Unity files,
+ namely the animations and the animator.
4. `osc_ctrl.py`. Sends OSC messages to VRChat, which it dutifully passes along
to the generated FX layer.
5. `transcribe.py`. Uses OpenAI's whisper neural network to transcribe audio
@@ -63,61 +63,52 @@ There are currently 5 important pieces:
#### Parameters & board indexing
-There are 2 obvious ways to tell the board how to display a message:
+I divide the board into 16 regions and use a single int parameter,
+`TaSTT_Select`, to select the active region. For each byte of data
+in the active region, I use a float parameter to blend between two
+animations: one with value 0, and one with value 255.
-1. Independently parameterize every character slot. If we want to display
- a 140-character tweet, this means using (140 characters) * (8 bits
- per character) == 1120 bits of parameter memory. VRChat only gives us 256!
-2. Parameterize one character slot. We could have an 8-bit letter, an 8-bit row
- select, and an 8-bit column select. To avoid overwriting cells while we seek,
- we could include a 1-bit enable. This approach works, and uses very few
- parameter bits, but it requires us to update the same parameter very quickly.
- Experimental results with this were not promising; remote viewers would see
- the wrong letters pretty often.
+To support wide character sets, I support 2 bytes per character. This
+can be configured down to 1 byte per character to save parameter bits.
-Thus I settled on a hybrid approach: we divide the board into `cells`,
-inside of which we can independently address each character slot. There
-are currently 16 cells.
-
-Since the board has (22 columns) * (8 rows) == 176 character slots, each cell
-contains (176 characters) / (16 cells) = 11 characters.
-
-To update a cell, we do this:
-
-1. Select the cell. Since there are 16 cells, this requires 4 bits.
-2. For each letter in the cell, select the letter. Since we support 256 letters
- per cell, this requires 8 bits.
-
-To avoid overwriting cells while we seek around, we also have a single boolean
-which enables/disables updating any cells.
-
-Thus the total amount of parameter memory used is dictated by this equation:
+The the total amount of parameter memory used is dictated by this equation:
```
-ROWS * COLS * 8 / CELLS + 1 + log2(CELLS)
+ROWS = 4
+COLS = 44
+CELLS = 16
+MEMORY = ROWS * COLS * (N bits per character) / CELLS + 1 + log2(CELLS)
```
-This is currently 93 bits.
+This is currently 93 bits for 1-byte characters and 181 bits for 2-byte
+characters.
#### FX controller design
The FX controller (AKA animator) is pretty simple. There is one layer for each
-character in a cell. Thus the layer has to work out which cell it's in, then
+character in a cell. The layer has to work out which cell it's in, then
work out which letter we want to write in that cell, then run an animation for
that letter.
-Here's a layer where I manually moved things around to show the structure of
-the decision tree:
-
-![One FX layer with 4-bit indexing](Images/four_bit_indexing.png)
+![One FX layer with 16 cells](Images/tastt_anim.png)
From top down, we first check if updating the board is enabled. If no, we stay
-in the first state. Then we check which cell we're in. This is divided into 4
-binary checks, each looking at a boolean parameter. Finally, we fire one of 80
-animations based on the value of the current layer's Letter parameter.
+in the first state. Then we check which cell we're in. Finally, we drive a
+shader parameter to one of 256 possible values using a blendtree.
+
+![An 8-bit blendtree](Images/tastt_blend.png)
+
+The blendtree trick lets us represent wide character sets efficiently. The
+number of animations required increases logarithmically with the size of the
+character set:
-In the pictured FX layer, there are 16 cells each controlling 80 animations,
-for a total of 1280 animations. There are 11 such layers.
+```
+(N bytes per character) = ceil(log2(size of character set))
+(total animations) =
+ (2 animations per byte) *
+ (N bytes per character) *
+ (M chars per cell)
+```
### Contributing