summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authoryum <yum.food.vr@gmail.com>2022-10-02 17:24:19 -0700
committeryum <yum.food.vr@gmail.com>2022-10-02 17:24:19 -0700
commitd4556af258ae3911c83ece4b817335e8c5a2a2d2 (patch)
tree7336f68ef7f3b249849c7f18d783ea73caad2c24
parent21c17fcb5698ed238e5397a0c2b0530034804d34 (diff)
Add LICENSE
* Update README with contribution instructions & design details. * Add text-to-text demo gif * Document known Unity landmines in generate.sh.
-rw-r--r--.gitignore6
-rw-r--r--Images/four_bit_indexing.pngbin0 -> 114753 bytes
-rw-r--r--Images/text_to_text_demo.gifbin0 -> 5126195 bytes
-rw-r--r--LICENSE7
-rw-r--r--README.md131
-rw-r--r--TaSTT_menu.asset15
-rw-r--r--generate.sh9
-rw-r--r--text_to_text_demo_message.txt9
8 files changed, 133 insertions, 44 deletions
diff --git a/.gitignore b/.gitignore
index 7806de3..4113422 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,7 @@
+# Ignore vim swap files.
+*.sw[po]
+# Ignore generated Unity assets.
*.meta
generated
-*.swp
+*.controller
+*.asset
diff --git a/Images/four_bit_indexing.png b/Images/four_bit_indexing.png
new file mode 100644
index 0000000..11eb135
--- /dev/null
+++ b/Images/four_bit_indexing.png
Binary files differ
diff --git a/Images/text_to_text_demo.gif b/Images/text_to_text_demo.gif
new file mode 100644
index 0000000..7d86ed1
--- /dev/null
+++ b/Images/text_to_text_demo.gif
Binary files differ
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..b3d217c
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,7 @@
+Copyright 2022 yum_food
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. \ No newline at end of file
diff --git a/README.md b/README.md
index d89a5de..ddc3475 100644
--- a/README.md
+++ b/README.md
@@ -5,20 +5,26 @@ local machine translation to turn your voice into text, then sends it into
VRChat via OSC. A few parameters, a machine-generated FX layer, and a
custom shader display the text in game.
+![Text-to-text demo](Images/text_to_text_demo.gif)
+
Features:
+* 8x22 display grid, 80 characters per slot.
+* Text-to-text interface.
+* Speech-to-text interface (planned)
* Free as in beer.
* Free as in freedom.
-* Privacy respecting. Speech-to-text done locally using an open source language
- model.
-* Low-latency.
-* Stable.
-* Configurable.
-* 6x14 display grid, 80 characters per slot.
- * Each parameter - grid size, characters per slot, may be dialed up or down
- as desired.
+* Hackable.
* 100% from-scratch implementation.
* Permissive MIT license.
+Contents:
+1. [Motivation](#motivation)
+2. [Design overview](#design-overview)
+3. [Contributing](#contributing)
+4. [Backlog](#backlog)
+
+Made with love by yum\_food.
+
### Motivation
Many VRChat players choose not to use their mics, but as a practical matter,
@@ -30,7 +36,7 @@ reason or another:
1. RabidCrab's STT costs money and relies on cloud-based translation. I have
struggled with latency, quality, and reliability issues. It's also
- closed-source, and uses quite a few parameters.
+ closed-source.
2. The in-game text box is only visible to your friends list, making it
useless for those who like to make new friends.
@@ -41,37 +47,106 @@ expressive communication tools for mutes.
### Design overview
-There are roughly 4 important pieces here:
+There are currently 4 important pieces:
-1. TaSTT.shader. A simple CG shader. Has one parameter per cell in the display.
-2. generate\_animations.sh. Generates one animation per (row, column, letter).
+1. `TaSTT.shader`. A simple unlit shader. Has one parameter per cell in the
+ display.
+2. `generate\_animations.sh`. Generates one animation per (row, column, letter).
These animations allow us to write the shader's parameters from an FX layer.
-3. generate\_fx.py. Generates a colossal FX layer which maps (row, column,
+3. `generate\_fx.py`. Generates a colossal FX layer which maps (row, column,
letter, active) to exactly one of TaSTT.shader's parameters.
-4. osc\_ctrl.py. Sends OSC messages to VRChat, which it dutifully passes along
+4. `osc\_ctrl.py`. Sends OSC messages to VRChat, which it dutifully passes along
to the generated FX layer.
+#### Parameters & board indexing
+
+There are 2 obvious ways to tell the board how to display a message:
+
+1. Independently parameterize every character slot. If we want to display
+ a 140-character tweet, this means using (140 characters) * (8 bits
+ per character) == 1120 bits of parameter memory. VRChat only gives us 256!
+2. Parameterize one character slot. We could have an 8-bit letter, an 8-bit row
+ select, and an 8-bit column select. To avoid overwriting cells while we seek,
+ we could include a 1-bit enable. This approach works, and uses very few
+ parameter bits, but it requires us to update the same parameter very quickly.
+ Experimental results with this were not promising; remote viewers would see
+ the wrong letters pretty often.
+
+Thus I settled on a hybrid approach: we divide the board into `cells`,
+inside of which we can independently address each character slot. There
+are currently 16 cells.
+
+Since the board has (22 columns) * (8 rows) == 176 character slots, each cell
+contains (176 characters) / (16 cells) = 11 characters.
+
+To update a cell, we do this for every single character:
+1. Select the cell. Since there are 16 cells, this requires 4 bits.
+2. Select the letter. Since we support 256 letters per cell, this requires 8 bits.
+
+To avoid overwriting cells while we seek around, we also have a single boolean
+which enables/disables updating any cells.
+
+Thus the total amount of parameter memory used is dictated by this equation:
+
+```
+ROWS * COLS * (log2(CELLS) + 8) / CELLS + 1
+```
+
+This is currently 133 bits.
+
+#### FX controller design
+
+The FX controller (AKA animator) is pretty simple. There is one layer for each
+character in a cell. Thus the layer has to work out which cell it's in, then
+work out which letter we want to write in that cell, then run an animation for
+that letter.
+
+Here's a layer where I manually moved things around to show the structure of
+the decision tree:
+
+![One FX layer with 4-bit indexing](Images/four_bit_indexing.png)
+
+### Contributing
+
+Contributions welcome. Send a pull request to this repository.
+
+To use the STT:
+
+1. Enable Windows Subsystem for Linux. This is a lightweight Linux virtual
+ machine that runs on your Windows host. You can access the Windows
+ filesystem at /mnt/c/....
+2. $ cd /mnt/c/path/to/your/unity/project
+2. $ cd Assets
+3. $ git clone https://github.com/yum\_food/TaSTT
+4. $ cd TaSTT
+5. $ ./generate.sh
+6. Put TaSTT\_fx.controller and TaSTT\_params.asset on your avatar.
+7. Upload (or build & test).
+8. Open powershell.
+9. Navigate to TaSTT.
+10. $ python3 ./osc\_ctrl.py
+11. Start typing. Your messages should show display in-game.
+
### Backlog
1. Better Unity integrations
-1.1. Port all scripts to Unity-native C# scripts.
-1.2. Support appending to existing FX layers.
-1.3. Use VRCSDK to generate FX layer instead of generating the serialized files.
-1.4. Optimize FX layer. Unity takes quite a while to load in the current one.
+ 1. Port all scripts to Unity-native C# scripts.
+ 2. Support appending to existing FX layers.
+ 3. Use VRCSDK to generate FX layer instead of generating the serialized files.
+ 4. Optimize FX layer. Unity takes quite a while to load in the current one.
Some redesign is likely needed.
2. In-game usability features.
-2.1. Resizing (talk to friends far away).
-2.2. Basic toggles (hide it when not needed).
-2.3. World mounting (leave it in a fixed position in world space).
-2.4. Avatar mounting (attach it to your hand).
-2.5. Controller triggers (avoid having to use the radial menu every time you
+ 1. Resizing (talk to friends far away).
+ 2. Basic toggles (hide it when not needed).
+ 3. World mounting (leave it in a fixed position in world space).
+ 4. Avatar mounting (attach it to your hand).
+ 5. Controller triggers (avoid having to use the radial menu every time you
want to speak).
3. General usability features.
-3.1. Error detection & correction.
-3.2. Text-to-text interface. Type in terminal, show in game.
+ 1. Error detection & correction.
+ 2. Text-to-text interface. Type in terminal, show in game.
4. Optimization
-4.1. Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the
+ 1. Utilize the avatar 3.0 SDK's ability to drive parameters to reduce the
total # of parameters (and therefore OSC messages & sync events). Note
that the parameter memory usage may not decrease.
-
-Made with love by yum\_food.
+5. Bugfixes
diff --git a/TaSTT_menu.asset b/TaSTT_menu.asset
deleted file mode 100644
index 91e0de6..0000000
--- a/TaSTT_menu.asset
+++ /dev/null
@@ -1,15 +0,0 @@
-%YAML 1.1
-%TAG !u! tag:unity3d.com,2011:
---- !u!114 &11400000
-MonoBehaviour:
- m_ObjectHideFlags: 0
- m_CorrespondingSourceObject: {fileID: 0}
- m_PrefabInstance: {fileID: 0}
- m_PrefabAsset: {fileID: 0}
- m_GameObject: {fileID: 0}
- m_Enabled: 1
- m_EditorHideFlags: 0
- m_Script: {fileID: -340790334, guid: 67cc4cb7839cd3741b63733d5adf0442, type: 3}
- m_Name: TaSTT_menu
- m_EditorClassIdentifier:
- controls: []
diff --git a/generate.sh b/generate.sh
index 2d668b0..e1a718c 100644
--- a/generate.sh
+++ b/generate.sh
@@ -16,6 +16,15 @@ echo 'their Unity GUID, which is generated during import.'
set -o xtrace
read -r line
+set +o xtrace
+echo 'Now that animations have been generated, close Unity again.'
+echo 'Unity can only really handle importing the generated FX layer at startup.'
+echo 'If it reimports it at runtime, it hangs for a long time. No idea why!'
+echo
+echo 'Press enter once Unity is closed again.'
+set -o xtrace
+read -r line
+
echo 'Generating FX layer'
./generate_fx.py > TaSTT_fx.controller
diff --git a/text_to_text_demo_message.txt b/text_to_text_demo_message.txt
new file mode 100644
index 0000000..956db83
--- /dev/null
+++ b/text_to_text_demo_message.txt
@@ -0,0 +1,9 @@
+
+Hello, my name is yum_food.
+I'm a VRChat mute: a player that chooses not to use their microphone.
+This repository contains the code for an in-game chat bubble.
+It is designed with the core goals of being low latency, reliable, and free - free as in freedom, and as in beer.
+All code is written from scratch and permissively licensed.
+This project is in the exploratory phase. Once the core set of features are implemented, I will focus on productizing.
+Until then, use with caution.
+Contributions welcome!