| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
| |
* Fix prefab: bounding box & position are now set to 0
* Fix shader: text is no longer upside down
* Update README
|
| |
|
|
|
|
| |
Users can now use PBR textures on their custom backplate!
* Update TaSTT.fbx: UV map aspect ratio matches board
|
| |
|
|
| |
Document recent features, better explain basis of transcription.
|
| |
|
|
|
|
|
| |
Metallics now reflect the map's cubemap.
* Remove SpecularTint (did nothing)
* Adjust mipBias to be sharper
|
| |
|
|
|
|
|
|
|
| |
Need to calculate this in the space of letter UVs, not the overall text
box UV space, in order for the correct mip maps to be chosen.
* Expose dithering as a toggle in the shader
* Actually generate mipmaps
* Fine-tune mipmapBias for legibility
|
| |
|
|
|
|
| |
* Enable streaming mipmaps on glyph bitmaps
* Sample glyph bitmaps using mipmaps
* Add temporal noise to letter UVs (dithering)
|
| |
|
|
|
|
| |
* Render at 3k render queue to avoid clashing with avatar meshes
* Set reasonable shader defaults
* Remove unused material
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
TaSTT shader now uses physically based rendering (PBR). Users can pick
smoothness, metallic, and emissive.
This implementation borrows heavily from catlikecoding.com's excellent
tutorials, which are released under MIT No Attribution (MIT-0).
https://catlikecoding.com/unity/tutorials/license/
To retain what little clarity remains in the shader, I have chosen not
to attribute the code in the source itself.
|
| |
|
|
|
|
|
|
| |
Light color themes revealed a need for a lit shader, since an unlit
shader would be blindingly bright.
This implementation doesn't really work in game. I suspect that I need
to support more than just one global light.
|
| |
|
|
| |
The --extra-index-url must appear *before* the dependency in this file.
|
| |
|
|
|
|
| |
We use a button to start/stop transcription. Previously this was
hardcoded to left joystick. Now users can pick from {left, right} x
{joystick, a, b}.
|
| |
|
|
|
|
|
| |
This seems to be the canonical way of listing a Python app's
dependencies.
* Installing dependencies no longer hangs the GUI
|
| |
|
|
|
|
|
|
| |
* Text color, background color, and margin color are all customizable
now
* Better organize shader parameters. User-facing params are exposed
Like_This; internal params are exposed _Like_This.
* Update README. More wordsmithing.
|
| | |
|
| |
|
|
|
| |
* Point to a more up-to-date demo.
* Improve wordsmithing/flow
|
| | |
|
| |
|
|
| |
Whisper doesn't like 0.18.3, so downgrade to the last version.
|
| |
|
|
|
|
|
| |
Don't literally check in Python since it looks dodgy (rightfully so).
Instead the build script just fetches it.
* Update README, simplifying language and documenting other projects
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
VRChat exposes a built-in chatbox which can be seen by anyone who has
it enabled. This was not the case when I started this project: the
chatbox would only be visible to friends. Since this is clearly useful,
enabling the STT on public models, let's enable sending data to it.
Caveats:
* The built-in chatbox has anti-spam tech which limits us to updating
about once every 2 seconds. The custom chatbox has no such limitation
and is thus typically much faster.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
I found that I tend to regenerate the animator on the same avatar a lot,
requiring me to re-enter the same paths and parameters over and over
again. Persist them across restarts.
* Refactor Config classes
* Use safe `get_if` instead of the exception-throwing `operator>>` when
deserializing from YAML
* Begin sketching out Log singleton
* Put Quote() and Unquote() into their own little lib; they shouldn't
hide inside PythonApp
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The configuration of the transcription app, such as the number of rows
and columns in the text box, now persists across app restarts. I found
that I would have to change from the defaults to my preferred config
every time I started up in VR, which was annoying. Now we just start
with the config that was set last time.
* Add dependency on rapidyaml (MIT)
* Serialize transcription config to file under Resources/
* Add Config class to wrap serializing/deserializing
* Update build instructions
* Simplify StartApp() API, taking Config struct instead of a ton of
arguments
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, paths containing spaces would be interpreted by python's argument
parser as multiple separate arguments, causing it to fail. Now we escape paths
inside PythonWrapper using std::quoted().
* Improve PII filtering. Python output would contain multiple path separators
(like C:\\Users\\foo\\), defeating the PII regex.
* Silence compiler warning in PII filter.
* Document usability improvements.
* Transcription layer exponential backoff goes to ~infinity when paused.
This is a hack, since we really don't need to transcribe at all when paused,
but it lets us keep the code simple. Good enough until the next rewrite.
* Shader only samples background when necessary.
* Limit matchStrings() print()s to DEBUG mode
|
| |
|
|
|
| |
* Expose option to run transcription engine on CPU instead of GPU
* Use embedded git when setting up the Python virtual environment
|
| |
|
|
|
|
|
| |
package.ps1 fetches PortableGit and embeds it in the package. This
eliminates all but one runtime dependency (MSVC++ Redistributable).
* Move Python into a new FOSS folder.
|
| |
|
|
| |
Update build instructions.
|
| |
|
|
| |
This seems like the best way to support users.
|
| |
|
|
|
|
| |
Re-paging anything on screen N causes screens N+1...infinity to
completely re-page. This fixes cases where we go back and draw something
at the bottom of the board, and it never gets overwritten.
|
| |
|
|
|
|
|
|
| |
Boards whose size is an even multiple of CHARS_PER_SYNC would lose the
entire last region.
* Attempt to fix runaway memory usage of GUI text frames, but this needs
more work
|
| |
|
|
| |
The defaults now reflect what I typically use.
|
| |
|
|
|
| |
Users can pick longer transcription durations for accuracy-critical
tasks, or shorter durations for latency-critical tasks.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
VRChat won't update the FX layer associated with an avatar unless its
GUID changes. Delete the GUID file when overwriting our generated FX
layer to work around this.
* Change paging behavior: when a region is updated, we re-page everything
that comes after it. This fixes the issue where we go back to update
something, then jump back to the current screen, leaving some random
chunk of text somewhere on the board.
* Reduce transcription time from 28s to 10s. I'm going to expose this to
the user since there's a fundamental latency/stability tradeoff here.
|
| |
|
|
|
|
|
|
| |
Bump up recording window to 28 seconds. This helps a lot with long-form
transcription tasks, s.a. transcribing an audiobook.
We should expose this as a parameter, since at 10s the transcription delay is
typically 300ms, while at 28s it's typically 1.1-1.2s.
|
| |
|
|
|
|
|
|
| |
Users can now control how many letters wide and tall the board is.
Tested at 4x48, 5x60, 10x120, and 20x240. At 20x240, Unity freezes and
does not make forward progress. Perhaps creating 4800 float parameters
isn't a truly scalable interface.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Now it's possible to generate shaders with a custom number of rows, columns,
and bytes per character.
All edits to the shader should go through TaSTT_template.shader. To generate
a new shader from the template:
$ ./Scripts/generate_shader.py \
--bytes_per_char 2 \
--rows 1 \
--cols 12
--shader_template $(pwd)/Shaders/TaSTT_template.shader \
--shader_path $(pwd)/Shaders/TaSTT.shader
|
| |
|
|
|
| |
Users can now see the number of avatar parameter bits they'll use
prior to committing.
|
| |
|
|
|
|
|
|
|
| |
An off-by-one issue in numRegions() would result in one extra layer
trying to drive a letter in the last region, which would wrap back
around to the 0th character slot (cell).
* GUI explicitly logs when it's done generating avatar stuff
* OSC layer no longer tries to update cells which don't exist
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transcription engine beeps when you start/stop transcribing so you know
that it's listening. Users can now disable this.
* add help text to all input fields in GUI
* make TaSTT generated file textctrls readonly, since I haven't tested
them being reassigned
* document idea to configure unity & transcription apps with config files
* controller input thread no longer crashes if steamvr isn't running, it just
slowly spins and waits
* when you stop transcribing, the transcription engine re-transcribes a few
times. I think this should improve end-of-transcription tail latencies
* transcribe.py now prints out its args
|
| |
|
|
|
|
|
|
| |
Define proper interfaces for these things. Simplify osc_ctrl,
temporarily dropping support for emotes (they were broken anyway).
* Bugfix: Japanese no longer crashes transcribe.py, but it still doesn't
show up in the wxTextCtrl
|
| |
|
|
|
| |
The transcribe panel was grabbing data from the unity panel, causing the
bytes per char / chars per sync parameters to be ignored.
|
| |
|
|
|
|
|
|
|
|
| |
Because we allow users to customize the # of sync params, the board is
no longer divided into regions of uniform size. When the last region is
a different size than the rest, we simply omit it from paging.
This is a hack but it's easy to reason about.
Of course the entire paging stack should be rewritten, but not today.
|
| |
|
|
|
|
|
| |
Add a new shader to make the box a little prettier.
* Reduce material slots required from 2 to 1
* Add rounding to edge of box
|
| |
|
|
|
| |
This reduces the expected delay to wake up the board & start
transcribing from 750 milliseconds to 2.5 milliseconds.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Users can now control how many characters they send per sync event, as
well as the number of bytes used to represent each character.
This gives them the power to pick between faster paging and fewer sync
params.
International users must use 2 bytes per char (at least for now).
* package.ps1: don't distribute the gigantic TTF files, just the bitmaps
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
By sending encoded words rather than letters, we could speed up
English paging rate by 2.5x over an optimized implementation
Word-encoded implementation: 16 bits per word
(capped at 64k possible words).
Optimized char-based imlementation:
(5.7 chars per word) * (7 bits per char) == 39.9 bits per word
2.5x slower than word encoding.
Today's char-based implementation:
(5.7 chars per word) * (16 bits per char) == 91.2 bits per word
5.7x slower than word encoding.
|
| | |
|
| |
|
|
|
| |
This fixed some slowness I was seeing when waking up the STT. The right
fix is to add interruptible sleeps. Let's fix this soon.
|
| |
|
|
|
| |
This makes incremental workflows much more efficient, since you don't
have to reassign the FX controller, params, and menu.
|
| |
|
|
|
|
| |
* Fix shader background rendering
* Add ability to control margin size
* Add ability to disable speech indicator
|
| |
|
|
|
|
|
|
| |
Create printf-like interface for writing to wxTextCtrl objects.
Also mask out PII. I wanted a way to not dox myself when recording
demos, but I wound up making a second user on my PC to serve the same
purpose. Maybe I'll delete the code later idk.
|