| Commit message (Collapse) | Author | Age |
| |
|
|
| |
Paging is now slower but more reliable.
|
| |
|
|
|
| |
OSC was paging using incorrect board resolution. Use cfg to provide this
data.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Four threads:
* Main thread
* Transcription (mic -> collector -> whisper -> committer -> pager)
* VR input
* Keyboard input
Also:
* add OscPager class to encapsulate all OSC interactions.
* bump `last_n_must_match` from 2 to 3 to reduce hallucinations
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
* Temporarily restore normal process priority. Working on adding a UI
option to set STT prio.
* Give audio indicator phonemes a 1/3 chance to do nothing. Makes result
sound a little better imo.
* Quiet down steamVR thread when steamVR isn't running
* Fix use of `button_id` and `hand_id` in steamvr.py
* Increase amount of silence allowed before transcript from 1 to 5
seconds. You want enough buffer to allow for a few full transcripts,
else you risk spuriously dropping audio.
* Enable background loading in audio metadata (required by vrc sdk)
|
| |
|
|
|
| |
We now play arpeggiated *chords* of vowels instead of one, allowing for
a denser audio feedback mechanism.
|
| |
|
|
|
|
|
| |
Also fix prefab default size (no longer colossal).
TODO
* Add runtime & unity-time toggles
|
| |
|
|
| |
Text box now shows an animated ellipsis prior to first speech.
|
| |
|
|
|
| |
Deprecate the visual and auditory speech indicators, saving 4 bits
across the board. Fixed overhead is now 21 bits.
|
| |
|
|
|
|
|
|
|
|
|
| |
Emotes require 2 bytes per char. They're encoded into the region
[0xE000, infinity). The texture is 4k, and uses 1k vertical pixels
per emote segment, for a maximum of 32 segments.
* Reduce volume of noise indicator by 90%. Quiet is probably better.
Might want to add a volume slider idk.
* Bugfix: emotes without a transparency channel now work
* Address a couple Unity performance complaints about the shader
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Done:
* Users can add images to Fonts/Emotes/
* The basename of that image ('clueless.png' becomes 'clueless') is the
keyword to make the image show up in game.
* Fix a bug in the shader where letters on the 2nd texture and later
would have UV outside of [0.0, 1.0]
Not yet implemented:
* transcribed words are encoded using emotes mapping
|
| |
|
|
| |
* Reduce noise on/off indicator volume by 50%
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
VRChat exposes a built-in chatbox which can be seen by anyone who has
it enabled. This was not the case when I started this project: the
chatbox would only be visible to friends. Since this is clearly useful,
enabling the STT on public models, let's enable sending data to it.
Caveats:
* The built-in chatbox has anti-spam tech which limits us to updating
about once every 2 seconds. The custom chatbox has no such limitation
and is thus typically much faster.
|
| |
|
|
|
|
|
|
| |
Boards whose size is an even multiple of CHARS_PER_SYNC would lose the
entire last region.
* Attempt to fix runaway memory usage of GUI text frames, but this needs
more work
|
| |
|
|
|
|
|
|
| |
Bump up recording window to 28 seconds. This helps a lot with long-form
transcription tasks, s.a. transcribing an audiobook.
We should expose this as a parameter, since at 10s the transcription delay is
typically 300ms, while at 28s it's typically 1.1-1.2s.
|
| |
|
|
|
|
|
|
|
| |
An off-by-one issue in numRegions() would result in one extra layer
trying to drive a letter in the last region, which would wrap back
around to the 0th character slot (cell).
* GUI explicitly logs when it's done generating avatar stuff
* OSC layer no longer tries to update cells which don't exist
|
| |
|
|
|
|
|
|
| |
Define proper interfaces for these things. Simplify osc_ctrl,
temporarily dropping support for emotes (they were broken anyway).
* Bugfix: Japanese no longer crashes transcribe.py, but it still doesn't
show up in the wxTextCtrl
|
| |
|
|
|
|
|
|
|
|
| |
Because we allow users to customize the # of sync params, the board is
no longer divided into regions of uniform size. When the last region is
a different size than the rest, we simply omit it from paging.
This is a hack but it's easy to reason about.
Of course the entire paging stack should be rewritten, but not today.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Users can now control how many characters they send per sync event, as
well as the number of bytes used to represent each character.
This gives them the power to pick between faster paging and fewer sync
params.
International users must use 2 bytes per char (at least for now).
* package.ps1: don't distribute the gigantic TTF files, just the bitmaps
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The typical use pattern is now possible without entering radial.
Leaving mounted to the world for a long time is no longer possible.
Maybe I need an override param?
Left joystick controls:
* Short press toggle 1: show board, lock to hand, start transcribing
* Short press toggle 2: lock to world, stop transcribing
* Long press: hide board, stop transcribing
|
|
|
GUI can now download all TaSTT dependencies and install them into a
virtual environment.
* Add buttons to check embedded python version & install dependencies
* Add class to wrap interacting with embedded Python
* Put all TaSTT python scripts into a folder
|