summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Tweak paging logicv0.1yum2022-12-31
| | | | | | Re-paging anything on screen N causes screens N+1...infinity to completely re-page. This fixes cases where we go back and draw something at the bottom of the board, and it never gets overwritten.
* Bugfix: regions truncate correctly at page boundariesyum2022-12-30
| | | | | | | | Boards whose size is an even multiple of CHARS_PER_SYNC would lose the entire last region. * Attempt to fix runaway memory usage of GUI text frames, but this needs more work
* GUI: Update chars per sync defaultyum2022-12-30
| | | | The defaults now reflect what I typically use.
* GUI: Expose transcription window durationyum2022-12-30
| | | | | Users can pick longer transcription durations for accuracy-critical tasks, or shorter durations for latency-critical tasks.
* Bugfix: regenerated FX layers now work on uploaded avatarsyum2022-12-30
| | | | | | | | | | | | | VRChat won't update the FX layer associated with an avatar unless its GUID changes. Delete the GUID file when overwriting our generated FX layer to work around this. * Change paging behavior: when a region is updated, we re-page everything that comes after it. This fixes the issue where we go back to update something, then jump back to the current screen, leaving some random chunk of text somewhere on the board. * Reduce transcription time from 28s to 10s. I'm going to expose this to the user since there's a fundamental latency/stability tradeoff here.
* Fine-tune transcriptionyum2022-12-30
| | | | | | | | Bump up recording window to 28 seconds. This helps a lot with long-form transcription tasks, s.a. transcribing an audiobook. We should expose this as a parameter, since at 10s the transcription delay is typically 300ms, while at 28s it's typically 1.1-1.2s.
* GUI: Users can now control board dimensionsyum2022-12-29
| | | | | | | | Users can now control how many letters wide and tall the board is. Tested at 4x48, 5x60, 10x120, and 20x240. At 20x240, Unity freezes and does not make forward progress. Perhaps creating 4800 float parameters isn't a truly scalable interface.
* Add Scripts/generate_shader.pyyum2022-12-29
| | | | | | | | | | | | | | Now it's possible to generate shaders with a custom number of rows, columns, and bytes per character. All edits to the shader should go through TaSTT_template.shader. To generate a new shader from the template: $ ./Scripts/generate_shader.py \ --bytes_per_char 2 \ --rows 1 \ --cols 12 --shader_template $(pwd)/Shaders/TaSTT_template.shader \ --shader_path $(pwd)/Shaders/TaSTT.shader
* GUI: preview number of parameter bits the config will useyum2022-12-29
| | | | | Users can now see the number of avatar parameter bits they'll use prior to committing.
* First letter no longer disappearsyum2022-12-29
| | | | | | | | | An off-by-one issue in numRegions() would result in one extra layer trying to drive a letter in the last region, which would wrap back around to the 0th character slot (cell). * GUI explicitly logs when it's done generating avatar stuff * OSC layer no longer tries to update cells which don't exist
* Users can disable local beepyum2022-12-29
| | | | | | | | | | | | | | | The transcription engine beeps when you start/stop transcribing so you know that it's listening. Users can now disable this. * add help text to all input fields in GUI * make TaSTT generated file textctrls readonly, since I haven't tested them being reassigned * document idea to configure unity & transcription apps with config files * controller input thread no longer crashes if steamvr isn't running, it just slowly spins and waits * when you stop transcribing, the transcription engine re-transcribes a few times. I think this should improve end-of-transcription tail latencies * transcribe.py now prints out its args
* Encapsulate paging & text wrapping logicyum2022-12-27
| | | | | | | | Define proper interfaces for these things. Simplify osc_ctrl, temporarily dropping support for emotes (they were broken anyway). * Bugfix: Japanese no longer crashes transcribe.py, but it still doesn't show up in the wxTextCtrl
* Bugfix: transcribe panel respects chars per sync etc.yum2022-12-26
| | | | | The transcribe panel was grabbing data from the unity panel, causing the bytes per char / chars per sync parameters to be ignored.
* Bugfix: don't use last region if it's partialyum2022-12-25
| | | | | | | | | | Because we allow users to customize the # of sync params, the board is no longer divided into regions of uniform size. When the last region is a different size than the rest, we simply omit it from paging. This is a hack but it's easy to reason about. Of course the entire paging stack should be rewritten, but not today.
* Touch up TaSTT.shaderyum2022-12-25
| | | | | | | Add a new shader to make the box a little prettier. * Reduce material slots required from 2 to 1 * Add rounding to edge of box
* Make transcription sleeps interruptibleyum2022-12-24
| | | | | This reduces the expected delay to wake up the board & start transcribing from 750 milliseconds to 2.5 milliseconds.
* GUI: expose chars per sync, bytes per charyum2022-12-24
| | | | | | | | | | | | Users can now control how many characters they send per sync event, as well as the number of bytes used to represent each character. This gives them the power to pick between faster paging and fewer sync params. International users must use 2 bytes per char (at least for now). * package.ps1: don't distribute the gigantic TTF files, just the bitmaps
* Document encoding optimizationyum2022-12-22
| | | | | | | | | | | | | | | | By sending encoded words rather than letters, we could speed up English paging rate by 2.5x over an optimized implementation Word-encoded implementation: 16 bits per word (capped at 64k possible words). Optimized char-based imlementation: (5.7 chars per word) * (7 bits per char) == 39.9 bits per word 2.5x slower than word encoding. Today's char-based implementation: (5.7 chars per word) * (16 bits per char) == 91.2 bits per word 5.7x slower than word encoding.
* Update README.mdv0.0yum2022-12-22
|
* Quick hack: don't exponentially back off when unpausedyum2022-12-22
| | | | | This fixed some slowness I was seeing when waking up the STT. The right fix is to add interruptible sleeps. Let's fix this soon.
* Don't delete TaSTT_Generatedyum2022-12-21
| | | | | This makes incremental workflows much more efficient, since you don't have to reassign the FX controller, params, and menu.
* Add shader togglesyum2022-12-21
| | | | | | * Fix shader background rendering * Add ability to control margin size * Add ability to disable speech indicator
* GUI: Add better logging interfaceyum2022-12-21
| | | | | | | | Create printf-like interface for writing to wxTextCtrl objects. Also mask out PII. I wanted a way to not dox myself when recording demos, but I wound up making a second user on my PC to serve the same purpose. Maybe I'll delete the code later idk.
* Control tweak: introduce long/short hold behavioryum2022-12-20
| | | | | | | | | | | | The typical use pattern is now possible without entering radial. Leaving mounted to the world for a long time is no longer possible. Maybe I need an override param? Left joystick controls: * Short press toggle 1: show board, lock to hand, start transcribing * Short press toggle 2: lock to world, stop transcribing * Long press: hide board, stop transcribing
* Bugfix: animators may now include Unicode charactersyum2022-12-20
| | | | Completed first end-to-end test on a third party avatar :)
* Check in `World Constraint.prefab`yum2022-12-20
| | | | Can simply drag this into hierarchy & update reset target.
* GUI: "Finish" avatar generation workflowyum2022-12-20
| | | | | | | | | | | GUI now generates parameters & menu. Still need to handle write defaults. * Add capability to append to avatar parameters & menu * Install canned Unity assets, shaders, and fonts in avatar folder * Check in materials for ease of use * Bugfix: correctly label menu/parameters file pickers
* GUI can now generate animatoryum2022-12-20
| | | | Still need to generate params & merge menus. Getting close....
* GUI: Begin work generating animatoryum2022-12-20
| | | | The GUI can now generate guid.map and animations.
* GUI: Fix transcription outputyum2022-12-19
| | | | | | | | | | | Output now shows up in the textbox in ~real time. We do this by disabling Python's output buffering. This has a performance impact, but it should be negligible. * Fix crash when setting up python environment * UI tweak: text displays now expand with window * Fix how we merge transcribe.py; usually don't have to resort to SIGKILL, which loses stdout/stderr.
* GUI: Improve error loggingyum2022-12-19
| | | | | PythonWrapper correctly captures wxProcess stdout & stderr in sync and async execution modes.
* GUI: Sketch out Unity panelyum2022-12-19
| | | | | | | Now there are two panels: one to run transcription, one to generate avatar assets. Also, getting mics & python version can no longer crash the app.
* Now it's possible to build the app from Powershellyum2022-12-18
| | | | No more WSL dependencies!
* Add resource file headeryum2022-12-18
|
* Add ability to select modelyum2022-12-18
| | | | | | | * icon now works when pinned to taskbar * add model selection * add script to dump mic devices * whisper models now download into the virtual environment
* GUI: Add mic, language selectionyum2022-12-18
| | | | | | | | Users can now select their mic & spoken language in the GUI. * pyaudio now samples at the mic rate, fixing an issue where frames would drop. We downsample in the callback by dropping frames. * add Sounds folder to package
* GUI: Add ability to start & stop transcription engineyum2022-12-17
|
* Finish python virtual envyum2022-12-17
| | | | | | | | | GUI can now download all TaSTT dependencies and install them into a virtual environment. * Add buttons to check embedded python version & install dependencies * Add class to wrap interacting with embedded Python * Put all TaSTT python scripts into a folder
* Check in `future` packageyum2022-12-17
| | | | | | | | | | | I hit some issues installing Whisper and had to embed this package. I haven't taken the time to deeply understand what's going on. I think that embedded Python follows different rules about resolving module paths than regular system Python. Basically, `future`'s setup.py has a line like `import src`, where `src` is a module inside future (like `future/src/__init__.py`). This doesn't work unless we put that directory on the search path.
* Downgrade to Python 3.10.9yum2022-12-17
| | | | Whisper needs Python < 3.11.
* Document embedded venv hackyum2022-12-16
| | | | | | | Check in pip & modify embedded python to install to Lib and Lib/site-packages. Experimentally, packages may be installed with pip and do reside in Lib/site-packages. Hard to tell if this is also touching files outside the venv.
* Check in python 3.11yum2022-12-16
| | | | License is included in source & distributable package.
* Refactor appyum2022-12-16
| | | | Create headers & implementation files for App and Frame.
* Add logoyum2022-12-16
| | | | | | | * GUI now shows logo * Add package.ps1 to generate distributable application bundle * Rename ~GUI to GUI * Add ScopeGuard class
* Add GUI hello worldyum2022-12-15
| | | | | | | | Literally just the wxWidgets hello world. ~GUI is named that way to prevent Unity from generating .meta files. Build instructions in ~GUI/README.md.
* Optimize transcription latencyyum2022-12-14
| | | | | | | | | | | | | | | | | | Shave off ~500ms due to locking. Acquiring a threading.Lock takes hundreds of milliseconds and the global interpreter lock already takes care of most crashy race conditions, so just remove the locks. Avoid writing audio to disk, saving more time (and disk wear / IOPS). Add basic profiling to transcribe(). Omit timestamps, since we don't use them (maybe we should!) Shorten noise indicators to 350ms The whisper behavior where it repeats tokens causes certain transcriptions to take many seconds. I haven't thought about how to fix this, yet.
* Update README.mdyum2022-12-01
| | | | Also decrease sync params & add a few more emotes.
* Add emotesyum2022-11-26
| | | | | | | | | | | | | | | | | | | | Add emotes.py. It accepts a list of images and creates a texture with 64 total embedded images. The shader knows how to draw these into fixed 6-character-wide slots. Each slot must be aligned to a 6-character boundary. osc_ctrl has to pad with spaces to make this work. This whole patch is a little more complicated than it has any right to be, but my brain feels fuzzy and I don't know where to start fixing it, so I'm going to leave it shitty-but-functional for now. There's also some bug where writing a character into the 11th slot causes it to show up at the end of the board. I'll figure that out later, idk. I didn't include any of the emotes I use since I couldn't find any info on their licenses. I'm just banking on having a good workflow later on so people can add their own.
* Add on/off sound indicator (local)yum2022-11-25
| | | | | | Now we have a visual and auditory indicator for transcription. The auditory indicator is only heard by the user, and can be used to reset the state of the board prior to displaying.
* Add scaling capabilityyum2022-11-25
| | | | Text box may be scaled up and down now.