| Commit message (Collapse) | Author | Age |
| ... | |
| |
|
|
|
|
|
|
| |
* Pip install, dependency install, and model download can be gracefully
interrupted and resume later.
* Mic list was pointing at freed memory. Fix this by copying into the
heap with std::unique_ptr()s. Mic list in CPP panel is much more
reliable now.
|
| |
|
|
| |
Not ready yet.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Sort of a misnomer. The idea is to use C++ for transcription and Python
for steamvr and OSC.
Having issues getting output from multithreaded Python code. Not in the
mood to figure this out today.
* Hide unimplemented parts of C++ panel.
|
| |
|
|
| |
Simplifies debugging process.
|
| |
|
|
|
|
|
|
|
|
|
| |
Rapidyaml started refusing to parse config files so I dropped it.
* Add ConfigMarshal clas to support very simple config marshalling
* No versioning, no type indicators, nothing.
* Supports int, bool, and string.
* Bool are serialized as int.
* Log no longer segfaults if given nullptr wxTextCtrl*.
* Fix how whisper CPP GUI fields restore from config
|
| |
|
|
|
| |
* Implement HTTPMapper classes
* Browser source respects user-configured source port
|
| |
|
|
|
|
| |
Server needs to parse incoming HTTP.
* Server spawns a thread for each incoming connection
|
| |
|
|
|
|
|
| |
oatpp was a crashy mess. Begin making a simple web server from scratch.
* Add Designs/ folder to document nontrivial things like the webserver
design
|
| |
|
|
|
|
| |
It's a crashy mess, but it sort of works.
* Add Transcript class to send transcription segments between layers
|
| |
|
|
|
| |
Browser source queries /api/transcript at 10Hz via jquery and renders
the response.
|
| |
|
|
|
|
| |
Documented in BrowserSource::Run().
* Parameterize Release/Debug in build scripts
|
| |
|
|
|
|
|
| |
Browser source can be started and stopped via the UI. It still serves a
hello world json blob.
Observing occasional crashes when stopping the C++ transcription engine.
|
| |
|
|
|
| |
Synchronous multiprocessing layer now accepts a callback, which the
caller can use to stream output to the UI.
|
| |
|
|
|
|
| |
Not wired up yet.
* Add browser source fields to persistent config
|
| |
|
|
| |
* Fix oatpp fetch and build
|
| |
|
|
|
|
|
|
|
|
|
| |
Use raw WIN32 APIs to launch processes instead of wxProcess. This
enables spawning processes from arbitrary thread contexts, such as
std::async or std::thread.
In the future, this layer should be redone to support streaming output.
* TODO: update setting path. This is almost certainly broken for users
without git installed. Test in VM!
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that you cannot spawn a wxProcess from an independent thread
of execution. I imagine they're supposed to be spawned from the main
thread.
Fuck that, I'm going to try to use the raw WIN32 API to spawn helper
processes, and do it from arbitrary thread context.
* Log() now delegates to a queue which the main thread periodically
drains.
* Log() now writes to a file.
* WhisperCPP thread is now done with std::async.
* Default chars per sync is now 8
* oatpp: Promising web framework.
|
| |
|
|
|
|
|
|
|
| |
* Filter out transcriptions like " (music)"
* Whisper mic choice auto-populates with queried values
* No more manually lining up numbers!
* Persist whisper mic in config
* Remove setup and dump mics button from Whisper page
* Redesign makes these unnecessary
|
| |
|
|
|
|
|
|
| |
Use Const-me/Whisper to perform transcription. This implementation is
vastly more efficient: CPU usage, memory usage, and VRAM usage are all
dramatically reduced. It's slightly less accurate when comparing the
same model (due to the lack of beam search decoding), but since you can
use larger models, the impact is largely a wash.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When you generate Unity assets, you have to configure
rows/cols/chars per sync/ bytes per char. When you switch over to the
transcription panel, these choices will be automatically populated.
This should reduce accidental mismatch between the two panels.
* Merge Config classes. Now just use one big AppConfig class instead of
one class per panel.
* Factor out (most) input field initialization into a function. Call it
when switching panels so input fields synchronize.
* Wrap a lot of lines at 80 columns.
* Add -skip_zip switch to package.ps1.
|
| |
|
|
|
| |
It's much faster (and friendlier to upstream providers) to back up and
restore the venv instead of re-downloading every time.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Done:
* Users can add images to Fonts/Emotes/
* The basename of that image ('clueless.png' becomes 'clueless') is the
keyword to make the image show up in game.
* Fix a bug in the shader where letters on the 2nd texture and later
would have UV outside of [0.0, 1.0]
Not yet implemented:
* transcribed words are encoded using emotes mapping
|
| |
|
|
|
|
|
|
|
|
|
|
| |
VRC SDK does not correctly regenerate OSC configs when adding
parameters to an avatar, causing the custom text box to be
non-functional for new users. This checkbox clears configs,
forcing the SDK to fully regenerate them on upload.
* Make start/stop transcription buttons bigger so they're easier to
click in VR.
* Fix a couple tooltip messages.
* Tooltips take much longer to disappear.
|
| |
|
|
|
| |
Based on screenshots seen in Discord. This filter is just here to
maximize user privacy while debugging.
|
| |
|
|
|
|
|
|
| |
Add debug panel with options to show installed packages, clear the pip
cache, reset venv, and clear OSC configs.
* Refactor synchronous command execution + logging pattern inside
PythonWrapper
|
| |
|
|
|
|
|
|
|
|
|
|
| |
I was using this file to constrain the set of paths that Python can see,
but since `future` doesn't have a wheel, it will fail to install on a
fresh system.
If you set pip's --cache-dir to some new directory, you'll see it fail
to install.
The _pth doesn't really seem to matter, since without it, packages are
still installed under the virtual environment.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
TaSTT shader now uses physically based rendering (PBR). Users can pick
smoothness, metallic, and emissive.
This implementation borrows heavily from catlikecoding.com's excellent
tutorials, which are released under MIT No Attribution (MIT-0).
https://catlikecoding.com/unity/tutorials/license/
To retain what little clarity remains in the shader, I have chosen not
to attribute the code in the source itself.
|
| |
|
|
|
|
| |
We use a button to start/stop transcription. Previously this was
hardcoded to left joystick. Now users can pick from {left, right} x
{joystick, a, b}.
|
| |
|
|
|
|
|
| |
This seems to be the canonical way of listing a Python app's
dependencies.
* Installing dependencies no longer hangs the GUI
|
| |
|
|
| |
Whisper doesn't like 0.18.3, so downgrade to the last version.
|
| |
|
|
|
|
|
| |
Don't literally check in Python since it looks dodgy (rightfully so).
Instead the build script just fetches it.
* Update README, simplifying language and documenting other projects
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
VRChat exposes a built-in chatbox which can be seen by anyone who has
it enabled. This was not the case when I started this project: the
chatbox would only be visible to friends. Since this is clearly useful,
enabling the STT on public models, let's enable sending data to it.
Caveats:
* The built-in chatbox has anti-spam tech which limits us to updating
about once every 2 seconds. The custom chatbox has no such limitation
and is thus typically much faster.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
I found that I tend to regenerate the animator on the same avatar a lot,
requiring me to re-enter the same paths and parameters over and over
again. Persist them across restarts.
* Refactor Config classes
* Use safe `get_if` instead of the exception-throwing `operator>>` when
deserializing from YAML
* Begin sketching out Log singleton
* Put Quote() and Unquote() into their own little lib; they shouldn't
hide inside PythonApp
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The configuration of the transcription app, such as the number of rows
and columns in the text box, now persists across app restarts. I found
that I would have to change from the defaults to my preferred config
every time I started up in VR, which was annoying. Now we just start
with the config that was set last time.
* Add dependency on rapidyaml (MIT)
* Serialize transcription config to file under Resources/
* Add Config class to wrap serializing/deserializing
* Update build instructions
* Simplify StartApp() API, taking Config struct instead of a ton of
arguments
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, paths containing spaces would be interpreted by python's argument
parser as multiple separate arguments, causing it to fail. Now we escape paths
inside PythonWrapper using std::quoted().
* Improve PII filtering. Python output would contain multiple path separators
(like C:\\Users\\foo\\), defeating the PII regex.
* Silence compiler warning in PII filter.
* Document usability improvements.
* Transcription layer exponential backoff goes to ~infinity when paused.
This is a hack, since we really don't need to transcribe at all when paused,
but it lets us keep the code simple. Good enough until the next rewrite.
* Shader only samples background when necessary.
* Limit matchStrings() print()s to DEBUG mode
|
| |
|
|
|
| |
* Expose option to run transcription engine on CPU instead of GPU
* Use embedded git when setting up the Python virtual environment
|
| |
|
|
|
|
|
| |
package.ps1 fetches PortableGit and embeds it in the package. This
eliminates all but one runtime dependency (MSVC++ Redistributable).
* Move Python into a new FOSS folder.
|
| |
|
|
| |
Update build instructions.
|
| |
|
|
|
|
|
|
| |
Boards whose size is an even multiple of CHARS_PER_SYNC would lose the
entire last region.
* Attempt to fix runaway memory usage of GUI text frames, but this needs
more work
|
| |
|
|
| |
The defaults now reflect what I typically use.
|
| |
|
|
|
| |
Users can pick longer transcription durations for accuracy-critical
tasks, or shorter durations for latency-critical tasks.
|
| |
|
|
|
|
|
|
| |
Users can now control how many letters wide and tall the board is.
Tested at 4x48, 5x60, 10x120, and 20x240. At 20x240, Unity freezes and
does not make forward progress. Perhaps creating 4800 float parameters
isn't a truly scalable interface.
|
| |
|
|
|
| |
Users can now see the number of avatar parameter bits they'll use
prior to committing.
|
| |
|
|
|
|
|
|
|
| |
An off-by-one issue in numRegions() would result in one extra layer
trying to drive a letter in the last region, which would wrap back
around to the 0th character slot (cell).
* GUI explicitly logs when it's done generating avatar stuff
* OSC layer no longer tries to update cells which don't exist
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The transcription engine beeps when you start/stop transcribing so you know
that it's listening. Users can now disable this.
* add help text to all input fields in GUI
* make TaSTT generated file textctrls readonly, since I haven't tested
them being reassigned
* document idea to configure unity & transcription apps with config files
* controller input thread no longer crashes if steamvr isn't running, it just
slowly spins and waits
* when you stop transcribing, the transcription engine re-transcribes a few
times. I think this should improve end-of-transcription tail latencies
* transcribe.py now prints out its args
|
| |
|
|
|
| |
The transcribe panel was grabbing data from the unity panel, causing the
bytes per char / chars per sync parameters to be ignored.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Users can now control how many characters they send per sync event, as
well as the number of bytes used to represent each character.
This gives them the power to pick between faster paging and fewer sync
params.
International users must use 2 bytes per char (at least for now).
* package.ps1: don't distribute the gigantic TTF files, just the bitmaps
|
| |
|
|
|
| |
This makes incremental workflows much more efficient, since you don't
have to reassign the FX controller, params, and menu.
|