| Commit message (Collapse) | Author | Age |
| |
|
|
|
|
| |
Users can now configure a keybind to start/stop/dismiss the STT when in
desktop mode. The default keybind is ctrl+x, since by default VRC
doesn't use 'x' for anything.
|
| |\
| |
| | |
Fix accidental semicolon typo
|
| |/ |
|
| |
|
|
|
|
| |
Useful on devices with multiple GPUs, such as gaming laptops.
* Update GUI/README.md.
|
| |
|
|
|
|
| |
See comment for details.
* Update README
|
| |
|
|
|
|
|
|
|
| |
faster-whisper doesn't need it. This reduces install size from 6.00GB
with base.en model to 1.70GB.
* Use a single sampler in shader (enables using more than 16 textures)
* Minor legibility regression - need to improve AA.
* Enable backface culling in shader (minor performance win)
|
| |
|
|
|
|
| |
Affinity mask no longer affects performance. String matching is still
needed for temporal stability in fast-paced long-form transcription
tasks.
|
| |
|
|
|
|
|
| |
Depth was being calculated wrong, causing text box to render behind
objects it's in front of.
* Fix package.ps1 compression. 7z was increasing file size, somehow.
|
| |
|
|
| |
I'm able to use the new code to show text in game. Not yet play-tested.
|
| |
|
|
|
|
| |
This is a much faster, lower-VRAM reimplementation of Whisper in Python.
Early testing is extremely promising: fast transcription speed,
extremely low resource usage (CPU/RAM/VRAM), high accuracy.
|
| |
|
|
| |
Intended to avoid accidentally releasing dirty environments.
|
| |
|
|
|
|
| |
Use `pip freeze` and `pip uninstall` to reset the venv to a near-default
state. Filter out `future` since we need to vendor it. If it ever gets
removed, the installation is borked.
|
| |
|
|
|
|
|
|
| |
This dependency fails to install with the embedded python, so now it's
vendored.
Installing pip after wheel would result in wheel reinstalling, so we
also vendor pip.
|
| |
|
|
|
|
|
|
| |
This fixes issues where the transparent corners of the textbox render
in front of other materials, causing those other materials to skip
rendering.
* Update README.md with roadmap and avatar resource usage.
|
| |
|
|
|
|
|
|
|
|
| |
We used to populate 7 4k textures + 1 2k texture for all users.
Now if the user has configured `bytes_per_char=1` in the Unity
panel, we just populate a single 512x512 texture containing the
first 128 ASCII characters.
This reduces texture memory usage by 99.74%, from 134.67 MB to
340 KB.
|
| |
|
|
|
| |
Need python310._pth, specifically 'import site' line, for
embedded python + pip to get along.
|
| |
|
|
|
| |
If you don't have Python installed, venv setup will fail. Begin work
fixing environment config so `pip install` uses vendored Python.
|
| |
|
|
|
|
|
|
|
| |
A user saw an error like `ModuleNotFoundError: No module named _socket`.
StackOverflow blames this on PYTHONPATH, so let's try setting it.
* Fix latent bug in Scripts/transcribe.py. PyAudio.open() positional
parameters must be specified in correct order, even when telling it
which parameter is which. *shrug*
|
| |
|
|
|
|
|
|
| |
Expose decode method, beam search parameters, and voice activity
detection parameters in GUI.
* Remove WhisperCPP::Init(), do it on launch instead.
* Add float support to ConfigMarshal
|
| |
|
|
|
|
| |
Twofold approach:
* All spawned processes have the desired path (new codepath)
* Setup command silences the warning (old codepath)
|
| |
|
|
|
|
| |
Do these in a std::future.
* SetAffinityMask() now returns a value on all control paths
|
| | |
|
| |
|
|
|
|
|
|
|
| |
A user pointed out that constraining the Python implmentation to a
single core does not affect visible latency. This seems true on my
PC as well.
* Reimplement Python transcription wxProcess as a std::async.
App shutdown is much faster now.
|
| |
|
|
|
| |
* Plumb beam search params into whisper cpp implementation
(currently broken)
|
| |
|
|
|
| |
Things like " (static)" and " *explosions*" were showing up a lot with
ggml-medium.bin. Filter them out.
|
| |
|
|
|
|
|
| |
Use forked Whisper implementation which has tweaks to reduce dropped
words around the beginning VAD segments.
* Retain audio after VAD segmentation events
|
| |
|
|
|
|
|
|
| |
* Pip install, dependency install, and model download can be gracefully
interrupted and resume later.
* Mic list was pointing at freed memory. Fix this by copying into the
heap with std::unique_ptr()s. Mic list in CPP panel is much more
reliable now.
|
| |
|
|
| |
Not ready yet.
|
| | |
|
| |
|
|
|
|
|
|
|
|
| |
Sort of a misnomer. The idea is to use C++ for transcription and Python
for steamvr and OSC.
Having issues getting output from multithreaded Python code. Not in the
mood to figure this out today.
* Hide unimplemented parts of C++ panel.
|
| |
|
|
| |
Simplifies debugging process.
|
| |
|
|
|
|
|
|
|
|
|
| |
Rapidyaml started refusing to parse config files so I dropped it.
* Add ConfigMarshal clas to support very simple config marshalling
* No versioning, no type indicators, nothing.
* Supports int, bool, and string.
* Bool are serialized as int.
* Log no longer segfaults if given nullptr wxTextCtrl*.
* Fix how whisper CPP GUI fields restore from config
|
| |
|
|
|
| |
* Implement HTTPMapper classes
* Browser source respects user-configured source port
|
| |
|
|
|
|
| |
Server needs to parse incoming HTTP.
* Server spawns a thread for each incoming connection
|
| |
|
|
|
|
|
| |
oatpp was a crashy mess. Begin making a simple web server from scratch.
* Add Designs/ folder to document nontrivial things like the webserver
design
|
| |
|
|
|
|
| |
It's a crashy mess, but it sort of works.
* Add Transcript class to send transcription segments between layers
|
| |
|
|
|
| |
Browser source queries /api/transcript at 10Hz via jquery and renders
the response.
|
| |
|
|
|
|
| |
Documented in BrowserSource::Run().
* Parameterize Release/Debug in build scripts
|
| |
|
|
|
|
|
| |
Browser source can be started and stopped via the UI. It still serves a
hello world json blob.
Observing occasional crashes when stopping the C++ transcription engine.
|
| |
|
|
|
| |
Synchronous multiprocessing layer now accepts a callback, which the
caller can use to stream output to the UI.
|
| |
|
|
|
|
| |
Not wired up yet.
* Add browser source fields to persistent config
|
| |
|
|
| |
* Fix oatpp fetch and build
|
| |
|
|
|
|
|
|
| |
This reverts commit cece1ee8f1b985c2a89adb661dd02c6d44787f67.
This does *not* in fact result in improved temporal stability. It makes
makes things so unstable that even single-sentence messages fail to
ever stabilize.
|
| |
|
|
|
|
|
|
|
|
|
| |
Use raw WIN32 APIs to launch processes instead of wxProcess. This
enables spawning processes from arbitrary thread contexts, such as
std::async or std::thread.
In the future, this layer should be redone to support streaming output.
* TODO: update setting path. This is almost certainly broken for users
without git installed. Test in VM!
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It appears that you cannot spawn a wxProcess from an independent thread
of execution. I imagine they're supposed to be spawned from the main
thread.
Fuck that, I'm going to try to use the raw WIN32 API to spawn helper
processes, and do it from arbitrary thread context.
* Log() now delegates to a queue which the main thread periodically
drains.
* Log() now writes to a file.
* WhisperCPP thread is now done with std::async.
* Default chars per sync is now 8
* oatpp: Promising web framework.
|
| |
|
|
|
|
|
|
|
| |
* Filter out transcriptions like " (music)"
* Whisper mic choice auto-populates with queried values
* No more manually lining up numbers!
* Persist whisper mic in config
* Remove setup and dump mics button from Whisper page
* Redesign makes these unnecessary
|
| |
|
|
|
|
|
|
| |
Use Const-me/Whisper to perform transcription. This implementation is
vastly more efficient: CPU usage, memory usage, and VRAM usage are all
dramatically reduced. It's slightly less accurate when comparing the
same model (due to the lack of beam search decoding), but since you can
use larger models, the impact is largely a wash.
|
| |
|
|
|
| |
Per the Whisper source code, this should result in better temporal
stability.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When you generate Unity assets, you have to configure
rows/cols/chars per sync/ bytes per char. When you switch over to the
transcription panel, these choices will be automatically populated.
This should reduce accidental mismatch between the two panels.
* Merge Config classes. Now just use one big AppConfig class instead of
one class per panel.
* Factor out (most) input field initialization into a function. Call it
when switching panels so input fields synchronize.
* Wrap a lot of lines at 80 columns.
* Add -skip_zip switch to package.ps1.
|
| |
|
|
|
|
|
|
|
|
| |
Allows sustained exponential backoff when not transcribing. Used to cap
out at 1s.
* Add more items to README TODO list
* Adjust emote metadata
* Emotes bugfix: Non-existent emote map doesn't cause transcription
engine to bail out.
|