summaryrefslogtreecommitdiffstats
path: root/GUI
Commit message (Collapse)AuthorAge
* Reduce texture memory usage for English speakersv0.10.0yum2023-03-21
| | | | | | | | | | We used to populate 7 4k textures + 1 2k texture for all users. Now if the user has configured `bytes_per_char=1` in the Unity panel, we just populate a single 512x512 texture containing the first 128 ASCII characters. This reduces texture memory usage by 99.74%, from 134.67 MB to 340 KB.
* Fix _socket module not found issueyum2023-03-21
| | | | | Need python310._pth, specifically 'import site' line, for embedded python + pip to get along.
* Begin work fixing venv setupyum2023-03-09
| | | | | If you don't have Python installed, venv setup will fail. Begin work fixing environment config so `pip install` uses vendored Python.
* Set PYTHONPATH in synchronous multiprocessing layeryum2023-03-08
| | | | | | | | | A user saw an error like `ModuleNotFoundError: No module named _socket`. StackOverflow blames this on PYTHONPATH, so let's try setting it. * Fix latent bug in Scripts/transcribe.py. PyAudio.open() positional parameters must be specified in correct order, even when telling it which parameter is which. *shrug*
* Expose more C++ whisper parameters in GUIyum2023-03-08
| | | | | | | | Expose decode method, beam search parameters, and voice activity detection parameters in GUI. * Remove WhisperCPP::Init(), do it on launch instead. * Add float support to ConfigMarshal
* Silence virtual env setup PATH warningsyum2023-03-06
| | | | | | Twofold approach: * All spawned processes have the desired path (new codepath) * Setup command silences the warning (old codepath)
* Animator generation and dumping mics no longer hang GUIv0.9.0yum2023-03-05
| | | | | | Do these in a std::future. * SetAffinityMask() now returns a value on all control paths
* Implement thread affinity optimization for Python transcription engineyum2023-02-28
| | | | | | | | | A user pointed out that constraining the Python implmentation to a single core does not affect visible latency. This seems true on my PC as well. * Reimplement Python transcription wxProcess as a std::async. App shutdown is much faster now.
* Bugfix: fix use-after-free in GetMicsImplyum2023-02-28
| | | | | * Plumb beam search params into whisper cpp implementation (currently broken)
* Filter out more transcription noisev0.8.2yum2023-02-26
| | | | | Things like " (static)" and " *explosions*" were showing up a lot with ggml-medium.bin. Filter them out.
* Improve behavior around VAD segmentation eventsyum2023-02-26
| | | | | | | Use forked Whisper implementation which has tweaks to reduce dropped words around the beginning VAD segments. * Retain audio after VAD segmentation events
* CPP implementation refinementsyum2023-02-26
| | | | | | | | * Pip install, dependency install, and model download can be gracefully interrupted and resume later. * Mic list was pointing at freed memory. Fix this by copying into the heap with std::unique_ptr()s. Mic list in CPP panel is much more reliable now.
* Bugfix: C++ transcription engine should not launch OSC layerv0.8.1yum2023-02-26
| | | | Not ready yet.
* Bugfix: add vendored git to PATHv0.8.0yum2023-02-26
|
* Begin work on C++ custom chatboxyum2023-02-26
| | | | | | | | | | Sort of a misnomer. The idea is to use C++ for transcription and Python for steamvr and OSC. Having issues getting output from multithreaded Python code. Not in the mood to figure this out today. * Hide unimplemented parts of C++ panel.
* Convert most PythonWrapper wxLogError() to Log()yum2023-02-25
| | | | Simplifies debugging process.
* Drop rymlyum2023-02-25
| | | | | | | | | | | Rapidyaml started refusing to parse config files so I dropped it. * Add ConfigMarshal clas to support very simple config marshalling * No versioning, no type indicators, nothing. * Supports int, bool, and string. * Bool are serialized as int. * Log no longer segfaults if given nullptr wxTextCtrl*. * Fix how whisper CPP GUI fields restore from config
* Complete OBS browser sourceyum2023-02-25
| | | | | * Implement HTTPMapper classes * Browser source respects user-configured source port
* Add HTTP parseryum2023-02-25
| | | | | | Server needs to parse incoming HTTP. * Server spawns a thread for each incoming connection
* Begin work on custom webserveryum2023-02-25
| | | | | | | oatpp was a crashy mess. Begin making a simple web server from scratch. * Add Designs/ folder to document nontrivial things like the webserver design
* Finish browser source proof-of-conceptyum2023-02-24
| | | | | | It's a crashy mess, but it sort of works. * Add Transcript class to send transcription segments between layers
* Add HTML for BrowserSourceyum2023-02-24
| | | | | Browser source queries /api/transcript at 10Hz via jquery and renders the response.
* Add hack to prevent browser source crash on shutdownyum2023-02-24
| | | | | | Documented in BrowserSource::Run(). * Parameterize Release/Debug in build scripts
* Wire up browser sourceyum2023-02-23
| | | | | | | Browser source can be started and stopped via the UI. It still serves a hello world json blob. Observing occasional crashes when stopping the C++ transcription engine.
* Implement streaming output for synchronous multiprocessing layeryum2023-02-23
| | | | | Synchronous multiprocessing layer now accepts a callback, which the caller can use to stream output to the UI.
* Add input fields for browser sourceyum2023-02-22
| | | | | | Not wired up yet. * Add browser source fields to persistent config
* Begin sketching out browser sourceyum2023-02-22
| | | | * Fix oatpp fetch and build
* Finish reimplementing synchronous process layeryum2023-02-22
| | | | | | | | | | | Use raw WIN32 APIs to launch processes instead of wxProcess. This enables spawning processes from arbitrary thread contexts, such as std::async or std::thread. In the future, this layer should be redone to support streaming output. * TODO: update setting path. This is almost certainly broken for users without git installed. Test in VM!
* Checkpoint: begin work reimplementing processesyum2023-02-22
| | | | | | | | | | | | | | | | It appears that you cannot spawn a wxProcess from an independent thread of execution. I imagine they're supposed to be spawned from the main thread. Fuck that, I'm going to try to use the raw WIN32 API to spawn helper processes, and do it from arbitrary thread context. * Log() now delegates to a queue which the main thread periodically drains. * Log() now writes to a file. * WhisperCPP thread is now done with std::async. * Default chars per sync is now 8 * oatpp: Promising web framework.
* Various refinementsyum2023-02-22
| | | | | | | | | * Filter out transcriptions like " (music)" * Whisper mic choice auto-populates with queried values * No more manually lining up numbers! * Persist whisper mic in config * Remove setup and dump mics button from Whisper page * Redesign makes these unnecessary
* Begin work on C++ implementationyum2023-02-22
| | | | | | | | Use Const-me/Whisper to perform transcription. This implementation is vastly more efficient: CPU usage, memory usage, and VRAM usage are all dramatically reduced. It's slightly less accurate when comparing the same model (due to the lack of beam search decoding), but since you can use larger models, the impact is largely a wash.
* Transcription and Unity input fields now auto-synchronizeyum2023-02-19
| | | | | | | | | | | | | | | When you generate Unity assets, you have to configure rows/cols/chars per sync/ bytes per char. When you switch over to the transcription panel, these choices will be automatically populated. This should reduce accidental mismatch between the two panels. * Merge Config classes. Now just use one big AppConfig class instead of one class per panel. * Factor out (most) input field initialization into a function. Call it when switching panels so input fields synchronize. * Wrap a lot of lines at 80 columns. * Add -skip_zip switch to package.ps1.
* Add venv backup/restore functionyum2023-02-13
| | | | | It's much faster (and friendlier to upstream providers) to back up and restore the venv instead of re-downloading every time.
* Begin work adding emotesyum2023-02-13
| | | | | | | | | | | | Done: * Users can add images to Fonts/Emotes/ * The basename of that image ('clueless.png' becomes 'clueless') is the keyword to make the image show up in game. * Fix a bug in the shader where letters on the 2nd texture and later would have UV outside of [0.0, 1.0] Not yet implemented: * transcribed words are encoded using emotes mapping
* Add checkbox to clear OSC configsv0.6.0yum2023-02-12
| | | | | | | | | | | | VRC SDK does not correctly regenerate OSC configs when adding parameters to an avatar, causing the custom text box to be non-functional for new users. This checkbox clears configs, forcing the SDK to fully regenerate them on upload. * Make start/stop transcription buttons bigger so they're easier to click in VR. * Fix a couple tooltip messages. * Tooltips take much longer to disappear.
* Improve PII filteryum2023-02-06
| | | | | Based on screenshots seen in Discord. This filter is just here to maximize user privacy while debugging.
* GUI: Add debug panelv0.5.0yum2023-02-04
| | | | | | | | Add debug panel with options to show installed packages, clear the pip cache, reset venv, and clear OSC configs. * Refactor synchronous command execution + logging pattern inside PythonWrapper
* Delete python310._pthyum2023-01-28
| | | | | | | | | | | | I was using this file to constrain the set of paths that Python can see, but since `future` doesn't have a wheel, it will fail to install on a fresh system. If you set pip's --cache-dir to some new directory, you'll see it fail to install. The _pth doesn't really seem to matter, since without it, packages are still installed under the virtual environment.
* Finish basic PBR shadingyum2023-01-25
| | | | | | | | | | | | | TaSTT shader now uses physically based rendering (PBR). Users can pick smoothness, metallic, and emissive. This implementation borrows heavily from catlikecoding.com's excellent tutorials, which are released under MIT No Attribution (MIT-0). https://catlikecoding.com/unity/tutorials/license/ To retain what little clarity remains in the shader, I have chosen not to attribute the code in the source itself.
* GUI: Add ability to choose buttonyum2023-01-25
| | | | | | We use a button to start/stop transcription. Previously this was hardcoded to left joystick. Now users can pick from {left, right} x {joystick, a, b}.
* Use requirements.txt for Scripts/yum2023-01-25
| | | | | | | This seems to be the canonical way of listing a Python app's dependencies. * Installing dependencies no longer hangs the GUI
* Bugfix: Use future 0.18.2 instead of 0.18.3yum2023-01-23
| | | | Whisper doesn't like 0.18.3, so downgrade to the last version.
* package.ps1 now fetches all dependenciesyum2023-01-23
| | | | | | | Don't literally check in Python since it looks dodgy (rightfully so). Instead the build script just fetches it. * Update README, simplifying language and documenting other projects
* Bugfix: shader now respects bytes per charv0.3.1yum2023-01-22
|
* Enable using built-in chatboxv0.3yum2023-01-22
| | | | | | | | | | | | | VRChat exposes a built-in chatbox which can be seen by anyone who has it enabled. This was not the case when I started this project: the chatbox would only be visible to friends. Since this is clearly useful, enabling the STT on public models, let's enable sending data to it. Caveats: * The built-in chatbox has anti-spam tech which limits us to updating about once every 2 seconds. The custom chatbox has no such limitation and is thus typically much faster.
* GUI: Save Unity input fields across app restartsyum2023-01-22
| | | | | | | | | | | | | I found that I tend to regenerate the animator on the same avatar a lot, requiring me to re-enter the same paths and parameters over and over again. Persist them across restarts. * Refactor Config classes * Use safe `get_if` instead of the exception-throwing `operator>>` when deserializing from YAML * Begin sketching out Log singleton * Put Quote() and Unquote() into their own little lib; they shouldn't hide inside PythonApp
* GUI: Persist transcription app configv0.2yum2023-01-06
| | | | | | | | | | | | | | | The configuration of the transcription app, such as the number of rows and columns in the text box, now persists across app restarts. I found that I would have to change from the defaults to my preferred config every time I started up in VR, which was annoying. Now we just start with the config that was set last time. * Add dependency on rapidyaml (MIT) * Serialize transcription config to file under Resources/ * Add Config class to wrap serializing/deserializing * Update build instructions * Simplify StartApp() API, taking Config struct instead of a ton of arguments
* Bugfix: user-provided paths may now contain spacesyum2023-01-04
| | | | | | | | | | | | | | | | Previously, paths containing spaces would be interpreted by python's argument parser as multiple separate arguments, causing it to fail. Now we escape paths inside PythonWrapper using std::quoted(). * Improve PII filtering. Python output would contain multiple path separators (like C:\\Users\\foo\\), defeating the PII regex. * Silence compiler warning in PII filter. * Document usability improvements. * Transcription layer exponential backoff goes to ~infinity when paused. This is a hack, since we really don't need to transcribe at all when paused, but it lets us keep the code simple. Good enough until the next rewrite. * Shader only samples background when necessary. * Limit matchStrings() print()s to DEBUG mode
* Portability bugfixesyum2023-01-01
| | | | | * Expose option to run transcription engine on CPU instead of GPU * Use embedded git when setting up the Python virtual environment
* Embed git in packageyum2023-01-01
| | | | | | | package.ps1 fetches PortableGit and embeds it in the package. This eliminates all but one runtime dependency (MSVC++ Redistributable). * Move Python into a new FOSS folder.