TaSTT.git - Free self-hosted STT for VRChat.

	Commit message (Collapse)	Author	Age
*	Filter out more transcription noisev0.8.2	yum	2023-02-26
\| \| \| \| \|	Things like " (static)" and " explosions" were showing up a lot with ggml-medium.bin. Filter them out.
*	Improve behavior around VAD segmentation events	yum	2023-02-26
\| \| \| \| \| \| \|	Use forked Whisper implementation which has tweaks to reduce dropped words around the beginning VAD segments. * Retain audio after VAD segmentation events
*	CPP implementation refinements	yum	2023-02-26
\| \| \| \| \| \| \| \|	* Pip install, dependency install, and model download can be gracefully interrupted and resume later. * Mic list was pointing at freed memory. Fix this by copying into the heap with std::unique_ptr()s. Mic list in CPP panel is much more reliable now.
*	Bugfix: C++ transcription engine should not launch OSC layerv0.8.1	yum	2023-02-26
\| \| \| \|	Not ready yet.
*	Bugfix: add vendored git to PATHv0.8.0	yum	2023-02-26
\|
*	Begin work on C++ custom chatbox	yum	2023-02-26
\| \| \| \| \| \| \| \| \| \|	Sort of a misnomer. The idea is to use C++ for transcription and Python for steamvr and OSC. Having issues getting output from multithreaded Python code. Not in the mood to figure this out today. * Hide unimplemented parts of C++ panel.
*	Convert most PythonWrapper wxLogError() to Log()	yum	2023-02-25
\| \| \| \|	Simplifies debugging process.
*	Drop ryml	yum	2023-02-25
\| \| \| \| \| \| \| \| \| \| \|	Rapidyaml started refusing to parse config files so I dropped it. * Add ConfigMarshal clas to support very simple config marshalling * No versioning, no type indicators, nothing. * Supports int, bool, and string. * Bool are serialized as int. * Log no longer segfaults if given nullptr wxTextCtrl. Fix how whisper CPP GUI fields restore from config
*	Complete OBS browser source	yum	2023-02-25
\| \| \| \| \|	* Implement HTTPMapper classes * Browser source respects user-configured source port
*	Add HTTP parser	yum	2023-02-25
\| \| \| \| \| \|	Server needs to parse incoming HTTP. * Server spawns a thread for each incoming connection
*	Begin work on custom webserver	yum	2023-02-25
\| \| \| \| \| \| \|	oatpp was a crashy mess. Begin making a simple web server from scratch. * Add Designs/ folder to document nontrivial things like the webserver design
*	Finish browser source proof-of-concept	yum	2023-02-24
\| \| \| \| \| \|	It's a crashy mess, but it sort of works. * Add Transcript class to send transcription segments between layers
*	Add HTML for BrowserSource	yum	2023-02-24
\| \| \| \| \|	Browser source queries /api/transcript at 10Hz via jquery and renders the response.
*	Add hack to prevent browser source crash on shutdown	yum	2023-02-24
\| \| \| \| \| \|	Documented in BrowserSource::Run(). * Parameterize Release/Debug in build scripts
*	Wire up browser source	yum	2023-02-23
\| \| \| \| \| \| \|	Browser source can be started and stopped via the UI. It still serves a hello world json blob. Observing occasional crashes when stopping the C++ transcription engine.
*	Implement streaming output for synchronous multiprocessing layer	yum	2023-02-23
\| \| \| \| \|	Synchronous multiprocessing layer now accepts a callback, which the caller can use to stream output to the UI.
*	Add input fields for browser source	yum	2023-02-22
\| \| \| \| \| \|	Not wired up yet. * Add browser source fields to persistent config
*	Begin sketching out browser source	yum	2023-02-22
\| \| \| \|	* Fix oatpp fetch and build
*	Revert "Apply previous window conditioning to decoding layer"	yum	2023-02-22
\| \| \| \| \| \| \| \|	This reverts commit cece1ee8f1b985c2a89adb661dd02c6d44787f67. This does not in fact result in improved temporal stability. It makes makes things so unstable that even single-sentence messages fail to ever stabilize.
*	Finish reimplementing synchronous process layer	yum	2023-02-22
\| \| \| \| \| \| \| \| \| \| \|	Use raw WIN32 APIs to launch processes instead of wxProcess. This enables spawning processes from arbitrary thread contexts, such as std::async or std::thread. In the future, this layer should be redone to support streaming output. * TODO: update setting path. This is almost certainly broken for users without git installed. Test in VM!
*	Checkpoint: begin work reimplementing processes	yum	2023-02-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It appears that you cannot spawn a wxProcess from an independent thread of execution. I imagine they're supposed to be spawned from the main thread. Fuck that, I'm going to try to use the raw WIN32 API to spawn helper processes, and do it from arbitrary thread context. * Log() now delegates to a queue which the main thread periodically drains. * Log() now writes to a file. * WhisperCPP thread is now done with std::async. * Default chars per sync is now 8 * oatpp: Promising web framework.
*	Various refinements	yum	2023-02-22
\| \| \| \| \| \| \| \| \|	* Filter out transcriptions like " (music)" * Whisper mic choice auto-populates with queried values * No more manually lining up numbers! * Persist whisper mic in config * Remove setup and dump mics button from Whisper page * Redesign makes these unnecessary
*	Begin work on C++ implementation	yum	2023-02-22
\| \| \| \| \| \| \| \|	Use Const-me/Whisper to perform transcription. This implementation is vastly more efficient: CPU usage, memory usage, and VRAM usage are all dramatically reduced. It's slightly less accurate when comparing the same model (due to the lack of beam search decoding), but since you can use larger models, the impact is largely a wash.
*	Apply previous window conditioning to decoding layer	yum	2023-02-22
\| \| \| \| \|	Per the Whisper source code, this should result in better temporal stability.
*	Transcription and Unity input fields now auto-synchronize	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When you generate Unity assets, you have to configure rows/cols/chars per sync/ bytes per char. When you switch over to the transcription panel, these choices will be automatically populated. This should reduce accidental mismatch between the two panels. * Merge Config classes. Now just use one big AppConfig class instead of one class per panel. * Factor out (most) input field initialization into a function. Call it when switching panels so input fields synchronize. * Wrap a lot of lines at 80 columns. * Add -skip_zip switch to package.ps1.
*	Remove exponential backoff capv0.7.0	yum	2023-02-19
\| \| \| \| \| \| \| \| \| \|	Allows sustained exponential backoff when not transcribing. Used to cap out at 1s. * Add more items to README TODO list * Adjust emote metadata * Emotes bugfix: Non-existent emote map doesn't cause transcription engine to bail out.
*	Update .gitignore files	yum	2023-02-13
\|
*	Add venv backup/restore function	yum	2023-02-13
\| \| \| \| \|	It's much faster (and friendlier to upstream providers) to back up and restore the venv instead of re-downloading every time.
*	Add hack to reduce outlines around emotes	yum	2023-02-13
\| \| \| \| \| \| \| \|	Don't render any part of an emote with alpha < 0.5. Improves visual clarity in the common case at the cost of generality. * Emotes now use physically-based shading. * Use round() to denoise shader parameters instead of floor()
*	Finish emotes	yum	2023-02-13
\| \| \| \| \| \| \| \| \| \| \|	Emotes require 2 bytes per char. They're encoded into the region [0xE000, infinity). The texture is 4k, and uses 1k vertical pixels per emote segment, for a maximum of 32 segments. * Reduce volume of noise indicator by 90%. Quiet is probably better. Might want to add a volume slider idk. * Bugfix: emotes without a transparency channel now work * Address a couple Unity performance complaints about the shader
*	Begin work adding emotes	yum	2023-02-13
\| \| \| \| \| \| \| \| \| \| \| \|	Done: * Users can add images to Fonts/Emotes/ * The basename of that image ('clueless.png' becomes 'clueless') is the keyword to make the image show up in game. * Fix a bug in the shader where letters on the 2nd texture and later would have UV outside of [0.0, 1.0] Not yet implemented: * transcribed words are encoded using emotes mapping
*	Add checkbox to clear OSC configsv0.6.0	yum	2023-02-12
\| \| \| \| \| \| \| \| \| \| \| \|	VRC SDK does not correctly regenerate OSC configs when adding parameters to an avatar, causing the custom text box to be non-functional for new users. This checkbox clears configs, forcing the SDK to fully regenerate them on upload. * Make start/stop transcription buttons bigger so they're easier to click in VR. * Fix a couple tooltip messages. * Tooltips take much longer to disappear.
*	Shader now supports custom cubemap	yum	2023-02-06
\| \| \| \|	Applied to both PBS and TaSTT shaders.
*	Improve PII filter	yum	2023-02-06
\| \| \| \| \|	Based on screenshots seen in Discord. This filter is just here to maximize user privacy while debugging.
*	GUI: Add debug panelv0.5.0	yum	2023-02-04
\| \| \| \| \| \| \| \|	Add debug panel with options to show installed packages, clear the pip cache, reset venv, and clear OSC configs. * Refactor synchronous command execution + logging pattern inside PythonWrapper
*	Built-in chatbox no longer shows empty messages	yum	2023-02-04
\| \| \| \|	* Reduce noise on/off indicator volume by 50%
*	Use bold font for English	yum	2023-01-31
\| \| \| \|	Looks more legible. Thanks Noppers for the feedback!
*	Do not use PBR shading on curve transparency	yum	2023-01-31
\| \| \| \|	Diffuse reflections can show up on this part.
*	Specify exact version for torch	yum	2023-01-31
\| \| \| \|	Ruling out possibilities for a user reported bug.
*	Implement simple anti-aliasing	yum	2023-01-31
\| \| \| \| \| \| \|	Sample the texture up to 5 times using the algorithm shown in `aa_sample_algorithm.py`. Results are averaged together. * Redo dithering PRNG
*	Rework dithering	yum	2023-01-31
\| \| \| \| \| \| \| \| \|	I realized that ddx(i.uv.x) tells us how wide the current pixel is w/r/t UV coordinates. We can use this to implement a better form of dithering, which gets weaker as the viewer gets closer and stronger as they get farther. * Fine-tune mip map filtering based on play testing
*	Check in PBS, a very minimal physically-based shader	yum	2023-01-29
\| \| \| \| \|	Strip out everything except the PBS bits from the TaSTT shader and put them into a standalone shader.
*	Update hardware requirementsv0.4.1	yum	2023-01-28
\| \| \| \| \|	Deleting python310._pth causes a few more things to be installed in the venv.
*	Delete python310._pth	yum	2023-01-28
\| \| \| \| \| \| \| \| \| \| \| \|	I was using this file to constrain the set of paths that Python can see, but since `future` doesn't have a wheel, it will fail to install on a fresh system. If you set pip's --cache-dir to some new directory, you'll see it fail to install. The _pth doesn't really seem to matter, since without it, packages are still installed under the virtual environment.
*	Bugfixesv0.4.0	yum	2023-01-27
\| \| \| \| \| \|	* Fix prefab: bounding box & position are now set to 0 * Fix shader: text is no longer upside down * Update README
*	Enable texture-based PBR rendering of backplate	yum	2023-01-27
\| \| \| \| \| \|	Users can now use PBR textures on their custom backplate! * Update TaSTT.fbx: UV map aspect ratio matches board
*	Update README.md	yum	2023-01-26
\| \| \| \|	Document recent features, better explain basis of transcription.
*	Fix PBR metallics	yum	2023-01-26
\| \| \| \| \| \| \|	Metallics now reflect the map's cubemap. * Remove SpecularTint (did nothing) * Adjust mipBias to be sharper
*	Correct ddx/ddy calculation	yum	2023-01-26
\| \| \| \| \| \| \| \| \|	Need to calculate this in the space of letter UVs, not the overall text box UV space, in order for the correct mip maps to be chosen. * Expose dithering as a toggle in the shader * Actually generate mipmaps * Fine-tune mipmapBias for legibility
*	Improve font rendering	yum	2023-01-26
\| \| \| \| \| \|	* Enable streaming mipmaps on glyph bitmaps * Sample glyph bitmaps using mipmaps * Add temporal noise to letter UVs (dithering)