summaryrefslogtreecommitdiffstats
path: root/transcribe.py
Commit message (Collapse)AuthorAge
* Quiet down transcribe.pyyum2022-10-20
| | | | | | Also adjust continuous transcription algorithm to use leftmost minimum instead of rightmost. This prevents some cases where we generate longer and longer text.
* Add continuous transcription modeyum2022-10-17
| | | | | | | | | | | | | | | | | Algorithm: * look at last 20 chars of last committed transcription * scan new transcription using 10-char sliding window * find spot where distance is minimized * stitch two messages together Thus we're able to maintain a continuously growing transcription without having to feed the AI more than 30 seconds of data at a time. Seems to work reasonably well in bench tests. Also fix silence detection. AI exposes a probability that nothing was said. Hand-pick a probability of 0.1. Sometimes the AI still goes sicko mode with this setting but going higher occasionally results in no transcription.
* Add libunity.addTransitionyum2022-10-15
| | | | | * Implement basic board toggle using new transition logic * Metadata can now restore from file
* Transcribe.py now pagesyum2022-10-15
| | | | | | | | | Messages longer than a board will automatically write over the top. TODO * Real cell-based message diffing * Cumulative transcription * this would completely mitigate the effects of trim events
* Further improve transcribe.py responsivenessyum2022-10-15
| | | | | | | Add a third heuristic. If the transcription is relatively long and the first bit differs from the previous transcription, immediately overwrite. Because the transcription is long, it's a bit less likely to be a complete mistranscription.
* Tweak transcribe.pyyum2022-10-15
| | | | | | | | | | | | | | Slightly improve temporal stability and responsiveness at the cost of limiting to a 30 second recording. Before committing to a transcription, wait for two consecutive transcriptions such that they are identical, or the former is a prefix of the latter. This helps with temporal stability by eliminating most one-off wildly inaccurate transcriptions. Also make osc_ctrl.sendMessageLazy a little lazier, limiting it to 2 consecutive non-empty cells per call. This allows us to recover from mistranscriptions faster.
* Fix animations: renamed prefab from CustomSTT to TaSTTyum2022-10-15
| | | | | | | | | Also: * Check in toggle on/off animations * Add toggle parameter * libunity bug: getUniqueId() was calling allocateId() incorrectly * Remove osc_ctrl `client` global * Fix transcribe.py text encoding
* Add ability to leave board in worldyum2022-10-11
| | | | | | | | | * Add VRLabs' World Constraint as a submodule * Add animations for world constraint * Add toggles for board * Add libunity.py (no content yet) * Support >30s transcription * Add board FBX
* Introduce STT proof-of-conceptyum2022-10-03
Using OpenAI's whisper neural network, we can do local STT. Translation quality is good, system resource usage is minimal (1 GB VRAM), latency is much lower than cloud-based translation. * Add transcribe.py * Creates 3 threads: * One saves mic audio to a buffer * One passes mic audio to the STT * One sends the transcribed text to the board * Main thread listens for input. Press enter to start a new message. * Add osc_ctrl.sendMessageLazy, a simple diff-based message sending utility. * A little complexity: it only sends 1 empty cell per call, allowing us to quickly say new things without having to wait for the whole buffer to clear.