summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
* Add GUI hello worldyum2022-12-15
| | | | | | | | Literally just the wxWidgets hello world. ~GUI is named that way to prevent Unity from generating .meta files. Build instructions in ~GUI/README.md.
* Optimize transcription latencyyum2022-12-14
| | | | | | | | | | | | | | | | | | Shave off ~500ms due to locking. Acquiring a threading.Lock takes hundreds of milliseconds and the global interpreter lock already takes care of most crashy race conditions, so just remove the locks. Avoid writing audio to disk, saving more time (and disk wear / IOPS). Add basic profiling to transcribe(). Omit timestamps, since we don't use them (maybe we should!) Shorten noise indicators to 350ms The whisper behavior where it repeats tokens causes certain transcriptions to take many seconds. I haven't thought about how to fix this, yet.
* Update README.mdyum2022-12-01
| | | | Also decrease sync params & add a few more emotes.
* Add emotesyum2022-11-26
| | | | | | | | | | | | | | | | | | | | Add emotes.py. It accepts a list of images and creates a texture with 64 total embedded images. The shader knows how to draw these into fixed 6-character-wide slots. Each slot must be aligned to a 6-character boundary. osc_ctrl has to pad with spaces to make this work. This whole patch is a little more complicated than it has any right to be, but my brain feels fuzzy and I don't know where to start fixing it, so I'm going to leave it shitty-but-functional for now. There's also some bug where writing a character into the 11th slot causes it to show up at the end of the board. I'll figure that out later, idk. I didn't include any of the emotes I use since I couldn't find any info on their licenses. I'm just banking on having a good workflow later on so people can add their own.
* Add on/off sound indicator (local)yum2022-11-25
| | | | | | Now we have a visual and auditory indicator for transcription. The auditory indicator is only heard by the user, and can be used to reset the state of the board prior to displaying.
* Add scaling capabilityyum2022-11-25
| | | | Text box may be scaled up and down now.
* Code cleanupyum2022-11-25
| | | | Reorganize locations, remove a couple unused parameters.
* Tweak speech indicatoryum2022-11-23
| | | | | | | | | Use a single indicator with 3 states: 1. green: actively speaking 2. orange: waiting for paging 3. red: up-to-date Use slightly nicer colors.
* Shorten audio window to 10 secondsyum2022-11-22
| | | | | This helps with temporal stability in long-running transcriptions, and lets us get rid of that hack where we refuse to update old pages.
* Update STT demoyum2022-11-22
| | | | | Zero mistranscribed words. One minor hiccup caused by instability in the (very long) transcription. I think the paging indicator is also buggy.
* Fix audio bugyum2022-11-22
| | | | | | | Coarse locking was causing audio frames to drop, severely degrading transcription quality. We really need a spoken word integration test.
* Rework input controlsyum2022-11-22
| | | | | | | | Press joystick once to start recording, again to stop. When you start recording, any previous text on the board is cleared. Add 2 visual indicators: one to indicate speech, another to indicate that audio is paging.
* Begin work on obfuscationyum2022-11-17
| | | | | | | | | The basic idea is that we can raise the barrier to entry for potential data miners by encrypting traffic with a pre-shared key. Any data miner would probably have access to both the compiled shader and network data, which is obviously sufficient to decrypt that data. But they would have to spend a little time figuring it out, which should defeat most casual miners.
* Tweak transcription againyum2022-11-16
| | | | | | | Works a little better on longer transcriptions while maintaining the same improved performance on short transcriptions. We really need a benchmark to evaluate performance mechanically.
* Another transcription reworkyum2022-11-14
| | | | | | | | | | | | | | | | | | After re-reading the paper, I noticed that they apply a couple optimizations I wasn't using. Use the top-level `whisper.transcribe` method, which is a little slower, but more accurate than the one I was using. Although this method is slower, it has better temporal stability due to the increased quality, which I think should make for an overall more responsive UX. Lower transcription quality means the paging layer has to waste time updating earlier cells. Also, drop the auto-commit stuff and go back to string stitching. I think it's better to let the user manually commit. A rework of the hand controls is probably coming soon. Finally, update README.
* Fix reset buttonyum2022-11-12
| | | | | | | | Board would lock up if you reset after the first page. osc_ctrl.clear() was assigning the wrong member :) Tweak continuous transcription logic: now we only commit if the transcription remains identical for N seconds.
* Clicking the left joystick resets the board.yum2022-11-12
| | | | | | | | | | | | | | | | * Increase no speech probability threshold. This is what was preventing short transcriptions from working. We rely more on the avg logprob filter now. * Remove string matching logic from transcribe. Now when we get 2 consecutive identical transcriptions, we commit the transcription. This *could* cause words to get cut off but in practice it doesn't seem to happen. * Fix steamvr joystick click detection. Moving the joystick would also fire the event, which is not correct. * Combine locks in transcribe.py. * Remove "clear" vocal control. * osc_ctrl.clear() resets last_message_encoded * Remove osc_ctrl.sendMessage (unused)
* Add capability to listen for controller inputsyum2022-11-12
| | | | | | Add steamvr.py, which listens for the left-hand joystick being clicked. Simply call pollButtonPress() and check if it returns RISING_EDGE or FALLING_EDGE. Does not block if there are no events.
* License scrubyum2022-11-10
| | | | Begin auditing dependencies' licenses.
* Update fontsyum2022-11-08
| | | | | | | | | | | English, Japanese, Chinese, and Korean should look much better now. French, German, and Spanish look like shit now, because I haven't figured out how to best make Noto Sans stay within its bounding box. * Use Noto Sans for most things * Simplify how we enable unicode blocks & assign fonts to them * Increase string matching window to 300. Works better in real-world test.
* Fix matchStrings O(n^2) loopyum2022-11-07
| | | | | | | | | | | | | This slides 2 windows across input strings, looking for a region where they are most similar. It then uses that region to stitch the strings together. Since transcribe.py passes in a continuous transcription as the `old_text` argument, we can wind up spending a lot of time here. Constrain the area of the `old_text` argument that we look at to the most recent 50 characters. This should be good enough. Also fix how we calculate levenshtein_distance. Uh... yeah, let's not talk about how it was before.
* Fix font clipping bugyum2022-11-07
| | | | | | When fonts completely fill a slot, any pixel touching a perimeter border gets stretched due to clamping. To avoid this, add a 2% margin around each slot.
* Add generate.pyyum2022-11-07
| | | | | Generates a string with every character starting from a minimum. Useful for testing paging and font issues.
* Fix osc_ctrl diffingyum2022-11-07
| | | | | | Now we actually maintain an ongoing buffer with what we think is on the display. When we send a new cell, we update only that cell instead of overwriting the entire prefix up to that cell.
* Update READMEyum2022-11-06
|
* Fix set_noop_animyum2022-11-06
| | | | | | | fileID check intended to avoid overwriting blendtree animations was too broad and excluded regular unassigned animations. This was causing an issue where the display would flicker or fail to show any new text at all.
* Improve font alignmentyum2022-11-06
|
* Add language flag to transcription CLIyum2022-11-06
|
* String matching no longer relies on spacesyum2022-11-06
| | | | | | | | | | | Add a `matchStrings` which does basically the same thing as `matchStringList` except it doesn't split the input at space boundaries. I think this should work better for Japanese and Chinese, since they don't use spaces. Doesn't seem to cause any accuracy regressions for English. Also update the README.
* Fix clear animationyum2022-11-05
| | | | Also adjust letter positioning to avoid clipping.
* Expand character set from 80 to 64K charactersyum2022-11-05
| | | | | | | | | | | | | Each character is now addressed with 2 bytes instead of 1. The number of bytes per character is configured in (I think) exactly one spot, so increasing or decreasing this is trivial. English speakers can just set it to 1. The animator seems a little unstable; if I leave my character in a public for a while, the board becomes unresponsive. Oh well. * Check in fonts. Did this so users don't have to remember to set the resolution or to disable mipmaps.
* Update shader to use new font filesyum2022-11-05
| | | | So far only the first file is used.
* Add generate_fonts.pyyum2022-11-05
| | | | | | | | Add code to generate 4k textures holding a bunch of unicode characters. Add unicode blocks for English, Japanese, Chinese, and Korean. Embed GNU's excellent Unifont ttf, which I use to generate these textures. The license is included under $filename.LICENSE.
* OSC controller uses one sync per region instead of 2yum2022-11-05
| | | | | | | | | | My theory as to why this seems to work: VRChat batches parameter updates together and applies them simultaneously. Thus we don't run into the expected failure mode where we update the prior region before paging over to the next region. * Fix beep feature * update the STT demo
* Reduce dimensionality of animator by factor of 80yum2022-11-05
| | | | | | | | | | | Instead of generating one animation for every single character in our character set, we just generate 2: the lowest and the highest. We use blend trees to interpolate between these two extremes. This reduces the number of animations we have to generate by a factor of 80. It also clears the way for multi-language support (coming soon). It also means we don't have to reopen unity every time we generate a new animator.
* Add speech-to-text demoyum2022-11-04
|
* Reduce sync rate from 10 Hz to 5 Hzyum2022-11-03
| | | | | | | | Also reduce the number of syncs per cell from 3 to 2. Thus the effective sync rate went from (10 / 3 == 3.33 Hz) to (5 / 2 == 2.5 Hz). This comes at the cost of a degraded UX: updating a cell temporarily shows the contents of the previous cell.
* Improve transcription qualityyum2022-11-01
| | | | | | | | | | Apply heuristics described in whisper paper. Dramatically improve silence detection as well as overall transcription quality. I was able to read the entire demo script at speed without any serious transcription inaccuracies. Field testing is TODO.
* Combine 4 boolean select parameters into oneyum2022-11-01
| | | | | Should further improve reliability, especially in laggy environments. We'll see!
* Fix bug where some text would show up after saying 'Clear'yum2022-11-01
|
* Update READMEyum2022-10-30
|
* Reduce total # of select bits from 44 to 4yum2022-10-30
| | | | | | | | | The board is divided into 16 regions. We select the region to be updated by updating 4 boolean parameters. We *used* to define 4 parameters per layer. Now we just have 4 params total, which affect every layer. Total param memory: 142 bits -> 102 bits Params updated per region update: 56 -> 16
* Disable mipmaps on board textureyum2022-10-27
| | | | | This fixes the faint outline issue at close range (!) at the cost of making it less legible from far away.
* Flip text in mirroryum2022-10-27
| | | | | | | | | Use some of pema99's tricks described in their 'shader-knowledge' repo (MIT license). * Text is now readable in mirrors * GetLetterParameter() now uses a jump table instead of a ton of `if` statements
* Change board sizeyum2022-10-27
| | | | | | | It's now twice as wide and half as tall. * Add small margin to board * Add simple backplate shader
* Tweak continuous transcriptionyum2022-10-27
| | | | | Stitching new uses 6 word sliding window instead of 4 word. Seems to dramatically improve transcription quality.
* Add 'over' keywordyum2022-10-27
| | | | | | | When the user says 'over', the board will stop displaying new transcriptions until the user says 'clear'. * Remove the control thread from transcribe.py
* Add fast clear animationyum2022-10-27
| | | | | | | | | | | | | | | | The old clear mechanism would write an empty cell in every layer, which would take (0.3 seconds) * (11 layers) == about 3 seconds. The new mechanism drives an animation which overwrites every character slot simultaneously, taking only 0.1 seconds. A nice ~30x speedup. * Fix the transcription exponential backoff logic. Saying new things will reset the delay to the minimum again. * Clearing the board will also reset the transcription delay back to the minimum. * Tune the noise detection minimum to 0.2 instead of 0.1. Speaking softly into the mic seems to fail to exceed the 0.1 threshold pretty often.
* De-scuff continuous transcriptionyum2022-10-25
| | | | | | | | | | Transcription stitching now occurs in word space, rather than in text space. This avoids problems where we accidentally duplicate or delete letters in the middle of words. Factor out stitching into its own module and add a small handful of test cases. Hopefully if we hit problems in production, we can just grow this list and avoid regressions if we reimplement.
* Tweak transcription heuristicsyum2022-10-25
| | | | | The heuristics now occur in the filtered word space, so punctuation and casing changes won't confound them.