summaryrefslogtreecommitdiffstats
path: root/string_matcher.py
Commit message (Collapse)AuthorAge
* String matching no longer relies on spacesyum2022-11-06
| | | | | | | | | | | Add a `matchStrings` which does basically the same thing as `matchStringList` except it doesn't split the input at space boundaries. I think this should work better for Japanese and Chinese, since they don't use spaces. Doesn't seem to cause any accuracy regressions for English. Also update the README.
* Tweak continuous transcriptionyum2022-10-27
| | | | | Stitching new uses 6 word sliding window instead of 4 word. Seems to dramatically improve transcription quality.
* De-scuff continuous transcriptionyum2022-10-25
Transcription stitching now occurs in word space, rather than in text space. This avoids problems where we accidentally duplicate or delete letters in the middle of words. Factor out stitching into its own module and add a small handful of test cases. Hopefully if we hit problems in production, we can just grow this list and avoid regressions if we reimplement.