I was wondering, how do you capture handwritten notes using Xournal and Xournal++? Using some sort of tablet maybe?
Great to hear that you found out about Xournal++! It's really the power of open-source that you own your own handwritten notes as you are not locked in.
"HTR is an oasis in this global desert of manual digitization" - hach, this is such a great phrasing! <3
Xournal++ is a great project that features a bunch of really great developers who dedicate a lot of time to it.
I am not much involved in the Xournal++ development itself but then try to utilise my machine learning skills to build an HTR system for Xournal++ in the form of a plugin.
Yes, you're right, stroke-order-aware HWR are hard to find. One reason for that is the lack of good datasets for machine learning model training!
As such, my stroke-order-aware attempt over at https://github.com/PellelNitram/OnlineHTR/ uses a dataset from 2000 with around 12,000 samples. Contrary, the internal Google dataset is reported to feature around 16,000,000 samples :-D.
Currently, the machine learning model only supports offline HTR (i.e. using images) but online HTR (i.e. using pen time series data) is in the making, see here:
Yes, that's totally correct! The current version of the plugin supports only so called "offline" HTR, which operates on images. This is ultimately determined by the underlying machine learning model.
I have developed another model however (based on a somewhat recent Google paper by Carbune et al. 2020), that operates on pen dynamics and thereby implements online HTR, see here:
This model is open-source as well and will be part of the HTR system for Xournal++ in the future. Feel free to give it a try yourself locally.
One question that has been bothering me a long time and prevented online HTR so far for me is how to find text on a page in temporal domain (i.e. in online domain and not offline domain). If you have any ideas on that, please do let me know as I would greatly appreciate that! One possible way is a transformer model - but again that feels a bit overkill and introduces a context length.
The HTR feature in Xournal++ does not require you to scan anything though. You just write handwritten notes in Xournal++ as always and upon saving with the plugin, the resulting PDF is searchable (the handwritten texts at least). So there's no scanning or retyping involved :-).
Oh that's so exciting to hear! Please do let me know if you need help when trying to use the plugin! I am happy to help you until the plugin works for you.