Why is OCR for handwritten content still that bad?

hinterlufer@lemmy.world · edit-2 2 months ago

Why is OCR for handwritten content still that bad?

CarbonatedPastaSauce@lemmy.world · 2 months ago

I’ve read that the USPS has amazing OCR for mail sorting. It is, of course, highly tuned for one particular data format.

zenharbinger@lemmy.world · 2 months ago

also, banks and mobile check deposit. I’ve only ever seen it get it wrong once.

BigMikeInAustin@lemmy.world · 2 months ago

You took the time to spell your post correctly and use correct grammar.

I used to have very sloppy handwriting. I’ve come to realize that if you want other people to understand you, you do need to make an effort to be understandable.

Shortcuts in communication do not show superiority. Too many shortcuts devalue your communication, just like poor spelling and grammar would devalue your post.

hinterlufer@lemmy.world · 2 months ago

I’m writing notes for myself and I can read them. When I’m writing for someone else (which rarely happens for handwritten notes) I take the time and effort to write nicer.

Also, I specifically didn’t write the example carefully because the use case for me would specifically be handwritten notes I made for myself.

BigMikeInAustin@lemmy.world · 2 months ago

So ideally there would be a way to train an AI on one’s own particular handwriting? (Not sarcasm or rudely)

cooljimy84@lemmy.world · 2 months ago

Try again on plain paper, or on lined/ruled paper. That dotted graph paper hurts my eyes and I’m pretty sure I’m mostly human…

hinterlufer@lemmy.world · 2 months ago

I like dotted paper, the dots are less distracting than grids, lined paper sucks for sketches/etc. and with plain paper I’m missing guides. But I agree that on this particular one, the dots are a bit too prominent.

snooggums@lemmy.world · 2 months ago

Are you trying to scan the text from paler with the dots? That is most likely making it even harder for the OCR to pick out the text.

snooggums@lemmy.world · 2 months ago

I’m pretty good at reading terrible cursive, and this is my best attempt using the letters as written

Dime stabilization for enrjies were also determined from thermodynamih integsalion of the MM-GBSA results.

I think the first one in italics should be energies, but wouldn’t assume OCR would know the context to fill in the missing letters. Not sure what word that starts with thermo ends in an h or maybe a k. No idea on the one that starts with inte. I might have been able to determine those words if I was familar with the context, but OCR doesn’t work that way.

crimsoncobalt@lemmy.world · 2 months ago

Here’s what I got with Google Lens. Certainly some mistakes, but not “jumbled mess of nonsense.”

Dimes stabilization fire einiges were also delirmed. from thermodinamik integration (I), see methods), which provide a dimict, validation of the MM. GBSA results

J. Phys. Chem. B 2018, 122 7038-2048

CarbonatedPastaSauce@lemmy.world · 2 months ago

That’s completely incomprehensible.

snooggums@lemmy.world · 2 months ago

That IS a “jumbled mess of nonsense”!