Page 1 of 1

Remove non-latin characters

PostPosted: Tue Nov 09, 2021 6:32 pm
by Wirsing84
Hi there,
my Fujitsu ScanSnap tries to generate accurate filenames from a pdf's contents. However, this often times leeds to gibberish filenames like this:

Mainzer_Айев_17.19 -> should be Mainzer_Allee
aNلمηοη_ -> should be Union
2021-11-05_سم__ -> whatever this should be, it tried to OCR a drawing of my 3 year old daughter ;)

I would very much like Hazel to remove these non-latin characters.
Is there any way to match them?

Thanks a bunch!
Chris

Re: Remove non-latin characters

PostPosted: Wed Nov 10, 2021 2:47 pm
by Mr_Noodle
No good way at the moment. Probably the best way would be a shell script using a scripting language with regular expression support. Not sure if any of this is making sense to you or not.