Remove non-latin characters

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Remove non-latin characters Tue Nov 09, 2021 6:32 pm • by Wirsing84
Hi there,
my Fujitsu ScanSnap tries to generate accurate filenames from a pdf's contents. However, this often times leeds to gibberish filenames like this:

Mainzer_Айев_17.19 -> should be Mainzer_Allee
aNلمηοη_ -> should be Union
2021-11-05_سم__ -> whatever this should be, it tried to OCR a drawing of my 3 year old daughter ;)

I would very much like Hazel to remove these non-latin characters.
Is there any way to match them?

Thanks a bunch!
Chris
Wirsing84
 
Posts: 2
Joined: Tue Nov 09, 2021 6:27 pm

Re: Remove non-latin characters Wed Nov 10, 2021 2:47 pm • by Mr_Noodle
No good way at the moment. Probably the best way would be a shell script using a scripting language with regular expression support. Not sure if any of this is making sense to you or not.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support

cron