Flexible date matching (optional spaces) for scanned docs?

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Hi all --

I have been scanning a large number of paper financial statements that I'd like Hazel to rename including the statement date or date range. The PDFs have been OCRed but it is an imperfect process, leaving me with documents with dates that don't quite match up cleanly with a simple date pattern. The greatest error culprit by far is that the OCR misses spaces, giving text like:

January 1, 2007 - January 31, 2007
January 1,2007 - January 31, 2007
January 1, 2007 - January 31,2007
January 1, 2007 -January 31, 2007
January 1,2007- January 31, 2007
January 1,2007-January 31,2007

I know that Hazel's space-matching will match any number of spaces. However it looks like I need the equivalent of the regex ? operator, i.e. "match zero-or-one spaces". Is there a way to do this in Hazel? If not, can someone recommend a way to handle this parsing situation without painfully enumerating all possible spacing configurations as separate rules?

Thanks in advance!
felciano
 
Posts: 22
Joined: Sat Feb 23, 2013 5:44 pm

No good way to do that now. Does the auto detection work in this case? Might be worth trying.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Unfortunately, no. I appreciate how the Hazel UI simplifies pattern matching, but is there a way to actually plug in a real regular expression? I can write one that resolve the issue, but it would need a full regex syntax.

Thanks,

Ramon
felciano
 
Posts: 22
Joined: Sat Feb 23, 2013 5:44 pm

Use the "Run shell script" action. You can then use regexes in the language of your choice.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

I think I have a similar problem. My customer number was suddenly changed on my phone provider's bill. A space was simply inserted in a different place. Otherwise, the number has remained the same.
Is there a rule in hazel that can flexibly work around such spaces?
For me it would also be ok to enter a regex. but that doesn't seem to be supported.
domstep
 
Posts: 1
Joined: Thu Oct 12, 2023 4:33 am

If you want to use regex, you can use a shellscript action and the language/regex dialect of your choice.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support