Page 1 of 1

Flexible date matching (optional spaces) for scanned docs?

PostPosted: Sun Dec 27, 2020 9:42 pm
by felciano
Hi all --

I have been scanning a large number of paper financial statements that I'd like Hazel to rename including the statement date or date range. The PDFs have been OCRed but it is an imperfect process, leaving me with documents with dates that don't quite match up cleanly with a simple date pattern. The greatest error culprit by far is that the OCR misses spaces, giving text like:

January 1, 2007 - January 31, 2007
January 1,2007 - January 31, 2007
January 1, 2007 - January 31,2007
January 1, 2007 -January 31, 2007
January 1,2007- January 31, 2007
January 1,2007-January 31,2007

I know that Hazel's space-matching will match any number of spaces. However it looks like I need the equivalent of the regex ? operator, i.e. "match zero-or-one spaces". Is there a way to do this in Hazel? If not, can someone recommend a way to handle this parsing situation without painfully enumerating all possible spacing configurations as separate rules?

Thanks in advance!

Re: Flexible date matching (optional spaces) for scanned doc

PostPosted: Mon Dec 28, 2020 10:13 am
by Mr_Noodle
No good way to do that now. Does the auto detection work in this case? Might be worth trying.

Re: Flexible date matching (optional spaces) for scanned doc

PostPosted: Thu Dec 31, 2020 10:22 pm
by felciano
Unfortunately, no. I appreciate how the Hazel UI simplifies pattern matching, but is there a way to actually plug in a real regular expression? I can write one that resolve the issue, but it would need a full regex syntax.

Thanks,

Ramon

Re: Flexible date matching (optional spaces) for scanned doc

PostPosted: Mon Jan 04, 2021 12:22 pm
by Mr_Noodle
Use the "Run shell script" action. You can then use regexes in the language of your choice.

Re: Flexible date matching (optional spaces) for scanned doc

PostPosted: Thu Oct 12, 2023 4:42 am
by domstep
I think I have a similar problem. My customer number was suddenly changed on my phone provider's bill. A space was simply inserted in a different place. Otherwise, the number has remained the same.
Is there a rule in hazel that can flexibly work around such spaces?
For me it would also be ok to enter a regex. but that doesn't seem to be supported.

Re: Flexible date matching (optional spaces) for scanned doc

PostPosted: Thu Oct 12, 2023 9:00 am
by Mr_Noodle
If you want to use regex, you can use a shellscript action and the language/regex dialect of your choice.