Fuzzy / less strict matching on poorly-formed PDFs

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Hi everyone,

I'm trying to use Hazel to batch process ~6000 PDFs into folders. These PDFs were created from email newsletters (using a clunky batch processing tool in Windows).

The main things I need to do are:

- Sort emails into a subfolder based on topic e.g. 'Engineering and Maintenance', 'ABC Project Update'
- Rename the PDF using a date match

The problem is that the text in the PDFs is ill-formatted probably because of the creation tool, and Hazel is having trouble matching patterns. For example, if i use the condition:

Code: Select all
Contents contain "This message has been sent to everyone in Engineering and Maintenance"


It doesn't work because the characters in the PDF are actually:

Code: Select all
This m essage has been sent to everyone in Engineering and M aintenance.


The same problem is messing up date matches: Hazel thinks an email was sent in January 2020, for example, because the PDF contents has the first date as "January 20 23". I haven't tested on many documents but I suspect the formatting patterns are unpredictable.

Is there a way to do a fuzzier match in Hazel, something like "match this string, even if there are spaces between some of the characters?"

Thanks in advance for your help.
agbear
 
Posts: 2
Joined: Wed Feb 06, 2019 12:03 am

There is no way to fuzzy match it but you could have Hazel do its own OCR and see if you get better results. You can access that in the rule options (the circle with three dots in the top right).
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support