Contain Match no hits, Contain does

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Contain Match no hits, Contain does Tue Jul 19, 2022 8:26 am • by etienne74
Hello,

I have some pdfs from Pinterest, the invoices.
If I do contents - contain it does find e.g. 'Date:'
but when I do contents - contain match, then it can't find 'Date:'.

When I preview the file, all the text seems to be in inverse order, so 'Date' becomes 'etaD'.

Also the invoice date of 06/30/2022 is shown as 2202/03/60 in the preview.
I can get the individual numbers, but I don't think I can correct that in the correct date and then determine which quarter that was. Right?

So in short, the preview sees all characters in reverse but contain function works on the correct words, but contain match not.

Any ideas are highly appreciated.
etienne74
 
Posts: 7
Joined: Tue Dec 24, 2019 9:14 am

Re: Contain Match no hits, Contain does Tue Jul 19, 2022 9:06 am • by Mr_Noodle
Not sure if much can be done on Hazel's end here. There are some screwed up PDFs that do that. Not sure how possible this is, but you may want to look into getting this fixed at the source (whoever is generating the PDF).
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Contain Match no hits, Contain does Sat Jul 23, 2022 8:49 am • by etienne74
Mr_Noodle wrote:Not sure if much can be done on Hazel's end here. There are some screwed up PDFs that do that. Not sure how possible this is, but you may want to look into getting this fixed at the source (whoever is generating the PDF).


Thanks for the reply.
So Contain uses a different method than Contain Match in looking for hits?

Pinterest generates these PDFs. I ask them, but we probably already know the answer ;-)
etienne74
 
Posts: 7
Joined: Tue Dec 24, 2019 9:14 am

Re: Contain Match no hits, Contain does Mon Jul 25, 2022 9:40 am • by Mr_Noodle
You can try opening them up in Preview, then doing Print->Save as PDF and see if that fixes it.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Contain Match no hits, Contain does Thu Nov 24, 2022 4:58 am • by harricool
Did you come to a resolution on this? I have the same issue with certain PDF invoices.
harricool
 
Posts: 3
Joined: Thu Nov 24, 2022 4:46 am

Re: Contain Match no hits, Contain does Fri Nov 25, 2022 12:21 pm • by Mr_Noodle
Did you try the suggestions above and what were the results?
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Contain Match no hits, Contain does Thu Dec 01, 2022 2:26 am • by harricool
No success with re-saving as PDF/printing to PDF. Same issue as OP, the 'contain' condition successfully finds data without it being reversed, but the 'contain match' condition shows reversed character order within a line.

I'm going to experiment with a workflow using a script and ocrmypdf (https://github.com/ocrmypdf/OCRmyPDF)
- force conversion to image at [x] DPI
- OCR image
- save result as PDF
- run through hazel for filing per normal
CMIIW - there is no way around needing two rules to handle this, one for the pre-processing OCR and one for the extraction/filing
harricool
 
Posts: 3
Joined: Thu Nov 24, 2022 4:46 am


Return to Support

cron