Tagging PDFs using OCR

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Tagging PDFs using OCR Sat Mar 21, 2020 6:10 pm • by markp
Hi there

Hope you are all keeping well and looking after yourselves and others during this global pandemic.

Needless to say I have a lot of time on my hands at home, and am trying to sort and tag my thousands of mac files. I now need to tag PDFs files based on key words contained in the documents (using OCR). I know that I need to use some kind of script (I know nothing about scripts) - how to write, how to insert etc, so was hoping that someone might be able to point me in the right direction - i.e. explain the steps in an idiot's guide kind of way.

So basically say i have a PDF, and I want to tag all the ones with the word Lloyds Bank in the document.

Anyway, thanks in advance and take care.
Posts: 4
Joined: Wed Aug 08, 2018 7:01 am

Re: Tagging PDFs using OCR Mon Mar 23, 2020 11:15 am • by Mr_Noodle
You can use "Contents contain" or "Contents contain match" to match the file containing specific words. What do you want to tag the files with? The words that matched or something else?
Site Admin
Posts: 8523
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Tagging PDFs using OCR Thu Mar 26, 2020 10:37 am • by Bane
I'm sure Mr Noodle is quite busy, so I'll try to help with some of my pitfalls. If this is your first time doing this you'll run into a lot of little "support" things possibly that will make it seem like hazel isn't working right. I haven't had to script much for OCR processing to be honest though for most renaming/filing type of work.

[*]First - make sure you have good OCR software, this was my #1 issue I found. if Hazel cant read it, it can't act on it. I migrated to downloading most of my receipts/pdf's where possible for OCR'ing and this boosted hazels performance by a TON.

[*]Check that it works - make a rule that has "Contents - contain match - pick a letter" and "preview" this on one of your OCR documents. Click the check that appears next to the Rule and if you click on the 3 dots "..." that hazel displays; it will show you what it matched and what else it sees (should be more visible field IMO in hazel). Really quick way to verify what text you're trying to match. If you're filing by account number 12345 but the OCR shows 1245 or 12845 due to bad scan's this will let you see why hazel isn't working.

[*]As software updates have come out I've had to re-ocr OLD files because they're doing a better job scanning (just a tip).

A LOT of my issues and headaches with Hazel were simply that you need to ensure you're feeding it legible files. I have to manually save an excel file at work as a CSV for hazel to read it. But after that hazel saves me 10min per file since it can scan and process each one, and simply renames it back to proper extension when its finished.
OCR is no different and the "before" process is where you'll spend most of your time.

Hope any part of this helps you!
Posts: 8
Joined: Thu Jul 16, 2015 1:56 pm

Re: Tagging PDFs using OCR Tue Mar 31, 2020 12:26 pm • by Yak_Forger
Great, that's the explanation I needed, it'll make my job much faster! And since I'm overdue on a report about these Cannes property prices, I'll put that knowledge to good use... right now!
Posts: 5
Joined: Thu Mar 26, 2020 5:39 am

Return to Support