Can Hazel save the OCR'd version of a PDF file?

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Can Hazel save the OCR'd version of a PDF file? Mon Nov 18, 2024 4:19 pm • by smm
I know that Hazel 6 can now "on-the-fly" OCR a file for use with "Contents contain" or "Content contain match", but can the OCR'd text itself be saved to a file, or better - can a Hazel create a PDF with the on-the-fly OCR result embedded back into the file? If not, can someone provide a simple Applescript to "Export" with the "Embed Text" option set?
smm
 
Posts: 9
Joined: Sun Oct 25, 2015 10:28 am

Hazel cannot do this at this time. It's a bit tricky because usually there's the expectation that the text is also placed where the original text is so you can visually select it and such. Also, I'm using Apple's PDF engine and it doesn't support some things and it's unclear to me if I re-save a PDF whether certain things will get stripped out as a result, which some users may not appreciate. That's something I'll need to do a bit of research on.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:Hazel cannot do this at this time. It's a bit tricky because usually there's the expectation that the text is also placed where the original text is so you can visually select it and such. Also, I'm using Apple's PDF engine and it doesn't support some things and it's unclear to me if I re-save a PDF whether certain things will get stripped out as a result, which some users may not appreciate. That's something I'll need to do a bit of research on.



Or maybe it's possible to integrate this? https://github.com/ocrmypdf/OCRmyPDF?tab=readme-ov-file

or to call it from Hazel? for how?
nicolasbulb
 
Posts: 3
Joined: Sat Nov 10, 2007 1:58 pm

Possibly. I'd have to look into it but since it does its own OCR, it may differ from the OCR I'm already using. Or I can use it for all OCR operations but then I'll need to see if it performs worse or better than Apple's engine.

Lastly, there's the issue of changing a file while evaluating it. Checking a file's contents would results in it being modified which may be unexpected depending on how the rules are set up.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support