Page 1 of 1

Can Hazel save the OCR'd version of a PDF file?

PostPosted: Mon Nov 18, 2024 4:19 pm
by smm
I know that Hazel 6 can now "on-the-fly" OCR a file for use with "Contents contain" or "Content contain match", but can the OCR'd text itself be saved to a file, or better - can a Hazel create a PDF with the on-the-fly OCR result embedded back into the file? If not, can someone provide a simple Applescript to "Export" with the "Embed Text" option set?

Re: Can Hazel save the OCR'd version of a PDF file?

PostPosted: Tue Nov 19, 2024 9:46 am
by Mr_Noodle
Hazel cannot do this at this time. It's a bit tricky because usually there's the expectation that the text is also placed where the original text is so you can visually select it and such. Also, I'm using Apple's PDF engine and it doesn't support some things and it's unclear to me if I re-save a PDF whether certain things will get stripped out as a result, which some users may not appreciate. That's something I'll need to do a bit of research on.

Re: Can Hazel save the OCR'd version of a PDF file?

PostPosted: Thu Dec 05, 2024 3:51 pm
by nicolasbulb
Mr_Noodle wrote:Hazel cannot do this at this time. It's a bit tricky because usually there's the expectation that the text is also placed where the original text is so you can visually select it and such. Also, I'm using Apple's PDF engine and it doesn't support some things and it's unclear to me if I re-save a PDF whether certain things will get stripped out as a result, which some users may not appreciate. That's something I'll need to do a bit of research on.



Or maybe it's possible to integrate this? https://github.com/ocrmypdf/OCRmyPDF?tab=readme-ov-file

or to call it from Hazel? for how?

Re: Can Hazel save the OCR'd version of a PDF file?

PostPosted: Fri Dec 06, 2024 10:29 am
by Mr_Noodle
Possibly. I'd have to look into it but since it does its own OCR, it may differ from the OCR I'm already using. Or I can use it for all OCR operations but then I'll need to see if it performs worse or better than Apple's engine.

Lastly, there's the issue of changing a file while evaluating it. Checking a file's contents would results in it being modified which may be unexpected depending on how the rules are set up.