Page 1 of 1
Suggested Feature: Built in OCR
Posted:
Sat Jun 18, 2022 8:18 am
by jakesm
A lot of time and effort goes into scripts to call applications like PDFpen and now Nitro to do OCR on PDFs. I find that mine stops working from time to time until I restart Hazel.
It would be great to have built-in OCR available as a Hazel feature. I'd pay extra for the convenience.
Re: Suggested Feature: Built in OCR
Posted:
Mon Jun 20, 2022 9:14 am
by Mr_Noodle
Thanks for the suggestion. I do not have any expertise in OCR tech so I would have to rely on third party libraries (I know about Tesseract, which is an open source/free one). Keep in mind that should I integrate that, I am responsible for supporting it. I don't know how well any of these third party libraries would stack up against their commercial equivalents but if they fall significantly short, then that might be problematic.
Re: Suggested Feature: Built in OCR
Posted:
Mon Aug 15, 2022 1:17 pm
by SmplNerd
Mr_Noodle wrote:Thanks for the suggestion. I do not have any expertise in OCR tech so I would have to rely on third party libraries (I know about Tesseract, which is an open source/free one). Keep in mind that should I integrate that, I am responsible for supporting it. I don't know how well any of these third party libraries would stack up against their commercial equivalents but if they fall significantly short, then that might be problematic.
Maybe, this would be enough?
Re: Suggested Feature: Built in OCR
Posted:
Tue Aug 16, 2022 9:21 am
by Mr_Noodle
It's unclear how well the Vision APIs would work for larger and more complex documents. All cases where I've seen it used seems to be for simple images. It would seem dealing with something like text with multiple span and columns may be more complex and would require some sort of extra logic that an actual OCR solution has baked in.
Re: Suggested Feature: Built in OCR
Posted:
Mon Dec 11, 2023 10:28 pm
by BreakAndMoldMe
Reviving an old thread. The Vision API has proven both fast and accurate even on large documents all hardware accelerated by Apple Silicone. There are two apps in the macOS App Store that use it and charge money: OwlOCR and Textify. I have yet to purchase either (just testing). But OwlOCR seems to be able to add the OCR text layer without recompressing the image layers.
I too would prefer OCR built into Hazel…proven app dev process and trustworthy developer.
Re: Suggested Feature: Built in OCR
Posted:
Tue Dec 12, 2023 9:43 am
by Mr_Noodle
Thanks for the feedback. I'm definitely considering this for the next major release. Note that it would be "read only" in that it will do OCR on the fly, but not actually write out the text layer to the PDF.
Re: Suggested Feature: Built in OCR
Posted:
Tue Dec 19, 2023 11:45 am
by hal
I've been using ABBYY Finereader (for Scansnap) on PDF's for a while, and recently noticed that Apple's built in OCR seems to do a better job. It doesn't automatically write the text layer to the PDF though. You have to export from Preview.app to a new pdf with the option selected to include the text layer. I haven't found a way to automatically (AppleScript, Automator, Shortcuts) save the text layer to the source PDF, but it would be nice.
Re: Suggested Feature: Built in OCR
Posted:
Tue Dec 19, 2023 12:32 pm
by Mr_Noodle
I don't think Hazel will save the text layer as it would be a weird side effect to a condition on a rule. As it's planned now, it will be on-the-fly OCR. Nonetheless, it is under consideration.
Re: Suggested Feature: Built in OCR
Posted:
Thu Dec 28, 2023 11:12 am
by BreakAndMoldMe
Hmm that is a tough one... changing files as a condition to a rule.
From a CPU resources perspective...if Hazel is going to do the processing anyway, it might as well offer the option to save it to the file.
Otherwise, depending on one's workflow, the same document could be OCR'd up to 3 times: once on-the-fly by ScanSnap Home for file naming purposes (can be turned off), once by one's preferred OCR app, and once on-the-fly by Hazel.
It could be more efficient if Hazel added the option to save the text to file.
With that said, I did email the ScanSnap Home folks and asked them to consider changing their OCR software to Apple's native (more accurate and faster). They passed the request on to the developers. Might be good for any other ScanSnap Home users to do the same!
Finally, I hadn't notice Preview's ability to export with "Embed Text"...nice find. I'll also start digging around to see if there is a way to automate that export option.
Re: Suggested Feature: Built in OCR
Posted:
Sat Apr 06, 2024 6:19 pm
by Steveu75
It would be great if there was a standard way to test to see if the document has already been OCRd.
Right now I do a check to see if the document contains an a or an e or an I
Provided feedback to Apple regarding possibility of including OCR status in the meta-data.
I have been struggling with getting this working reliably for over five years, I would pay extra to be able to have that feature builtin to Hazel V6
Re: Suggested Feature: Built in OCR
Posted:
Mon Apr 08, 2024 8:57 am
by Mr_Noodle
What do you plan to do based on whether the file is OCRed or not? Will consider adding some sort of attribute for this but not sure how reliable it will be.
Re: Suggested Feature: Built in OCR
Posted:
Thu Apr 18, 2024 4:44 pm
by aljjspam
Mr_Noodle wrote:What do you plan to do based on whether the file is OCRed or not? Will consider adding some sort of attribute for this but not sure how reliable it will be.
I'll love this.
I have many PDF's I'm not sure is OCRed.
This Apps using Apple's API has exploded. Much Better than many other engines.
I'm using OwlOCR. but it lack many things, like automation.