Determine if a PDF needs to be OCR'd & Automate FineReader

Please help !

Please do not keep posting to bump the thread.

I am not sure what the poster is asking for here. It seems to me that there is a misunderstanding of what the original script is doing. It is meant to determine if a PDF needs to be ocr'ed, not as a general means to search for text. For that, there is already support for that built in to Hazel so no script is needed. I suggest you go that route and if you can't get it to work, post your own question is specifics in the Support forum.

Thanks Mr_Noodle for your answer, and sorry for bumping this thread, but because I feel that my request was simple and that I was near to the solution without easily find it !

Nevertheless, I do my best to find how Hazel can automatically do this without a script but no success !
I am a newbie in Hazel and searching other threads leads me to use the script mentioned in Cassady post http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=2035&p=8618&hilit=ocr#p8618.

Here is my purpose :
Search in a folder containing PDF documents, those (and only those) who require OCR (so label them in yellow).

Here is what in do (using a script) :

Code: Select all: if (all) of the following conditions are met : kind is pdf passes shell script (embedded script) color label is not yellow Do the following to the matched file or folder: set color label yellow

Where the embedded script is :

Code: Select all: a=$(grep -ci "encoding" "$1") if [ x$a = x ]; then exit 1; fi if [ $a -lt 2 ]; then exit 0 else exit 1 fi

Now it seems working, but my question is how to do it without a script as you mentioned in your last post ?
I have also another question not related to this topic : Where is in Hazel the "for (the file or folder being processed)" in if (all) of the following conditions are met for (the file or folder being processed). Im a using Hazel 3.2.1

Thanks in advance

I was referring more to the part about coloring the file. That part can be done using Hazel's built-in actions so I wasn't sure what the point was for the post you were responding to.

At this point, I guess I'm unclear as to what the question is since it seems you have it working.

As for your other question, that pop-up only shows up in nested conditions, which you can get by holding down the option key while clicking the + button to create a new condition.

Thanks !

All it's ok for me now !

fuank wrote:Nevermind, I just found out how:

Code: Select all
#! /bin/bash if ! grep Font "$1" then exit 0 else exit 1 fi

Wow, easy setup! I put the above script in the Conditions field, and ran the OP's script for FineReader on matches.

Hazel 3.2.3
FineReader 4.1

Works like a freaking charm! Here's a screenshot: https://www.dropbox.com/s/wivb2mcnzohwj ... %20126.png

I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all: #! /bin/bash if [ `pdffonts "$1" | grep Type | sed -n '$='` ] # FAIL when the file is OCRed then exit 0 else exit 1 fi

Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.

dredhorse wrote:I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Don,

Can you translate a bit and further explain why you would need to install Poppler? Thanks!

Bryan

pdffonts is part of poppler.

I haven't had a lot of time to check more pdf's atm, those pdf's I created myself seem to work better with pdffonts than with the other ways which where posted here.

I have been searching for a way to do largely what the original poster described. I think with current version of ABBYY FineReader for ScanSnap (which unfortunately I don't think you can upgrade to without purchasing a new scanner) it can be made a little simpler.

My version of Scan to Searchable PDF (4.1) does not append anything to the file's name. Additionally it has a preference to close after scanning and delete converted documents. Because of this the bash script can become just a few lines:

Code: Select all: #! /bin/bash # set ABBYY to close after scanning and to delete converted documents if ! grep Font "$1" then # Open the file in ABBYY's FineReader open -W -a '/Applications/ABBYY FineReader for ScanSnap/Scan to Searchable PDF.app' "$1" fi

the -W flag in the open command tells it to wait until the program exits, so there is no need to try to figure out when the conversion is done. I then use this applescript to send it to DevonThink:

Code: Select all: tell application id "DNtp" set theRecord to ocr file theFile to incoming group end tell

hey, Thanks for the workflow.

dredhorse wrote:I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all
#! /bin/bash if [ `pdffonts "$1" | grep Type | sed -n '$='` ] # FAIL when the file is OCRed then exit 0 else exit 1 fi

Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.

Thank you. This solved my problem for good.