Determine if a PDF needs to be OCR'd & Automate FineReader

From your noodle to other noodles. Talk about ways to get the most from Hazel. Even exchange recipes for the cool rules you've thought up. DO NOT POST YOUR QUESTIONS HERE.

Moderators: Mr_Noodle, Moderators

Please help !
Loyd
 
Posts: 9
Joined: Fri Nov 15, 2013 5:42 pm

Please do not keep posting to bump the thread.

I am not sure what the poster is asking for here. It seems to me that there is a misunderstanding of what the original script is doing. It is meant to determine if a PDF needs to be ocr'ed, not as a general means to search for text. For that, there is already support for that built in to Hazel so no script is needed. I suggest you go that route and if you can't get it to work, post your own question is specifics in the Support forum.
Mr_Noodle
Site Admin
 
Posts: 12051
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Thanks Mr_Noodle for your answer, and sorry for bumping this thread, but because I feel that my request was simple and that I was near to the solution without easily find it !

Nevertheless, I do my best to find how Hazel can automatically do this without a script but no success !
I am a newbie in Hazel and searching other threads leads me to use the script mentioned in Cassady post http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=2035&p=8618&hilit=ocr#p8618.

Here is my purpose :
Search in a folder containing PDF documents, those (and only those) who require OCR (so label them in yellow).

Here is what in do (using a script) :

Code: Select all
if (all) of the following conditions are met :
     kind is pdf
     passes shell script (embedded script)
     color label is not yellow

Do the following to the matched file or folder:
     set color label yellow
     


Where the embedded script is :
Code: Select all
a=$(grep -ci "encoding" "$1")
if [ x$a = x ];
then exit 1;
fi
if [ $a -lt 2 ];
then exit 0
else
exit 1
fi


Now it seems working, but my question is how to do it without a script as you mentioned in your last post ?
I have also another question not related to this topic : Where is in Hazel the "for (the file or folder being processed)" in if (all) of the following conditions are met for (the file or folder being processed). Im a using Hazel 3.2.1

Thanks in advance
Loyd
 
Posts: 9
Joined: Fri Nov 15, 2013 5:42 pm

I was referring more to the part about coloring the file. That part can be done using Hazel's built-in actions so I wasn't sure what the point was for the post you were responding to.

At this point, I guess I'm unclear as to what the question is since it seems you have it working.

As for your other question, that pop-up only shows up in nested conditions, which you can get by holding down the option key while clicking the + button to create a new condition.
Mr_Noodle
Site Admin
 
Posts: 12051
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Thanks !

All it's ok for me now !
Loyd
 
Posts: 9
Joined: Fri Nov 15, 2013 5:42 pm

fuank wrote:Nevermind, I just found out how:

Code: Select all
#! /bin/bash
if ! grep Font "$1"
then
     exit 0
else
     exit 1
fi


Wow, easy setup! I put the above script in the Conditions field, and ran the OP's script for FineReader on matches.

Hazel 3.2.3
FineReader 4.1

Works like a freaking charm! Here's a screenshot: https://www.dropbox.com/s/wivb2mcnzohwj ... %20126.png
andrewgl
 
Posts: 3
Joined: Wed Aug 21, 2013 1:31 pm
Location: New Orleans, LA

I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all
#! /bin/bash
if  [ `pdffonts "$1" | grep Type | sed -n '$='` ]

# FAIL when the file is OCRed
then
   exit 0
else
   exit 1
fi


Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.
Regards

Don
dredhorse
 
Posts: 20
Joined: Mon Aug 11, 2014 4:39 pm

dredhorse wrote:I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.


Don,

Can you translate a bit and further explain why you would need to install Poppler? Thanks!

Bryan
Bryan
 
Posts: 25
Joined: Wed Jan 11, 2012 4:34 pm
Location: Maryland

pdffonts is part of poppler.

I haven't had a lot of time to check more pdf's atm, those pdf's I created myself seem to work better with pdffonts than with the other ways which where posted here.
Regards

Don
dredhorse
 
Posts: 20
Joined: Mon Aug 11, 2014 4:39 pm

I have been searching for a way to do largely what the original poster described. I think with current version of ABBYY FineReader for ScanSnap (which unfortunately I don't think you can upgrade to without purchasing a new scanner) it can be made a little simpler.

My version of Scan to Searchable PDF (4.1) does not append anything to the file's name. Additionally it has a preference to close after scanning and delete converted documents. Because of this the bash script can become just a few lines:

Code: Select all
#! /bin/bash
# set ABBYY to close after scanning and to delete converted documents
if ! grep Font "$1"
then
    # Open the file in ABBYY's FineReader
   open -W -a '/Applications/ABBYY FineReader for ScanSnap/Scan to Searchable PDF.app' "$1"
fi


the -W flag in the open command tells it to wait until the program exits, so there is no need to try to figure out when the conversion is done. I then use this applescript to send it to DevonThink:

Code: Select all
tell application id "DNtp"
   set theRecord to ocr file theFile to incoming group
end tell
rwa
 
Posts: 1
Joined: Thu Nov 06, 2014 12:06 am

hey, Thanks for the workflow.
aliciakr
 
Posts: 1
Joined: Mon Feb 02, 2015 8:09 am

dredhorse wrote:I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all
#! /bin/bash
if  [ `pdffonts "$1" | grep Type | sed -n '$='` ]

# FAIL when the file is OCRed
then
   exit 0
else
   exit 1
fi


Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.



Thank you. This solved my problem for good.
Dellu
 
Posts: 20
Joined: Thu Dec 12, 2013 6:26 am

Previous

Return to Tips & Tricks - DO NOT POST QUESTIONS