Page 3 of 3

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Sat Nov 30, 2013 4:22 pm
by Loyd
Please help !

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Mon Dec 02, 2013 2:55 pm
by Mr_Noodle
Please do not keep posting to bump the thread.

I am not sure what the poster is asking for here. It seems to me that there is a misunderstanding of what the original script is doing. It is meant to determine if a PDF needs to be ocr'ed, not as a general means to search for text. For that, there is already support for that built in to Hazel so no script is needed. I suggest you go that route and if you can't get it to work, post your own question is specifics in the Support forum.

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Tue Dec 03, 2013 1:44 pm
by Loyd
Thanks Mr_Noodle for your answer, and sorry for bumping this thread, but because I feel that my request was simple and that I was near to the solution without easily find it !

Nevertheless, I do my best to find how Hazel can automatically do this without a script but no success !
I am a newbie in Hazel and searching other threads leads me to use the script mentioned in Cassady post http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=2035&p=8618&hilit=ocr#p8618.

Here is my purpose :
Search in a folder containing PDF documents, those (and only those) who require OCR (so label them in yellow).

Here is what in do (using a script) :

Code: Select all
if (all) of the following conditions are met :
     kind is pdf
     passes shell script (embedded script)
     color label is not yellow

Do the following to the matched file or folder:
     set color label yellow
     


Where the embedded script is :
Code: Select all
a=$(grep -ci "encoding" "$1")
if [ x$a = x ];
then exit 1;
fi
if [ $a -lt 2 ];
then exit 0
else
exit 1
fi


Now it seems working, but my question is how to do it without a script as you mentioned in your last post ?
I have also another question not related to this topic : Where is in Hazel the "for (the file or folder being processed)" in if (all) of the following conditions are met for (the file or folder being processed). Im a using Hazel 3.2.1

Thanks in advance

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Wed Dec 04, 2013 1:32 pm
by Mr_Noodle
I was referring more to the part about coloring the file. That part can be done using Hazel's built-in actions so I wasn't sure what the point was for the post you were responding to.

At this point, I guess I'm unclear as to what the question is since it seems you have it working.

As for your other question, that pop-up only shows up in nested conditions, which you can get by holding down the option key while clicking the + button to create a new condition.

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Wed Dec 04, 2013 3:00 pm
by Loyd
Thanks !

All it's ok for me now !

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Mon Dec 23, 2013 6:37 pm
by andrewgl
fuank wrote:Nevermind, I just found out how:

Code: Select all
#! /bin/bash
if ! grep Font "$1"
then
     exit 0
else
     exit 1
fi


Wow, easy setup! I put the above script in the Conditions field, and ran the OP's script for FineReader on matches.

Hazel 3.2.3
FineReader 4.1

Works like a freaking charm! Here's a screenshot: https://www.dropbox.com/s/wivb2mcnzohwj ... %20126.png

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Fri Aug 15, 2014 7:05 am
by dredhorse
I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all
#! /bin/bash
if  [ `pdffonts "$1" | grep Type | sed -n '$='` ]

# FAIL when the file is OCRed
then
   exit 0
else
   exit 1
fi


Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Fri Aug 22, 2014 4:48 pm
by Bryan
dredhorse wrote:I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.


Don,

Can you translate a bit and further explain why you would need to install Poppler? Thanks!

Bryan

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Sat Aug 23, 2014 11:38 am
by dredhorse
pdffonts is part of poppler.

I haven't had a lot of time to check more pdf's atm, those pdf's I created myself seem to work better with pdffonts than with the other ways which where posted here.

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Thu Nov 06, 2014 12:14 am
by rwa
I have been searching for a way to do largely what the original poster described. I think with current version of ABBYY FineReader for ScanSnap (which unfortunately I don't think you can upgrade to without purchasing a new scanner) it can be made a little simpler.

My version of Scan to Searchable PDF (4.1) does not append anything to the file's name. Additionally it has a preference to close after scanning and delete converted documents. Because of this the bash script can become just a few lines:

Code: Select all
#! /bin/bash
# set ABBYY to close after scanning and to delete converted documents
if ! grep Font "$1"
then
    # Open the file in ABBYY's FineReader
   open -W -a '/Applications/ABBYY FineReader for ScanSnap/Scan to Searchable PDF.app' "$1"
fi


the -W flag in the open command tells it to wait until the program exits, so there is no need to try to figure out when the conversion is done. I then use this applescript to send it to DevonThink:

Code: Select all
tell application id "DNtp"
   set theRecord to ocr file theFile to incoming group
end tell

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Mon Feb 02, 2015 8:16 am
by aliciakr
hey, Thanks for the workflow.

Re: Determine if a PDF needs to be OCR'd & Automate FineRead

PostPosted: Fri Jan 06, 2017 12:21 am
by Dellu
dredhorse wrote:I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.

I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.

Code: Select all
#! /bin/bash
if  [ `pdffonts "$1" | grep Type | sed -n '$='` ]

# FAIL when the file is OCRed
then
   exit 0
else
   exit 1
fi


Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.



Thank you. This solved my problem for good.