Re: Determine if a PDF needs to be OCR'd & Automate FineRead
Posted: Sat Nov 30, 2013 4:22 pm
Please help !
simply useful mac software
https://www.noodlesoft.com/forums/
if (all) of the following conditions are met :
kind is pdf
passes shell script (embedded script)
color label is not yellow
Do the following to the matched file or folder:
set color label yellow
a=$(grep -ci "encoding" "$1")
if [ x$a = x ];
then exit 1;
fi
if [ $a -lt 2 ];
then exit 0
else
exit 1
fi
fuank wrote:Nevermind, I just found out how:
- Code: Select all
#! /bin/bash
if ! grep Font "$1"
then
exit 0
else
exit 1
fi
#! /bin/bash
if [ `pdffonts "$1" | grep Type | sed -n '$='` ]
# FAIL when the file is OCRed
then
exit 0
else
exit 1
fi
dredhorse wrote:I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.
#! /bin/bash
# set ABBYY to close after scanning and to delete converted documents
if ! grep Font "$1"
then
# Open the file in ABBYY's FineReader
open -W -a '/Applications/ABBYY FineReader for ScanSnap/Scan to Searchable PDF.app' "$1"
fi
tell application id "DNtp"
set theRecord to ocr file theFile to incoming group
end tell
dredhorse wrote:I found that the way to detect OCRed PDF file like posted in the op doesn't really work for a lot of pdfs.
I use this approach which looks atm to work, you need poppler http://poppler.freedesktop.org/ which can also be installed via homebrew.
- Code: Select all
#! /bin/bash
if [ `pdffonts "$1" | grep Type | sed -n '$='` ]
# FAIL when the file is OCRed
then
exit 0
else
exit 1
fi
Depending on the kind of rule, you just need to switch the exit statement. This one would be for a rule which doesn't proceed if the pdf file has already OCR information.