I understand this is not specificly a HAZEL issue, but its all part of the workflow I have been working on for the last week.
I feel like I am 99.9% to solving the issues.
I using HAZEL auto OCR files as they appear in a "Watch Folder" using PDFpenPro
The apple script I am using is this:
- Code: Select all
try
tell application "PDFpenPro"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
on error
tell document 1 to close
tell application "PDFpenPro" to quit
--This captures the error so that a document isn't OCR'd ad infinitum.
end try
I have found that the OCR of PDRpenPro is not consistent. One some PDF's it will allow a pattern match, on others it will miss.
However, I tried ABBYY Fine Reader Express, and so far it is working 100%, it seems to "block Text" ocr the documents in a far more logical manner.
So onto my question. Can someone please provide an applescript foe HAZEL to convert PDF files to with ABBYY fine reader to a searchable PDF. My applescript ability os about 5%
Regards
Hans
NB: I did try this script, but it deals with automating a folder rather than using HAZEL, + it did not work with OSX 10.8.4 and I think I would need to amend the script to work with ABBYY FineReader Express.
http://paperjammed.com/2010/01/04/automate-scansnap-ocr-process-on-your-mac-with-applescript-snow-leopard-edition/
- Code: Select all
(*
NOTE: This script was written for Snow Leopard. It may work
on Leopard, but I never tried it.
This is a folder listener script that will act as a queue, receiving
PDF files from the ScanSnap scanner and feeding them, one by one, to
the Abbyy FineReader OCR software.
This allows you to keep scanning while the OCR job runs in the background
on all of the unprocessed files.
Why do we want to do this?
The ScanSnap Manager software does not support this by default, so
when you scan in a file, it sends it to FineReader for OCR. You then
must wait until FineReader finishes its work before scanning in another
document.
This script allows you to keep scanning without waiting for OCR.
Installation:
o Copy this script to:
<home>/Library/Scripts/Folder Action Scripts
You may have to create the "Folder Action Scripts" folder.
o Open a Finder window and navigate to the parent folder
of the scanned documents folder.
o Right click (control-click) the scanned documents folder and
choose:
Folder Actions Setup...
o At this point if folder actions are not enabled, you will
likely have to enable them and add the script manually.
- check "Enable Folder Actions"
- Use the "+" buttons on the left and right sides to add the
scan folder and then this script.
o Otherwise, a list of scripts will come up. Choose this script
from the "Choose a Script to Attach" dialog.
o Close all windows.
Copyright (C) 2010 Tad Harrison
*)
property ocrFileSuffix : " processed by Abbyy"
property ocrApplicationName : "Scan to Searchable PDF"
property ocrApplicationWindow : "Converting the document"
property ocrLockFileName : "OCR in Progress"
on adding folder items to this_folder after receiving added_items
set lockFilePath to (POSIX path of (path to desktop folder as text)) & ocrLockFileName
try
logEvent("=== Run OCR on New Folder Items ===")
-- Test for lockfile; exit if lockfile exists
tell application "System Events" to set lockFileExists to exists file lockFilePath
if lockFileExists then
logEvent("Other script running. Exiting...")
return
else
do shell script "/usr/bin/touch \"" & lockFilePath & "\""
end if
-- Main loop
set moreWorkToDo to true
repeat while moreWorkToDo
set aFile to getNextFile(this_folder)
if not aFile = "" then
ocrFile(aFile)
else
set moreWorkToDo to false
end if
end repeat
logEvent("No more work.")
exitApp(ocrApplicationName)
on error errorStr number errNum
display dialog "Error " & errNum & " while running OCR: " & errorStr
set my isRunning to false
end try
-- Get rid of the lockfile, ignoring any errors
try
do shell script "/bin/rm \"" & lockFilePath & "\""
end try
end adding folder items to
(*
Name: ocrFile
Description: Runs OCR on the next un-OCR'd file
Parameters:
aFile - the file to be OCR'd
*)
on ocrFile(aFile)
set posixFilePath to POSIX path of aFile
set posixOcrFilePath to getPosixOcrFilePath(posixFilePath)
logEvent("OCR: " & posixFilePath)
tell application ocrApplicationName to open aFile
--
-- Now sit in a loop checking once per second for the OCR file
-- Give up after five minutes
--
with timeout of 300 seconds
set ocrFileExists to false
repeat until ocrFileExists
set ocrFileExists to posixFileExists(posixOcrFilePath)
if ocrFileExists then
logEvent("OCR file generated.")
-- Wait 5 even if the file was found, to let things settle
delay 5
else
-- Wait a second before checking again
delay 1
end if
end repeat
end timeout
end ocrFile
(*
Name: appIsRunning
Description: Determines if a particular application is running.
Parameters:
appName - the name of the application to be tested
Returns: True if the application is running; otherwise False
*)
on appIsRunning(appName)
tell application "System Events" to (name of processes) contains appName
end appIsRunning
(*
Name: posixFileExists
Description: Determines if a particular file exists.
Parameters:
posixFilePath - the POSIX path to the file
Returns: True if the file exists; otherwise False
*)
on posixFileExists(posixFilePath)
tell application "System Events" to exists file posixFilePath
end posixFileExists
(*
Name: exitApp
Description: Exits the specified app if it is running.
Parameters:
appName - the application name
*)
on exitApp(appName)
if appIsRunning(appName) then
tell application appName to quit
end if
end exitApp
(*
Name: getPosixOcrFilePath
Description: Gets the OCR output filename for a given input filename.
Parameters:
posixFilePath - the full path to the source file
Return: the POSIX path of the OCR output file
*)
on getPosixOcrFilePath(posixFilePath)
set posixBaseName to do shell script ¬
"filename=" & quoted form of posixFilePath & "; echo ${filename%\\.*}"
set posixOcrFilePath to posixBaseName & ocrFileSuffix
return posixOcrFilePath
end getPosixOcrFilePath
(*
Name: getNextFile
Description: Finds the next unprocessed ScanSnap PDF
Return: the file or ""
*)
on getNextFile(aFolder)
logEvent("Getting next file...")
set masterFileList to list folder aFolder ¬
without invisibles
set posixPath to POSIX path of aFolder
repeat with i from 1 to count masterFileList
set fileName to item i of masterFileList
set posixFilePath to posixPath & fileName
log posixFilePath
--
-- Construct a FineReader file name from our file
--
set posixOcrFilePath to getPosixOcrFilePath(posixFilePath)
--
-- See if the FineReader file we constructed exists
--
set ocrFileExists to posixFileExists(posixOcrFilePath)
tell me to set fileCreator to getSpotlightInfo for "kMDItemCreator" from posixFilePath
log ("Creator: " & fileCreator)
if not ocrFileExists and fileCreator = "ScanSnap Manager" then
return POSIX file posixFilePath
end if
end repeat
return ""
end getNextFile
(*
Name: getSpotlightInfo
Description: Gets a named attribute from metadata for a specific file.
Parameters:
for myattribute - the name of the attribute
from myfile - the name of the file
Returns: the attribute value or "" if none found
*)
on getSpotlightInfo for myattribute from myfile
try
set this_kMDItemResult to ""
tell application "Finder"
set this_item to myfile as string
set this_item to POSIX path of this_item
set this_kMDItem to myattribute
set theResult to words of (do shell script "/usr/bin/mdls -name " & this_kMDItem & " -raw -nullMarker None " & quoted form of this_item)
log "Result: " & theResult as string
repeat with j from 1 to number of items in theResult
set this_kMDItemResult to this_kMDItemResult & item j of theResult as string
if j < number of items in theResult then
set this_kMDItemResult to this_kMDItemResult & " "
end if
end repeat
end tell
on error
set this_kMDItemResult to ""
end try
return this_kMDItemResult
end getSpotlightInfo
(*
Name: logEvent
Description: Write an event to an event log
Parameters:
themessage - the message to write to the log
*)
on logEvent(themessage)
set theLine to (do shell script ¬
"date +'%Y-%m-%d %H:%M:%S'" as string) ¬
& " " & themessage
do shell script "echo " & theLine & ¬
" >> ~/Library/Logs/AppleScript-events.log"
end logEvent