Here's the AppleScript I use. It opens each file it matches in PDFpen, ensures the window is active, runs the "remove OCR layer" command, waits for it to take effect, then runs OCR again.
The best bit, from my perspective, is it will remove the OCR layer from PDF documents that are actually already text (like a print to pdf of an email for example) in instances where some dodgy software has added one, but then only adds an OCR layer to files that need it (scanned documents, not "text" PDFs).
I borrowed the bulk of the script (minus the stripping of the OCR) from :-https://katiefloyd.com/blog/automatically-ocr-pdfs-with-hazel-and-pdfpen-2017-edition
- Code: Select all
tell application "PDFpen"
open theFile as alias
--remove OCR layer from the document
-- this only strips the OCR, doesn't impact "real text" PDFs.
activate application "PDFpen"
delay 2
tell application "System Events"
-- This is the keyboard shortcut to remove the OCR layer
keystroke "o" using {command down, option down, control down}
end tell
-- without this delay, testing the document will claim it doesn't need OCR
-- delay required for the "remove OCR layer" step to take effect
delay 2
-- does the document need to be OCR'd?
get the needs ocr of document 1
if result is true then
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
--In PDFpen, when no documents are open, window 1 is "Preferences"
--If other documents are open, do not close the App.
if name of window 1 is "Preferences" then
tell application "PDFpen"
quit
end tell
end if
else
-- Scan Doc was previously OCR'd or is already a text type PDF.
tell document 1
close without saving
end tell
--In PDFpen, when no documents are open, window 1 is "Preferences"
--If other documents are open, do not close the App.
if name of window 1 is "Preferences" then
tell application "PDFpen"
quit
end tell
end if
end if
end tell
-- without this, sometimes it seems to kick off this same script with multiple matches at once
delay 2