Automate OCR and Compression with PDFpen Pro

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Automate OCR and Compression with PDFpen Pro Mon Apr 28, 2025 12:18 pm • by ApapAJP
I have the following script. It seems as though it is only completing the ocr. Not the compression. Can someone help?
[code]tell application "PDFpenPro"
-- Open the incoming file
open theFile as alias

-- Check if the document needs OCR
set needsOCR to needs ocr of document 1
if needsOCR then
tell document 1
-- Perform OCR
ocr
repeat while performing ocr
delay 1
end repeat
delay 1

-- Compress the PDF after OCR
create optimized PDF
delay 1

-- Save and close
close with saving
end tell

-- Quit PDFpenPro if no documents remain
if name of window 1 is "Preferences" then
tell application "PDFpenPro" to quit
end if
else
-- Document already has text layer or OCR not needed
tell document 1 to close without saving

-- Quit PDFpenPro if no documents remain
if name of window 1 is "Preferences" then
tell application "PDFpenPro" to quit
end if
end if
end tell
/code]
ApapAJP
 
Posts: 5
Joined: Tue Mar 04, 2025 9:19 pm

I noticed the issue. The downloaded PDFs are secured bank statements. When ocr'd they do not pick up the account number either. If I manually print to pdf (to remove the secured part) then manually ocr with my windows pc program foxit pro. I get the ocr and small size as well as hazel to pick up the account number.

Can this be done in a script (print to pdf to remove security, then ocr then compress) ?
ApapAJP
 
Posts: 5
Joined: Tue Mar 04, 2025 9:19 pm

You might want to look into Shortcuts, Automator or AppleScript for that.
Mr_Noodle
Site Admin
 
Posts: 11998
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:You might want to look into Shortcuts, Automator or AppleScript for that.


It took me awhile to crack this one but I succesfully completed it. See below script.

This will break secured files with qpdf then ocr with PDFpenPro rename to original file name so hazel can run its rules.

Code: Select all
-- Hazel provides "theFile"
set originalFile to POSIX path of theFile

-- Skip if already processed
if originalFile ends with ".OCR.pdf" then return

-- Set temporary paths
set tempUnlockedFile to "/tmp/unlocked_" & (do shell script "uuidgen") & ".pdf"
set tempOCRFile to "/tmp/ocr_output_" & (do shell script "uuidgen") & ".pdf"

set qpdfSuccess to false

-- Try to unlock using qpdf
try
   do shell script "/opt/homebrew/bin/qpdf --decrypt " & quoted form of originalFile & " " & quoted form of tempUnlockedFile
   set qpdfSuccess to true
on error
   -- If unlock fails, proceed with original
end try

-- Choose which file to OCR
if qpdfSuccess then
   set fileToProcess to POSIX file tempUnlockedFile
else
   set fileToProcess to theFile
end if

-- Perform OCR in PDFpenPro
tell application "PDFpenPro"
   activate
   open fileToProcess
   delay 1
   
   if (count of documents) = 0 then error "Failed to open document"
   
   tell document 1
      try
         ocr
      on error errMsg
         error "OCR failed: " & errMsg
      end try
      
      -- Wait for OCR to finish
      set timeoutSeconds to 0
      repeat while performing ocr
         delay 2
         set timeoutSeconds to timeoutSeconds + 2
         if timeoutSeconds > 900 then error "OCR timed out"
      end repeat
      
      -- ✅ Save to temp path correctly
      save in (POSIX file tempOCRFile)
      close saving no
   end tell
   
   if (count of documents) = 0 then quit
end tell

-- ✅ Replace the original file with the OCR'd version
do shell script "mv -f " & quoted form of tempOCRFile & " " & quoted form of originalFile

-- Clean up
if qpdfSuccess then
   do shell script "rm -f " & quoted form of tempUnlockedFile
end if
ApapAJP
 
Posts: 5
Joined: Tue Mar 04, 2025 9:19 pm


Return to Support