Problem with rule to OCR a PDF using PDFPen?

Talk, speculate, discuss, pontificate. As long as it pertains to Hazel.

Moderators: Mr_Noodle, Moderators

Problem with rule to OCR a PDF using PDFPen? Wed Nov 10, 2021 6:29 am • by MacOCD
Is anyone successfully using a Hazel rule to perform OCR on PDFs using the latest versions of Hazel & PDFPENPro?

I've been using the following script on my old Mac to OCR PDFs using PDFPenPro for years

Code: Select all
tell application "PDFpenPro"
   open theFile as alias
   tell document 1
      ocr
      repeat while performing ocr
         delay 1
      end repeat
      delay 1
      close with saving
   end tell
end tell

Old Mac
OS: 10.12.6
PDFPenPro v10.2.4
Hazel v4.4.5

On my new Mac I have:

New Mac
OS 11.5.2
PDFPenPo v13.1
Hazel v5.1

The script now fails, although it still works on my old Mac.

PDFPen have been sold to Nitro and support there is now virtually non-existent.

Thanks,
Mark.
MacOCD
 
Posts: 44
Joined: Fri Sep 26, 2014 11:02 am

I suggest trying the script outside of Hazel to see if it works there. I'm not sure what the state of things are with PDFpen but you might want to check to see if they have active forums.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hello,

I'm having virtually the same issue and the same script with PDFPenPro. In my case I have:

Hazel 5.1.2
macOS 12.4
PDFPenPro 12.2.3


The script has been working fine since my update to Monterey back on 12.0 or 12.1. Using Hazel 5.x the whole time. I haven't been able to identify any change prior to Hazel failing to launch PDFPenPro to run OCR.

I can run the script directly through Script Editor and it works precisely as intended (except that I have to specify what "theFile" alias points to. I have uninstalled and reinstalled Hazel and reset the PPPC permissions for Hazel and the Automation (Apple Events) permissions category using tccutil, but the issue persists.

When I attempt to run the script through Hazel on the same PDF I get OSAScriptErrorNumberKey = "-1708" in this output in the log while in debug mode:

Code: Select all

2022-06-25 21:38:58.471 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Thread 0x600003e60c00: Run worker for folder: /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans
2022-06-25 21:38:58.497 hazelworker[97749] Running worker (v5.1.2) for folder with identifier: 16777220-2972009.
2022-06-25 21:38:58.498 hazelworker[97749] ###main load address: 0x107464000
2022-06-25 21:38:58.498 hazelworker[97749] ###Hazel Core load address: 0x10773f000
2022-06-25 21:38:58.498 hazelworker[97749] ###Noodle load address: 0x107a60000
2022-06-25 21:38:58.498 hazelworker[97749] ###CK load address: 0x10761f000
2022-06-25 21:38:58.510 hazelworker[97749] DEBUG: Program is licensed.
2022-06-25 21:38:58.533 hazelworker[97749] DEBUG: Error reading file /Users/[RedactedToShare]/Library/Application Support/Firefox/prefs.js: Error Domain=NSCocoaErrorDomain Code=260 "The file “prefs.js” couldn’t be opened because there is no such file." UserInfo={NSFilePath=/Users/[RedactedToShare]/Library/Application Support/Firefox/prefs.js, NSUnderlyingError=0x60000220bae0 {Error Domain=NSPOSIXErrorDomain Code=2 "No such file or directory"}}
2022-06-25 21:38:58.535 hazelworker[97749] DEBUG: Could not find entry for default_directory in Chrome preference file.
2022-06-25 21:38:58.538 hazelworker[97749] Processing folder Scans (forced)
2022-06-25 21:38:58.538 hazelworker[97749] DEBUG: Pausing to wait for things to settle down.
2022-06-25 21:39:00.538 hazelworker[97749] DEBUG: Processing directories: (
    "/Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans"
)
2022-06-25 21:39:00.572 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Checking events for path /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans, folder Scans
2022-06-25 21:39:00.572 hazelworker[97749] DEBUG: About to process directory /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans
2022-06-25 21:39:00.574 hazelworker[97749] DEBUG: .DS_Store: File is hidden/invisible. Skipping.
2022-06-25 21:39:00.574 hazelworker[97749][PREDICTION] DEBUG: Calculating fire time - predicate: labelColor ==[cd] 7 result: 0
2022-06-25 21:39:00.574 hazelworker[97749][PREDICTION] DEBUG: Next fire time: 4000-12-31 19:00:00.000
2022-06-25 21:39:00.574 hazelworker[97749][PREDICTION] DEBUG: Bail out: AND predicate
2022-06-25 21:39:00.574 hazelworker[97749][PREDICTION] DEBUG: Predicted fire time for file: /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans/Screen Shot 2022-06-25 at 1.23.30 AM.pdf and rule Do Not Run on Gray Label: 4000-12-31 19:00:00.000 Should poll: 0
2022-06-25 21:39:00.613 hazelworker[97749][PREDICTION] DEBUG: Calculating fire time - predicate: typeObject isType: "com.adobe.pdf" result: 1
2022-06-25 21:39:00.613 hazelworker[97749][PREDICTION] DEBUG: Next fire time: 4000-12-31 19:00:00.000
2022-06-25 21:39:00.614 hazelworker[97749][PREDICTION] DEBUG: Calculating fire time - predicate: tags hazelDoesNotContainObjects: {"OCR"} result: 1
2022-06-25 21:39:00.614 hazelworker[97749][PREDICTION] DEBUG: Next fire time: 4000-12-31 19:00:00.000
2022-06-25 21:39:00.614 hazelworker[97749][PREDICTION] DEBUG: Predicted fire time for file: /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans/Screen Shot 2022-06-25 at 1.23.30 AM.pdf and rule Apply OCR to PDFs: 4000-12-31 19:00:00.000 Should poll: 0
2022-06-25 21:39:00.614 hazelworker[97749] Screen Shot 2022-06-25 at 1.23.30 AM.pdf: Rule Apply OCR to PDFs matched.
2022-06-25 21:39:00.614 hazelworker[97749] DEBUG: New rule signature. Executing actions.
Old signatures: (
)
New Signature:{typeObject isType: "com.adobe.pdf" AND tags hazelDoesNotContainObjects: {"OCR"}}:{(applescript:,{
})(addtag:(
    OCR,
    Scans
),{
})(continue:,{
})}
2022-06-25 21:39:00.666 hazelworker[97749] [Error] AppleScript failed: Error executing AppleScript on file /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans/Screen Shot 2022-06-25 at 1.23.30 AM.pdf.
[b]2022-06-25 21:39:00.666 hazelworker[97749] OSAScript error: {
    OSAScriptErrorNumberKey = "-1708";[/b]
}
2022-06-25 21:39:00.666 hazelworker[97749] DEBUG: Tapping error retry sequence
2022-06-25 21:39:00.667 hazelworker[97749] DEBUG: Writing out DB file for /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans to path: /Users/[RedactedToShare]/Library/Application Support/Hazel/16777220-2972009.hazeldb
2022-06-25 21:39:00.668 hazelworker[97749] DEBUG: Directory /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans processed in 0.095322 seconds
2022-06-25 21:39:00.668 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Checking events for path /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans, folder Scans
2022-06-25 21:39:00.668 hazelworker[97749] Received abort event.
2022-06-25 21:39:00.668 hazelworker[97749] DEBUG: Sleeping
2022-06-25 21:39:02.679 hazelworker[97749] DEBUG: About to process directory /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans
2022-06-25 21:39:02.679 hazelworker[97749] DEBUG: Directory /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans processed in 0.000502 seconds
2022-06-25 21:39:02.680 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Checking events for path /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans, folder Scans
2022-06-25 21:39:02.680 hazelworker[97749] Received abort event.
2022-06-25 21:39:02.680 hazelworker[97749] DEBUG: Sending metrics to scheduler. Next scheduled run: 4000-12-31 19:00:00.000
2022-06-25 21:39:02.680 hazelworker[97749] Done processing folder Scans
2022-06-25 21:39:02.680 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Received metrics for folder /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans: {
    directoryDepth = 0;
    requestedSchedulingTime = "4001-01-01 00:00:00 +0000";
    triggerPaths = "<NoodlePathSet: 0x600002b6e2a0>\n";
    unavailablePaths = "{(\n)}";
}
2022-06-25 21:39:02.680 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Timer scheduled for folder /Users/[RedactedToShare]/Library/Mobile Documents/com~apple~CloudDocs/Archive/Scans at 4001-01-01 00:00:00 +0000
2022-06-25 21:39:02.682 86Z3GCJ4MF.com.noodlesoft.HazelHelper[94721] DEBUG: Thread 0x600003e60c00: Task removed: [97749]



Tested with different PDFs and still the same results. PDFPenPro 12 opens and OCRs them without issue if done manually, or by running the AppleScript with the alias manually specified like so: set theFile to alias "Users:[RedactedToShare]:Desktop:Screen Shot 2022-06-25 at 7.03.13 PM.pdf"

Any ideas on what could be the issue? There have been no updates to PDFPen Pro on my Mac and I don't recall this issue occurring only after the macOS 12.4 update or a recent Hazel update. As
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

Is the script embedded or external?
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

The script is external. I have been running it that way for years, but I think I also tested it as an embedded script during my diagnostics. I’ll test as embedded later today and let you know for certain.
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

The script runs properly when run as an embedded script. I was prompted to allow Hazel to control PDFPenPro as expected, but then ran smoothly. Could there be some issue with the TCC framework not granting proper control to Hazel, or perhaps I missed a prompt to allow Hazel to run the script externally? There are no unchecked items in the Privacy prefs for anything related to Hazel or the Script Editor. I've also reset Hazel, the Script Editor, and PDFPenPro with tccutil and granted their permissions again when prompted to try to rule out that issue.

Any thoughts on what could be the issue? I can run it as embedded if that is the only way I can use AppleScript moving forward, but I would much rather have a centralized, external file I can edit to update multiple instances of the script in Hazel simultaneously.


EDIT:

I also just tried creating the script from scratch as a new file (copy and pasted the script text into a new file to avoid any file permissions issues by duplicating the file directly), deleting the rule and creating it manually, then tried to run the new rule and I'm seeing the same error from Hazel when trying to execute the external script.
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

Check the manual. External scripts require that you use a specific handler. It's implicit for embedded scripts.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:Check the manual. External scripts require that you use a specific handler. It's implicit for embedded scripts.


Thanks! Is that a recent change in Hazel? I am 95% certain that I only had the external script in place as described above, without the additional handlers since Hazel 4. Either way, I will try that "radical" idea of reading and following the instructions before bothering anyone further. 8)
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

It's been like that ever since AppleScript support was introduced years ago.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Well, then I guess I'm just a little nuttier (or at least more forgetful) than I thought. :oops: Anyhow, thanks for your help. Everything is working properly now that I have the correct AppleScript handlers in place.

Here is the script I am using (handlers included) in case this will be helpful for anyone else. This script checks if a PDF needs to be OCR'd, OCR's the PDF if needed, then closes PDFPenPro if the only document open in PDFPenPro is the target PDF. I believe I pulled this script without the handlers from this forum post: https://talk.automators.fm/t/automatica ... pen-pro/23

Code: Select all
on hazelProcessFile(theFile)
   tell application "PDFpenPro"
      open theFile as alias
      -- does the document need to be OCR'd?
      get the needs ocr of document 1
      if result is true then
         tell document 1
            ocr
            repeat while performing ocr
               delay 1
            end repeat
            delay 1
            close with saving
         end tell
         --In PDFpen, when no documents are open, window 1 is "Preferences"
         --If other documents are open, do not close the App.
         if name of window 1 is "Preferences" then
            tell application "PDFpenPro"
               quit
            end tell
         end if
      else
         -- Scan Doc was previously OCR'd or is already a text type PDF.
         tell document 1
            close without saving
         end tell
         --In PDFpen, when no documents are open, window 1 is "Preferences"
         --If other documents are open, do not close the App.
         if name of window 1 is "Preferences" then
            tell application "PDFpenPro"
               quit
            end tell
         end if
      end if
   end tell
   delay 2
end hazelProcessFile
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

Hazel 5.1 OSX: Big Sur 11.6.8

Being the OP of this thread, and having OCR'd files very happily since, I'm now finding after upgrading to Big Sur the scripts, including MasterRee's alternatives aren't working for me.

I'm getting "-1708 - the AppleEvent was not handled by any handler"

I Googled a modified script elsewhere but get the same error

Code: Select all
tell application "PDFpenPro"
   open theFile as alias
   -- does the document need to be OCR'd?
   get the needs ocr of document 1
   if result is true then
      tell document 1
         ocr
         repeat while performing ocr
            delay 1
         end repeat
         delay 1
         close with saving
      end tell
      --In PDFpen, when no documents are open, window 1 is "Preferences"
      --If other documents are open, do not close the App.
      if name of window 1 is "Preferences" then
         tell application "PDFpenPro"
            quit
         end tell
      end if
   else
      -- Scan Doc was previously OCR'd or is already a text type PDF.
      tell document 1
         close without saving
      end tell
      --In PDFpen, when no documents are open, window 1 is "Preferences"
      --If other documents are open, do not close the App.
      if name of window 1 is "Preferences" then
         tell application "PDFpenPro"
            quit
         end tell
      end if
   end if
end tell


Apparently works for others, but not for me

Could this be a security permissions issue on my Mac?
MacOCD
 
Posts: 44
Joined: Fri Sep 26, 2014 11:02 am

Did you include the handler lines that are present in my script example above? Those lines are not in your own example. The script must start with
Code: Select all
on hazelProcessFile(theFile)
and end with
Code: Select all
end hazelProcessFile
to tell Hazel how to handle the AppleScript. Check out my conversation with Mr. Noodle above and the link he includes to the user guide where it is explained in more detail, but adding those lines should suffice.
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

Re: Problem with rule to OCR a PDF using PDFPen? Mon Sep 05, 2022 12:27 pm • by MacOCD
I'm using your updated handler version of the script...

I download a PDF of my phone bill. I have a Hazel rule that should spot certain words in the PDF file to know it is a Phone Bill & run a rule accordingly.

When the bill is downloaded from my supplier Hazel cannot detect the 'Contains' & 'Contains Match' details in the PDF (account number, provider, bill date & amount)

If I manually OCR the file with PDFPen (v13.1) the information is then detected by Hazel and the file is processed correctly. The Supplier, account number, date and amount are all detected.

If I use this PDFPen applescript to preprocess the file I can see PDFPen opening, but Hazel still can't detect any of the information I require once the script has completed.

It's all very odd.

Thanks for your help so far.
MacOCD
 
Posts: 44
Joined: Fri Sep 26, 2014 11:02 am

Are you certain that the PDF doesn’t already have a native text later? This script will only run OCR if PDFPenPro detects that the PDF does not already have text. PDFs with text will open in PDFPen and then it will close without making any changes.

You can try removing the handler code lines from the script and then manually running the script on a PDF that you expect needs to be OCR’d (make sure it doesn’t already have editable text or it will behave as I described).
MasterRee
 
Posts: 19
Joined: Mon Jan 01, 2018 8:21 pm

MasterRee wrote:Are you certain that the PDF doesn’t already have a native text later? This script will only run OCR if PDFPenPro detects that the PDF does not already have text. PDFs with text will open in PDFPen and then it will close without making any changes.

You can try removing the handler code lines from the script and then manually running the script on a PDF that you expect needs to be OCR’d (make sure it doesn’t already have editable text or it will behave as I described).


If it does have a text layer it’s one that Hazel can’t access. I’ll have a good try later. Everything has worked perfectly for years before my (enforced) Big Sur update.

I guess I need a script that’ll replace an existing OCR layer if it exists.
MacOCD
 
Posts: 44
Joined: Fri Sep 26, 2014 11:02 am

Next

Return to Open Discussion