PDFPen Pro v9 OCR Remove

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

PDFPen Pro v9 OCR Remove Wed May 03, 2017 10:28 pm • by Bane
Hello,
For a while I have used hazel to OCR my documents and move them, etc into other folders. Over time I used some not-so-great tools to OCR files and finally found and used PDF Pen pro. Unfortunately that left me with older files that have horrible OCR layers. The new PDF Pen pro will remove those old layers and let me do a clean OCR but I'd like to do this with an embedded apple script in Hazel if possible; unfortunately I can't figure the command to delete the existing ocr layer from a file.

Can any of you super smart people help with the script to simply call PDF Pen pro and remove the OCR layer?
Sorry, I'm trying to learn but i've hit a wall and can't find the command in the script tool or right syntax if its off the regular "ocr" command.

Thanks!
Bane
 
Posts: 8
Joined: Thu Jul 16, 2015 1:56 pm

Re: PDFPen Pro v9 OCR Remove Thu May 04, 2017 4:02 pm • by Mr_Noodle
You might want to try Smile's support as this seems like specifically a PDFpen Pro issue. I don't know if they have forums but I'm sure their support can get back to you quickly.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: PDFPen Pro v9 OCR Remove Fri May 05, 2017 4:25 pm • by Bane
Thanks Mr Noodle,
You're right, I thought since I was using the "ocr and move" scripts people use here that this was my first stop but I'll ask Smile and report back in case anyone else finds this useful. Perhaps I can contribute for once ;)

Thanks!
Bane
 
Posts: 8
Joined: Thu Jul 16, 2015 1:56 pm

Re: PDFPen Pro v9 OCR Remove Fri May 26, 2017 8:32 am • by Bane
Sorry for the delay but Smile did respond to me a bit ago, adding it into the post here so that hopefully it helps someone else in the meantime. Unfortunately for me I'm not sure how to do this in the apple script part of hazel (I'm still pretty new to all of this).



"There isn't an AppleScript command to remove the OCR, but it isn't too bad to use System Events to press the keyboard shortcut. That is, this should accomplish what you're after:

tell application "System Events" to keystroke "o" using {control down, command down, option down}

as long as the document is on top and PDFpen is the active application. It's not a perfect solution but I'll forward a request onto our engineers to include an AppleScript method for this in a future version. Let me know if you have any other questions, comments or concerns."
Bane
 
Posts: 8
Joined: Thu Jul 16, 2015 1:56 pm

Re: PDFPen Pro v9 OCR Remove Fri May 26, 2017 10:47 am • by Mr_Noodle
If you use that script in Hazel, you should add a part before that to open the document in PDFPen and bring it forward.

Here's one thread on how to do that part: https://forums.macrumors.com/threads/us ... am.882578/ (or search around as there are a lot of AppleScript resources out there).
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: PDFPen Pro v9 OCR Remove Mon May 29, 2017 1:34 am • by sandcastle
Here's the AppleScript I use. It opens each file it matches in PDFpen, ensures the window is active, runs the "remove OCR layer" command, waits for it to take effect, then runs OCR again.

The best bit, from my perspective, is it will remove the OCR layer from PDF documents that are actually already text (like a print to pdf of an email for example) in instances where some dodgy software has added one, but then only adds an OCR layer to files that need it (scanned documents, not "text" PDFs).

I borrowed the bulk of the script (minus the stripping of the OCR) from :-https://katiefloyd.com/blog/automatically-ocr-pdfs-with-hazel-and-pdfpen-2017-edition

Code: Select all
tell application "PDFpen"
   open theFile as alias
   
   --remove OCR layer from the document
   -- this only strips the OCR, doesn't impact "real text" PDFs.
   activate application "PDFpen"
   delay 2
   
   tell application "System Events"
      -- This is the keyboard shortcut to remove the OCR layer
      keystroke "o" using {command down, option down, control down}
   end tell
   
   -- without this delay, testing the document will claim it doesn't need OCR
   -- delay required for the "remove OCR layer" step to take effect
   delay 2
   -- does the document need to be OCR'd?
   get the needs ocr of document 1
   if result is true then
      tell document 1
         ocr
         repeat while performing ocr
            delay 1
         end repeat
         delay 1
         close with saving
      end tell
      --In PDFpen, when no documents are open, window 1 is "Preferences"
      --If other documents are open, do not close the App.
      if name of window 1 is "Preferences" then
         tell application "PDFpen"
            quit
         end tell
      end if
   else
      -- Scan Doc was previously OCR'd or is already a text type PDF.
      tell document 1
         close without saving
      end tell
      --In PDFpen, when no documents are open, window 1 is "Preferences"
      --If other documents are open, do not close the App.
      if name of window 1 is "Preferences" then
         tell application "PDFpen"
            quit
         end tell
      end if
   end if
end tell
-- without this, sometimes it seems to kick off this same script with multiple matches at once
delay 2
sandcastle
 
Posts: 3
Joined: Wed Jan 14, 2015 3:46 am

Re: PDFPen Pro v9 OCR Remove Thu Jun 01, 2017 9:09 pm • by Bane
Wow, you guys are awesome.
Sorry for the delayed response (9mo old running around) but thanks for the script response. This will really help clean up old, bad ocr's.
Thank you very much sandcastle and Mr Noodle!
Bane
 
Posts: 8
Joined: Thu Jul 16, 2015 1:56 pm

Re: PDFPen Pro v9 OCR Remove Sun Jul 09, 2017 12:39 am • by PhilM
I'm with Bane. You guys are awesome. :D
Thanks for the AppleScript sandcastle. I just modified it to PDFPenPro and all was sweet.
PhilM
 
Posts: 14
Joined: Tue Jul 26, 2016 11:38 pm

Re: PDFPen Pro v9 OCR Remove Fri Jul 21, 2017 7:28 pm • by tanaquil
This is genius!! I came here looking for exactly this script and had to register on the forums just so that I could thank you. Works perfectly!
tanaquil
 
Posts: 1
Joined: Fri Jul 21, 2017 7:26 pm

Re: PDFPen Pro v9 OCR Remove Wed Aug 09, 2017 2:48 pm • by bronson
Hey Guys. Awesome script.

Everytime I run it, the hazel log shows
2017-08-09 11:43:05.579 hazelworker[27607] ###main load address: 0x10549b000
2017-08-09 11:43:05.581 hazelworker[27607] ###Noodle load address: 0x1055bd000
2017-08-09 11:43:05.581 hazelworker[27607] ###CK load address: 0x105580000
2017-08-09 11:43:05.620 hazelworker[27607] Processing folder Test Folder (forced)
2017-08-09 11:43:07.629 hazelworker[27607] Fifth Wheel.pdf: Rule OCR Newly Added PDF matched.
2017-08-09 11:43:07.631 hazelworker[27607] Test.pdf: Rule OCR Newly Added PDF matched.
2017-08-09 11:43:07.633 hazelworker[27607] Received abort event.
2017-08-09 11:43:07.634 hazelworker[27607] Done processing folder Test Folder


It's the received abort event that I believe is making it fail. Any advice?

Thanks!
bronson
 
Posts: 1
Joined: Wed Aug 09, 2017 2:44 pm

Re: PDFPen Pro v9 OCR Remove Tue Apr 16, 2019 3:06 pm • by mmandell
This worked perfectly and so so appreciated!

Thanks,

Matt
mmandell
 
Posts: 21
Joined: Fri Jun 15, 2018 10:36 am


Return to Support