Hazel "beginner", OCR, Finereader and the like

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Hazel "beginner", OCR, Finereader and the like Thu Apr 18, 2019 4:04 am • by luoto
Hi...

Well I am very slowly getting Hazel to do basic file manipulation for me. One of my challenges, to which I then though Hazel can help, is with the Abbyy Finereader OCR program.

Essentially I want to feed my beefy desktop machine with a lot of PDFs and have it make nice OCR versions and spit them out into a given directory. The Abbyy Automator scripts don't work if you dump 20 files into a directory -- unless you want one super-file of 20 PDFs together. I tried a "dispense" workflow routine but struggle to get it to work.

I wondered therefore Hazel. I know I am not yet ready to figure out deeper conditions, so had even wondered if a "what's the latest file" out of 20, process that, remove from the directory and loop back (what's the latest from 19, what's the latest from 18 and so on).

I've looked at these threads but am struggling to see how to put everything together.

viewtopic.php?f=4&t=7865&p=22371&hilit=finereader#p22371
viewtopic.php?f=4&t=4704&p=24942&hilit=finereader#p24942

The TL:DR, if anyone is kind enough to give some specific guidance:

I would have a specific "to process" directory
I would have a specific "processed" directory
The original files can be deleted. I would have copied them from another location anyway and would manually replace them for various reasons with the OCRed version anyway.
Standard Abbyy features (English, make into OCR).

All non-standard stuff (rotate, other languages, etc) I would run manually.

The reason for making everything OCRed is to make reading easier (visually handicapped), but in wanting to make my life easier, I have to first use the limited visual resources to struggle with an unfamiliar way of thinking (the app/programming) to do it. The bits Ihave learned have been very great and appreciated, but it is a very steep curve for me.

Thanks in advance for any pointers.
luoto
 
Posts: 6
Joined: Thu Apr 18, 2019 3:54 am

To get Finereader to OCR, you will need an AppleScript or Automator action to do that. You can try searching the forums here or on their site.

Is it imperative that you process the files in a certain order? If not, then you shouldn't worry about that part at all as it complicates things.
Mr_Noodle
Site Admin
 
Posts: 8217
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:To get Finereader to OCR, you will need an AppleScript or Automator action to do that. You can try searching the forums here or on their site.


Thank you for approving the topic and the prompt response. In the threads, I quoted I saw some script examples, but they (to me) seemed to contradict or confuse. There is no order of processing desired. Essentially I want to be able to dump files in one directory and they get OCRed and output elsewhere, with the originals being deleted (to avoid reprocessing).

I could not see a way with Hazel for it to look at one file in the "import directory" and then just send that to Finereader, since my understanding from the script was you had to make some selection of the file (name) to send it away. When I tried with Automator it wanted to do all the files at once, and it sought to merge them in the OCR application despite not selecting "merge".

The goal would be to move over say 50 files and come back sometime later to "refeed" the monster.

Is it correct that Hazel would work on a file-by-file basis, so even if it suddenly discovers five files added in the interval since its last check (sorry, I don't know how it "alerts itself" to new files) it would still run on a file-by-file basis its rule, so it would not try and make five different OCR attempts (or pass five files to the OCR application)? If this assumption is correct, I am guessing it is to make a "watch folder", on detection of a file (filetype PDF) then run the script that hopefully exports the OCR elsewhere. Hazel can then move/delete the "source" file and it automatically loops to the next "alert" and so on.

All part of the learning experience, which no doubt feels more daunting than its reality!

EDIT: Well that went a LOT smoother than I thought, or at least the files processing so far are. Thank you, once more.

On with the refinements and the steep (for me) learning curve.
luoto
 
Posts: 6
Joined: Thu Apr 18, 2019 3:54 am

luoto wrote:EDIT: Well that went a LOT smoother than I thought, or at least the files processing so far are. Thank you, once more.
On with the refinements and the steep (for me) learning curve.


So did it work out for you?
I would recommend doing the moving of the files with tags: so after the script of abbyFineReader processed the file add a tag "OCRed" and then have a separate rule in Hazel that says: if a file has this tag, move the file.

Just to add the script I am using with abbyFineReader – that works perfectly well:

Code: Select all
on hazelProcessFile(theFile)
   
   using terms from application "FineReader"
      set langList to {German, English}
      set saveType to single file
      set exportmodepdflayout to "text over the page image"
      set keepPageNumberHeadersAndFootersBoolean to yes
      set keepImageBoolean to yes
      set imageOptionsImageQualityEnum to balanced quality
      set usemrcboolean to no
      set makepdfaboolean to yes
      set pageSizePageSizeEnum to automatic
      set increasePaperSizeToFitContentBoolean to yes
   end using terms from
   
   tell application "FineReader"
      export to pdf theFile from file theFile
   end tell
   
   WaitWhileBusy()
   
   tell application "FineReader"
      quit
   end tell
   
end hazelProcessFile

on WaitWhileBusy()
   repeat while IsMainApplicationBusy()
   end repeat
end WaitWhileBusy

on IsMainApplicationBusy()
   tell application "FineReader"
      set resultBoolean to is busy
   end tell
   return resultBoolean
end IsMainApplicationBusy
"Behind all the inhuman aspects of automation (...) its real possibilities appear: the genesis of a technological world in which man can finally withdraw from (...) the apparatus of his labor – in order to experiment freely with it." /Marcuse
Robert
 
Posts: 52
Joined: Sun Dec 16, 2018 8:05 am

Robert wrote:So did it work out for you?

Thank you for the follow-up.
I got it working for me by running an Automator script I found online (i.e. run the match, run the Automator script, repeat).

I have looked at your Applescript, making a directory monitoring condition and within Hazel it did not like the script, complaining that "Expected "end" but found "on" ?", but compiling it within Apple's Script Editor and pointing to that it *seems* to be happy, but I did not get it to write the tags. Is this a problem with Hazel or Applescript? The same text was copied and compiled without an issue to the Script Editor in any case.

EDIT2: I don't know if Hazel somehow queued a lot of "to-do" tasks with the "old" way, despite me deselecting it and adding the new one. Had to stop and start it properly (rather than pause). Tests to go on as I've had a few oddities now, but no more updates tonight.!
Last edited by luoto on Fri Apr 19, 2019 10:50 am, edited 1 time in total.
luoto
 
Posts: 6
Joined: Thu Apr 18, 2019 3:54 am

To answer your previous question, yes, Hazel processes files one-by-one.

His script is meant to be an external script. I forget if Script Editor is installed on non-developer systems but if it is, copy that script into a document there, compile and save it as a .scpt file which you can then reference from Hazel via the "Run AppleScript" action.

If you don't have Script Editor, you can paste it into a regular text document in TextEdit and save it as a .applescript file. Hazel would then have to compile the script on the fly every time.
Mr_Noodle
Site Admin
 
Posts: 8217
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hi-

I have been trying to solve this for a while, and unfortunately have not been able to do so.

I notice references within the thread to an automator script found online, but I don't know what or where this is.

Would you be so kind, if you made this work to your liking, to lay out the steps you have taken? What script are you using, and how to you set Hazel to use the script on one file at a time?

Thanks!
sawbones
 
Posts: 14
Joined: Wed Jul 03, 2013 9:29 pm

Sorry for the delay. I've been in hospital for nearly six weeks (organ transplant) and am slowly catching up. If the request was aimed at me, this is my solution so far. The only issue is Finereader (latest) still likes to crash randomly and I've found no way to restart and (auto) ignore the recovery suggestion. Sometimes it also tries to grab several files at once and make one big one.

Image
luoto
 
Posts: 6
Joined: Thu Apr 18, 2019 3:54 am

Hi, luoto-

I certainly hope everything is going great with your health, and greatly appreciate the fact you took time out to respond. Thanks!

If I might ask, could you lay out how you do this in Hazel? I have a similar Automator action, but the issue for me is how to get the workflow going through Hazel.

Cheers!
sawbones
 
Posts: 14
Joined: Wed Jul 03, 2013 9:29 pm

Thanks for the kind thoughts. I am far from being an expert, but this workflow worked for me.

Image

The bigger issue is the bugfest that is FineReader :)
luoto
 
Posts: 6
Joined: Thu Apr 18, 2019 3:54 am

Unfortunately, I cannot see the workflow you reference in your Hazel process. Is it possible to outline that Automator workflow?

Cheers!
sawbones
 
Posts: 14
Joined: Wed Jul 03, 2013 9:29 pm


Return to Support