Page 1 of 1

Hazel "beginner", OCR, Finereader and the like

PostPosted: Thu Apr 18, 2019 4:04 am
by luoto
Hi...

Well I am very slowly getting Hazel to do basic file manipulation for me. One of my challenges, to which I then though Hazel can help, is with the Abbyy Finereader OCR program.

Essentially I want to feed my beefy desktop machine with a lot of PDFs and have it make nice OCR versions and spit them out into a given directory. The Abbyy Automator scripts don't work if you dump 20 files into a directory -- unless you want one super-file of 20 PDFs together. I tried a "dispense" workflow routine but struggle to get it to work.

I wondered therefore Hazel. I know I am not yet ready to figure out deeper conditions, so had even wondered if a "what's the latest file" out of 20, process that, remove from the directory and loop back (what's the latest from 19, what's the latest from 18 and so on).

I've looked at these threads but am struggling to see how to put everything together.

viewtopic.php?f=4&t=7865&p=22371&hilit=finereader#p22371
viewtopic.php?f=4&t=4704&p=24942&hilit=finereader#p24942

The TL:DR, if anyone is kind enough to give some specific guidance:

I would have a specific "to process" directory
I would have a specific "processed" directory
The original files can be deleted. I would have copied them from another location anyway and would manually replace them for various reasons with the OCRed version anyway.
Standard Abbyy features (English, make into OCR).

All non-standard stuff (rotate, other languages, etc) I would run manually.

The reason for making everything OCRed is to make reading easier (visually handicapped), but in wanting to make my life easier, I have to first use the limited visual resources to struggle with an unfamiliar way of thinking (the app/programming) to do it. The bits Ihave learned have been very great and appreciated, but it is a very steep curve for me.

Thanks in advance for any pointers.

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Thu Apr 18, 2019 10:26 am
by Mr_Noodle
To get Finereader to OCR, you will need an AppleScript or Automator action to do that. You can try searching the forums here or on their site.

Is it imperative that you process the files in a certain order? If not, then you shouldn't worry about that part at all as it complicates things.

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Thu Apr 18, 2019 10:52 am
by luoto
Mr_Noodle wrote:To get Finereader to OCR, you will need an AppleScript or Automator action to do that. You can try searching the forums here or on their site.


Thank you for approving the topic and the prompt response. In the threads, I quoted I saw some script examples, but they (to me) seemed to contradict or confuse. There is no order of processing desired. Essentially I want to be able to dump files in one directory and they get OCRed and output elsewhere, with the originals being deleted (to avoid reprocessing).

I could not see a way with Hazel for it to look at one file in the "import directory" and then just send that to Finereader, since my understanding from the script was you had to make some selection of the file (name) to send it away. When I tried with Automator it wanted to do all the files at once, and it sought to merge them in the OCR application despite not selecting "merge".

The goal would be to move over say 50 files and come back sometime later to "refeed" the monster.

Is it correct that Hazel would work on a file-by-file basis, so even if it suddenly discovers five files added in the interval since its last check (sorry, I don't know how it "alerts itself" to new files) it would still run on a file-by-file basis its rule, so it would not try and make five different OCR attempts (or pass five files to the OCR application)? If this assumption is correct, I am guessing it is to make a "watch folder", on detection of a file (filetype PDF) then run the script that hopefully exports the OCR elsewhere. Hazel can then move/delete the "source" file and it automatically loops to the next "alert" and so on.

All part of the learning experience, which no doubt feels more daunting than its reality!

EDIT: Well that went a LOT smoother than I thought, or at least the files processing so far are. Thank you, once more.

On with the refinements and the steep (for me) learning curve.

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Fri Apr 19, 2019 6:50 am
by Robert
luoto wrote:EDIT: Well that went a LOT smoother than I thought, or at least the files processing so far are. Thank you, once more.
On with the refinements and the steep (for me) learning curve.


So did it work out for you?
I would recommend doing the moving of the files with tags: so after the script of abbyFineReader processed the file add a tag "OCRed" and then have a separate rule in Hazel that says: if a file has this tag, move the file.

Just to add the script I am using with abbyFineReader – that works perfectly well:

Code: Select all
on hazelProcessFile(theFile)
   
   using terms from application "FineReader"
      set langList to {German, English}
      set saveType to single file
      set exportmodepdflayout to "text over the page image"
      set keepPageNumberHeadersAndFootersBoolean to yes
      set keepImageBoolean to yes
      set imageOptionsImageQualityEnum to balanced quality
      set usemrcboolean to no
      set makepdfaboolean to yes
      set pageSizePageSizeEnum to automatic
      set increasePaperSizeToFitContentBoolean to yes
   end using terms from
   
   tell application "FineReader"
      export to pdf theFile from file theFile
   end tell
   
   WaitWhileBusy()
   
   tell application "FineReader"
      quit
   end tell
   
end hazelProcessFile

on WaitWhileBusy()
   repeat while IsMainApplicationBusy()
   end repeat
end WaitWhileBusy

on IsMainApplicationBusy()
   tell application "FineReader"
      set resultBoolean to is busy
   end tell
   return resultBoolean
end IsMainApplicationBusy

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Fri Apr 19, 2019 8:27 am
by luoto
Robert wrote:So did it work out for you?

Thank you for the follow-up.
I got it working for me by running an Automator script I found online (i.e. run the match, run the Automator script, repeat).

I have looked at your Applescript, making a directory monitoring condition and within Hazel it did not like the script, complaining that "Expected "end" but found "on" ?", but compiling it within Apple's Script Editor and pointing to that it *seems* to be happy, but I did not get it to write the tags. Is this a problem with Hazel or Applescript? The same text was copied and compiled without an issue to the Script Editor in any case.

EDIT2: I don't know if Hazel somehow queued a lot of "to-do" tasks with the "old" way, despite me deselecting it and adding the new one. Had to stop and start it properly (rather than pause). Tests to go on as I've had a few oddities now, but no more updates tonight.!

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Fri Apr 19, 2019 10:48 am
by Mr_Noodle
To answer your previous question, yes, Hazel processes files one-by-one.

His script is meant to be an external script. I forget if Script Editor is installed on non-developer systems but if it is, copy that script into a document there, compile and save it as a .scpt file which you can then reference from Hazel via the "Run AppleScript" action.

If you don't have Script Editor, you can paste it into a regular text document in TextEdit and save it as a .applescript file. Hazel would then have to compile the script on the fly every time.

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Thu May 23, 2019 10:12 pm
by sawbones
Hi-

I have been trying to solve this for a while, and unfortunately have not been able to do so.

I notice references within the thread to an automator script found online, but I don't know what or where this is.

Would you be so kind, if you made this work to your liking, to lay out the steps you have taken? What script are you using, and how to you set Hazel to use the script on one file at a time?

Thanks!

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Wed Jun 05, 2019 1:15 am
by luoto
Sorry for the delay. I've been in hospital for nearly six weeks (organ transplant) and am slowly catching up. If the request was aimed at me, this is my solution so far. The only issue is Finereader (latest) still likes to crash randomly and I've found no way to restart and (auto) ignore the recovery suggestion. Sometimes it also tries to grab several files at once and make one big one.

Image

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Wed Jun 05, 2019 11:41 am
by sawbones
Hi, luoto-

I certainly hope everything is going great with your health, and greatly appreciate the fact you took time out to respond. Thanks!

If I might ask, could you lay out how you do this in Hazel? I have a similar Automator action, but the issue for me is how to get the workflow going through Hazel.

Cheers!

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Thu Jun 06, 2019 3:30 am
by luoto
Thanks for the kind thoughts. I am far from being an expert, but this workflow worked for me.

Image

The bigger issue is the bugfest that is FineReader :)

Re: Hazel "beginner", OCR, Finereader and the like

PostPosted: Tue Aug 13, 2019 8:26 pm
by sawbones
Unfortunately, I cannot see the workflow you reference in your Hazel process. Is it possible to outline that Automator workflow?

Cheers!