How to get Hazel to recognize OCR'd files

I batch process pdf's with Acrobat, Acrobat dumps them back into the same folder with the same file name. I hate that as sometimes the process gets and error so some get processed and some don't. That can be solved if Hazel can pick out the OCR'd files and either move, color or change the file name. I've tried many times using "encoding software contain Acrobat" with "kind is pdf" and "where from contain Acrobat" amoung other things, and just can't get it. Is it possible?

I'm using a droplet I found at DocumentSnap to batch OCR the files. I don't know anything about scripting so fixing the script to change the folder or file name isn't possible for me.

I think the problem here is that there is no metadata that I know of that distinguishes between an OCR'ed file and not-OCR'ed. It's up the program (in this case Acrobat) do add that information.

Can you have Acrobat add keywords when doing the OCR? Otherwise, can you have Hazel trigger the OCR instead of the droplet, like via Automator? If you do that, then since Hazel has a handle on the file through the whole workflow, it should work.

Otherwise, I don't know much about DocumentSnap. If someone else is more familiar with it, they should probably chime in here.

Hi,

I know this thread is old, but I ran into the same problem. I wanted to get hazel to look for the file-information "Encoding Software" containing "Acrobat" (which is the program I use for OCRing). I tried the search-string: "contains" "Acrobat", which ended in no rule-matching. Then I changed the "Acrobat" to "Adobe Acrobat 8.31 Paper Capture Plug-in" (the exact name in file-information) and I got a rule-matching.

So I think the "contain" in the "Encoding Software" does need an exact match instead of a single containing word.

Hope this helps other people with the same problem for future reference.

K.

edit: Updated infos to be more accurate.
OSX 10.8.0 Mountain Lion, Hazel 3.0.11 (build 841)

K.Meier wrote:Hi,

So I think the "contain" in the "Encoding Software" does need an exact match instead of a single containing word.

Hope this helps other people with the same problem for future reference.

I had the same issue. I forgot about it but a recent Adobe Acrobat upgrade changed the version number and my rules failed. It would be a lot easier if the "contain" condition matched on a part of the string.

Currently I need to check the Encoding Software for "Acrobat 11.0.3 Paper Capture Plug-in" to match. I would prefer a match on "Acrobat" alone.

Is this a bug or intended?

Thanks.

Last edited by a_freyer on Mon May 20, 2013 3:14 pm, edited 1 time in total.

Likely this is a bug. Contain is only supposed to match [full word]** substrings.

EDIT - ** edited to reflect Mr_Noodle's comment below.

"Contents contain" matches full words. It's based on Spotlight so you need to test it there first. If you search for that word in Spotlight. If the file doesn't appear in the results there then Hazel won't match it either.

Spotlight is finding the files. Even if I search on a part of the string (like 'Acrobat' or 'Capture Plug-in'). If I look at the output of "mdimport -d2 <filename> 2>&1" this is what is shown as the encoding application:

};
kMDItemEncodingApplications = (
"Acrobat 11.0.3 Paper Capture Plug-in"
);

I am trying to filter using the Encoding Software -> Contain -> <part of the string, but always full word>

This only returns results when I enter the full contents of the string below. It leaves me puzzled.

Ah, you are matching on Encoding software. That appears to be a list and when you do "contain", you have to specify one of the items in that list in full (like "Acrobat 11.0.3 Paper Capture Plug-in").

You can try using "match" instead which will do partial matches.

Why didn't I think of that ;-)

Works as advertised! Thanks.