How to get Hazel to recognize OCR'd files

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

How to get Hazel to recognize OCR'd files Sat May 22, 2010 2:40 pm • by detroy
I batch process pdf's with Acrobat, Acrobat dumps them back into the same folder with the same file name. I hate that as sometimes the process gets and error so some get processed and some don't. That can be solved if Hazel can pick out the OCR'd files and either move, color or change the file name. I've tried many times using "encoding software contain Acrobat" with "kind is pdf" and "where from contain Acrobat" amoung other things, and just can't get it. Is it possible?

I'm using a droplet I found at DocumentSnap to batch OCR the files. I don't know anything about scripting so fixing the script to change the folder or file name isn't possible for me.
detroy
 
Posts: 1
Joined: Sat May 22, 2010 2:27 pm

Re: How to get Hazel to recognize OCR'd files Mon May 24, 2010 11:39 am • by Mr_Noodle
I think the problem here is that there is no metadata that I know of that distinguishes between an OCR'ed file and not-OCR'ed. It's up the program (in this case Acrobat) do add that information.

Can you have Acrobat add keywords when doing the OCR? Otherwise, can you have Hazel trigger the OCR instead of the droplet, like via Automator? If you do that, then since Hazel has a handle on the file through the whole workflow, it should work.

Otherwise, I don't know much about DocumentSnap. If someone else is more familiar with it, they should probably chime in here.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: How to get Hazel to recognize OCR'd files Tue Aug 14, 2012 4:43 am • by K.Meier
Hi,

I know this thread is old, but I ran into the same problem. I wanted to get hazel to look for the file-information "Encoding Software" containing "Acrobat" (which is the program I use for OCRing). I tried the search-string: "contains" "Acrobat", which ended in no rule-matching. Then I changed the "Acrobat" to "Adobe Acrobat 8.31 Paper Capture Plug-in" (the exact name in file-information) and I got a rule-matching.

So I think the "contain" in the "Encoding Software" does need an exact match instead of a single containing word.

Hope this helps other people with the same problem for future reference.

K.

edit: Updated infos to be more accurate.
OSX 10.8.0 Mountain Lion, Hazel 3.0.11 (build 841)
K.Meier
 
Posts: 4
Joined: Fri Jan 21, 2011 5:12 am

Re: How to get Hazel to recognize OCR'd files Sun May 19, 2013 5:45 pm • by cornel
K.Meier wrote:Hi,


So I think the "contain" in the "Encoding Software" does need an exact match instead of a single containing word.

Hope this helps other people with the same problem for future reference.




I had the same issue. I forgot about it but a recent Adobe Acrobat upgrade changed the version number and my rules failed. It would be a lot easier if the "contain" condition matched on a part of the string.

Currently I need to check the Encoding Software for "Acrobat 11.0.3 Paper Capture Plug-in" to match. I would prefer a match on "Acrobat" alone.

Is this a bug or intended?

Thanks.
cornel
 
Posts: 3
Joined: Sun May 19, 2013 5:40 pm

Re: How to get Hazel to recognize OCR'd files Mon May 20, 2013 10:54 am • by a_freyer
Likely this is a bug. Contain is only supposed to match [full word]** substrings.

EDIT - ** edited to reflect Mr_Noodle's comment below.
Last edited by a_freyer on Mon May 20, 2013 3:14 pm, edited 1 time in total.
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado

"Contents contain" matches full words. It's based on Spotlight so you need to test it there first. If you search for that word in Spotlight. If the file doesn't appear in the results there then Hazel won't match it either.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: How to get Hazel to recognize OCR'd files Tue May 21, 2013 4:56 pm • by cornel
Spotlight is finding the files. Even if I search on a part of the string (like 'Acrobat' or 'Capture Plug-in'). If I look at the output of "mdimport -d2 <filename> 2>&1" this is what is shown as the encoding application:

};
kMDItemEncodingApplications = (
"Acrobat 11.0.3 Paper Capture Plug-in"
);

I am trying to filter using the Encoding Software -> Contain -> <part of the string, but always full word>

This only returns results when I enter the full contents of the string below. It leaves me puzzled.
cornel
 
Posts: 3
Joined: Sun May 19, 2013 5:40 pm

Re: How to get Hazel to recognize OCR'd files Wed May 22, 2013 12:38 pm • by Mr_Noodle
Ah, you are matching on Encoding software. That appears to be a list and when you do "contain", you have to specify one of the items in that list in full (like "Acrobat 11.0.3 Paper Capture Plug-in").

You can try using "match" instead which will do partial matches.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: How to get Hazel to recognize OCR'd files Thu May 23, 2013 4:47 pm • by cornel
Why didn't I think of that ;-) Works as advertised! Thanks.
cornel
 
Posts: 3
Joined: Sun May 19, 2013 5:40 pm


Return to Support