3.1.1 Content Matching... superb! One problem example...

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

First off, thanks for the AWESOME feature addition to support content matching. My ability to detect a date in a document and then use that date within the renamed file name is nothing short of incredible. Super feature... Thanks for the continued product enhancements! :D

I am having a bit of difficulty with one of my scanned documents (works for many others). I cannot manage to get Hazel to detect the date after the STATEMENT DATE box (see example below). I've setup the rule to Contents > contain match > STATEMENT DATE <date match>

Hazel is not finding the date. Does it have something to do with the shaded and/or bordered box? thanks for any tips -- jay

Image
jayelevy
 
Posts: 11
Joined: Sun Dec 02, 2012 6:25 pm

Hazel content matching will only work on OCR'd or text layer PDFs. It appears from the screen shot that this is an image pdf.

When you zoom, does the text pixelate?
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado

It can be OCRed but still have that look (the text is stored separate from the image, if I understand it correctly). The problem is that sometimes the text in the document is not actually in the order you think it is or may not be on the same line as it may look visually.

If you are commandline savvy, you can try running the PDF through 'mdimport -d2' and looking at the output of the kMDItemTextContent field to see what the text looks like.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

It is OCR'd text. Searching for text in Preview works just fine on this PDF (as it does for the others that do work well with the Hazel match capability).

I'm not command line savvy, but will play with mdimport and see what I can figure out! :shock:

thanks for the tip
jayelevy
 
Posts: 11
Joined: Sun Dec 02, 2012 6:25 pm

Ok, I mdimported it! See screen capture of dump below with relevant portions unmasked. Interestingly, when I inadvertently saved the mdimport data as a pdf to the folder Hazel was watching, the file was immediately recognized and renamed. Hazel can clean up using the mdimport info, but not the actual file itself. Hmmm???

Here's the rule that is not working and the mdimport data

Image

Image

Image
jayelevy
 
Posts: 11
Joined: Sun Dec 02, 2012 6:25 pm

I meant to include the date token used in the <Date match> field in the second screen shot. It is: 12/31/1999
jayelevy
 
Posts: 11
Joined: Sun Dec 02, 2012 6:25 pm

Thanks for that. Unfortunately, what I may get from reading the PDF may differ from what mdimport grabs. The text may be in a different order and such which is what I am guessing in this case. I am looking to add the ability to specify which match (1st, 2nd, 3rd, etc.) in a future release to help with situations like this.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

ok. Thanks for taking the time to review this and more importantly, thanks for a splendid product! Keep up the GREAT work! again thanks -- jay
jayelevy
 
Posts: 11
Joined: Sun Dec 02, 2012 6:25 pm

Hi all, just rediscovered Hazel thanks to the "content matching" feature and it's great!
I'm also having some problems with some pdf files with a lot of graphics, I'm having exactly the same problems as jayelevy: I'd like to extract two dates that have this format (from kMDItemTextContent):

Code: Select all
DATA PARTENZA 14 FEB 2014
some other text
DATA DI ACQUISTO 13 FEB 2014


Content match doesn't see at all the dates....I'd like to try with this Shell script (http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=1530&#p6183) but I don't know how to make the shell recognize this date format and to rename the file with these 2 variables
Help appreciated :oops:
Nestorito
 
Posts: 6
Joined: Thu Sep 26, 2013 10:31 am

Can you post the date pattern in your rule?
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support