3.1.1 Content Matching... superb! One problem example...

First off, thanks for the AWESOME feature addition to support content matching. My ability to detect a date in a document and then use that date within the renamed file name is nothing short of incredible. Super feature... Thanks for the continued product enhancements!

I am having a bit of difficulty with one of my scanned documents (works for many others). I cannot manage to get Hazel to detect the date after the STATEMENT DATE box (see example below). I've setup the rule to Contents > contain match > STATEMENT DATE <date match>

Hazel is not finding the date. Does it have something to do with the shaded and/or bordered box? thanks for any tips -- jay

Hazel content matching will only work on OCR'd or text layer PDFs. It appears from the screen shot that this is an image pdf.

When you zoom, does the text pixelate?

It can be OCRed but still have that look (the text is stored separate from the image, if I understand it correctly). The problem is that sometimes the text in the document is not actually in the order you think it is or may not be on the same line as it may look visually.

If you are commandline savvy, you can try running the PDF through 'mdimport -d2' and looking at the output of the kMDItemTextContent field to see what the text looks like.

It is OCR'd text. Searching for text in Preview works just fine on this PDF (as it does for the others that do work well with the Hazel match capability).

I'm not command line savvy, but will play with mdimport and see what I can figure out! :shock:

thanks for the tip

Ok, I mdimported it! See screen capture of dump below with relevant portions unmasked. Interestingly, when I inadvertently saved the mdimport data as a pdf to the folder Hazel was watching, the file was immediately recognized and renamed. Hazel can clean up using the mdimport info, but not the actual file itself. Hmmm???

Here's the rule that is not working and the mdimport data

I meant to include the date token used in the <Date match> field in the second screen shot. It is: 12/31/1999

Thanks for that. Unfortunately, what I may get from reading the PDF may differ from what mdimport grabs. The text may be in a different order and such which is what I am guessing in this case. I am looking to add the ability to specify which match (1st, 2nd, 3rd, etc.) in a future release to help with situations like this.

ok. Thanks for taking the time to review this and more importantly, thanks for a splendid product! Keep up the GREAT work! again thanks -- jay

Hi all, just rediscovered Hazel thanks to the "content matching" feature and it's great!
I'm also having some problems with some pdf files with a lot of graphics, I'm having exactly the same problems as jayelevy: I'd like to extract two dates that have this format (from kMDItemTextContent):

Code: Select all: DATA PARTENZA 14 FEB 2014 some other text DATA DI ACQUISTO 13 FEB 2014

Content match doesn't see at all the dates....I'd like to try with this Shell script (http://www.noodlesoft.com/forums/viewtopic.php?f=4&t=1530&#p6183) but I don't know how to make the shell recognize this date format and to rename the file with these 2 variables
Help appreciated :oops:

Can you post the date pattern in your rule?