Extracting the 1st date in a scanned document

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Extracting the 1st date in a scanned document Sun May 15, 2016 9:33 am • by BlueDoc
Hi
I've just bought scansnap in an effort to become paperless at home and am using Hazel to automate the filing process once the document has been OCR'd
I want to extract the first occurrence of a date in a document, whatever the format of the date and then use that date to form part of the name of the document. I can create a number of rules (16 in total to cover all the custom date permutations) but if one of the earlier rules in the list finds a date in the document that is not the first date that is the date that is passed forward to rename the document.
I apologise if this is a simple task and I am missing something obvious or has already been answered elsewhere.
Thanks in advance

John
BlueDoc
 
Posts: 4
Joined: Sun May 15, 2016 9:23 am

Instead of 16 rules, do it in one, with 16 conditions. When you create a custom attribute, you can re-use it in the subsequent rules, changing its format each time. If used within an "any" condition, the first one to match is the one that gets used. I suggest giving that a shot and see if that clears things up.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Thanks for your quick reply though I'm not sure I follow the suggestion. I think what you've suggested maybe what I've already done, but I called a condition a rule - Sorry
To give an example of what I'm trying just using two conditions: I have a a document which contains 2 dates, the "document" date 6 May 2016 at the start of the document and another date, 31 Mar 2014 half way through. It is the 6 May 2016 date I want to use. I have a rule with two "any" conditions:
1) Contents=>Contains Match=>Custom Date (31 December 1999)
2) Contents=>Contains Match=>Custom Date (31 December 1999)
The first condition picks up 31 March 2014 and the second condition 6 May 2016 and it is the first condition that is passed on to the renaming
I know I could just swap the conditions in this example but I want to write a rule that will pick out the first date in the document, whatever the format and whatever the document.
Sorry for the ramble and thank you for your help
John
BlueDoc
 
Posts: 4
Joined: Sun May 15, 2016 9:23 am

Ah, I see. Unfortunately, can't really be done as things are now unless you use a script to do it for you.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Try this:
2) Contents=>Contains Match=>Custom Date (ANYTHING31 December 1999) The ANYTHING token placed before the day should absorb the extra space(s) in front of your single digit days. Good luck, JV
jmvenable
 
Posts: 22
Joined: Thu Apr 28, 2016 2:04 pm


Return to Support