Page 1 of 1

Extracting the 1st date in a scanned document

PostPosted: Sun May 15, 2016 9:33 am
by BlueDoc
Hi
I've just bought scansnap in an effort to become paperless at home and am using Hazel to automate the filing process once the document has been OCR'd
I want to extract the first occurrence of a date in a document, whatever the format of the date and then use that date to form part of the name of the document. I can create a number of rules (16 in total to cover all the custom date permutations) but if one of the earlier rules in the list finds a date in the document that is not the first date that is the date that is passed forward to rename the document.
I apologise if this is a simple task and I am missing something obvious or has already been answered elsewhere.
Thanks in advance

John

Re: Extracting the 1st date in a scanned document

PostPosted: Mon May 16, 2016 10:37 am
by Mr_Noodle
Instead of 16 rules, do it in one, with 16 conditions. When you create a custom attribute, you can re-use it in the subsequent rules, changing its format each time. If used within an "any" condition, the first one to match is the one that gets used. I suggest giving that a shot and see if that clears things up.

Re: Extracting the 1st date in a scanned document

PostPosted: Mon May 16, 2016 4:47 pm
by BlueDoc
Thanks for your quick reply though I'm not sure I follow the suggestion. I think what you've suggested maybe what I've already done, but I called a condition a rule - Sorry
To give an example of what I'm trying just using two conditions: I have a a document which contains 2 dates, the "document" date 6 May 2016 at the start of the document and another date, 31 Mar 2014 half way through. It is the 6 May 2016 date I want to use. I have a rule with two "any" conditions:
1) Contents=>Contains Match=>Custom Date (31 December 1999)
2) Contents=>Contains Match=>Custom Date (31 December 1999)
The first condition picks up 31 March 2014 and the second condition 6 May 2016 and it is the first condition that is passed on to the renaming
I know I could just swap the conditions in this example but I want to write a rule that will pick out the first date in the document, whatever the format and whatever the document.
Sorry for the ramble and thank you for your help
John

Re: Extracting the 1st date in a scanned document

PostPosted: Tue May 17, 2016 11:51 am
by Mr_Noodle
Ah, I see. Unfortunately, can't really be done as things are now unless you use a script to do it for you.

Re: Extracting the 1st date in a scanned document

PostPosted: Tue May 17, 2016 4:05 pm
by jmvenable
Try this:
2) Contents=>Contains Match=>Custom Date (ANYTHING31 December 1999) The ANYTHING token placed before the day should absorb the extra space(s) in front of your single digit days. Good luck, JV