Date Matching - 31 vs 31 in OCR contents

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Date Matching - 31 vs 31 in OCR contents Thu Oct 17, 2013 12:05 pm • by philrob
One of the nice bits of Hazel is being able to get the date from an OCR'd PDF and use it to rename the filename etc.

My matching is unreliable :cry: Sometimes it works, sometimes it doesn't.

On my system the day element dropdown of the match gives me a choice of 31 or underlined 31.

Which should I be using to match the following:

04 Oct 13

On the next release would it be worth changing the labelling to 1 and ?1 to indicate the difference (similarly with the months 12 and 12?

If there is normally a space between the day and month, do I need to put a space in the token? Having done OCR a spotlight search finds 04 Oct 13 and lists the file, but I can't get hazel to find it.

I couldn't find anything in the FAQ/forums for this, is there a manual or was I looking in the wrong place?
philrob
 
Posts: 8
Joined: Sun Jun 24, 2012 3:39 am

Re: Date Matching - 31 vs 31 in OCR contents Thu Oct 17, 2013 12:46 pm • by Mr_Noodle
Not sure if ?1 makes much sense to people. If you want to have a leading zero (04), then you want to use 31. And yes, if there is a space between elements, you should add one to your pattern, though if the pattern is for matching the contents of a file, you only need to add one space and it will match any number of spaces and even newlines.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Date Matching - 31 vs 31 in OCR contents Sat Oct 19, 2013 9:31 am • by GeekNeck
philrob wrote:On the next release would it be worth changing the labelling to 1 and ?1 to indicate the difference (similarly with the months 12 and 12?

I agree about a UI change. I've been confused for the last 15 minutes on which one is padded and which one is not. Even just a note in the Help would be a good reminder!
GeekNeck
 
Posts: 27
Joined: Sat Aug 11, 2012 7:26 am

The other issue with ?1 is that in the pattern, it is not apparent what it is representing, whereas 31 gives an indication that it's day of the month. The help does mention this in passing though it doesn't specifically point out the underline thing (which also changed in a recent release).
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Date Matching - 31 vs 31 in OCR contents Sat Oct 26, 2013 11:31 am • by philrob
Thanks for the responses. I think I understand now.

Following up on the intuitive labelling (particularly for newbies) would using 1 and 01 (or 12)on the months and and 1 and 01 (or 31) on the days be more intuitive than 12 and 12.
philrob
 
Posts: 8
Joined: Sun Jun 24, 2012 3:39 am

Re: Date Matching - 31 vs 31 in OCR contents Mon Oct 28, 2013 11:18 am • by Mr_Noodle
Again, though, having both month and days be 01 or 1, you wouldn't be able to tell them apart in the pattern.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Date Matching - 31 vs 31 in OCR contents Thu Mar 02, 2017 10:50 am • by awenro
I was actually rather surprised that Hazle doesn't already do this. If you trying to go paperless this is one of the most common use cases in my opinion.

The example is simple. Invoices from various suppliers. They don't use the same format and might even add spaces or not. European examples include 31. Dez. 1999, 31.12.1999, 31.12.99, 31-12-99, 31. Dezember 1999. Some add padding to the numbers 01. Dez. some don't 1. Dez.

And even if, in the invoice case, mostly the 1st date matches. For everything that goes for a timeframe e.g. electricity bill you might want to use the latest date in the document for the file name instead of the first match.

The Anti-OCR-Misinterpreter feature is something I see as a sprinkle. The date match is a crucial AAA feature for me. As I see it invoice and document handling is one of the most common usage scenarios. So I would strongly consider adding this.

I know that you can add multiple date attributes by hand, but it is still a PITA to maintain this. So something like "date in content" should become a standard match.

Also the 31. and 31. to show padded vs non padded is bad UI design in my opinion (had to google it). I'd rather use a different number for the example e.g. 1. So the example would be 01 and 1, which is way more self explanatory.
awenro
 
Posts: 1
Joined: Thu Mar 02, 2017 10:38 am

Re: Date Matching - 31 vs 31 in OCR contents Thu Mar 02, 2017 12:10 pm • by Mr_Noodle
Thanks for the feedback. The reason "1"/"01" isn't used is because when it is in the pattern, it becomes indistinguishable from any other element. Is it the day? The month? The year? The whole point of the design was that it is readily apparent what the date format is without having to do extra digging. The number padding is a secondary feature and therefore is designated in a secondary way.

"Smart" matching on dates in general is a bit tricky and is very likely to be wrong. Aside from ambiguities with various date formats, you will probably end up with a lot of non-dates being interpreted as such. Documents contain a lot of numbers, many of which can be misinterpreted. For now, I'd prefer that the outcome is more predictable and controlled by the user.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Thinking about it a bit more, the OS does have built-in support for "data detectors". An example of them in use is in Mail where it identifies certain items, like links, phone numbers, and even dates. I'll have to play with it but if it provides a reasonable baseline for detecting dates, I can add that as an option (and possibly even the default) so that you don't have to specify a pattern and hope that it figures it out.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:I'll have to play with it but if it provides a reasonable baseline for detecting dates, I can add that as an option (and possibly even the default) so that you don't have to specify a pattern and hope that it figures it out.


I think this would be super helpful to have as an option.
CordellRa
 
Posts: 1
Joined: Thu Jan 04, 2018 5:52 am

Re: Date Matching - 31 vs 31 in OCR contents Mon Jan 08, 2018 11:43 am • by Mr_Noodle
It's there now. Are you not seeing it?
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support