Slightly more flexible date formats

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Slightly more flexible date formats Sat May 05, 2018 1:57 pm • by Sandro
I am scanning receipts, OCR them and want to categorize them automatically

For one type of receipt (from one Vendor) the following keeps on repeating itself. The actual date format is dd.mm.yyyy
Sometimes the OCR gives me an dd,mm.yyy or dd.mm,yyyy or dd,mm,yyyy

All that is close enough and I'd like to recognize it as the receipts date.

Right now I have solved it with a nested condition, which reuses the same date token with the slightly different format as explained above

That is tedious though. Especially, if that pops up for another type of receipt. Is there a more flexible approach to handling this?

Detecting the format automatically doesn't seem to work, because of the commas
Putting the anything placeholder between the dd, mm and yyyy gives a bit too much flexibility and all kinds of stuff might be recognized

Thanks!
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am

Re: Slightly more flexible date formats Mon May 07, 2018 11:08 am • by Mr_Noodle
How about using the symbol token to match any symbols/punctuation between the numbers?

Also, if you can also specify surrounding text, that would minimize the chances of the "anything" token matching too much.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Slightly more flexible date formats Wed May 23, 2018 2:27 pm • by Sandro
"symbol" token sounds good, but for the date format I only see the "anything" token. The "symbol" token is only available outside of the custom date token, right? I want to use the matched date when renaming the file

Surrounding text as an anchor: the text isn't constant. I know I might match it with some more flexible patterns as well, but I am somehow afraid it'll convolute the rule and make it less predictable
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am

Re: Slightly more flexible date formats Thu May 24, 2018 11:09 am • by Mr_Noodle
Ah right. I'll have to look into adding that.

Can you show me examples where using "anything" doesn't work?
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Slightly more flexible date formats Tue Jun 05, 2018 4:39 am • by Sandro
Sure

My date format is
Code: Select all
DD[anything]MM[anything]YYYY


Here is what is matched
11373 KOI C008582 04,12.2017


It is matched as
11.08.2017, 00:00


What I would have wanted is the last part
04,12.2017


which, with
Code: Select all
DD[symbol]MM[symbol]YYYY


should have been matched to
04.12.2017
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am

Re: Slightly more flexible date formats Tue Jun 05, 2018 10:54 am • by Mr_Noodle
Thanks for that. I'll look into a solution. Either adding the symbol token or making "anything" smarter there where it will stop at any numbers.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Slightly more flexible date formats Wed Jun 06, 2018 3:20 pm • by Sandro
Thanks!

I have a another similar issue, but I think it merits a thread on its own. I'll post it there
Sandro
 
Posts: 20
Joined: Mon Jul 31, 2017 7:08 am


Return to Support