Slightly more flexible date formats

I am scanning receipts, OCR them and want to categorize them automatically

For one type of receipt (from one Vendor) the following keeps on repeating itself. The actual date format is dd.mm.yyyy
Sometimes the OCR gives me an dd,mm.yyy or dd.mm,yyyy or dd,mm,yyyy

All that is close enough and I'd like to recognize it as the receipts date.

Right now I have solved it with a nested condition, which reuses the same date token with the slightly different format as explained above

That is tedious though. Especially, if that pops up for another type of receipt. Is there a more flexible approach to handling this?

Detecting the format automatically doesn't seem to work, because of the commas
Putting the anything placeholder between the dd, mm and yyyy gives a bit too much flexibility and all kinds of stuff might be recognized

Thanks!

How about using the symbol token to match any symbols/punctuation between the numbers?

Also, if you can also specify surrounding text, that would minimize the chances of the "anything" token matching too much.

"symbol" token sounds good, but for the date format I only see the "anything" token. The "symbol" token is only available outside of the custom date token, right? I want to use the matched date when renaming the file

Surrounding text as an anchor: the text isn't constant. I know I might match it with some more flexible patterns as well, but I am somehow afraid it'll convolute the rule and make it less predictable

Ah right. I'll have to look into adding that.

Can you show me examples where using "anything" doesn't work?

Sure

My date format is

Code: Select all: DD[anything]MM[anything]YYYY

Here is what is matched

11373 KOI C008582 04,12.2017

It is matched as

11.08.2017, 00:00

What I would have wanted is the last part

04,12.2017

which, with

Code: Select all: DD[symbol]MM[symbol]YYYY

should have been matched to

04.12.2017

Thanks for that. I'll look into a solution. Either adding the symbol token or making "anything" smarter there where it will stop at any numbers.

Thanks!

I have a another similar issue, but I think it merits a thread on its own. I'll post it there