I am scanning receipts, OCR them and want to categorize them automatically
For one type of receipt (from one Vendor) the following keeps on repeating itself. The actual date format is dd.mm.yyyy
Sometimes the OCR gives me an dd,mm.yyy or dd.mm,yyyy or dd,mm,yyyy
All that is close enough and I'd like to recognize it as the receipts date.
Right now I have solved it with a nested condition, which reuses the same date token with the slightly different format as explained above
That is tedious though. Especially, if that pops up for another type of receipt. Is there a more flexible approach to handling this?
Detecting the format automatically doesn't seem to work, because of the commas
Putting the anything placeholder between the dd, mm and yyyy gives a bit too much flexibility and all kinds of stuff might be recognized
Thanks!