I was setting up a rule for some 401k statements in searchable PDFs. I wanted to grab the ending statement date to use in the file name.
The contents of the statement include "Retirement Savings Account" followed by the statement period in a format such as "May 1, 2013 - June 30, 2013".
I have the rule with a condition of "Contents contain match". The match is "(open date token)(anything)-(anything)(close date token)" The dates are in the format of "June 20, 2013". The (anything) tokens are to take care of any OCR variability as to whether the OCR identified spaces before and after the dash or whether there are no spaces.
I also have a couple other conditions based on the content containing key phrases not using match that just verify that it is the 401k statement in general.
I got it to successfully match one file, but it stumbled with another scan of the same institution's statement. The problem came down to the fact that the OCR text in one date did not have a space after the common, i.e. June 20,2013".
If I take the space out of the date token, then it matches that file, but it won't match other files where the date was OCR'd with the space.
Any ideas whether there's a date match rule that can cover such variances or a better way to do what I'm doing? There is no (anything) token in the date match itself, so I can't insert that instead of a space after the comma.