Date Match does not always get the correct date

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

I'm hoping I can get some help with this.

I have some rules set up to do date matching on my various bills. These bills have not changed format in quite a while. Each of these rules are set to do a date match and use that date in renaming the file.

However, it seems that Hazel is sometimes 'matching' the correct date and sometimes it is not. So, if I have a bill from "Visa" that has a closing date and a due date and I am trying to match the closing date, sometimes it seems to pick the closing date and sometimes the due date. I have chosen "1st occurrence" or "2nd occurrence" (which ever is appropriate, but it does not always seem to get the correct one.

Is there as way to determine which of the dates that Hazel will match?

Any ideal why it is not consistent?

Any help would be appreciated.
jormsby
 
Posts: 27
Joined: Mon Oct 13, 2014 7:04 pm

It may be an issue with how the file is generated. Note that in PDF, the order you think things are in visually may not correspond to how they appear in the file. Are these OCRed or downloaded in digital form? If the former, that also adds a random element. If possible, it's better to also include surrounding text in your pattern.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hi,

Thanks for the response. These documents are scanned with a ScanSnap scanner and OCR'd in the process using the included FineReader software v4.1.

I could understand that what I 'thought' was the first occurrence of a date may actually be seen by the computer as the second occurrence, however, I would expect it to be consistent since the bill's format has not changed and I am using the same software.

On occasion, I have tried adding surrounding words but to mixed results. Often the spacing seems to be interpreted differently (how many spaces) or even that the additional wording is on a different row.

Is there a way to look at a document to 'see' how Hazel will 'see' it with respect to date matching?

Regards,

John
jormsby
 
Posts: 27
Joined: Mon Oct 13, 2014 7:04 pm

Regarding spaces, a space in your pattern translates to any number of spaces or newlines so that's something to consider.

As for seeing the document content, see this post: viewtopic.php?f=4&t=3206&p=11554&hilit=hazelimporter#p11554
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

OK, I used the HazelImporter utility on some of my scans. It appears that Hazel "IS" picking the first occurrence of a date, however that first date may not have been the actual first date in the printed document. It seems like sometimes the scan is jumping around. Here is an example. I scanned two different months statements from the same store. In one, while looking at the results of the HazelImporter, I see the following:

"Minimum Payment Warning: If you make only the minimum payment each period, you will pay more in interest and it will take you longer to pay off your balance."

While the other month's statement results in:

"Minimum Payment Warning: If you make only the minimum 03/20/2014 payment each period, you will pay more in interest and it will
31
take you longer to pay off your balance.
"

Notice the date now stuck in the middle of the sentence and also the number "31" later in the sentence.

This particular companies statement has two rectangular boxes next to each other with one containing "Summary of Account Activity" and the other containing "Payment Information". The minimum payment statement above comes from the "Payment Information" box and the date and the number 31 both come from the Summary box to the left.

Do you have any idea if this inconsistent output is due the scanner, scanning software, OCR software or Hazel?

Regards,
jormsby
 
Posts: 27
Joined: Mon Oct 13, 2014 7:04 pm

It probably isn't Hazel since it just reads what's given to it. It probably is the OCR software. It's a bit arbitrary how you order things, especially when there are columns and callouts. You can try slightly tilting the document when scanning to see how that affects things. Also, if the company has a digital download of the statement, use that since it is likely digitally (and hopefully more consistently) generated.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support