Contains Match Occurrence Sequence

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Contains Match Occurrence Sequence Sat Mar 15, 2014 9:27 am • by Drumer
I am using a custom match to pull my credit card balance from an OCR'd pdf into the file name using the format '£(number)' and matching on the 4th occurrence (prev balance, payments rec, new activity amounts come first on the page - top-to-bottom).

However, Hazel appears to be updating the file name with the 2nd value on the page, not the 4th? I have manually performed a search for '£' on the pdf and there are no matches higher on the 1st page but there are other matches further down the page, and on subsequent pages.

Does anyone know in what order pages are OCR's - I had assumed top to bottom and left to right but I may be wrong on that. The credit card statement has 3 columns with text in multiple sizes so defining an order for the text may be a much more complex subject than I have assumed.

(btw, I have fixed this problem by including the text 'new balance' in the custom match token and then replacing that text when adding to the file name - nevertheless I am still interested in understanding what I am doing wrong).
Drumer
 
Posts: 1
Joined: Sat Mar 15, 2014 8:41 am

Re: Contains Match Occurrence Sequence Thu Mar 20, 2014 12:38 pm • by Mr_Noodle
Unfortunately, the ordering of things in PDF files can be hard to predict. If using OCR, it's up to the OCR program to write out the text in whatever order and sometimes it doesn't quite mesh with what you see visually. This is especially true when there are multiple columns or sections instead of a single stream of text. If you are really curious, reply back and I can give more detailed instructions (requiring use of Terminal) where you can dump the raw text of the file.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support