Page 1 of 1

Selection for Attribute going past newline

PostPosted: Tue Mar 29, 2016 10:41 am
by xyzzy
New user here, already finding Hazel invaluable. One problem I can't seem to resolve:

I need to scan and OCR medical insurance reimbursement statements, and would like to extract the Provider's name for inclusion in the file name. In the document (verified by a cut and paste from the OCR'd PDF into TextWrangler) the information I need is in the form of an address, which looks like this in Text Wrangler with a <newline> symbol at the end of each line:

PROVIDER: Name of Provider or Facility
First Line of Address
Next Line of Address
etc.

The Provider's name line is of course essentially random, with varying number of letters and words.

The selection I'm using is "Contents Contain Match: with the below pattern and attribute (In the attribute there are no spaces or other attributes other than the "Anything" code.) The problem is that Hazel is extracting for the custom token the first two lines (Name plus first line of address). I was under the impression that the selection should stop at the end of the line, but it doesn't. What do I need to do to only pull out the first line (up to but not including the <newline>)?

Image

Re: Selection for Attribute going past newline

PostPosted: Tue Mar 29, 2016 1:31 pm
by Mr_Noodle
It definitely shouldn't go past a newline. If it did, it would go until the end of the file. Can you copy and paste the text into a plain text file and email it to support? I'm curious to see what character is actually there.

Resolution

PostPosted: Thu Mar 31, 2016 10:06 am
by xyzzy
After a very helpful email exchange with Mr_Noodle, he was able to determine that what appeared to be a newline, both on the original PDF and viewing the underlying OCR in TextWrangler, was actually a space, and therefore Hazel was performing as designed.
Not sure why, but ABBYY Finereader for ScanSnap was misinterpreting the character. (On the original documents and scanned PDFs, of which there were over 100, this appeared unmistakably to be a newline.) First time I've noted a repeated error of this magnitude from Finereader.

Found a satisfactory work-around by capturing the first three words of the address, which is satisfactory for my purpose.

Nice to have a product with such involved support: Thanks, Paul.