Selection for Attribute going past newline

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Selection for Attribute going past newline Tue Mar 29, 2016 10:41 am • by xyzzy
New user here, already finding Hazel invaluable. One problem I can't seem to resolve:

I need to scan and OCR medical insurance reimbursement statements, and would like to extract the Provider's name for inclusion in the file name. In the document (verified by a cut and paste from the OCR'd PDF into TextWrangler) the information I need is in the form of an address, which looks like this in Text Wrangler with a <newline> symbol at the end of each line:

PROVIDER: Name of Provider or Facility
First Line of Address
Next Line of Address
etc.

The Provider's name line is of course essentially random, with varying number of letters and words.

The selection I'm using is "Contents Contain Match: with the below pattern and attribute (In the attribute there are no spaces or other attributes other than the "Anything" code.) The problem is that Hazel is extracting for the custom token the first two lines (Name plus first line of address). I was under the impression that the selection should stop at the end of the line, but it doesn't. What do I need to do to only pull out the first line (up to but not including the <newline>)?

Image
xyzzy
 
Posts: 5
Joined: Sun Mar 13, 2016 4:54 pm

It definitely shouldn't go past a newline. If it did, it would go until the end of the file. Can you copy and paste the text into a plain text file and email it to support? I'm curious to see what character is actually there.
Mr_Noodle
Site Admin
 
Posts: 11951
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Resolution Thu Mar 31, 2016 10:06 am • by xyzzy
After a very helpful email exchange with Mr_Noodle, he was able to determine that what appeared to be a newline, both on the original PDF and viewing the underlying OCR in TextWrangler, was actually a space, and therefore Hazel was performing as designed.
Not sure why, but ABBYY Finereader for ScanSnap was misinterpreting the character. (On the original documents and scanned PDFs, of which there were over 100, this appeared unmistakably to be a newline.) First time I've noted a repeated error of this magnitude from Finereader.

Found a satisfactory work-around by capturing the first three words of the address, which is satisfactory for my purpose.

Nice to have a product with such involved support: Thanks, Paul.
xyzzy
 
Posts: 5
Joined: Sun Mar 13, 2016 4:54 pm


Return to Support