Matching rules across line breaks

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Matching rules across line breaks Thu Jul 11, 2019 7:08 am • by zaphod
Hi there,

I have a Condition that should match the appearance of two words in a PDF/A file generated and OCRed by Finereader.

The Condition is as simple as: Contents > Contain Match > word1 [anything] word 2

Obviously Hazel is not able to process the Condition across line breaks, since only with everything in the same line the Condition delivers a match as supposed. And yes, there are several line breaks in the document.

From what I've read here, in general Hazel (in my case 4.3.5 build 1568) should be able to match conditions across line breaks, while being somehow dependent on how the OCRed text is structured within the PDF.

Is there any way to deal with this problem?

Thank you very much in advance and take care
Stefan
zaphod
 
Posts: 3
Joined: Thu Jul 11, 2019 5:57 am
Location: Germany

Re: Matching rules across line breaks Thu Jul 11, 2019 10:10 am • by Mr_Noodle
(anything) won't match over line breaks as it will create a ton of problems. A single space ( ), though, will match any amount of whitespace, including linebreaks. You may want to try using that.
Mr_Noodle
Site Admin
 
Posts: 8110
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching rules across line breaks Thu Jul 11, 2019 1:51 pm • by zaphod
Unfortunately a single space ( ) won't help, since there is a bunch of text (together with the linebreaks) between word1 and word2. BTW the problem does especially appear with supermarket receipts, as Finereader obviously seems to interprete each item followed by whitespace and a number as a new line - which really isn't such a bad idea considering the fact, that such a line represent something like a logical entity...
zaphod
 
Posts: 3
Joined: Thu Jul 11, 2019 5:57 am
Location: Germany

Re: Matching rules across line breaks Fri Jul 12, 2019 10:35 am • by Mr_Noodle
Can you use a combination of spaces and "anything" tokens? Use a space for each linebreak and "anything" for the text.
Mr_Noodle
Site Admin
 
Posts: 8110
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching rules across line breaks Sat Jul 13, 2019 5:12 am • by zaphod
Yes, but this only works for documents having exactly the same amount of linebreaks as specified in the condition. The supermarket receipts differ a lot as the more/less items you have on it, the more/less linebreaks you have. Anyway, looks as if I will have to do a workaround with a number of different rules - not elegant but will do.
zaphod
 
Posts: 3
Joined: Thu Jul 11, 2019 5:57 am
Location: Germany


Return to Support