Page 1 of 1

Matching rules across line breaks

PostPosted: Thu Jul 11, 2019 7:08 am
by zaphod
Hi there,

I have a Condition that should match the appearance of two words in a PDF/A file generated and OCRed by Finereader.

The Condition is as simple as: Contents > Contain Match > word1 [anything] word 2

Obviously Hazel is not able to process the Condition across line breaks, since only with everything in the same line the Condition delivers a match as supposed. And yes, there are several line breaks in the document.

From what I've read here, in general Hazel (in my case 4.3.5 build 1568) should be able to match conditions across line breaks, while being somehow dependent on how the OCRed text is structured within the PDF.

Is there any way to deal with this problem?

Thank you very much in advance and take care
Stefan

Re: Matching rules across line breaks

PostPosted: Thu Jul 11, 2019 10:10 am
by Mr_Noodle
(anything) won't match over line breaks as it will create a ton of problems. A single space ( ), though, will match any amount of whitespace, including linebreaks. You may want to try using that.

Re: Matching rules across line breaks

PostPosted: Thu Jul 11, 2019 1:51 pm
by zaphod
Unfortunately a single space ( ) won't help, since there is a bunch of text (together with the linebreaks) between word1 and word2. BTW the problem does especially appear with supermarket receipts, as Finereader obviously seems to interprete each item followed by whitespace and a number as a new line - which really isn't such a bad idea considering the fact, that such a line represent something like a logical entity...

Re: Matching rules across line breaks

PostPosted: Fri Jul 12, 2019 10:35 am
by Mr_Noodle
Can you use a combination of spaces and "anything" tokens? Use a space for each linebreak and "anything" for the text.

Re: Matching rules across line breaks

PostPosted: Sat Jul 13, 2019 5:12 am
by zaphod
Yes, but this only works for documents having exactly the same amount of linebreaks as specified in the condition. The supermarket receipts differ a lot as the more/less items you have on it, the more/less linebreaks you have. Anyway, looks as if I will have to do a workaround with a number of different rules - not elegant but will do.