Content matching inconsistent

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Content matching inconsistent Mon Jan 01, 2024 9:18 am • by vco1
As with Hazel 5.3.1 content matching on pdf files is inconsistent.

macOS Sonoma 14.2.1

Even if I copy the string from the contents preview in Hazel, the string is not found. Seems to occur when the string contains spaces. The pdf is generated (a bank statement), not scanned and OCR'ed. When copying the string from the pdf and pasting it in an editor, there is just one space.

Problem is that when I recreated the rule it suddenly worked for one rule. But this trick didn't work for another rule. No matter what I tried - on different strings - the moment I entered a space in the pattern, it resulted in a non-match.
vco1
 
Posts: 9
Joined: Mon Feb 15, 2021 3:04 pm

Re: Content matching inconsistent Tue Jan 02, 2024 10:49 am • by Mr_Noodle
Can you post specifics of your case? The specific text, the rule/pattern you set up, etc?
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Content matching inconsistent Tue Jan 16, 2024 6:48 am • by someone
I have a very similar issue with bank statements.
I'm on Hazel 5.3.1. (2371) and macOS 14.1.2.

I have a "Contents" "contain match" condition with an embedded table (bankAccount). I'm using the bank account no to find a match "(bankAccount.findBankAccountNo)". It will not match the column "findBankAccountNo". Even if I copy this from the text preview within hazel.

But when I set up a new condition using "Contents" "contain match" with the text string itself it works. The text string is an IBAN and look like this: "DE00 0000 0000 0000 0000 00".
someone
 
Posts: 5
Joined: Tue Jan 16, 2024 6:41 am

Re: Content matching inconsistent Tue Jan 16, 2024 9:55 am • by Mr_Noodle
Can you email in to support with a sample document and rule (with table) demonstrating this problem?
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Content matching inconsistent Tue Mar 05, 2024 6:32 am • by someone
It's a bit difficult, because they are bank statements.

But when I remove pages from the pdf and remove rows from my embedded table, the match works.
someone
 
Posts: 5
Joined: Tue Jan 16, 2024 6:41 am

Re: Content matching inconsistent Tue Mar 05, 2024 7:49 am • by someone
Ok, I think I know what this is. It's a UX problem I guess.

I have similar bank documents. It seems to almost match another row in my table. These rows are fairly similar:

- findBankName
- findAccountNo
- findBankStatement (String with "Bank Statement", "Message", "Costs")
...

It's sometimes just one cell that differs to create a match. The matcher just looks at the closes match maybe?
So the red indicator for a wrong match is not helping.

Maybe it would be a good idea to be able to select a row when debugging against it.
someone
 
Posts: 5
Joined: Tue Jan 16, 2024 6:41 am

Re: Content matching inconsistent Tue Mar 05, 2024 9:50 am • by Mr_Noodle
I'm not clear on the issue here. Hazel matches the text exactly. Can you post specific screens/details?
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Content matching inconsistent Wed Mar 06, 2024 10:50 am • by someone
Ok, here it is. I have a rule:

If [all] of the following conditions are met
* Extension is pdf
* [Contents] [contain match] [bankAccount.findBank]
* [Contents] [contain match] [bankAccount.findAccountNo]
* [Contents] [contain match] [bankAccount.findBankStatement]
* [Contents] [contain match] [bankAccount.matchDatePre][date]

And I have an embedded table bankAccount that looks like this:
findBank,findAccountNo,findBankStatement,matchDatePre,rowId(just-for-reference)
Bank A,000001,Kontoauszug, Seite ,1
Bank A,000001,Mitteilung, Seite ,2
Bank A,000002,Kontoauszug, Seite ,3
Bank A,000002,Mitteilung, Seite ,4
Bank B,000011,Kontoauszug, Seite ,5
Bank B,000011,Mitteilung, Seite ,6
Bank B,000011,Entgelte, Seite ,7
...

Now some files don't get processed, because Hazel tries to match agains previous rows, that are very similar. For instance it tries to match rowId 3 where the proper document match is rowId 4. I can see that it tries to validate agains the wrong row, because of the "Custom Attributes" section, when I click on the Rule match indicator (green or red). It's the wrong row number.

When I move the row up, it matches properly. But then the other file will not be sorted.

That's also why I meant it would be helpful to debug agains a specific row of a table (UX problem). The red indicator does not help, because it's for the wrong row.
someone
 
Posts: 5
Joined: Tue Jan 16, 2024 6:41 am

Re: Content matching inconsistent Thu Mar 07, 2024 10:26 am • by Mr_Noodle
I see. How it works now is once a custom attribute matches a row, subsequent matches must match that row. At the moment, you can't do multi-column matches as you are expecting now. You would probably need to combine the columns to give them unique values.
Mr_Noodle
Site Admin
 
Posts: 11255
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Content matching inconsistent Fri Mar 08, 2024 12:07 pm • by someone
I see. Thank you for clarifying!
someone
 
Posts: 5
Joined: Tue Jan 16, 2024 6:41 am


Return to Support

cron