Feature Request: "Throw away if a duplicate" using contents

I've looked through the online references and forum and did not see a solution to this. My apologies if I missed it.

What I would like to do is have the "Throw away if a duplicate" option not require exact file matches, but to match on extracted file contents. I expect that there would then be "Throw away if an exact duplicate" and "Throw away if extracted text matches" options and/or a global option to change the "duplicate" behavior.

Use case for this feature

I download my bank statements from the online banking site and use Hazel to rename the file and then file it away. Sometimes I go a few months between downloading them and just download a 3-5 of them at once when I remember.

Sample rules:

Match contents based on keywords and account number to identify the rule to apply
Match and capture statement date to use for the renaming of the file
Rename the file to "Bank_Name-YYYY-MM-DD" based on the captured date
Move to my folder for these records, have selected "Throw away if a duplicate" option

For some banks, they store the PDF online and I get the exact same "blob" downloaded each time, so the duplicate gets thrown away. For others, the PDF seems to be generated at the time of request, and the contents are all the same, but the files differ in some way so the checksum is different. (E.g., maybe it has some sort of embedded comment with date of generation or similar.) For those, I get duplicates being renamed even though when I open the file, I can see no difference and in Hazel's preview of matching contents, all of the text shown is identical.

Request

What I would like to request is that there be an option to make the file comparison based on the extracted text contents of the PDF and allow the duplicate detection to be triggered based on the same contents and not just 100% exact same files.

Will think about it but it does seem to be a niche case. For now, you can try using a nested condition to try and match a file against other files to see if they meet the criteria. Either that or a custom script.

Thanks for following up with me on this. Can you point me to an example of how I can use a nested condition or custom script to match against the destination file? I'm pretty comfortable with programming, so I'm sure I can figure out the logic needed, but am having trouble figuring out where to put this in my Hazel rules. Thanks.

You can set the target of a nested condition to match against another file in the same folder. To compare the text output, though, would probably require a script.