All Hazel rules stopped recognizing text in pdf's

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

I noticed a few weeks ago certain Hazel rules stopped reading the text contents for the pdf's particular to that rule.

The conditions which stopped matching are "contents contain XXXX".
I repopulated these conditions by copying and pasting applicable text from the latest pdf back into the rule conditions. It continues to give the "rules do not match" error.

Upon further investigation, I found that ALL of the rules, and there are dozens, stopped matching any text in all of the pdf's relevant to those rules.

These rules have not changed for the last 18 - 24 months. It also worked fine when upgrading to V5.0 up until recently.

Please help with a suggestion on where to start looking for the issue.

Thanks
Hardehout
 
Posts: 1
Joined: Sun Dec 15, 2019 3:05 am

"Contents contain" uses Spotlight so if Spotlight is not indexing your files, they won't match in Hazel. You can either fix Spotlight on your system or use "Contents contain match". The latter does a direct scan of the file, bypassing Spotlight. It's more reliable but also more resource intensive to run.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

I am also noticing this on Monterey 12.6, Hazel. 5.1.4.

I set up contents contain match rules on a single folders or pdfs and xls files. The files are balancing reports and I a pulling the PO# and Part# out of the file in order to sort them into folders. The rules evaluated a handful of times and then stopped.

I originally had the files in OneDrive so thought I had a syncing issue. I moved them to a local folder and applied the same rule, the check status of current rules and the preview rule indicate a match but nothing has happened in a day or two.

Even when I stop hazel, clear all the logs and manually run the rules on the one folder nothing appears in the log.

I am now seeing the error

2022-10-19 22:33:55.247 hazelworker[29865] File type not supported: {(
"public.item",
"com.microsoft.excel.xls",
"public.data",
"public.composite-content",
"public.spreadsheet",
"public.content"
jfisher
 
Posts: 55
Joined: Sat Feb 25, 2017 7:47 pm

Hazel cannot read Excel files as they are not text files.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

How does the contents contain match feature work? Does a direct scan act differently than extract text from image using Shortcuts? Hazel can process some of the files, likely the true pdfs but some of the documents I receive from the vendor are pdfs created from scanning a physical document. Apple Shortcuts can usually extract the text from the image using regex criteria I set up but I can’t seem to pass the data back to Hazel to do anything with it.
jfisher
 
Posts: 55
Joined: Sat Feb 25, 2017 7:47 pm

It reads the files. Since every file type has their own format, Hazel can only read certain text-based files. Extracting text from an image is a totally different mechanism (OCR). Hazel can read some PDFs that were originally scans because some other program along the way performed OCR on them.

That said, I am consider adding text extraction to Hazel in the future. It would probably be seamless in that "Contents contain match" will just work on non-OCRed files.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Hardehout wrote:I noticed a few weeks ago certain Hazel rules stopped reading the text contents for the pdf's particular to that rule.

The conditions which stopped matching are "contents contain XXXX".
I repopulated these conditions by copying and pasting applicable text from the latest pdf back into the rule conditions. It continues to give the "rules do not match" error.

Upon further investigation, I found that ALL of the rules, and there are dozens, stopped matching any text in all of the pdf's relevant to those rules.

These rules have not changed for the last 18 - 24 months. It also worked fine when upgrading to V5.0 up until recently.

Please help with a suggestion on where to start looking for the issue.

Thanks

i have just noticed the same issue. Dozens of rules looking for text content in PDFs no longer working that previously had been working for years... Whether using "Contents contain" or "Contents contain match" in the way i previously wrote the rules, not working now...

In looking at some of the rules i wrote long ago (that working until recently), i think i used "Contents contain match" when looking for a pattern (such as date of a financial statement or utility bill) that i then wanted to use in a renaming of the PDF file and used "Contents contain" when just looking for text in PDF that would help identify that statement was of a certain kind (such as the name of a utility company).

So for example, my rules would say "Contents contain" : XYZCompany and then "Contents contain match" of a date pattern (such as June 30, 2010). And then action would be to rename the file as "2010-06-30 XYZCompany monthly statement". Needed the date pattern to be found so could reconfigure date way i wanted in file name. In the rename action item, would have the date pattern (with the reconfiguration from what in the PDF noted) and then just type in the rest
corylusman
 
Posts: 2
Joined: Sun Sep 30, 2012 2:57 pm

If "Contents contain" does not match, then it's an issue with Spotlight on your system. If it is "Contents contain match", then open the file in Preview. See if you can find the text there.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

I have the same problem and I think it has to do with changes done by Dropbox to network drives. All my rules are in Dropbox folders and the ones that use the pattern "Content - contains - XXX" have stopped working. If I search with Spotlight I can't find any results. However, if I move the pdf out of Dropbox, Spotlight does find the text inside the pdf.
josmen
 
Posts: 1
Joined: Wed Jan 04, 2023 1:31 am

You should contact Dropbox about that as it's a larger issue between them and Spotlight.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

I'm having the same issue with "Content contains". All of my rules using it have stopped working. My files are stored locally (on my computer).
When I search using Spotlight, it finds the file easily. However, "content contains" does not work.
Any fixes to this issue?
winzelerm
 
Posts: 5
Joined: Tue Jan 24, 2023 6:36 pm

Are the terms you are using to search for it in Spotlight also in the name of the file?
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:Are the terms you are using to search for it in Spotlight also in the name of the file?


No. They are not in the name.
Every rule that uses "contains" is not working.
More info: running Hazel version 5.2 on a new M1 iMac.

I saw an earlier comment from you that said to "fix" Spotlight. Not sure how to do that or what you meant.
winzelerm
 
Posts: 5
Joined: Tue Jan 24, 2023 6:36 pm

Can you post screenshots of this with a specific file? Use imgur.com if you do not have your own file/image hosting service.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

I have the same problem where every rule that uses "contain" is not working. They were working fine before I shifted to a new M2 Pro Mac mini yesterday. I'm using macOS 13.2.1 and Hazel 5.2.

I have reindexed Spotlight for the Downloads folder (where my files end up) as described in this link (https://support.apple.com/en-us/HT201716), and that made no difference. There was still no match. But searching with Spotlight located the proper PDF file (whose name is numbers only and does not contain the desired search text) easily as you can see from this screenshot: https://i.imgur.com/hcIr02B

As an example, I have three "contain" conditions searching for "AT&T," "Update" and "AutoPay," and none of them match. Hopefully this screenshot will come through clearly to illustrate: https://i.imgur.com/ntuRWGR

On the other hand, using a "contain match" condition appears to work fine, but changing my numerous "contain" conditions to "contain match" would be a long and tedious process.

If the screenshots don't come through, let me know what else I can do besides an IMGUR link.

Thanks!

James
brudderman
 
Posts: 8
Joined: Tue Jan 19, 2016 5:02 pm

Next

Return to Support