Page 1 of 1

PDF Text Issues Since macOS Sequoia

PostPosted: Mon Oct 07, 2024 4:00 pm
by briantoth
Hi,

I don't think this is a Hazel problem as it affects Preview also, but since upgrading to Sequoia (15.0) a lot of my rules are broken and require work-arounds. I figured out what's going on. Say I have a rule that matches account numbers: "1234567890" Preview is interpreting the text as "1 2 3 4 5 6 7 8 9 0" — with the spaces between each letter. It seems more common with numbers than letters. If I open the same document in Adobe Acrobat, it's fine. I'm assuming that internally Hazel is using Apple's PDF frameworks for parsing? If I add a rule to match with the spaces the rules work again.

I'm running Hazel 4 at the moment. I was curious if anyone else is experiencing similar problems and if so if it's been fixed in later macOS betas or if later versions of Hazel are also affected.

Thanks! (Love Hazel!)

Re: PDF Text Issues Since macOS Sequoia

PostPosted: Tue Oct 08, 2024 10:19 am
by Mr_Noodle
Yes, Hazel is using Apple's PDF frameworks. PDF is a bit of a freeform format so unfortunately, you may get things like this. Do you have access to a pre-Sequoia machine? I'm curious to see if the same file is interpreted differently there.

Re: PDF Text Issues Since macOS Sequoia

PostPosted: Mon Jan 13, 2025 4:34 pm
by briantoth
I just wanted to follow up…

After trying many combinations of scan quality settings vs different OCR tools (built in Fujitsu, OCRmyPDF, and others) I found that the text recognition was just inconsistent at best. And ironically, increasing scan quality did not always equate to better OCR text recognition. It did seem to work better before, but I don't know if it was a Fujitsu update or something else that changed.

However, what did work was purchasing Hazel 6 and forcing the use of its built-in text recognition for all of my rules. :) It has been 100% consistent on every document I throw at it. Even forms with weird fonts, and even when things are upside down or skewed. Best software purchase I've made in a long time! I was able to use the Preview function and looking at the text Hazel parsed to figure out the best way to process certain documents… things like dates were not always in the order I expected, but the text recognition always gave the same results so I was able to greatly simply rules. Absolutely amazing. I even had it go back and process some old files that I had given up on because I ran out of new files to scan.

(Now I just wish I could save the text recognition results as the text layer in the PDFs so that other PDF readers and search tools could benefit as well. Maybe for Hazel 7. :D )

Re: PDF Text Issues Since macOS Sequoia

PostPosted: Tue Jan 14, 2025 9:57 am
by Mr_Noodle
Yeah, saving the results is on my radar but there are some issues with it that I need to research.