Page 1 of 2

Hazel 6 OCR on-the-fly contents

PostPosted: Wed Oct 23, 2024 11:06 am
by martinewski
Awesome new features on Hazel 6!

I wonder, however, how I can view what the engine recognizes as text on a non-OCRed PDF file, when creating a condition? I tried the preview and the contents text is empty. It's important to see it so the rules can be correctly created.

https://share.cleanshot.com/CMVg7ZFdZ9QPG3KMqfQG

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Wed Oct 23, 2024 3:24 pm
by Mr_Noodle
It should work. Are you able to get it to work at all (preview or not)?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Wed Oct 23, 2024 7:09 pm
by noodlehard
I have a bank statement with embedded text, but the account number is redacted. I'm struggling to get Hazel 6 to recognize the account number using on-the-fly text recognition. Is this something H6 OTF recognition should find?

Details
- "Use Text Recognition" is checked for this rule
- Hazel Preview finds embedded text fields (but not the redacted account number)
- Tried match using "Contents > contain" and "Contents > contain match". Neither worked.

Any guidance?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 9:46 am
by Mr_Noodle
If it's redacted, it's unreadable. Hazel shouldn't be able to do text recognition on it in that case.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 10:13 am
by noodlehard
I didn't use 'redacted' correctly.

The account number is visible when I open the .pdf, but is missing when I copy and paste the text from the .pdf. My hope was that Hazel 6 would pick up the visible account number even though it's not in the embedded text. Does that make sense?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 10:18 am
by Mr_Noodle
Is the document already OCRed? If so, Hazel's text recognition won't kick in.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 10:27 am
by noodlehard
Yes. My bank embedded text in the .pdf. But they left out the account number. Is there a way to tell Hazel 6 to use its own OCR instead?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 11:02 am
by martinewski
Mr_Noodle wrote:It should work. Are you able to get it to work at all (preview or not)?


No, it's not detecting and matching any content on files missing the OCR layer.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 11:07 am
by Mr_Noodle
Can you email the file in to support?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 12:06 pm
by noodlehard
If it wasn't a bank statement, I would be glad to send it. Concerned about sending this particular content.

I can probably look for other tags and ignore the bank account number. But if other options come to mind, I'm eager to try them.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 2:13 pm
by Mr_Noodle
Actually, @martinewski sent in a file and then I discovered an embarrassing mistake. Some test code creeped in which disabled the text recognition. I'll be putting out a patch (probably tomorrow) which will fix things.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 4:24 pm
by noodlehard
Great. Mystery solved! I'll wait for the update.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Thu Oct 24, 2024 7:46 pm
by marcusnoodle
I have a related issue with my bank's poor OCR on my credit card statement.
I can no longer trigger Hazel rules I used to rely on because the poor quality of the existing OCR embedded in the PDF file. There are so many intermittent spaces breaking up the fields that make up attributes that I cannot match any attributes like account numbers or dates.

If there was an option to ignore the PDF embedded OCR and instead rely on Hazel's version, that may well improve the situation.

Is that possible?

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Fri Oct 25, 2024 9:19 am
by Mr_Noodle
6.0.1 is out now which should fix things.

@marcusnoodle: it's something I've been considering. Just waiting for reports of people actually needing this. Thanks.

Re: Hazel 6 OCR on-the-fly contents

PostPosted: Fri Oct 25, 2024 12:01 pm
by sascha
Unfortunately, i don't understand how this should work.
For years now, i use Hazel to move any document from an "input" folder to Devonthink-Inbox, while copying a copy to a rescue folder.
Image
Now i was so happy that with Hazel 6, the documents would autoamtically be OCRd in order to be searchable in Devon.
How could this work if you don't save the text in the document? What is the purpose of OCR without saving ??
Thank you