Hazel 6 OCR on-the-fly contents

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Hazel 6 OCR on-the-fly contents Wed Oct 23, 2024 11:06 am • by martinewski
Awesome new features on Hazel 6!

I wonder, however, how I can view what the engine recognizes as text on a non-OCRed PDF file, when creating a condition? I tried the preview and the contents text is empty. It's important to see it so the rules can be correctly created.

https://share.cleanshot.com/CMVg7ZFdZ9QPG3KMqfQG
martinewski
 
Posts: 30
Joined: Sat Jun 29, 2013 6:41 pm

Re: Hazel 6 OCR on-the-fly contents Wed Oct 23, 2024 3:24 pm • by Mr_Noodle
It should work. Are you able to get it to work at all (preview or not)?
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Wed Oct 23, 2024 7:09 pm • by noodlehard
I have a bank statement with embedded text, but the account number is redacted. I'm struggling to get Hazel 6 to recognize the account number using on-the-fly text recognition. Is this something H6 OTF recognition should find?

Details
- "Use Text Recognition" is checked for this rule
- Hazel Preview finds embedded text fields (but not the redacted account number)
- Tried match using "Contents > contain" and "Contents > contain match". Neither worked.

Any guidance?
noodlehard
 
Posts: 8
Joined: Sat Dec 12, 2020 1:30 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 9:46 am • by Mr_Noodle
If it's redacted, it's unreadable. Hazel shouldn't be able to do text recognition on it in that case.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 10:13 am • by noodlehard
I didn't use 'redacted' correctly.

The account number is visible when I open the .pdf, but is missing when I copy and paste the text from the .pdf. My hope was that Hazel 6 would pick up the visible account number even though it's not in the embedded text. Does that make sense?
noodlehard
 
Posts: 8
Joined: Sat Dec 12, 2020 1:30 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 10:18 am • by Mr_Noodle
Is the document already OCRed? If so, Hazel's text recognition won't kick in.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 10:27 am • by noodlehard
Yes. My bank embedded text in the .pdf. But they left out the account number. Is there a way to tell Hazel 6 to use its own OCR instead?
noodlehard
 
Posts: 8
Joined: Sat Dec 12, 2020 1:30 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 11:02 am • by martinewski
Mr_Noodle wrote:It should work. Are you able to get it to work at all (preview or not)?


No, it's not detecting and matching any content on files missing the OCR layer.
martinewski
 
Posts: 30
Joined: Sat Jun 29, 2013 6:41 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 11:07 am • by Mr_Noodle
Can you email the file in to support?
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 12:06 pm • by noodlehard
If it wasn't a bank statement, I would be glad to send it. Concerned about sending this particular content.

I can probably look for other tags and ignore the bank account number. But if other options come to mind, I'm eager to try them.
noodlehard
 
Posts: 8
Joined: Sat Dec 12, 2020 1:30 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 2:13 pm • by Mr_Noodle
Actually, @martinewski sent in a file and then I discovered an embarrassing mistake. Some test code creeped in which disabled the text recognition. I'll be putting out a patch (probably tomorrow) which will fix things.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 4:24 pm • by noodlehard
Great. Mystery solved! I'll wait for the update.
noodlehard
 
Posts: 8
Joined: Sat Dec 12, 2020 1:30 pm

Re: Hazel 6 OCR on-the-fly contents Thu Oct 24, 2024 7:46 pm • by marcusnoodle
I have a related issue with my bank's poor OCR on my credit card statement.
I can no longer trigger Hazel rules I used to rely on because the poor quality of the existing OCR embedded in the PDF file. There are so many intermittent spaces breaking up the fields that make up attributes that I cannot match any attributes like account numbers or dates.

If there was an option to ignore the PDF embedded OCR and instead rely on Hazel's version, that may well improve the situation.

Is that possible?
marcusnoodle
 
Posts: 1
Joined: Thu Oct 24, 2024 7:40 pm

Re: Hazel 6 OCR on-the-fly contents Fri Oct 25, 2024 9:19 am • by Mr_Noodle
6.0.1 is out now which should fix things.

@marcusnoodle: it's something I've been considering. Just waiting for reports of people actually needing this. Thanks.
Mr_Noodle
Site Admin
 
Posts: 11865
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Hazel 6 OCR on-the-fly contents Fri Oct 25, 2024 12:01 pm • by sascha
Unfortunately, i don't understand how this should work.
For years now, i use Hazel to move any document from an "input" folder to Devonthink-Inbox, while copying a copy to a rescue folder.
Image
Now i was so happy that with Hazel 6, the documents would autoamtically be OCRd in order to be searchable in Devon.
How could this work if you don't save the text in the document? What is the purpose of OCR without saving ??
Thank you
sascha
 
Posts: 1
Joined: Fri Oct 25, 2024 11:51 am

Next

Return to Support

cron