Page 1 of 1

Matching content in an HTML file

PostPosted: Sun Sep 15, 2019 9:37 am
by Dave61
I want to pull some information out of a saved HTML file, starting with the page's title.

Obviously this is surrounded by <title> and </title> tags, so I tried a condition "Contents contain match" and the [anything] token. But it doesn't work.
Image
I have tried this with the file in plain text, HTML, and PDF.

Am I right in thinking that Hazel ignores anything in < > on the grounds that it isn't content?

Re: Matching content in an HTML file

PostPosted: Mon Sep 16, 2019 10:06 am
by Mr_Noodle
That is correct. Markup is not considered the content so you can't really use that to match in this case. If you preview the file and click on the badge next to the "Contents" condition, you can view the contents as Hazel sees them.

Re: Matching content in an HTML file

PostPosted: Mon Feb 22, 2021 7:10 pm
by BenW
Is there a way for Hazel to match inside HTML code?

I regularly download a page that I want Hazel to extract the date shown on the page and some other bits of info.

If I copy the page and paste into TextEdit, I see what I need fine. If I open the HTML file directly in TextEdit, I can see the info text if I search through the dense HTML code. It's pretty deep inside misc nested "<" and other functions.

If I preview the file in Hazel and click the (i) next to the match function, only a small portion of the page shows up there, which doesn't include what I need.

How to get at that info I need?
Thx much

Re: Matching content in an HTML file

PostPosted: Tue Feb 23, 2021 11:32 am
by Mr_Noodle
If using the preview, for Contents, click the ... button, which will expand the field to show all the text.

Re: Matching content in an HTML file

PostPosted: Fri Feb 26, 2021 6:27 pm
by BenW
"Contains" finds the text, "Contains match" does not.

The 3 dots preview expansion does not show the text I'm looking for. Select-All then copy from the browser viewing the HTML page and pasting into Text Edit does show the text.

Some screen shots...
HTML page (file on computer archived from web)
https://drive.google.com/file/d/1DfRzwX ... iVMcU9rwyP

Rule showing matches and not, and ... expansion pg1, pg2
https://drive.google.com/file/d/1FsLHJH ... omXt3mEOOR
https://drive.google.com/file/d/1IwR9l5 ... i-4PthJMzT

TextEdit view
https://drive.google.com/file/d/1JnQG08 ... EJcFfNbtlw

Thanks for any help getting to match some of the field names & text. Useful for this example and a few other page archivings I do.

p.s. How do I include images in my post? Couldn't get it to work hence the links to google drive.

Re: Matching content in an HTML file

PostPosted: Mon Mar 01, 2021 11:28 am
by Mr_Noodle
For images on the forums, you have to link to a separate site.

Are you set or did you still require help with this? As for why the "Contents contain" is matching, it's dependent on Spotlight so it's possible Spotlight is either searching for the terms separately or it's some other quirk.

Re: Matching content in an HTML file

PostPosted: Mon Mar 01, 2021 4:26 pm
by BenW
Mr_Noodle wrote:For images on the forums, you have to link to a separate site.

Are you set or did you still require help with this? As for why the "Contents contain" is matching, it's dependent on Spotlight so it's possible Spotlight is either searching for the terms separately or it's some other quirk.


Did not get it working. Still doing it by hand. Can you recommend a way for a rule to look deeper into HTML (displayed text or source) for rule matching? Or is this a limitation of Hazel?

Seems that since there are those other methods that can see the text I need, that there ought to be a way to get at it with a rule. It's just that the text I need isn't showing up in the "..." preview so however Hazel is looking at the file has some constraints apparently.

Thx for any ideas/help.

Re: Matching content in an HTML file

PostPosted: Tue Mar 02, 2021 11:47 am
by Mr_Noodle
Hazel is looking at the text "content", which excludes tags. You will either need to use a script to read the file in yourself or change the file extension to something like txt for Hazel to read in the raw text.

Re: Matching content in an HTML file

PostPosted: Tue Mar 02, 2021 1:38 pm
by BenW
Mr_Noodle wrote:... change the file extension to something like txt for Hazel to read in the raw text.


That was the tip I needed. Change extension to .txt, do my matching things, change back to .html. Works like a charm now. Thx