Matching content in an HTML file

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Matching content in an HTML file Sun Sep 15, 2019 9:37 am • by Dave61
I want to pull some information out of a saved HTML file, starting with the page's title.

Obviously this is surrounded by <title> and </title> tags, so I tried a condition "Contents contain match" and the [anything] token. But it doesn't work.
Image
I have tried this with the file in plain text, HTML, and PDF.

Am I right in thinking that Hazel ignores anything in < > on the grounds that it isn't content?
Dave61
 
Posts: 113
Joined: Tue Jul 10, 2012 4:56 pm

Re: Matching content in an HTML file Mon Sep 16, 2019 10:06 am • by Mr_Noodle
That is correct. Markup is not considered the content so you can't really use that to match in this case. If you preview the file and click on the badge next to the "Contents" condition, you can view the contents as Hazel sees them.
Mr_Noodle
Site Admin
 
Posts: 11235
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching content in an HTML file Mon Feb 22, 2021 7:10 pm • by BenW
Is there a way for Hazel to match inside HTML code?

I regularly download a page that I want Hazel to extract the date shown on the page and some other bits of info.

If I copy the page and paste into TextEdit, I see what I need fine. If I open the HTML file directly in TextEdit, I can see the info text if I search through the dense HTML code. It's pretty deep inside misc nested "<" and other functions.

If I preview the file in Hazel and click the (i) next to the match function, only a small portion of the page shows up there, which doesn't include what I need.

How to get at that info I need?
Thx much
BenW
 
Posts: 6
Joined: Tue Mar 10, 2020 7:52 pm

Re: Matching content in an HTML file Tue Feb 23, 2021 11:32 am • by Mr_Noodle
If using the preview, for Contents, click the ... button, which will expand the field to show all the text.
Mr_Noodle
Site Admin
 
Posts: 11235
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching content in an HTML file Fri Feb 26, 2021 6:27 pm • by BenW
"Contains" finds the text, "Contains match" does not.

The 3 dots preview expansion does not show the text I'm looking for. Select-All then copy from the browser viewing the HTML page and pasting into Text Edit does show the text.

Some screen shots...
HTML page (file on computer archived from web)
https://drive.google.com/file/d/1DfRzwX ... iVMcU9rwyP

Rule showing matches and not, and ... expansion pg1, pg2
https://drive.google.com/file/d/1FsLHJH ... omXt3mEOOR
https://drive.google.com/file/d/1IwR9l5 ... i-4PthJMzT

TextEdit view
https://drive.google.com/file/d/1JnQG08 ... EJcFfNbtlw

Thanks for any help getting to match some of the field names & text. Useful for this example and a few other page archivings I do.

p.s. How do I include images in my post? Couldn't get it to work hence the links to google drive.
BenW
 
Posts: 6
Joined: Tue Mar 10, 2020 7:52 pm

Re: Matching content in an HTML file Mon Mar 01, 2021 11:28 am • by Mr_Noodle
For images on the forums, you have to link to a separate site.

Are you set or did you still require help with this? As for why the "Contents contain" is matching, it's dependent on Spotlight so it's possible Spotlight is either searching for the terms separately or it's some other quirk.
Mr_Noodle
Site Admin
 
Posts: 11235
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching content in an HTML file Mon Mar 01, 2021 4:26 pm • by BenW
Mr_Noodle wrote:For images on the forums, you have to link to a separate site.

Are you set or did you still require help with this? As for why the "Contents contain" is matching, it's dependent on Spotlight so it's possible Spotlight is either searching for the terms separately or it's some other quirk.


Did not get it working. Still doing it by hand. Can you recommend a way for a rule to look deeper into HTML (displayed text or source) for rule matching? Or is this a limitation of Hazel?

Seems that since there are those other methods that can see the text I need, that there ought to be a way to get at it with a rule. It's just that the text I need isn't showing up in the "..." preview so however Hazel is looking at the file has some constraints apparently.

Thx for any ideas/help.
BenW
 
Posts: 6
Joined: Tue Mar 10, 2020 7:52 pm

Re: Matching content in an HTML file Tue Mar 02, 2021 11:47 am • by Mr_Noodle
Hazel is looking at the text "content", which excludes tags. You will either need to use a script to read the file in yourself or change the file extension to something like txt for Hazel to read in the raw text.
Mr_Noodle
Site Admin
 
Posts: 11235
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Matching content in an HTML file Tue Mar 02, 2021 1:38 pm • by BenW
Mr_Noodle wrote:... change the file extension to something like txt for Hazel to read in the raw text.


That was the tip I needed. Change extension to .txt, do my matching things, change back to .html. Works like a charm now. Thx
BenW
 
Posts: 6
Joined: Tue Mar 10, 2020 7:52 pm


Return to Support