Detect a PDF without extension

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Detect a PDF without extension Mon Apr 23, 2018 3:54 am • by halloleo
I have one folder I drop all my regular bills in (Internet, mobile phone, etc) and let Hazel file them to the appropriate directories in my system. These come either from emails or from my browser.

These files are essentially all PDFs, each rules starts with a condition checking for the file kind as PDF and then has further conditions for content or one of the Spotlight metadata (like Where From or Description). So far so good. One provider though provides the PDF without extension. A file name might look like this:

Code: Select all
Doc20180422201701


This is no problem for Preview, but unfortunately for Hazel: When I double click the downloaded file, Preview shows it as what it is: a PDF; Hazel on the other hand says it's kind is not PDF and content or metadata do not match either as expected.

What to do to identify the document and then move it accordingly?

ps: If I by hand add the PDF extension, so that the file name reads

Code: Select all
Doc20180422201701.pdf


then Hazel recognises it and all condition magically become true as expected.
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm

Re: Detect a PDF without extension Mon Apr 23, 2018 12:20 pm • by Mr_Noodle
Select the file in Finder and do "Get Info". What is the "Kind" there? Also, is the extension set as hidden?
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Detect a PDF without extension Tue Apr 24, 2018 7:40 am • by halloleo
Mr_Noodle wrote:Select the file in Finder and do "Get Info". What is the "Kind" there? Also, is the extension set as hidden?


Get Info says "TextEdit.app Document". Extensions are not hidden. If I make a copy of the document and add ".pdf" to the name, Get Info says "PDF document".

But of course I do not want to add manually extensions to downloaded files...
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm

Re: Detect a PDF without extension Tue Apr 24, 2018 12:18 pm • by Mr_Noodle
If Finder thinks it's a text doc then Hazel will as well. You'll need to figure out some other way to identify the file. Maybe go by the Source URL, if that is available, and add the extension.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Detect a PDF without extension Tue Apr 24, 2018 9:31 pm • by halloleo
Mr_Noodle wrote:If Finder thinks it's a text doc then Hazel will as well. You'll need to figure out some other way to identify the file. Maybe go by the Source URL, if that is available, and add the extension.


Trouble is, the main way to identify the document is by file name and some content text, but Hazel seems to recognise the content only when the file has the correct extension.

So, does this mean, I somehow have to set up a rule which does the following:

1. Add to all file adhering to the naming condition the .pdf extension
2. Then check for some text in the content and if the content is correct move the file as usual
3. But if the content is not correct delete the extension

Is this the best way to go about it? How do I do this in a Hazel rule? Or do I need multiple rules?

Thanks for any pointers!
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm

Re: Detect a PDF without extension Tue Apr 24, 2018 10:06 pm • by NaOH
I find using Hazel to sort bills one of the easier tasks for it simply because each bill contains distinct information that can be used as rule criteria. Examples include

    my account number,
    the remittance address, or
    service type (e.g., "High-speed Internet Service").

Whether each bill is identifiable as a PDF isn't important for Hazel to work. It's just an option for us users.

So I would not include Kind is PDF as one of the criteria. But you could then try using the Rename action set to Name [no change or whatever you'd like] and Extension [blank for first field and .pdf for the second field].
NaOH
 
Posts: 9
Joined: Thu Mar 22, 2018 4:47 pm

Re: Detect a PDF without extension Tue Apr 24, 2018 10:54 pm • by halloleo
NaOH wrote:I find using Hazel to sort bills one of the easier tasks for it simply because each bill contains distinct information that can be used as rule criteria.


The problem is Hazel does not recognise the content of the document until the document has the extension ".pdf". :-(
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm

Re: Detect a PDF without extension Wed Apr 25, 2018 12:05 am • by NaOH
Testing over here I see better what you're experiencing. I think I've succeeded using the following conditions and corresponding actions. For the conditions, I have ALL of these being required

    Date Added is in the last hour (you could change that to much less I'd think), and
    Extension is blank.

Once those conditions are met, I used Rename, notably setting the Extension option to Set Default > .pdf to add that to the file name.

Certainly, other actions could be taken on your files, but I just tested those parts. Hopefully, this resolves the key issue you've been facing. Depending upon how many files you have running into this, you may need additional rules (or nested rules) for processing the files once they've been properly identified as PDFs.
NaOH
 
Posts: 9
Joined: Thu Mar 22, 2018 4:47 pm

Re: Detect a PDF without extension Wed Apr 25, 2018 11:55 am • by Mr_Noodle
Another option is to write a shell script which uses the "file" command, which tries to identify the type of file based on content, instead of its extension. Not sure if it will work in your case but worth a try.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Detect a PDF without extension Thu Apr 26, 2018 2:39 am • by halloleo
Mr_Noodle wrote:Another option is to write a shell script which uses the "file" command, which tries to identify the type of file based on content, instead of its extension.


Great idea. Thanks a lot. Works a treat: I have a rule "Virgin - Add extension" which checks for the file type via file -b -I and then adds the extension. This rule does its job perfectly.

The next rule "Virgin - Move document" checks the PDF for some Virgin-specific content text (which can now be done with normal Hazel methods, because now the document is for Hazel's purposes a PDF) and moves the document as desired. However this rule never fires, despite the fact that when I preview the rule on the now-renamed documnet the rule gets all green ticks: It matches.

Why does this rule never kicks in? Something with the order of rules?
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm

Re: Detect a PDF without extension Thu Apr 26, 2018 11:25 am • by Mr_Noodle
Does the first rule check for no extension? If not, then the renamed file may be matching that one instead. Note that the first rule to match a file is the one that gets run. If you want Hazel to continue matching rules against a file after a successful match, then you'll need to use the "Continue" action.
Mr_Noodle
Site Admin
 
Posts: 11872
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Detect a PDF without extension Thu Apr 26, 2018 8:19 pm • by halloleo
Thx Paul, now it works like a charm!

Great idea to check in the first rule for "blank extension" and, yes, a "Continue" action was the thing to add. Sorry, should have found this myself in the help...

Thanks again for your patient help.
halloleo
 
Posts: 59
Joined: Thu Apr 27, 2017 10:10 pm


Return to Support