Detect a PDF without extension

I have one folder I drop all my regular bills in (Internet, mobile phone, etc) and let Hazel file them to the appropriate directories in my system. These come either from emails or from my browser.

These files are essentially all PDFs, each rules starts with a condition checking for the file kind as PDF and then has further conditions for content or one of the Spotlight metadata (like Where From or Description). So far so good. One provider though provides the PDF without extension. A file name might look like this:

Code: Select all: Doc20180422201701

This is no problem for Preview, but unfortunately for Hazel: When I double click the downloaded file, Preview shows it as what it is: a PDF; Hazel on the other hand says it's kind is not PDF and content or metadata do not match either as expected.

What to do to identify the document and then move it accordingly?

ps: If I by hand add the PDF extension, so that the file name reads

Code: Select all: Doc20180422201701.pdf

then Hazel recognises it and all condition magically become true as expected.

Select the file in Finder and do "Get Info". What is the "Kind" there? Also, is the extension set as hidden?

Mr_Noodle wrote:Select the file in Finder and do "Get Info". What is the "Kind" there? Also, is the extension set as hidden?

Get Info says "TextEdit.app Document". Extensions are not hidden. If I make a copy of the document and add ".pdf" to the name, Get Info says "PDF document".

But of course I do not want to add manually extensions to downloaded files...

If Finder thinks it's a text doc then Hazel will as well. You'll need to figure out some other way to identify the file. Maybe go by the Source URL, if that is available, and add the extension.

Mr_Noodle wrote:If Finder thinks it's a text doc then Hazel will as well. You'll need to figure out some other way to identify the file. Maybe go by the Source URL, if that is available, and add the extension.

Trouble is, the main way to identify the document is by file name and some content text, but Hazel seems to recognise the content only when the file has the correct extension.

So, does this mean, I somehow have to set up a rule which does the following:

1. Add to all file adhering to the naming condition the .pdf extension
2. Then check for some text in the content and if the content is correct move the file as usual
3. But if the content is not correct delete the extension

Is this the best way to go about it? How do I do this in a Hazel rule? Or do I need multiple rules?

Thanks for any pointers!

I find using Hazel to sort bills one of the easier tasks for it simply because each bill contains distinct information that can be used as rule criteria. Examples include

Whether each bill is identifiable as a PDF isn't important for Hazel to work. It's just an option for us users.

So I would not include Kind is PDF as one of the criteria. But you could then try using the Rename action set to Name [no change or whatever you'd like] and Extension [blank for first field and .pdf for the second field].

NaOH wrote:I find using Hazel to sort bills one of the easier tasks for it simply because each bill contains distinct information that can be used as rule criteria.

The problem is Hazel does not recognise the content of the document until the document has the extension ".pdf". :-(

Testing over here I see better what you're experiencing. I think I've succeeded using the following conditions and corresponding actions. For the conditions, I have ALL of these being required

Once those conditions are met, I used Rename, notably setting the Extension option to Set Default > .pdf to add that to the file name.

Certainly, other actions could be taken on your files, but I just tested those parts. Hopefully, this resolves the key issue you've been facing. Depending upon how many files you have running into this, you may need additional rules (or nested rules) for processing the files once they've been properly identified as PDFs.

Another option is to write a shell script which uses the "file" command, which tries to identify the type of file based on content, instead of its extension. Not sure if it will work in your case but worth a try.

Mr_Noodle wrote:Another option is to write a shell script which uses the "file" command, which tries to identify the type of file based on content, instead of its extension.

Great idea. Thanks a lot. Works a treat: I have a rule "Virgin - Add extension" which checks for the file type via file -b -I and then adds the extension. This rule does its job perfectly.

The next rule "Virgin - Move document" checks the PDF for some Virgin-specific content text (which can now be done with normal Hazel methods, because now the document is for Hazel's purposes a PDF) and moves the document as desired. However this rule never fires, despite the fact that when I preview the rule on the now-renamed documnet the rule gets all green ticks: It matches.

Why does this rule never kicks in? Something with the order of rules?

Does the first rule check for no extension? If not, then the renamed file may be matching that one instead. Note that the first rule to match a file is the one that gets run. If you want Hazel to continue matching rules against a file after a successful match, then you'll need to use the "Continue" action.

Thx Paul, now it works like a charm!

Great idea to check in the first rule for "blank extension" and, yes, a "Continue" action was the thing to add. Sorry, should have found this myself in the help...

Thanks again for your patient help.