More options for handling duplicates

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

More options for handling duplicates Sun Nov 28, 2021 7:34 pm • by Lachlan Williams
I would like to know if (or how) it's possible to do more custom things when a duplicate is detected at the time of Hazel moving a file into a folder or sorting into a subfolder. At the moment, I have a rule which renames PDF's to "Payslip - YYYYMMDD" pattern according to attributes in the file and then move and sort into a subfolder based on the year. Ordinarily, I'd set the rule to discard duplicates which have the same name, however there are occasions where there are 2 x payslips with the same date attributes (and would therefore receive the same filename from the rule I have made), but are in fact a different supplementary payslip which I also want to keep. This means I don't want to discard the "duplicate" (which seems to be determined just by name), but to rename it using the option in Hazel.

Are there more sophisticated options available for what to do with the file if it's considered a "duplicate" when moved or sorted into a subfolder? What would be nice if there was a way of checking other metadata about the 2 files to determine if it's really a duplicate file (in which case, discard), or if it just happens to have the same name (in which case rename to something I choose such as "Payslip - YYYYMMDD (Supplementary)" rather than just append a numeric increment like "-1". I realise that I can probably handle a custom rename in the destination folder with it's own rule (and would have to create one for every subfolder), but not sure how to do the compare with original file?

Is there a better way to do this at the time of moving the orginal file? If not, maybe a feature suggestion to add more options for handling file duplicates - determining if duplicate based on certain parameters (not just name?) and allow running more rules based on that (eg: rename again, flag or move/sort again).

Also - I've always wondered what's the difference between:
If file exists:
    ( ) rename the file
    ( ) replace the existing file
    ( ) throw the file away
[ ] Throw away if duplicate

...?

Thanks
Lachlan Williams
 
Posts: 13
Joined: Fri Jun 11, 2021 4:06 am

Re: More options for handling duplicates Mon Nov 29, 2021 12:10 pm • by Mr_Noodle
Can you be more specific about how you would differentiate the files?

As for the difference between the two, a duplicate file is a file that has the same content. If a file exists, it just shares the same name.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Mr_Noodle wrote:Can you be more specific about how you would differentiate the files?

As for the difference between the two, a duplicate file is a file that has the same content. If a file exists, it just shares the same name.


I didn't quite understand your reply, but I'll try to clarify. My assumption is that Hazel determines if it's a duplicate based on just filename, but you've said it's based on content which changes things, I guess. My use case is a weekly payslip which arrives with a filename of "Payslip.pdf". I use Hazel to determine the date of the payslip based on content and name accordingly - "Payslip - YYYYMMDD". I then move to a folder and sort into a subfolder. All this works nicely.

Occasionally, a 2nd payslip will arrive...it's for a different $ amount, but the same pay date. Hazel renames this exactly as it would the first file that it processed because date is the only variable in the name...but I want to keep both. I just want to rename the 2nd file as "Payslip - YYYYMMDD (Extra)" once it gets moved and sorted to the final folder and what I'd like to be able to do is base that "(Extra)" suffix on the fact that there's already a file named "Payslip - YYYYMMDD" there. In other words...the second file to arrive with the same name is considered a "duplicate" (just based on the equivalent name) and that triggers an option of how to handle the "duplicate" - in this case, rename as "[current_filename] (Extra).[extension]". This is what sparked the question about More options for handling duplicates. In this case, it could be to run another rule local to that folder (if file is a duplicate).

I may have this all wrong...let's say that Hazel's determination of a file as "duplicate" is based on content as well as filename (as you've said) - I still think these extra options to hand file duplication would be quite powerful. More options than to just "rename" with increment. There may be clever ways to do this already. I just can't think what they are.

I can see other use cases for this...if, for example, you had apparent duplicates (based on filename) in photos, you might want to file the duplicates to a subfolder called "Dupes" so they don't clutter the main folder, but you don't throw them away. Or, you might want to keep them as "[The_filename]-n" (using default rename behaviour), but then add a Tag called "Check".

And sorry, I still don't understand the difference between "If file exists (throw away)" and "Throw away if duplicate". Does the former just compare filename, but the latter compares content?

And just so I'm really clear, are you saying that if I set my rule to "(if file exists) rename the file" AND "throw away if duplicate", that it would still keep a file with the same name but with different content (and rename as "[name]-n") and only throw away if all content was exactly the same? I still think the extra options on how to treat duplicates would be useful.

Absolutely loving Hazel though...it's changed my life, man! :D
Lachlan Williams
 
Posts: 13
Joined: Fri Jun 11, 2021 4:06 am

Re: More options for handling duplicates Tue Dec 07, 2021 9:56 am • by Mr_Noodle
Yes, duplicates mean same content.

The problem with more options is that everything you suggested is very open ended. It does not lend itself to a small set of checkable options. Instead, you may consider having another rule check for files with the "-n" numbering and deal with those in a separate step.
Mr_Noodle
Site Admin
 
Posts: 11193
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support

cron