Duplicate files - not removed

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Duplicate files - not removed Mon Jul 20, 2020 2:03 pm • by DavidB
Hello,

Running Hazel 4.4.5 on Mac OS 10.15.6, and having an issue trying to weed out duplicate pdf's:

Have a folder where I've moved all of my references; many duplicates exist and have a "-1" added to the filename. Set up the folder in Hazel and checked "Throw away: Duplicate files" then ran the rule. It removed a few duplicates, but left the vast majority. Looking at the actual/duplicate pdfs and they appear identical and have identical file size.

In debugging, the log shows many instances where it appears a duplicate is found:

-----------
2020-07-20 12:57:38.891 hazelworker[25509] DEBUG: Original names: (
"Am J Respir Crit Care Med 2010 Klok.pdf"
)
2020-07-20 12:57:38.891 hazelworker[25509] DEBUG: Candidate dupe: Am J Respir Crit Care Med 2010 Klok-1 extension: pdf
2020-07-20 12:57:38.891 hazelworker[25509] DEBUG: Candidate dupe found: Am J Respir Crit Care Med 2010 Klok.pdf for file: Am J Respir Crit Care Med 2010 Klok-1.pdf
2020-07-20 12:57:38.891 hazelworker[25509] DEBUG: File was previously diffed and file has not changed. No diff check needed.
------------

Not sure what it means that the "file was previously diffed..." - any advice on how to remove these duplicates?

Many thanks!
DB
DavidB
 
Posts: 2
Joined: Mon Jul 20, 2020 1:50 pm

Re: Duplicate files - not removed Tue Jul 21, 2020 10:11 am • by Mr_Noodle
That means it already checked and determined it was not a duplicate. By chance, are you creating files with the same name and number over and over again?
Mr_Noodle
Site Admin
 
Posts: 11870
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Duplicate files - not removed Tue Jul 21, 2020 11:44 am • by DavidB
Aha... no, not creating duplicates intentionally, but it appears that my reference manager does create duplicates - perhaps with some metadata adjusted that I can't see when I look at the pdf.

So now what I'd like to do is keep the versions ending with "-1" where there is a copy, without losing those files that have no duplicate. Or to put it another way, if there is a filename that has a match to the first part of the filename that then ends in "-1" then I'd like to trash the first file.

Example:
"Crit Care Med 2013 Hallet.pdf"
"Crit Care Med 2013 Walsh.pdf"
"Crit Care Med 2013 Walsh-1.pdf"

Are all files in the folder. I'd like to keep "Crit Care Med 2013 Hallet.pdf" and "Crit Care Med 2013 Walsh-1.pdf" while removing "Crit Care Med 2013 Walsh.pdf".

Any guidance on how to set up a rule to do this would be appreciated!
DavidB
 
Posts: 2
Joined: Mon Jul 20, 2020 1:50 pm

Re: Duplicate files - not removed Wed Jul 22, 2020 10:41 am • by Mr_Noodle
Look up nested conditions in the manual. You can set up a rule to match a file based on other files in the same folder. Give it a try and report back if you have problems with it.
Mr_Noodle
Site Admin
 
Posts: 11870
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City


Return to Support