Basics - Need Help

Get help. Get answers. Let others lend you a hand.

Moderator: Mr_Noodle

Basics - Need Help Sat Dec 29, 2012 1:42 pm • by rappahannock
OK, Let me confess right now, I'm no genius. Now that that's out of the way, how do I do a rule that instructs Hazel to eliminate duplicate files from my external hard drive? That is, not to move anything anywhere except to the trash. The "create rule" dialogue doesn't seem to include the attributes that would stipulate that if one or more files contain identical content, then move all but one to the trash. Can someone help? I'm really stuck here.
Thanks
S
rappahannock
 
Posts: 4
Joined: Sat Dec 29, 2012 1:36 pm

Re: Basics - Need Help Sat Dec 29, 2012 5:31 pm • by a_freyer
What kinds of files are you looking to search through? Images? Text? PDFs?

Brass tacks, it will not be simple to accomplish this with Hazel. There are many duplicate file removal applications on the App Store, but if you'd like to use Hazel, we will have to know the filestypes.
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado

Re: Basics - Need Help Sat Dec 29, 2012 5:37 pm • by rappahannock
Thanks for your swift reply. I'd like to go through all these categories. I was assuming it was possible because there's a box to check/uncheck for duplicates. Anything you can tell me to about how to accomplish this would be welcome. I got Hazel (mistakenly clearly) to do this (among other things).
Also if there is an app that does this specific thing safely and effectively I am open to recommendations. But if I can do it with Hazel; all the better, since I've already got it installed. Thanks S
rappahannock
 
Posts: 4
Joined: Sat Dec 29, 2012 1:36 pm

Re: Basics - Need Help Sat Dec 29, 2012 5:44 pm • by a_freyer
The "duplicate" option you see removes duplicately named files only. There isn't a real capacity for searching the content of files.

EDIT - The above is only partially correct, see Mr_Noodle's answer below; My apologies for the misinformation.

I really recommend going with different software for this problem. Finding duplicate text content is relatively straightforward, but still requires an external script. Finding duplicate images is very difficult in script because images can vary in size, brightness, contrast, etc. Most duplicate image applications use a histogram comparison algorithm which is not at all simple to code in AppleScript or shell scripts. PDFs have the same problem as images, unless they are OCR'd.

Because of the complexity, there are generally not free options available, however there might be some inexpensive ones. A friend of mine has recommended Gemini, although I have no experience with it.

Think of Hazel as a maid - she's only there to keep things organized, but she'll get uncomfortable throwing your stuff away unless you tell her very specifically that it's ok.
Last edited by a_freyer on Wed Jan 02, 2013 6:42 pm, edited 2 times in total.
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado

Re: Basics - Need Help Sat Dec 29, 2012 5:48 pm • by rappahannock
OK -- I've just downloaded Gemini and I'll see how well it does. Thanks
S
rappahannock
 
Posts: 4
Joined: Sat Dec 29, 2012 1:36 pm

Re: Basics - Need Help Sat Dec 29, 2012 5:53 pm • by a_freyer
Sure thing!

As a new Hazel user, I suggest looking through the help and the Tips and Tricks forums for the types of things that you can do with Hazel. Although it isn't designed to do what you'd like to do, it really is some of the most essential software you can have on your mac.
a_freyer
 
Posts: 631
Joined: Tue Sep 30, 2008 9:21 am
Location: Colorado

Re: Basics - Need Help Wed Jan 02, 2013 2:51 pm • by Mr_Noodle
To clarify, the duplicate checkbox will check duplicate contents, but only for files which follow a pattern of being downloaded or copied multiple times. In such cases, a number is usually tacked on making it easy for Hazel to determine which file to keep (i.e. the original one without the number).

Checking for duplicates in general is much broader. For one, it's not clear which one should be kept since people have different notions of which copy is the real one. It could be based on name, date or which folder it happens to be in or some other arbitrary thing. Also, it's a very time consuming operation and you don't want this happening to every single file every time Hazel does a scan.

In short, as already mentioned, a dedicated tool for this is probably better as it takes time and well as requires user input for every case.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Basics - Need Help Wed Jan 02, 2013 3:02 pm • by rappahannock
I get it... it's complicated. Haven't had a chance to run Gemini yet. I'd have thought that dupes would be determined by some bit-for-bit comparison. Seems to me w/fast processors this would be a snap, no?
S
rappahannock
 
Posts: 4
Joined: Sat Dec 29, 2012 1:36 pm

Re: Basics - Need Help Thu Jan 03, 2013 1:21 pm • by Mr_Noodle
It's not the comparing part; it's the part where you read both files from disk. Even with SSDs, disk access is extremely slow in relation to CPU speeds. And multiply this times every file that Hazel might see.
Mr_Noodle
Site Admin
 
Posts: 11195
Joined: Sun Sep 03, 2006 1:30 am
Location: New York City

Re: Basics - Need Help Wed Dec 01, 2021 3:07 am • by StanleySWells
Terminal can help you locate duplicate files https://www.imymac.com/duplicate-finder/find-duplicate-file.html and delete them.
Open Terminal and select the folder where you want to find duplicate files.
Locate the folder with a cd command.
That is to say, if you want to choose Downloads folder, just enter cd ~/Downloads and press Enter.
Then enter this command: find . -size 20 \! -type d -exec cksum {} \; | sort | tee /tmp/f.tmp | cut -f 1,2 -d ‘ ‘ | uniq -d | grep -hif – /tmp/f.tmp > duplicates.txt
Hit Enter once again.
Then, open Finder and go to Documents folder. Open the Documents.txt file.
That is where you can view those duplicate files.
StanleySWells
 
Posts: 1
Joined: Wed Dec 01, 2021 2:53 am


Return to Support