How to remove duplicated files in a directory?
fdupes is the tool of your choice. To find all duplicate files (by content, not by name) in the current directory:
fdupes -r .
To manually confirm deletion of duplicated files:
fdupes -r -d .
To automatically delete all copies but the first of each duplicated file (be warned, this warning, this actually deletes files, as requested):
fdupes -r -f . | grep -v '^$' | xargs rm -v
I'd recommend to manually check files before deletion:
fdupes -rf . | grep -v '^$' > files
... # check files
xargs -a files rm -v
bash 4.x
#!/bin/bash
declare -A arr
shopt -s globstar
for file in **; do
[[ -f "$file" ]] || continue
read cksm _ < <(md5sum "$file")
if ((arr[$cksm]++)); then
echo "rm $file"
fi
done
This is both recursive and handles any file name. Downside is that it requires version 4.x for the ability to use associative arrays and recursive searching. Remove the echo
if you like the results.
gawk version
gawk '
{
cmd="md5sum " q FILENAME q
cmd | getline cksm
close(cmd)
sub(/ .*$/,"",cksm)
if(a[cksm]++){
cmd="echo rm " q FILENAME q
system(cmd)
close(cmd)
}
nextfile
}' q='"' *
Note that this will still break on files that have double-quotes in their name. No real way to get around that with awk
. Remove the echo
if you like the results.
You can try FSLint. It has both command line and GUI interface.