Linux: How To Locate Duplicated Files Quickly
Sat, 25 Feb 2012 13:34:11 +0000
Locate duplicates quickly: I mean only size+filename check, not expensive MD5 sum computation:
find . -printf "%f:%s:%p\n" -type f | \ awk -F: ' { key=$1 " " $2; occur[key]++; loc[key]=loc[key] $3 " " } END { for(key in occur) { if(occur[key]>1) { print key ": " loc[key] } } } ' | sort
A bit of explanation of above magic:
- printf: tells find command to output file metadata instead of only file path (the default), this metadata (size, filename) will be used later
- -F: :We want to handle properly paths with spaces, that's why special separator is used
- key=$1 " " $2: we use file name (without dir) and file size to create ID for this file
- occur: table (key -> number of file occurences)
- loc: maps file ID to list of locations found
- occur[key]>1: we want to show only files that have duplicates
- sort: results are sorted alphabetically for easier navigation