Dariusz on Software: Linux: How To Locate Duplicated Files Quickly

Linux: How To Locate Duplicated Files Quickly

Sat, 25 Feb 2012 13:34:11 +0000

Locate duplicates quickly: I mean only size+filename check, not expensive MD5 sum computation:

find . -printf "%f:%s:%p\n" -type f | \
    awk -F: '
        {
            key=$1 " " $2;
            occur[key]++;
            loc[key]=loc[key] $3 " "
        }
        END {
            for(key in occur) {
                if(occur[key]>1) {
                    print key ": " loc[key]
                }
            }
        }
    ' | sort

A bit of explanation of above magic:

printf: tells find command to output file metadata instead of only file path (the default), this metadata (size, filename) will be used later
-F: :We want to handle properly paths with spaces, that's why special separator is used
key=$1 " " $2: we use file name (without dir) and file size to create ID for this file
occur: table (key -> number of file occurences)
loc: maps file ID to list of locations found
occur[key]>1: we want to show only files that have duplicates
sort: results are sorted alphabetically for easier navigation

Tags: linux.

Dariusz on Software

Methods and Tools

About This Site

Archive

Tags