Man page of rdfind
Section: rdfind (1)
Return to Main Contents
rdfind - finds duplicate files
rdfind [ options ]
directory1 | file1
directory2 | file2
finds duplicate files across and/or within several directories. It calculates
checksum only if necessary.
rdfind runs in O(Nlog(N)) time with N being the number of files.
If two (or more) equal files are found, the program decides which of
them is the original and the rest are considered duplicates. This
is done by ranking the files to each other and deciding which has the
highest rank. See section RANKING for details.
If you need better control over the ranking than given, you can use
some preprocessor which sorts the file names in desired order and then
run the program using xargs. See examples below for how to use find
and xargs in conjunction with rdfind.
To include files or directories that have names starting with -, use
rdfind ./- to not confuse them with options.
Given two or more equal files, the one with the highest rank is
selected to be the original and the rest are duplicates. The rules of
ranking are given below, where the rules are executed from start until
an original has been found. Given two files A and B which have equal
content, the ranking is as follows:
If A was found while scanning an input argument earlier than than B, A
is higher ranked.
If A was found at a depth lower than B, A is higher ranked (A closer
to the root)
If A was found earlier than B, A is higher ranked.
The last rule is needed when two files are found in the same directory
(obviously not given in separate arguments, otherwise the first rule applies)
and gives the same order between the files as the operating system
delivers the files while listing the directory. This is operating
system specific behaviour.
Searching options etc:
- -ignoreempty true|false
Ignore empty files. (default)
- -followsymlinks true|false
Follow symlinks. Default is false.
- -removeidentinode true|false
removes items found which have identical inode and device ID. Default
- -checksum md5|sha1
what type of checksum to be used: md5 or sha1. Default is md5.
- -makesymlinks true|false
Replace duplicate files with symbolic links
- -makehardlinks true|false
Replace duplicate files with hard links
- -makeresultsfile true|false
Make a results file results.txt (default) in the current directory.
- -outputname name
Make the results file name to be "name" instead of the default results.txt.
- -deleteduplicates true|false
Delete (unlink) files.
- -sleep Xms
sleeps X milliseconds between reading each file, to reduce
load. Default is 0 (no sleep). Note that only a few values are
supported at present: 0,1-5,10,25,50,100 milliseconds.
- -n -dryrun
displays what should have been done, dont actually delete or link anything.
- -h, -help, --help
displays brief help message.
- -v, -version, --version
displays version number.
- Search for duplicate files in home directory and a backup directory:
rdfind ~ /mnt/backup
- Delete duplicate in a backup directory:
rdfind -deleteduplicates true /mnt/backup
- Search for duplicate files in directories called foo:
find . -type d -name foo -print0 |xargs -0 rdfind
(the default name is results.txt and can be changed with option outputname,
see above) The results file results.txt will contain one row per duplicate file
found, along with a header row explaining the columns.
A text describes why the file is considered a duplicate:
DUPTYPE_UNKNOWN some internal error
DUPTYPE_FIRST_OCCURRENCE the file that is considered to be the original.
DUPTYPE_WITHIN_SAME_TREE files in the same tree (found when processing
the directory in the same input argument as the original)
DUPTYPE_OUTSIDE_TREE the file is found during processing another input
argument than the original.
0 on success, nonzero otherwise.
When specifying the same directory twice, it keeps the first
encountered as the most important (original), and the rest as
duplicates. This might not be what you want.
The symlink creates absolute links. This might not be what you
want. To create relative links instead, you may use the symlinks (2)
command, which is able to convert absolute links to relative links.
Older versions unfortunately contained a misspelling on the word
occurrence. This is now corrected (since 1.3), which might affect
user scripts parsing the output file written by rdfind.
There are lots of enhancements left to do. Please contribute!
Avoid manipulating the directories while rdfind is reading.
rdfind is quite brittle in that case. Especially, when deleting
or making links, rdfind can be subject to a symlink attack.
Use with care!
Paul Dreik 2006, reachable at email@example.com
Rdfind can be found at http://rdfind.pauldreik.se/
Do you find rdfind useful? Drop me a line! It is always fun to
hear from people who actually use it and what data collections
they run it on.
Several persons have helped with suggestions and improvements:
Niels Möller, Carl Payne and Salvatore Ansani. Thanks also to you
who tested the program and sent me feedback.
1.3.3 (release date 2013-06-18)
svn id: $Id: rdfind.1.html 299 2013-06-18 11:21:37Z paul $
This program is distributed under GPLv2 or later, at your option.
- EXIT VALUES
- SECURITY CONSIDERATIONS
- SEE ALSO
This document was created by
using the manual pages.
Time: 11:18:37 GMT, June 18, 2013