Rdfind – redundant data find

Rdfind is a program that finds duplicate files. It is useful for compressing backup directories or just finding duplicate files. It compares files based on their content, NOT on their file names.

As of 20180326, the repository has been moved to Github. Signed releases are served here.

Install

There are official packages for newer versions of Debian, Ubuntu and Fedora. Installation is as easy as

$apt-get install rdfind # Debian/Ubuntu
$dnf install rdfind     # Fedora

for those distributions. If you are on Mac, you can install through MacPorts. If you want to compile the source yourself, that is fine. Rdfind is written in c++ and should compile under any *nix. Rdfind is currently running under linux, Mac OS X, Solaris and Windows (using cygwin).

Releases

The packages are signed with keys indicated in the separate column.

Version

Release date

File

Signature

Signing key

Checksum

1.4.1

00ebedd2

2018-11-12

rdfind-1.4.1.tar.gz

rdfind-1.4.1.tar.gz.asc

0x4CC8C397

5334f6d807d85be5f6a1c039b9eabd4d2bf91656 (SHA1)
30c613ec26eba48b188d2520cfbe64244f3b1a541e60909ce9ed2efb381f5e8c (SHA256)

1.4.0

83de27c8

2018-11-09

rdfind-1.4.0.tar.gz

rdfind-1.4.0.tar.gz.asc

0x4CC8C397

8ad9eb99d3dc192d391a4a4959a181d703b19db2 (SHA1)
08a3b9c115c3644d92aed3ee06078d534aa232db864667682963a55e1af04c20 (SHA256)

1.4.0-alpha0

5a9c8ddd

2018-10-28

rdfind-1.4.0-alpha0.tar.gz

rdfind-1.4.0-alpha0.tar.gz.asc

0x4CC8C397

467b569c5d700871793cd1368b04fc00856e1076 (SHA1)
25845eea15e3353125cda47db9d864e7a3dba9e11d5cbc23fdc32f9c5912147c (SHA256)

1.3.5

2017-01-05

rdfind-1.3.5.tar.gz

rdfind-1.3.5.tar.gz.asc

0x4CC8C397

B860b96c156f6dde5c6e3ff52047a20defe21b9d (SHA1)

c36e0a1ea35b06ddf1d3d499de4c2e4287984ae47c44a8512d384ecea970c344 (SHA256)

1.3.4


rdfind-1.3.4.tar.gz

rdfind-1.3.4.tar.gz.asc

0x533B6030

c01bd2910cdec885b6c24164a389457e4f01ef61 (SHA1)

1.3.3


rdfind-1.3.3.tar.gz

rdfind-1.3.3.tar.gz.asc

0x533B6030

70ce33c6c393ba309dc4791c73489a73652a0be6 (SHA1)

1.3.2


rdfind-1.3.2.tar.gz

rdfind-1.3.2.tar.gz.asc

0x533B6030

4893904f895400faa9ca0ea042a97eb4536a820e (SHA1)

1.3.1


rdfind-1.3.1.tar.gz

rdfind-1.3.1.tar.gz.asc


c596e9e0d059e37135c9db62904426e37c879885 (SHA1)

1.3.0


rdfind-1.3.0.tar.gz

rdfind-1.3.0.tar.gz.asc


18a0fab3bd6951aa342d9385c3bc13bf615e1253 (SHA1)


Note to self: export pkg=rdfind-x.x.x-tar.gz; sha256sum $pkg; sha1sum $pkg; gpg -u 0x4CC8C397 -a -b $pkg

Documentation

Please see the man page (latest or 1.4.0) or the built in help (use the –help flag).

Without options, a results file will be created in the current directory and nothing will be removed.

Feature requests

(Note – these should be filed as GitHub issues. The ones here are not moved yet.)

From time to time, I get suggestions and get to know about interesting use cases. I will start to collect them here, which might lead to even better suggestions.

Handle massive amounts of hardlinks correct

Users with files having lots of hardlinks (approximately more than 65000 on ext4) will get into trouble when more files are hard linked to such files. Handling this situation would make rdfind survive the situation by creating a new group of hard linked files. Suggested by J 20121008.

Use database instead of ram for file list

Having lots of files may exhaust system memory. Letting rdfind use a database for the file list instead of memory reduces the load. This will of course be slower, but could be made optional. An additional benefit is that the results can be put in the database. Suggested by Andy Smith 20121010.

Optionally require user and permissons to match

If two different users have equal files, hard linking causes the files to have the same user and permissions afterwards. Adding options -matchuser and -matchperms allows these files to be removed from the deduplication process. Suggested by Andy Smith 20121010.

Cooperate with btrfs deduplication

btrfs has a tool (under development) called bedup for internal deduplication on btrfs filesystems. Letting it operate on groups of files found by rdfind, it is possible to let btrfs store some space. Suggested by XX 20130206.

Control minimum size of files

This is implemented – see Github issue #1

Suggested by Andrew Buehler 20131130.

Two step action

Instead of running with dry-run first and a second invocation without the dry-run, the following was suggested:

This would be useful, but it introduces other problems. The format of the results file must be able to handle file name with newlines etc. A parser must be written which handles syntax errors, missing files etc. This makes me reluctant to implement such a feature. If so, it should probably be coordinated with the database suggestion. Suggested by VB 20140123

Author

Rdfind is written by Paul Dreik (previously Sundvall). If you find this software useful, please drop me an email! The address is x@y.z where x=rdfind, y=pauldreik, z=se.

Suggestions and comments are very welcome.