Here is some software which I have written. It is published under the GPL.
This searches specified directory trees for duplicate files. When duplicates are found, it will remove the duplicate, and replace it with a hard link to the original file. This is a great way to say significant storage space.
The program is generally run from a command line. It can be silent (by
default) report duplicate files, and report total bytes saved. It is very safe
in operation, as it renames the duplicate file, creates the hard link, then
removes the renamed file. If anything goes wrong, an error message is printed,
and the file is not deleted. The program is written in python, should work
anywhere there is a python interpreter and on file systems that support hard
links.
View readme
View source
Download
This generates a file in each directory of the specified directory tree which has a md5sum for each file in the three. When re-run, it reads the md5sum file and compares the md5sum with each file's current md5sum. It reports any differences, which can be caused by renaming files, adding files, removing files, or if a file gets corrupted. I use it on an entire disk partition to notify me if any files get corrupted. Some special, system directories are skipped.
The program is generally run from a command line. It reports changes on
the command line, and the output can be redirected.
The program is written in python, should work anywhere there is a python
interpreter.
View readme
View source
Download
If you have comments or suggestions, Email me at turbo-www@jdeifik.com