Here is some software which I have written. It is published under the GPL.

Disk deduplicator

Intro

This searches specified directory trees for duplicate files. When duplicates are found, it will remove the duplicate, and replace it with a hard link to the original file. This is a great way to say significant storage space.

Features

The program is generally run from a command line. It can be silent (by default) report duplicate files, and report total bytes saved. It is very safe in operation, as it renames the duplicate file, creates the hard link, then removes the renamed file. If anything goes wrong, an error message is printed, and the file is not deleted. The program is written in python, should work anywhere there is a python interpreter and on file systems that support hard links.
View readme  View source  Download

md5sum a directory tree

Intro

This generates a file in each directory of the specified directory tree which has a md5sum for each file in the three. When re-run, it reads the md5sum file and compares the md5sum with each file's current md5sum. It reports any differences, which can be caused by renaming files, adding files, removing files, or if a file gets corrupted. I use it on an entire disk partition to notify me if any files get corrupted. Some special, system directories are skipped.

Features

The program is generally run from a command line. It reports changes on the command line, and the output can be redirected. The program is written in python, should work anywhere there is a python interpreter.
View readme  View source  Download

If you have comments or suggestions, Email me at turbo-www@jdeifik.com