Assume you have a directory or directories of files. Some of them are duplicated. You want to remove the duplicates to conserve space. This is the program in Python:

# - Sat, 23 Apr 2011 00:03:30 -0400
# As command line arguments are files of md5sum's. The md5sums are read one
# by one in their order and all duplicated files are removed, only the
# files of their first appearance are kept in their old path.

import sys;
import os.path;

f = {}	# File hashes

for file in sys.argv:
	md5file = open(file)
	for line in
		if f.has_key(line[0:32]):
			print "Remove "+line[34:]
			print "Keep "+line[34:]
			f[line[0:32]] = line[34:]

The code assumes the files to search for duplicates are filtered through md5sum, i.e.

find dir/ -type f -exec md5sum \{\} \; > md5sum.file

The files of md5sum are provided as command line arguments. The order is significant: in case of duplicate according to MD5 hash, only the first appeared file is kept.