I have been using Anti-Dupl.net for quite some time now.
It makes and stores databases of your collection(s), finds similarities & differences, remembers false positives, is very fast, etc.
http://antidupl.sourceforge.net/english/index.html
e.d.
P.S.: It is free & open source, too.