I noticed some duplicate code in a program that I help maintain and thought "wouldn't it be nice to have a tool that found duplicate lines of code?"
After a quick chat with a co-worker we decided that we could make something that read in every source line, trimmed it, created a hash and added that hash to a List for that file. Also add that hash to a Map with a List of all of the files and lines where that hash occurs. When all files are read go through all lines with the same hash (from the Map) and look at the next lines to find if those hashes match. It wasn't a perfect design but it would probably work well enough.
Before I went any further I decided to check Google and low and behold I found PMD http://pmd.sourceforge.net/cpd.html. It is on its 3rd algorithm and "now it can process the JDK 1.4 java.* packages in about 4 seconds" and "works with Java, C, C++, and PHP code." I guess there isn't much point in writing my own version. :-) Remember to always do a quick search before coding something new.