Clone Code Detection Tools and Algorithms
Out of my research need, I have collected the list of publicly available tools for "clone detection", that is, the detection of identical or similar code fragments from the given source programs.
| Name | Supported languages | Approach | License | Usage |
|---|---|---|---|---|
| Duploc | C, C++, Java | line-by-line string matching | GPL, written in Smalltalk | |
| PHP:Duploc | PHP | ??? | GPL, written in PHP | |
| pmd (cpd) | Java | Karp-Rabin string matching algorithm | BSD-style | Use ant task shown on the link (for avoiding file names enumeration) |
| Simian | Java, C#, C, C++, COBOL, Ruby, JSP, ASP, HTML, XML, Visual Basic, or any text files | ??? | Free for non-commercial and open source use, written in .NET 1.1 or Java 1.4, source code unavailable | java -jar .../simian-2.2.4.jar -recurse *.java |
| SimScan | Java | works on the parsed source tree (using ANTLR parser) | Free for non-commercial and open source use, written in Java, source code unavailable | |
| dupwatch (dupwatch.jar, dupwatch.tgz) | Java | Finding duplicated code via "metric fingerprints" | ??? |
- Code clone related tools summarized by Osaka University, the developer of CloneWarrior/CCFinder/Gemini (currently available only upon request)
- JPlag: tool for detecting software plagiarism. account available on request
- Moss: a tool for detecting cheating in university programming classes. Free internet service available only to instructors in programming courses
- CloneDR (Clone Doctor), famous, but commercial tool. Their paper says its based on AST ignoring identifier names
- Dup and Pdiff by Brenda S. Baker at Bell lab. Only the pepers are available
- Dotplot: similarity pattern visualization tool