Duplicate code

Duplicate code detection allows you to find code that has been generated by Copy/Paste programming. Duplicate code typically leads to higher maintenance costs because bugs will need to be fixed twice, more code needs to be tested, and so on.

There are many trade-offs when writing a duplicate code detection tool. Some of the conflicting goals are:

  • Speed
  • Low memory usage
  • Avoiding false alarms
  • Support for arbitrary programming languages (Java, JSP, C++, ...)
  • Support for fuzzy matches (comments, whitespace, linebreaks, variable renaming, etc.)

The check provided here, StrictDuplicateCode, is fast enough to facilitate checking very large code bases in acceptable time (minutes). It consumes very little memory, false alarms are impossible. While it supports multiple languages, it does not support fuzzy matches (that's why it's called Strict).

Note that there are brilliant commercial implementations of duplicate code detection tools. One that is particularly noteworthy is Simian from Simon Harris. Simian has managed to find a very good balance of the above tradeoffs. It is superior to the checks in this package in many respects. Simian is reasonably priced (free for noncommercial projects) and includes a Checkstyle plugin. We encourage all users of Checkstyle to evaluate Simian as an alternative to the Checks we offer in our distribution.

The following table summarizes the characteristics of the available Checkstyle plugins for duplicate code detection:

Name Speed Memory Usage False Alarms Supported languages Fuzzy matches
StrictDuplicateCode High Very Low Impossible any language No
Simian Very high Low Possible but very unlikely many languages, including Java and C/C++/C# Limited support


Performs a line-by-line comparison of all code lines and reports duplicate code if a sequence of lines differs only in indentation. All import statements in Java code are ignored, any other line - including javadoc, whitespace lines between methods, etc. - is considered (which is why the check is called strict).


name description type default value
min how many lines must be equal to be considered a duplicate int 12
fileExtensions file type extension of files to process String Set {}


To configure the check:

 <module name="StrictDuplicateCode"/>

To configure the check so that it allows larger equivalent blocks:

 <module name="StrictDuplicateCode">
   <property name="min" value="15"/>



Parent Module