Few controlled experiments have been run in large, commercial software projects. Most are run in the laboratory with only graduate students as subjects. With the help of Lawrence Votta of Lucent Technologies, I was allowed to run these experiments within a live, commercial software development project. This project was developing the compiler and run-time support for the Lucent 5ESS telephone switching system. The finished system contains about 75K lines of C++ code. To manage the daily operations, Harvey Siy joined the development team. The data collection phase ran for 18 months during which time we observed 88 inspections.

Along with Drs. Votta and Siy, I studied the effect of process structure on inspection effort, interval, and effectiveness. Prior to this study, we reviewed several inspection methods and identified key differences in their structure. These methods were Fagan Inspections [2], Active Design Reviews [3], N-Fold Inspections [4], Phased Inspections [5], and Two-Person Inspections [6]. The main “structural” differences are the size of the review team, the number of teams, and the strategy used to coordinate multiple teams.

When a code unit was ready to be inspected, Dr. Siy assigned the method with which it would be inspected (the treatment). He did this by randomly manipulating three independent variables:  the team size (1, 2, or 4 reviewers), the number of inspection teams (1 or 2), and, for 2-team inspections, the coordination strategy (either independent inspections, or two sequential inspections with repair in-between). The reviewers for each inspection were randomly selected without replacement from a pool of 17 experienced software developers. The dependent variables for each inspection included inspection interval, total person-hours of effort, and the observed defect density. We captured many other data, including repair statistics for every defect. Our null hypothesis was that none of the independent variables affected inspection effort, interval, or effectiveness.

Surprisingly, none of the independent variables had a significant effect on effectiveness. Consequently, we suspect that simply restructuring the inspection process (the approach taken by most research in this area) will not significantly increase effectiveness. We also found that many of the assumptions underlying contemporary methods did not hold in practice. While inspections with one reviewer were less effective than those with two, inspections with two reviewers were no less effective than those with four (an argument for smaller teams). Also, teams that reviewed the same code unit found few common defects (an argument against multiple team, sequential inspections).

Not surprisingly, effort appears to be driven solely by the total number of reviewers participating in the inspection. None of these independent variables had a significant effect on interval. However, the pre-meeting interval (time from start of inspection to the beginning of Team analysis) of 2-team, 2-person inspections with repair in-between was about twice as long (4 weeks versus 2 weeks) as all other pre-meeting intervals. Further investigation showed that the time needed to schedule the second inspection was responsible for the delay. These findings suggest that even simple types of coordination may substantially increase a process’ interval.

See Porter et al. [8][9] for a complete description of the experiment and its results.