Evolution

Software maintenance is a significant problem whose solution will save a great deal of money throughout the software industry. In pursuit of these savings much research is being done. However, as with other software engineering areas, there is a concern that our efforts lack hard evidence and critical evaluation, and that without these, we can't develop a deep understanding of what tools and processes work, when they work, or why. Consequently, many people believe that rigorous empirical methods must be one of the cornerstones of research in this field.

The Code Decay Project.

We are conducting a long-term, multidisciplinary project to discover the fundamental causes, symptoms, and remedies for code decay. The project team contains researchers in Statistics, Experimentation, Organizational Theory, Programming Languages, Software Engineering, and Visualization.

Our primary object of study is the AT&T 5ESS™ switching system. It is composed of more than 50 subsystems and contains over 18 million lines of code. Along with the source we have the system's change control history for the past 15 years covering 3.6 million code changes implementing 672,000 change requests. We also have data on its planned and actual development milestones, effort and testing data, organizational history, development policies, and coding standards.

Our goals are to (1) define response variables and document the existence of code decay, (2) develop code decay indices, (3) identify factors causing it, and (4) create and evaluate prevention strategies.

Responses. Code decay may be very complex or occur only in certain circumstances and there may be several ways to measure it. (effort, total changes, fixes on fixes). Therefore, we will examine a variety of definitions for code decay, document its presence in our data, and develop precise measures (response variables) for it.
Indices. An index is a surrogate measure used when a response variable is difficult to calculate. It can also serve as a predictor function when the response variable isn't well understood.
Factors. After developing responses and indices, we will attempt to determine which factors inside the code or organization cause code decay.
Strategies. Finally, we will attempt to identify and then evaluate strategies for preventing or retarding code decay..

My specific interests lie in the possibility of analyzing the version control systems of large development efforts. These systems contain significant amounts of data that could be, but are not currently being, exploited in the study of system evolution. I along with my student Jung-Min Kim, and Drs. Siy and Thomas Ball are exploring novel uses of this information. Currently, we have derived VCS-related metrics, like connection strength based on the probability that two classes are modified together. We are exploring several extensions to this work including time-series analyses, improved visualization techniques, and automated restructuring.

This project is sponsored by the National Science Foundation and the project team contains researchers in programming languages, software engineering, statistics, and scientific visualization. Our industrial partner is Lucent Technologies.

See Karr et al.[1], Porter [2], and Ball [3] for more information.