In most disciplines, the evolution of knowledge involves learning by observing, formulating theories, and experimenting. Theory formulation represents the encapsulation of knowledge and experience. It is used to create and communicate our basic understanding of the discipline. Checking that our understanding is correct involves testing our theories, i.e., experimentation in some form. Analyzing the results of the experimental study promotes learning and the ability to change and refine our theories. These steps take time which is why the understanding of a discipline, and its research methods, evolves over time.

The paradigm of encapsulation of knowledge into theories and the validation and verification of those theories based upon experimentation, empirical evidence, and experience is used in many fields, e.g., physics, medicine, manufacturing.

What do these fields have in common? They evolved as disciplines when they began learning by applying the cycle of observation, theory formulation, and experimentation. In most cases, they began with observation and the recording of what was observed in theories or specific models. They then evolved to manipulating the variables and studying the effects of change in the variables.

How does the paradigm differ for these fields? The differences lie in the objects they study, the properties of those objects, the properties of the system that contain them, and the relationship of the objects to the system. So differences exist in how the theories are formulated, models are built, and how studies are performed; often affecting the details of the research methods.

Software engineering has things in common with each of these other disciplines and several differences.

In physics, there are theorists and experimentalists. The discipline has progressed because of the interplay between both groups. Theorists build models (to explain the universe). These models predict the results of events that can be measured. The models may be based upon theory from understanding the essential variables and their interaction or data from prior experiments, or better yet, from both. Experimentalists observe and measure, i.e., carry out studies to test or disprove a theory or to explore a new domain. But at whatever point the cycle is entered there is a pattern of modeling, experimenting, learning and remodeling.

The early Greek model of science was that observation followed by logical thought, was sufficient for understanding. It took Galileo, and his dropping of balls off the tower at Pisa, to demonstrate the value of experimentation. Eddington's study of the 1919 eclipse differentiated the domain of applicability of Einstein's theories vs. Newton's.

In medicine, we have researchers and practitioners. The researcher aims at understanding the workings of the human body and the effects of various variables, e.g., procedures and drugs. The practitioner aims at applying that knowledge by manipulating those variables for some purpose, e.g., curing an illness. There is a clear relationship between the two; knowledge is often built by feedback from the practitioner to the researcher.

Medicine began as an art form. It evolved as a field when it began observation and theory formulation. For example, Harvey's controversial theory about the circulation of blood through the body was the result of many careful experiments; performed while he practiced medicine in London. Experimentation varies from controlled experiments to qualitative analysis. Depending on the area of interest, data may be hard to acquire. Human variance causes problems in interpreting results. However, our knowledge of the human body has evolved overtime.

The focus in manufacturing is to better understand and control the relationship between process and product for quality control. The nature of the discipline is that the same product is generated, over and over, based upon a set of processes, allowing the building of models with small tolerances. Manufacturing made tremendous strides in improving productivity and quality when it began to focus on observing, model building, and experimenting with variations in the process, measuring its effect on the revised product, building models of what was learned.

This journal is dedicated to the position that like other disciplines, software engineering requires the cycle of model building, experimentation, and learning; the belief that software engineering requires an empirical study as one of its components. There are researchers and practitioners. Research has an analytic and experimental component. The role of the researcher is to build models of and understand the nature of processes, products, and the relationship between the two in the context of the system in which they live. The practitioners role is to build "improved" systems, using the knowledge available and to provide feedback. But like medicine (e.g. Harvey), the distinction between researcher and practitioner is not absolute, some people do both at the same time or at different times in their careers. This mix is especially important in planning empirical studies and when formulating models and theories.

Like manufacturing, these roles are symbiotic. The researcher needs laboratories and they only exist where practitioners build software systems. The practitioner needs to better understand how to build systems more productively and profitably; the researcher can provide the models to help this happen.

Just as the early model of science evolved from learning based purely on logical thought, to learning via experimentation, so must software engineering evolve. It has a similar need to move from simple assertions about the effects of a technique to a scientific discipline based upon observation, theory formulation, and experimentation.

To understand how model building and empirical studies need to be tailored to the discipline, we first need to understand the nature of the discipline. What characterizes the software engineering discipline? Software is development not production. Here it is unlike manufacturing. The technologies of the discipline are human based. It is hard to build models and verify them via experiments - as with medicine. As with the other disciplines, there are a large number of variables that cause differences and their effects need to be studied and understood. Currently, there is a lack of models that allow us to reason about the discipline, there is a lack of recognition of the limits of technologies for certain contexts, there is a lack of analysis and experimentation.

There have been empirical analysis and model building in software engineering but the studies are often isolated events. For example, in one of the earliest empirical studies, Belady & Lehman ('72,'76) observed the behavior of OS 360 with respect to releases. They posed several theories that were based upon their observation concerning the entropy of systems. The idea of entropy - that you might redesign a system rather than continue to change it was a revelation. On the other hand, Basili & Turner ('75) observed that a compiler system being developed, using an incremental development approach, gained structure over time. This appears contradictory. But under what conditions is each phenomenon true? What were the variables that caused the different effects? What were the different variables in the second case? Where are the studies that provide some insights into the effect of such variables as size, methods, the nature of the changes? We can hypothesize, but what evidence do we have to support those hypotheses?

In another area, Walston and Felix ('79) identified 29 variables that had an effect on software productivity in the IBM FSD environment. Boehm ('81) observed that 15 variables seemed sufficient to explain/predict the cost of a project across several environments. Bailey and Basili ('81) identified 2 composite variables that when combined with size were a good predictor of effort in the SEL environment. There were many other cost models at the time. Why were the variables different? What did the data tell us about the relationship of variables?

Clearly the answer to these questions require more empirical studies that will allow us to evolve our knowledge of the variables of the discipline and the effects of their interaction.

In our discipline, there is little consensus on terminology, often depending upon whether the ancestry of the researcher is the physical sciences, social sciences, medicine, etc.. One of the roles of this journal is to begin to focus on a standard set of definitions.

We tend to use the word experiment broadly, i.e. as a research strategy in which the researcher has control over some of the conditions in which the study takes place and control over the independent variables being studied; an operation carried out under controlled conditions in order to discover an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law. This term thus includes quasi-experiments and pre-experimental designs. We use the term study to mean an act or operation for the purpose of discovering some thing unknown or of testing a hypothesis. This covers various forms of research strategies, including all forms of experiments, qualitative studies, surveys, and archival analyses. We reserve the term controlled experiment to mean an experiment in which the subjects are randomly assigned to experimental conditions, the researcher manipulates an independent variable, and the subjects in different experimental conditions are treated similarly with regard to all variables except the independent variable.

As a discipline software engineering, and more particularly, the empirical component is at a very primitive stage in its development. We are learning how to build models, how to design experiments, how to extract useful knowledge from experiments, and how to extrapolate that knowledge. We believe there is a need for all kinds of studies: descriptive, correlational, cause-effect studies; studies on novices and experts, studies performed in a laboratory environment or in real projects, quantitative and qualitative studies, and replicated studies.

We would expect that over time, we will see a maturing of the empirical component of software engineering. The level of sophistication of the goals of an experiment and our ability to understanding interesting things about the discipline will evolve over time. We would like to see a pattern of knowledge building from series of experiments; researchers building on each others' work, combining experimental results; studies replicated under similar and differing conditions.

This journal is a forum for that learning process. Our experiments in some cases, like those in the early stages of other disciplines, will be primitive. They will have both internal and external validity problems. Some of these problems will be based upon the nature of the discipline, affecting our ability to generate effective models or effective laboratory environments. These problems will always be with us, as they are with any discipline as it evolves and learns about itself. Some problems will be based on our immaturity in understanding experimentation as a discipline, e.g. not choosing the best possible experimental design, not choosing the best way to analyze the data. But we can learn from weakly designed experiments how to design them better. We can learn how to better analyze the data. This journal encourages people to discuss the weaknesses in their experiments. We encourage authors to provide their data to the journal so that other researchers may re-analyze them.

The journal supports the publication of artifacts and laboratory manuals. For example, in this issue, the paper "The Empirical Investigation of Perspective-based Reading" has associated with it a laboratory manual that will be furnished as part of the ftp site at Kluwer. It contains everything needed to replicate the experiment, including both the artifacts used and the procedures for analysis. It is hoped that the papers in this journal will reflect success and failures in experimentation, they will display the problems and attempts at learning how to do things better. At this stage we hope to be open and support the evolution of the experimental discipline in software engineering.

We ask researchers to critique their own experiments and we ask reviewers to evaluate experiments in the context of the current state of the discipline. Remember, that because youth of the experimental side of our discipline, our expectations cannot yet be the same as those of the more mature disciplines, such as physics and medicine.

This goal of this journal is to contribute to a better scientific and engineering basis for software engineering.


J. Bailey, V. R. Basili, "A Meta-Model for Software Development Resource Expenditures," Proceedings of the Fifth International Conference on Software Engineering, San Diego, USA, pp. 107-116, March1981.

V. R. Basili, A. J. Turner, "Iterative Enhancement: A Practical Technique for Software Development," IEEE Transactions on Software Engineering, vol. SE-1, no. 4, December 1975.

L. A. Belady and M. M. Lehman, "An Introduction to Growth Dynamics", Statistical Computer Performance Evaluation, Academic Press, New York, 1972.

L. A. Belady and M. M. Lehman, "A Model of Large Program Development, IBM Systems Journal, Vol. 15. No. 3, pp. 225-252, 1976.

B. W. Boehm, "Software Engineering Economics, "Prentice-Hall, Englewood Cliffs, NJ, 1981.

C. Walston and C. Felix, "A Method of Programming Measurement and Estimation", IBM Systems Journal, Vol. 16. No. 1, pp. 54-73, 1977.

Vic Basili

Empirical Software Engineering, vol.1 no.2, 1996