CMSC 828/838: Information Visualization
Jeff Carver
Application Project: NASA SEL Defect Data using Spotfire
October 5,1999
Motivation and Background
In the software development, successful companies are interested in continually improving their software
development processes. In order to improve their processes there are many pieces of data that can be
examined. One such piece is defect data. Defect data is an important piece of data for software
companies because it shows without a doubt the places that they have made mistakes. Some of those
mistakes may have been unavoidable, but many are due to human error. That error may come in the form
of commission, omission, misunderstandings, etc. If a software development company is able to
determined a specific type of error that is common over many projects, then they have located an ideal
place to start improving their process. In order to do this, data visualization can be very useful.
What a company is looking for is a correlation between some aspect of the defects and the effort taken
to fix those defects.
My Dataset
I have chosen to examine a set of defects that were found in software created in the SEL
(Software Engineering Laboratory) at NASA’s Goddard Space Flight Center. The dataset consists of 3095
defects that were found over multiple projects (Excel file). For a
description of the what the fields represent see Table 1.
And for the possible values for those fields see Table 2.
Using Spotfire, I have attempted to look for correlation between various attributes of these defects
and the effort to fix the defect.
Interesting information about the data
- Confirmation of logical assumptions
- Strong relationship between the number of components examined, and the effort required for fixing
the defect. (Figure 1)
- Strong relationship between the effort required to isolate the defect and the effort required for
fixing the defect. (Figure 2)
- Unexpected findings
- Weak relationship between the severity of the error and the effort required for fixing it. (Figure 3)
- New Information
- The type of defect is correlated to the effort required for fixing it. The ordering from easiest
to most difficult, is Other < Error Correction < Adaptation to Environment < Implementation
of Requirement < Enhancement (Figure 4)
Critique and suggestions for improvement
Spotfire is a good tool if you have data that is continuous on the attribute that you wish to examine.
In the case of my dataset, the attributes of interest have only a small number of discrete values.
Because of this, many of the data point lay on top of each other. The tool is still useful for
finding correlations, but I would suggest a feature similar to the call-out lens presented in the
Moveable Filters paper. Another problem that I had with the Spotfire tool was that many of the
attributes that I was interested in exploring were unordered. Because the attribute values had no
inherent ordering, I would have liked to have been able to rearrange the columns in real time to
examine the relationship among the values. As an example, I wished to determine the order of
increasing difficulty in fixing different types of errors. Because I didn’t know a priori what the
order should be, I could not force an ordering on the columns. The initial ordering of the columns
didn’t tell me the information that I wanted to know (Figure 5). So,
I had to examine the relationship between each pair (Figure 6,
Figure 7, Figure 8,
Figure 9), and I was able to determine the ordering by
hand. Had I been able to reorder the columns, I could have found this result much easier. This is
what the correlation looks like with the newly discovered ordering forced on the columns (Figure 4).
I had the same problem with the class and source information.
Web Accessibility