CMSC 828/838: Information Visualization

Jeff Carver

Application Project: NASA SEL Defect Data using Spotfire

October 5,1999


Motivation and Background

In the software development, successful companies are interested in continually improving their software development processes. In order to improve their processes there are many pieces of data that can be examined. One such piece is defect data. Defect data is an important piece of data for software companies because it shows without a doubt the places that they have made mistakes. Some of those mistakes may have been unavoidable, but many are due to human error. That error may come in the form of commission, omission, misunderstandings, etc. If a software development company is able to determined a specific type of error that is common over many projects, then they have located an ideal place to start improving their process. In order to do this, data visualization can be very useful. What a company is looking for is a correlation between some aspect of the defects and the effort taken to fix those defects.

My Dataset

I have chosen to examine a set of defects that were found in software created in the SEL (Software Engineering Laboratory) at NASA’s Goddard Space Flight Center. The dataset consists of 3095 defects that were found over multiple projects (Excel file). For a description of the what the fields represent see Table 1. And for the possible values for those fields see Table 2. Using Spotfire, I have attempted to look for correlation between various attributes of these defects and the effort to fix the defect.

Interesting information about the data



Critique and suggestions for improvement

Spotfire is a good tool if you have data that is continuous on the attribute that you wish to examine. In the case of my dataset, the attributes of interest have only a small number of discrete values. Because of this, many of the data point lay on top of each other. The tool is still useful for finding correlations, but I would suggest a feature similar to the call-out lens presented in the Moveable Filters paper. Another problem that I had with the Spotfire tool was that many of the attributes that I was interested in exploring were unordered. Because the attribute values had no inherent ordering, I would have liked to have been able to rearrange the columns in real time to examine the relationship among the values. As an example, I wished to determine the order of increasing difficulty in fixing different types of errors. Because I didn’t know a priori what the order should be, I could not force an ordering on the columns. The initial ordering of the columns didn’t tell me the information that I wanted to know (Figure 5). So, I had to examine the relationship between each pair (Figure 6, Figure 7, Figure 8, Figure 9), and I was able to determine the ordering by hand. Had I been able to reorder the columns, I could have found this result much easier. This is what the correlation looks like with the newly discovered ordering forced on the columns (Figure 4). I had the same problem with the class and source information.

Web Accessibility