CMSC 838b: Information Visualization

Application Project: Analysis of Baseball Players using Treemaps(2000) and Spotfire

Bongshin Lee
February 28, 2001


Introduction

It is very important to analyze and describe the performance of baseball players. And it is also interesting to find the relationships between salaries of major league baseball players and the players' performance. The main goal of this project is to use visualization tools to attempt to explain differences in the salaries of major league baseball players and to find out interesting features. And I tried to find advantages and disadvantages of the Treemap2000 and evaluated it based on some criteria.

 

Description of Data

I used data on the baseball players in 1986 and the team statistics. The salary data for 1987 were also included. The data is taken from CMU's StatLib--Datasets Archive.

 

Used Tools

At first, I used only Treemap2000 which is designed to use to present large hierarchical information spaces on planar display areas of limited size. Actually it displays items well categorized by team, division, and league. However, when I want to find relationships between attributes I can use only two options - size and color. Furthermore, since the dataset is small and not complete, it is very hard to find interesting features. So, I chose one more tool, SpotFire which is very useful to find out relationships between attributes.

 

Analysis of Data

 

 

Top 10 Hitters based on salary

This is very easy kind of query but essential. Both tools gave me the same results. But, Treemap2000 gave me more specific information. Five hitters are included in American League East Division and there are no hitters in American League West Division. 

 
Team Position(Ranking) vs Attendance for away games

By a coincidence the team data is arranged by the position. From the result given by SpotFire, I found out that the first and second team have larger number of attendance than that of other teams. But, I missed this point from the result given by Treemap2000 first. This is because the value of y point is easier to compare than the size of rectangle when the difference is not large enough.

 
Find the pitchers who changed the team

In this query, Treemap2000 gave wrong result. Because Treemap2000 uses only 10 colors to specify the categories, 15 teams were coded as same color. So, Treemap2000 couldn't catch the change between different teams using same colors. But, since SpotFire displays data in two-dimensional space, we can easily catch the differences between two attributes sharing same domains.

 
Did which factor influence on the salary?

At first, I guessed hitting average for hitter and winning average for pitcher are the most important factors to decide their salaries. But, both tool gave no strong relationships between these attributes. Instead, number of years in ML has some relations with salary. The salary gradually increases about to 10 years and it decreases about from 10 years. Actually, it is not obvious in the result but we can think of this result as the lifetime of baseball players.

 

 

Tool Evaluation

Basically, I focused on the Treemap2000 because I used the SpotFire as a supplementary tool to verify the result and compare to the Treemap2000. And other application projects using SpotFire can evaluate it better. 

Advantages

Although it takes some time to get acquainted with the hierarchical display, dynamic queries using double sliders is very intuitive and easy.

 

Disadvantages

We can't import data directly from excel file and it is almost impossible to make text input file manually. Fortunately, I could get a converter which converts excel file into text file. However, the format is hard to read and understand and I have to put some additional information for hierarchy redundantly.

As I mentioned earlier, Treemap2000 is very easy to find the objects which has large values. But, it is hard to find the objects which has small values.

If the range of values of an attribute is very small, for example winning average of pitcher between 0 and 1, we can hardly figure out the differences of them. Furthermore, we can't query for that attribute because double sliders interface is based on integer.

No Progress Bar
No Help
Different zooming mechanism

 

Grades

Criteria Treemap2000 SpotFire Comment
Easy to learn - Easy to use 10 10 Intuitive
Data import (cleanup, missing values) 8 10 In Treemap2000:
First, we have to made an excel file including hierarchy information.
Second, we should convert it into text file.
Scalability 9 10 I'm not sure but I guess Treemap2000 can handle large amount of data. 
Speed - responsiveness 10 10 Both tools response very quickly.
Type of data hierarchical data tabular data Treemap2000 shows a tree hierarchy effectively.
Screen Space Usage 10 9 Treemap2000 use screen space fully.
Match the task 8 9 It depends on the dataset.
Range of questions that can be answered 7 9 Type of questions can be answered is limited. It is hard to find the correlations between attributes.
How generic is it? 9 10 It takes several minutes to get acquainted with the treemap2000 display. We are very accustomed to the 2 dimensional spotted graph.
First impression (Whaoo factor) 8 9 a little bit complex
Target audience 10 10 After some training, everyone can use both tools.
Advanced capabilities - complex concept 8 10 Although the display is complex, the basic concept is very simple.
Exporting capabilities 1 10 I can capture the result screen.
Stability - consistency - continuity 9 10 It is stable overall. But I found some bugs.

Integration / Multiple visualization

7 10 All the data should have same attributes and it supports only one type of visualization.
Average 8.14 9.71  

 

Suggestions

It is very useful to save the results in other formats. We can't memorize the results and don't want to write down the results.

If Treemap2000 rearrange the rectangles based on their size at certain level, we can easily figure out orders.

When we select non-leaf node all the values of attributes are set to 0 or null. Parent nodes only have number of children and sum of size.