Visualization of 1985 Wages Data using Spotfire and TreeMap

by

Fusun Yaman

fusun@cs.umd.edu

 

Introduction

Given a large data set it is easier to discover the patterns, correlations, strange behavior and outliers when the data is presented visually. For this study I used a data set with 534 rows and 11 columns.  The dependent variable is the wage of  a person and the other variables are gender, occupation, sector, union membership, marital status and race. I tried to examine the correlation between wage and other variables. I explored answers for the following questions;

 

Ø      Do higher educated and more experienced people get higher salaries,

Ø      Do people with union membership get higher wages than people who are not members.

Ø      Is there any correlation between the wages and the gender or the race of the people

Ø      Are there some occupation groups that have higher salaries?

Ø      Are there some occupation groups that are preferred especially by females or males?

Ø      What is the correlation between age and other variables?

 

I utilized Spotfire and TreeMap tools to visualize the data and look for the answers.

 

Description of Data

 

Data :  Determinants of Wages from the 1985 Current Population Survey (CPS_85_Wages)

Description : The data file contains 534 observations on 11 variables sampled from the Current Population Survey of 1985.  This data set  demonstrates multiple regression, confounding, transformations, multicollinearity, categorical variables, ANOVA, pooled tests of   significance, interactions and model building strategies.

 

Column Name

Explanation

Values

EDUCATION

 Number of years of education.

2 to 18

LIVES_IN_SOUTH

 Indicator variable for Southern Region

Yes, No

SEX

 Indicator variable for sex

Female, Male

EXPERIENCE

 Number of years of work experience.

0 to 55

UNION MEMBER

 Indicator variable for union membership

Yes, No

WAGE

 Wage (dollars per hour).

1 to 26.29

AGE

 Age (years).

18 to 64

RACE

 Race.

(1=Other, 2=Hispanic, 3=White)

OCCUPATION

 Occupational category 

Management, Sales, Clerical, Service, Professional, Other

SECTOR

 Sector

(0=Other, 1=Manufacturing, 2=Construction).

MARR

 Marital Status

(0=Unmarried,1=Married)

 

 
Visualizations by Spotfire

 

Figure 1 displays the results where Y axis is education and X axis is experience. Color coding is by occupation and sizes by wage. The query is zoomed for experience less than or equal to 15 years. Figure 2 displays the results where Y axis is education and X axis is experience. Color coding is by occupation and sizes by wage. In addition shapes are by the union membership (circle for member, square for non-member). The query is zoomed for education greater than or equal to 8 years and experience less than or equal to 15 years.  I aimed to explore the correlation between wages and  education-experience pair. What I expected to see is higher educated and more experienced people get more money. Although the results in Figure 1 and Figure2  confirm this expectation, it seems the occupation highly effects the wage. A professional (light blue) generally get more money than a clerk (red) regardless of his/her education or experience. It can also be seen that most of the people have 12 years of education and this causes huge amount of occlusion and it is not possible to make clear distinctions at that part of the result. Examining Figure 2 to see if union membership makes a difference at all, I come to the conclusion that it does not.

    

 

Figure 1

Figure 2

 

Figure 3 and Figure 4 display the results where Y axis is wages and X axis is experience. Color coding is by race (Hispanic- red, White-yellow, Black-blue), sizes by education and shapes by gender (Male-circle, Female-square). In Figure 4 the query is zoomed for wages greater than or equal to 15 dollars per hour. What I expected to see is gender and race does not effect the wage. Although the results in Figure 1 confirm this expectation in the lower levels of  wage axis, Figure 2 which is zoomed on the higher levels of wage axis shows that  white men  get higher wages. Women who get high wages are well educated. One can argue that this difference is because of the jobs preferred by women and men. Following queries support opposite of this argument.

 

Figure 3

Figure 4

 

Figure 5 displays the results where Y axis is wages and X axis is experience. Color coding is by occupation (Clerical-red, Sales-green, Service-aqua, Professional –lightblue, Management- blue, Other-yellow), sizes by education and shapes by union membership (Member-circle, Non-member-square). My aim in this visualization is to see which occupations have higher wages. It seems like Professionals and managers have higher wages.

 

Figure 5

 

 

Figure 6 shows on pie charts, the percentage of each occupation group in each gender. Total  wages determine the size of whole pie.

 

Figure 6

 

Figure 7 displays the results where Y axis is age and X axis is experience. Color coding is by Lives in South (Yes-blue, No-red), sizes by wages and shapes by union membership (Member-circle, Non-member-square). My aim in this visualization is to see if age is correlated to other variables. My observations are ; obviously older people have more experience, youngest people have lower wages, for too old people in some cases experience is rewarded by higher wages. Generally middle aged people have higher wages on average.

 

Figure 7

 

Visualizations using TreeMap

I tried queries to cross validate some of my results.

Figure 8  and Figure 9 has the same hierarchy. I mapped high range education and experience levels  into 5 and 6 categories and created a hierarchy over education and experience. In figure 8  both size and color attributes are given by the wages. It can be easily seen that lower right corner  with higher education and higher experience  have higher wages. Also occupational categories are visible when wage is high enough. Again Professionals and Management seem to lead.

 

Figure 8

 

Figure 9 has the same hierarchy as Figure 8. Size is given by the wage  and  color is given by the age. The correlation between experience and age is easily detectable.

 

Figure 9

 

The hierarchy in Figure 10 is by Gender and Occupation. Size of the areas are by wage and colors are by occupation. Occupation distribution for each gender can be clearly seen.

 

Figure 10

 

Critique of the tools

 

Spotfire

Ø      Easy to use and learn

Ø      Can create powerful queries using shape, size, color attributes.

Ø      Effective filtering both for numeric and categorical data

Ø      Main problem is occlusion

Treemap

Ø      Easy to use and learn

Ø      Filtering for only numerical values

Ø      Main problem is to prepare the data set. If one wants to see different hierarchies then different data sets should be created.