In
Search of Proper State to Live
(Census Data Analysis using Spotfire)
Haixia Zhao
March 6, 2002
The purpose of this application is to visualize a subset of the census data, explore the relations between some facts that people might be interested in when choosing a state to live, and try to provide some residence choice hints to different categories of people.
The original data is the state level census data from various government Web sites, including population distribution, crime, law enforcement, income, expense, employment rate, etc. The datasets after being combined, extracted, and cleaned are:
· Population distribution data:
1999 population by state, age, gender, origin, race, and so on: http://www.census.gov/population/www/projections/st_yr95to00.html
22 variables :
|
Age |
White non-Hispanic Male |
White non-Hispanic Female |
White Hispanic Male |
White Hispanic Female |
Black non-Hispanic Male |
Black non-Hispanic Female |
Black Hispanic Male |
|
Black Hispanic Female |
AEA(American Indian, Eskimo, Aleut) non-Hispanic Male |
AEA non-Hispanic Female |
AEA Hispanic Male |
AEA Hispanic Female |
AP(Asian, Pacific Islander) non-Hispanic Male |
AP non-Hispanic Female |
AP Hispanic Male |
|
AP Hispanic Female |
White percentage |
Black percentage |
AP percentage |
Male percentage |
Female percentage |
|
|
· Safety data:
1999 crime statistic data by state and crime types: http://149.101.22.40/dataonline/Search/Crime/State/StateCrime.cfm
10 variables:
|
Index offenses
total |
Violent crime
total |
Murder total |
Forcible rape |
Robbery |
Aggravated assault |
Property crime
total |
Burglary |
Larceny-theft |
Motor vehicle
theft |
|
Index offense rate |
Violent crime rate |
Murder rate |
Forcible rape rate |
Robbery rate |
Aggravated assault
rate |
Property crime
total |
Burglary rate |
Larceny-theft rate |
Motor vehicle theft
rate |
1999 Police numbers and percentages by state:
http://149.101.22.40/dataonline/Search/Law/State/RunLawStateSelectedTables.cfm
4 variables:
|
Police total |
Sworn_police total |
Civilian_police
total |
Total police
percentage based on total population |
· Income data
1999 personal income by state: http://www.bea.gov/bea/regional/reis/
2 variables:
|
per_capital_personal_income |
disposable_per_capital_personal_income |
· Expense level data
1999 gasoline price by state and gasoline type: http://www.eia.doe.gov/emeu/states/_multi_states.html
3 variables:
|
Regular gasoline price |
Midgrade gasoline price |
Premium gasoline price |
· Labor data
1999 Unemployment rate by state: http://www.bls.gov/ro6/home.htm#data
1 variable:
|
Unemployment
rate |
Of course when choosing which state to live, different people have different considerations. Other data such as education, weather, health, etc. are also very important. This report is not going to full cover every potential factor but rather to give a rough idea of how information visualization tools can be useful in revealing interesting trends in massive data, and give some facts about some important factors most people would consider.
(A little complaint about data collecting through government agency websites: Locating and obtaining the right data is much more difficult than expected. One reason is that different types of census data are scattered among different government agencies, and agencies have different formats of presenting their data. E.g., A lot of agencies use PDF file to publish data, which makes raw data extraction very difficult. We need a consistent and integrated information access interface to all government published data! After raw data was finally obtained, they need to be cleaned and rearranged in the desired format, and aggregated attributes need to be calculated. )
In this part, I try to find some interesting trends of the data by exploring the relationship between different variables.

Burglary Rate v.s. Unemployment Rate

Robbery rate v.s. Policemen
Percentage Based on Population (scaled up by 100,000)





In this part, I try to give some hints of what states to live for different categories of people.
Category 1:
Retired White person, looking for states with low income, low crime rate, low expense level, and high percentage of white elders.
The figure below shows Index Offense Rate v.s. Total White Percentage, colored by state, sized by age. The data filtered in is with age >= 60, per-capital personal income below US average, and gasoline price below US average.
The states with high white elderly people percentage and low crime rate are North Dakota, South Dakota, West Virginia, Vermont, Maine, Kentucky, Iowa, Wisconsin, and Indiana. (This can be seen more clearly in the enlarged figure)
Index Crime Rate v.s. Total White Percentage, filtered by
age, per-capital personal income, and gasoline price.


Category 2:
Working Asian people, looking for high income, low unemployment rate, and may prefer having more Asian or Pacific Islander people at similar age around, and may also prefer low crime rate and low expense level.
The following two figures show Unemployment Rate v.s. Per-capital Personal Income, colored and sized by Index Offense Rate. The data is not filtered. We can see that CT, DC, NJ, MA, NY, MD, NH, MN, DE and VA have high income (in descendent order) and relatively low unemployment rate. However, you can see that DC has extremely high crime rate, which you might want to avoid.

Enlarged Partial Screenshot
|
|
|
Furthermore, you may prefer a place with more Asian at similar age, so you can have fun, and maybe bigger chance to meet your future wife or husband. By further filter the data by age and Asian / Pacific Islander population percentage, I got the figure below that shows states with above-average Asian/Pacific Islander population percentage between age 20 and 40. Among those states, NJ, MA, NY, MD and VA (in descendent order of income) have relatively high income and low crime rate.
Unemployment Rate v.s. Per-capital Personal Income, colored and sized by Index Offense Rate, with above-average Asian/Pacific Islander population percentage at the age 20 and 40.

|
|
|
Spotfire is a very good tool for analyzing both discrete and continuous value datasets. It accepts many input formats and can even automatically join records from different Excel sheets. It is very easy to understand and use, very flexible in terms of changing view points, colors, sizes, marking interesting records, and so on. I found the function of view tip extremely useful, because it can help to find interesting data distribution patterns that I did not expect at the beginning. The filtering function can be used to provide a dynamic view, and help me to focus on the part of data that I am interested in. 3-D capability is useful in viewing the data trend and reducing the occlusion problem, but I personally prefer 2-D view with the help of filters and color-coding, etc., because I personally like the marking and labeling capability in 2-D view.
Because Spotfire is such a general tool, one can show geographical locations of data by using picture background, such as a map picture, but the background scaling and position manipulation tools are too primitive to really benefit.
Another problem that came up while playing around with data, especially with changing the axes of the graph, is that there is no way of saving a particular graph setting. I can open up another graph window but there is no way of copying over the same view automatically, useful when doing a multi-window view of same data. They could add save function that saves the current setting of the active window, accessible by short-cut keys. These saved settings can then be applied to any window.
Another very nice feature to add could be an undo/redo function.
Other than these particular problems, exploration of the data was easy and painless. The hard part was finding the data, and explaining trends.