In Search of Proper State to Live

(Census Data Analysis using Spotfire)

Haixia Zhao

{haixia@cs.umd.edu}

March 6, 2002

Introduction

 

The purpose of this application is to visualize a subset of the census data, explore the relations between some facts that people might be interested in when choosing a state to live, and try to provide some residence choice hints to different categories of people.

 

Datasets

 

The original data is the state level census data from various government Web sites, including population distribution, crime, law enforcement, income, expense, employment rate, etc. The datasets after being combined, extracted, and cleaned are:

 

·  Population distribution data:

 

1999 population by state, age, gender, origin, race, and so on: http://www.census.gov/population/www/projections/st_yr95to00.html

 

22 variables :

 

Age

 White non-Hispanic Male

 White non-Hispanic Female

 White Hispanic Male

 White Hispanic Female

 Black non-Hispanic Male

 Black non-Hispanic Female

 Black Hispanic Male

 Black Hispanic Female

AEA(American Indian, Eskimo, Aleut) non-Hispanic Male

 AEA non-Hispanic Female

 AEA Hispanic Male

 AEA Hispanic Female

AP(Asian, Pacific Islander) non-Hispanic Male

 AP non-Hispanic Female

 AP Hispanic Male

 AP Hispanic Female

  White percentage

 Black percentage

 AP percentage

 Male percentage

 Female percentage

 

 

 

 

·  Safety data:

 

1999 crime statistic data by state and crime types: http://149.101.22.40/dataonline/Search/Crime/State/StateCrime.cfm

 

10 variables:

 

Index offenses total

Violent crime total

Murder total

Forcible rape

Robbery

Aggravated assault

Property crime total

Burglary

Larceny-theft

Motor vehicle theft

Index offense rate

Violent crime rate

Murder rate

Forcible rape rate

Robbery rate

Aggravated assault rate

Property crime total

Burglary rate

Larceny-theft rate

Motor vehicle theft rate

 

 

1999 Police numbers and percentages by state: 

http://149.101.22.40/dataonline/Search/Law/State/RunLawStateSelectedTables.cfm

 

4 variables:

 

Police total

Sworn_police total

Civilian_police total

Total police percentage based on total population

 

 

·  Income data

 

1999 personal income by state: http://www.bea.gov/bea/regional/reis/

 

2 variables:

 

per_capital_personal_income

disposable_per_capital_personal_income

 

·  Expense level data

 

1999 gasoline price by state and gasoline type: http://www.eia.doe.gov/emeu/states/_multi_states.html

 

3 variables:

 

Regular gasoline price

Midgrade gasoline price

Premium gasoline price

 

 

·  Labor data

 

1999 Unemployment rate by state: http://www.bls.gov/ro6/home.htm#data

 

1 variable:

Unemployment rate

 

 

Of course when choosing which state to live, different people have different considerations. Other data such as education, weather, health, etc. are also very important. This report is not going to full cover every potential factor but rather to give a rough idea of how information visualization tools can be useful in revealing interesting trends in massive data, and give some facts about some important factors most people would consider.

 

(A little complaint about data collecting through government agency websites: Locating and obtaining the right data is much more difficult than expected. One reason is that different types of census data are scattered among different government agencies, and agencies have different formats of presenting their data. E.g., A lot of agencies use PDF file to publish data, which makes raw data extraction very difficult. We need a consistent and integrated information access interface to all government published data! After raw data was finally obtained, they need to be cleaned and rearranged in the desired format, and aggregated attributes need to be calculated. )

 

Explore the Trends of Data

 

 

In this part, I try to find some interesting trends of the data by exploring the relationship between different variables.

 

  1. A question I asked is “Does low income increase crime rate?” I expected the answer to be yes, but there is no such trend in the data. The figure below shows Murder Rate v.s. Per-capital Personal Income. The spots are distributed in the lower part, except for District of Columbia. In fact, DC has very high income but extremely high murder rate. (The data distribution patterns are similar when other crime rates and /or disposal per-capital personal income are used.) Other high-income states are Connecticut, New Jersey and Massachusetts. Furthermore, I asked “Will the relative expense level have effect?”, by using 3-D Spotfire with Robbery Rate v.s. Per-capital Personal Income v.s. Regular Gasoline Price. Still no such trend..

 

Murder Rate v.s. Per-capital Personal Income

 

 

 

 

 

 

  1. “Does unemployment rate affect crime rate”? The figure below shows Burglary Rate v.s. Unemployment Rate. The answer can be yes, because we can see the spots cluster along the diagonal. When unemployment rate goes up, burglary rate generally goes up. The same trend holds for all other crime types except murder rate. We can also see that the states that had high unemployment rate are North Carolina, Louisiana, Mississippi, Nevada, Washington, Oregon. The states that had relatively low unemployment rate and low burglary rate are New Hampshire, North Dakota, South Dakota, Wyoming, Virginia, Montana.

 

Burglary Rate v.s. Unemployment Rate

 

 

 

 

  1. “Will it help to reduce the crime rate by having more policemen”? The figure below shows Robbery rate v.s. Policemen Percentage Based on Population (scaled up by 100,000) . There is no obvious trend that supports the assumption. District of Columbia has very high police percentage, but the crime rate is even higher.  Most states have low police percentage (below 1/1,000), except North Dakota and Vermont, which has very low crime rates.

 

Robbery rate v.s. Policemen Percentage Based on Population (scaled up by 100,000)

 

 

  1. “Will expense level go up when income level goes up”? The figure below shows Regular Gasoline Price v.s. Per-capital Personal Income. I do not see such trend from the gasoline price. The states with high gasoline price are Hawaii, Nevada, and Alaska. Hawaii and Nevada are famous recreation resorts, and Alaska is a cold place far north to the main part of US.

 

Regular Gasoline Price v.s. Per-capital Personal Income

 

 

 

  1. “Female lives longer!” The figure below shows Female Population Percentages of Different Ages, colored by state. Spots of the same color represent populations of different ages (0 -- 85+). We can see that female population is near half of the total population before age 60. The percentage starts to increase fast (exponentially) from age 60. The trend supports a conclusion that most people probably already know - “Female lives longer”! Also, we can see from the figure that there are less female babies than male babies. As to compare the states, DC has generally the highest female percentage, and Alaska has the lowest female percentage. Another interesting thing is that Hawaii has extremely low female percentage at the age between 20 and 30.

 

 

 

Female Percentage v.s. Age, colored by state

 

 

 

  1. “White and black people are the two dominating races in US.” The figure below shows Black Population Percentage v.s. White percentage, colored by state. Spots of the same color represent populations of different ages (0 -- 85+). We can see that most of the spots are along the anti-diagonal, which shows that in most states white plus black population occupies near 96% of the total population. An exception is Hawaii, which has neither a lot of whites nor a lot of blacks. Actually, the dominating race in Hawaii is Asian and Pacific Islanders. This can be seen more clearly in the figure Asian-Pacific-Islander Percentage v.s. White Percentage colored by state.

 

Black percentage v.s. white percentage, colored by state

 

 

 

 

Asian-Pacific-Islander Percentage v.s. White Percentage

 

 

 

 

 

Where to live?

In this part, I try to give some hints of what states to live for different categories of people.

 

Category 1:

 

Retired White person, looking for states with low income, low crime rate, low expense level, and high percentage of white elders.

 

The figure below shows Index Offense Rate v.s. Total White Percentage, colored by state, sized by age. The data filtered in is with age >= 60, per-capital personal income below US average, and gasoline price below US average.

 

 

The states with high white elderly people percentage and low crime rate are North Dakota, South Dakota, West Virginia, Vermont, Maine, Kentucky, Iowa, Wisconsin, and Indiana. (This can be seen more clearly in the enlarged figure)

 

Index Crime Rate v.s. Total White Percentage, filtered by age, per-capital personal income, and gasoline price.

 

 

Enlarged Partial Screenshot

 

 

 

Category 2:

 

Working Asian people, looking for high income, low unemployment rate, and may prefer having more Asian or Pacific Islander people at similar age around, and may also prefer low crime rate and low expense level.

 

The following two figures show Unemployment Rate v.s. Per-capital Personal Income, colored and sized by Index Offense Rate. The data is not filtered. We can see that CT, DC, NJ, MA, NY, MD, NH, MN, DE and VA have high income (in descendent order) and relatively low unemployment rate. However, you can see that DC has extremely high crime rate, which you might want to avoid.

 

 

Unemployment Rate v.s. Per-capital Personal Income, colored and sized by Index Offense Rate

 

Enlarged Partial Screenshot

 

 

 

Furthermore, you may prefer a place with more Asian at similar age, so you can have fun, and maybe bigger chance to meet your future wife or husband. By further filter the data by age and Asian / Pacific Islander population percentage, I got the figure below that shows states with above-average Asian/Pacific Islander population percentage between age 20 and 40. Among those states, NJ, MA, NY, MD and VA (in descendent order of income) have relatively high income and low crime rate.

 

Unemployment Rate v.s. Per-capital Personal Income, colored and sized by Index Offense Rate, with above-average Asian/Pacific Islander population percentage at the age 20 and 40.

 

Enlarged Partial Screenshot

 

 

 

 

About the Tool

 

Spotfire is a very good tool for analyzing both discrete and continuous value datasets. It accepts many input formats and can even automatically join records from different Excel sheets. It is very easy to understand and use, very flexible in terms of changing view points, colors, sizes, marking interesting records, and so on. I found the function of view tip extremely useful, because it can help to find interesting data distribution patterns that I did not expect at the beginning. The filtering function can be used to provide a dynamic view, and help me to focus on the part of data that I am interested in. 3-D capability is useful in viewing the data trend and reducing the occlusion problem, but I personally prefer 2-D view with the help of filters and color-coding, etc., because I personally like the marking and labeling capability in 2-D view.

 

Because Spotfire is such a general tool, one can show geographical locations of data by using picture background, such as a map picture, but the background scaling and position manipulation tools are too primitive to really benefit.

 

Another problem that came up while playing around with data, especially with changing the axes of the graph, is that there is no way of saving a particular graph setting. I can open up another graph window but there is no way of copying over the same view automatically, useful when doing a multi-window view of same data. They could add save function that saves the current setting of the active window, accessible by short-cut keys. These saved settings can then be applied to any window.

 

Another very nice feature to add could be an undo/redo function.

 

Other than these particular problems, exploration of the data was easy and painless. The hard part was finding the data, and explaining trends.