Hurricane Katrina: Kicking Them When They’re Down[1]

 

 

 

 

 

 

 

Gleneesha Johnson (gjohnson@cs.umd.edu)

Application Assignment

CMSC 838S

February 28, 2006

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1      Introduction

 

Katrina, a Category 5 hurricane that was the sixth-strongest Atlantic hurricane ever recorded, made landfall along the central gulf coast on August 29, 2005.    Katrina caused approximately $75 billion in damages along the coastlines of Alabama, Louisiana, and Mississippi, making it the costliest hurricane in United States history.  The storm broke levees separating Lake Pontchartrain from Orleans parish, resulting in 80% of the parish being flooded. The storm has killed at least 1,418 people, making it the deadliest U.S. hurricane since 1928 [1]

 

In this paper I perform exploratory data analysis (EDA) of demographic statistics on counties and parishes that were declared disaster areas as a result of Katrina (Figure 1).

 

Figure 1. Hurricane Katrina Disaster Areas [2]

 

 

2      Data Set and Information Visualization Tool

 

The data set consists of demographic data from the 2000 census on 49 counties and parishes in Alabama, Louisiana, and Mississippi affected by Katrina.  The data can be found at the website http://www.louisiana.gov, which contains a compilation of various census data on the counties and parishes declared disaster areas.

 

Each county/parish has the following 21 self-explanatory variables:

 

State

Percent male

Percent Female

Percent White

Percent Black

Percent American Indian or Alaska Native

Percent Asian

Percent 21 and over

Percent 62 and older

Average family size

Percent married-couple family with own children under 18 years old (single mother)

Percent below poverty level

Median household income (dollars)

Percent in labor force (employed)

Percent in armed forces

Per capita income (dollars)

Percent worked outside county of residence

Percent of population 25 and over high school graduates

Percent of population 25 and over college graduates

Percent live in a house with no vehicle

 

Counties in Mississippi have an additional variable, Percent registered for assistance with the Federal Emergency Management Agency (FEMA).

 

I used the Hierarchical Clustering Explorer (HCE) 3.0 to perform the analysis.

 

 

3      Analysis

 

I used the Graphics, Ranking, and Interaction for Discovery (GRID) [3] principles as a guide to exploring the data set.  While importing the data into HCE, I performed column-by-column normalization to allow meaningful comparison of the variables, and clustered by all variables except “State.”

 

3.1     Dendrogram and Color Mosaic

 

The following figure shows the dendrogram and color mosaic representations of the data. 

 

Figure 2. Dendrogram and color mosaic

 

 

As illustrated by the rows that are almost entirely bright green, the vast majority of the counties have relatively few American Indian or Alaska Native citizens, and citizens in the armed forces.  The row that is almost entirely bright red shows that most of the counties have a relatively high female population.

 

3.2     Profile Search

 

I used the “Profile Search” tab to perform a model-based query for very dire counties.  I define a dire county as one with very low values for “Median household income”, “Per capita income”, “Percent in   labor force”, “Percent of population 25 and over high school graduates”, and “Percent of population 25 and over college graduates”, and very high values for “Percent live in a house with no vehicle”, “Percent below poverty level”, and “Percent female householder with no husband present and own children under 18 years old”. Figure 3 displays the results.

 

Figure 3. Model-based query for dire counties

 

 

Wilkinson county, Orleans and Iberville parish are at least 80.4 % similar to my model pattern according to the Pearson correlation coefficient.  Orleans residents were already doing poorly before the hurricane, which flooded 80% of the parish.

 

3.3     1D - Histograms

 

Figure 4. Histograms ordered by normality

 

Figure 4 illustrates the ranking of histograms by “Normality”.  A majority of the variables have normal distributions, with the exceptions being “Percent in armed forces”, “Percent American Indian or Alaska Native”, “Percent Female”, and consequently “Percent male”.  This follows from the insight gained from the color mosaic, where the values for these variables are mostly at one extreme or the other.

 

Figure 5. Histograms ordered by size of biggest gap

 

When the histograms were ordered according to the size of the biggest gap (Figure 5), “Percent living in a house with no vehicle” ranked the highest.  The extreme value seen at the far right of the gap is the Orleans parish.  The other counties identified as dire are directly to the left of the gap.

 

3.4    2D - Scatterplots

 

Figure 6 shows scatterplots ordered by correlation coefficient.  The figure highlights the positive correlation between the group of variables “Percent Black”, “Percent female householder with no husband present and own children under 18 years old”, “Percent below poverty level”, and “Percent live in a house with no vehicle”.  The Orleans parish has the highest value for all of these variables except “Percent below poverty level”, for  which it has the third highest value (Figure 7).  This indicates that the Orleans parish had relatively high amounts of poor, Blacks, single mothers and residents with no vehicle. The highest positive correlation, .861, is between “Percent female householder with no husband present and own children under 18 years old” and “Percent Black”.  This suggests that there were many single Black mothers in all of the counties and parishes devastated by Katrina.  This is perhaps due to the high female population.  Figure 6 also illustrates trivially strong relationships between variables such as “Per capita income”, and “Median household income”, and “Percent of population 25 and over college graduates”, and  “Percent in labor force”.

 

Figure 6. Scatterplots ordered by correlation coefficient

 

 

Figure 7. Orleans parish represented by triangle in the scatterplots

 

The FEMA website only had statistics on people that registered for assistance in Mississippi, so I created an additional dataset consisting exclusively of Mississippi counties. In addition to the previously mentioned variables, this dataset has “Percent registered for assistance”.  Figure 8 illustrates that there is a very strong, .987, negative correlation between “Percent registered for assistance” and “Percent 62 and older”.  This indicates that few elderly hurricane victims are registering for assistance.  The fact that many of the victims killed by the hurricane were elderly [4] could be a contributing factor.  There is a somewhat strong positive correlation between “Percent registered for assistance” and “Percent married-couple family with own children under 18 years old” and “Median household income”.  Surprisingly, there is very low negative correlation between “Percent registered for assistance” and “Percent below poverty level”, “Percent living in a house with no vehicle”, and “Percent female householder with no husband present and own children under 18 years old”.  This indicates that the people who need assistance the most, are not registering for it as much as those who might be considered more fortunate.

 

Figure 8. Mississippi counties only scatterplots ordered by correlation coefficient

 

 

4.     Tool Critique

 

HCE greatly facilitates a systematic approach to exploratory data analysis, but there are some aspects of it that can be improved to make it even better. 

 

4.1     Functionality

 

It would be nice if users could edit data and filter it by categorical values in the “Load + Filter + Transfer” box.  If this were possible, I wouldn’t have had to create a new input file when I wanted to analyze only the counties in Mississippi.

 

I planned on tracking the results of my model-based query from the “Profile Search” tab in the other visualization tabs. When I saw something interesting in another tab (i.e., an outlier in a scatterplot), I had to select it to see the details of the data point.  This caused the results of my query to lose focus.  This was frustrating because although I wanted to further investigate something interesting, I was ultimately interested in tracking the results of my query.  Upon returning to the “Profile Search” tab my model was no longer there and I had to re-enter it.  This can become quite time consuming if a user sees several secondarily interesting things they would like to further investigate.  This problem can be eliminated by allowing users to save a model pattern on the “Profile Search” tab, or enabling mouseover tooltips that identify a data item without it being selecting.

 

4.2     Visualization

 

HCE 3.0 leaves a big gap of space between the color mosaic and the dendrogram, if the color mosaic doesn’t fill the entire area designated for it.  Although not intuitive, the dendrogram can be dragged closer to the color mosaic.  It would be nice if this positioning were automatic because it would make it easier to line up the variables in the dendrogram with their corresponding row in the color mosaic. 

 

Figure 9. Dendrogram and color mosaic

 

4.3             Bugs

 

Figure 10. Bug in Order by box

 

Clicking in the “Order by” box (Figure 10) causes the “Use Original Values” checkbox  to toggle even when I don’t click in it. 

 

I was unable to select and view multiple scatterplots simultaneously by clicking on them with the “control” key pressed.

 

 

 

References

 

  1. Hurricane Katrina – Wikipedia., http://en.wikipedia.org/wiki/Hurricane_katrina (accessed 17 February 2006)
  2. Louisiana.gov., http://www.louisiana.gov/wps/portal/.cmd/cs/.ce/155/.s/3329/_s.155/3313 (accessed 17 February 2006)
  3. Jinwook Seo and Ben Shneiderman . A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data. Information Visualization, 4, 2 (June 2005), 99-113. (HCIL-2004-31).
  4. The Charlotte Observer,  http://www.miami.com/mld/charlotte/news/13513079.htm?source=rss&channel=charlotte_news (accessed 17 February 2006)

 



[1] This title refers to hurricane Katrina, and  does not have any political undertones