Hurricane Katrina: Kicking Them When
They’re Down[1]
Gleneesha Johnson (gjohnson@cs.umd.edu)
Application Assignment
CMSC 838S
February 28, 2006
Katrina, a Category 5 hurricane
that was the sixth-strongest Atlantic hurricane ever recorded, made landfall
along the central gulf coast on August 29, 2005. Katrina caused approximately $75 billion in
damages along the coastlines of
In this paper I perform exploratory data analysis (EDA) of demographic statistics on counties and parishes that were declared disaster areas as a result of Katrina (Figure 1).

Figure 1. Hurricane Katrina Disaster Areas [2]
The data set consists of demographic data from the 2000
census on 49 counties and parishes in
Each county/parish has the following 21 self-explanatory variables:
|
State |
Percent male |
|
Percent Female |
Percent White |
|
Percent Black |
Percent American Indian or |
|
Percent Asian |
Percent 21 and over |
|
Percent 62 and older |
Average family size |
|
Percent married-couple family with own children under 18 years old (single mother) |
Percent below poverty level |
|
Median household income (dollars) |
Percent in labor force (employed) |
|
Percent in armed forces |
Per capita income (dollars) |
|
Percent worked outside county of residence |
Percent of population 25 and over high school graduates |
|
Percent of population 25 and over college graduates |
Percent live in a house with no vehicle |
Counties in
I used the Hierarchical Clustering Explorer (HCE) 3.0 to perform the analysis.
I used the Graphics, Ranking, and Interaction for Discovery (GRID) [3] principles as a guide to exploring the data set. While importing the data into HCE, I performed column-by-column normalization to allow meaningful comparison of the variables, and clustered by all variables except “State.”
The following figure shows the dendrogram and color mosaic representations of the data.

Figure 2. Dendrogram and color mosaic
As illustrated by the rows that are almost entirely bright green, the vast majority of the counties have relatively few American Indian or Alaska Native citizens, and citizens in the armed forces. The row that is almost entirely bright red shows that most of the counties have a relatively high female population.
I used the “Profile Search” tab to perform a model-based query for very dire counties. I define a dire county as one with very low values for “Median household income”, “Per capita income”, “Percent in labor force”, “Percent of population 25 and over high school graduates”, and “Percent of population 25 and over college graduates”, and very high values for “Percent live in a house with no vehicle”, “Percent below poverty level”, and “Percent female householder with no husband present and own children under 18 years old”. Figure 3 displays the results.

Figure 3. Model-based query for dire counties
Wilkinson county,

Figure 4. Histograms ordered by normality
Figure 4 illustrates the ranking of histograms by “Normality”. A majority of the variables have normal distributions, with the exceptions being “Percent in armed forces”, “Percent American Indian or Alaska Native”, “Percent Female”, and consequently “Percent male”. This follows from the insight gained from the color mosaic, where the values for these variables are mostly at one extreme or the other.

Figure 5. Histograms ordered by size of biggest gap
When the histograms were ordered according to the size of
the biggest gap (Figure 5), “Percent living in a house with no vehicle” ranked
the highest. The extreme value seen at
the far right of the gap is the
Figure 6 shows scatterplots ordered by correlation
coefficient. The figure highlights the positive
correlation between the group of variables “Percent Black”, “Percent female
householder with no husband present and own children under 18 years old”,
“Percent below poverty level”, and “Percent live in a house with no
vehicle”. The

Figure 6. Scatterplots ordered by correlation coefficient

Figure 7.
The FEMA website only had statistics on people that registered
for assistance in

Figure 8.
HCE greatly facilitates a systematic approach to exploratory data analysis, but there are some aspects of it that can be improved to make it even better.
It would be nice if users could edit data and filter it by
categorical values in the “Load + Filter + Transfer” box. If this were possible, I wouldn’t have had to
create a new input file when I wanted to analyze only the counties in
I planned on tracking the results of my model-based query from the “Profile Search” tab in the other visualization tabs. When I saw something interesting in another tab (i.e., an outlier in a scatterplot), I had to select it to see the details of the data point. This caused the results of my query to lose focus. This was frustrating because although I wanted to further investigate something interesting, I was ultimately interested in tracking the results of my query. Upon returning to the “Profile Search” tab my model was no longer there and I had to re-enter it. This can become quite time consuming if a user sees several secondarily interesting things they would like to further investigate. This problem can be eliminated by allowing users to save a model pattern on the “Profile Search” tab, or enabling mouseover tooltips that identify a data item without it being selecting.
HCE 3.0 leaves a big gap of space between the color mosaic and the dendrogram, if the color mosaic doesn’t fill the entire area designated for it. Although not intuitive, the dendrogram can be dragged closer to the color mosaic. It would be nice if this positioning were automatic because it would make it easier to line up the variables in the dendrogram with their corresponding row in the color mosaic.

Figure 9. Dendrogram and color mosaic

Figure 10. Bug in Order by box
Clicking in the “Order by” box (Figure 10) causes the “Use Original Values” checkbox to toggle even when I don’t click in it.
I was unable to select and view multiple scatterplots simultaneously by clicking on them with the “control” key pressed.