03/10/2006
CMSC 838S: Information Visualization
Spring 2006
Application Project Report
Neeti Ogale
Visualization of human development data by
using HCE
Diseases like Malaria, Tuberculosis, and AIDS exist in developing as well as developed countries. In this application project, I tried to find the correlation between the prevalence of these diseases and various factors such as literacy rate of the population, medical facilities available to the people, and public health expenditure of the country.
In the three studies given below,
I try to find the correlation of AIDS (study 1), Malaria and tuberculosis (study
2) and general life expectancy data (study 3) with various factors. In all
cases, it is observed that the countries cluster in 3 or 4 groups. (Hierarchical
clustering algorithm with Pearson correlation coefficient). In the clustering results, the groups roughly
correspond to the developed and less developed countries.
STUDY 1: Prevalence of AIDS: relationship with Adult
literacy rate (%), Public health expenditure (as % of GDP) and HDI (human
development index) for 131 countries for the year 2002.
Summary of Results:
· HIV prevalence is low in countries having high literacy rate.
· There is a weak correlation between HIV prevalence and public health expenditure.
· HDI values are high for the countries having low HIV prevalence.
HDI Value Adult literacy Health expenditure HIV Prevalence

The countries are clustered in
three groups by the hierarchical clustering algorithm. Some countries in the
first cluster are
The high correlation between HIV
prevalence and adult literacy rate can be clearly seen from the diagram below
showing k-means clustering of the columns.

In the above figure, the columns are clustered into 3 clusters by the k-means algorithm. In this, we can see that Adult literacy rate is grouped with HIV prevalence, which indicates that HIV prevalence is most related with adult literacy rate among all the factors.
The high positive correlation between HIV prevalence and adult literacy rate can also be seen in the following profile diagram:

In this figure, all the countries in the second group having low HIV prevalence are selected in the upper graph, and thus only the profiles of those countries are shown in the lower graph. We can see that these countries have high literacy rate, low HIV prevalence while the health expenditures vary are in the lower half range (i.e. the are relatively low).
In general, it is seen that not many countries have an extremely high prevalence of HIV. This can be seen in the following histogram:

STUDY 2: Prevalence of Malaria and Tuberculosis in
the year 2002. Dataset includes figures for 157 countries for (a) Prevalence of
malaria and tuberculosis, (b) Population with access to improved water sources
and improved sanitation (%), (c) Physicians per 10000 people (d) Immunization
of 1-year olds against Tuberculosis (%).
Summary of results:
· As expected, there is a high negative correlation between prevalence of tuberculosis and immunization against it.
· Prevalence of tuberculosis is also related with availability of improved water and sanitation, but a high correlation is not observed.
· Prevalence of malaria is also weakly negatively correlated with availability of improved water and sanitation.
Improved water Improved sanitation Physicians Malaria cases TB cases Immunization (%)

Hierarchical clustering divides
the countries into four clusters. In the first three clusters, the prevalence
of tuberculosis is low, while it is high in the last cluster. Prevalence of
malaria is also low in the first three clusters and relatively high in the last
cluster, but it has a lot of missing values which are shown in grey color. Some
countries in the first group are –
From this analysis, it appears that there are definitely some factors which are not considered here, which will be highly correlated with the prevalence of these diseases.
The following figure shows the histogram of tuberculosis cases in 2003. It can be seen that there is a broad distribution in tuberculosis prevalence among countries.

STUDY 3: Influence of various factors on Life expectancy
and mortality under 5-years.
Data used includes:
(a) Life expectancy at birth (%).
(b) Life expectancy index.
(c) Public health expenditure (% of GDP).
(d) One year olds fully immunized against
Tuberculosis (%).
(e) One year olds fully immunized against
Measles (%).
(f) Births attended by skilled health
personnel (%).
(g) Physicians per 100000 people.
(h) Under 5 mortality rate.
Summary of Results:
· Life expectancy is positively correlated with Births attended by skilled health personnel, public health expenditure and physicians per 100000 people and immunizations.
· Under 5 mortality is highly negatively correlated with immunizations and births attended by skilled personnel. It is also negatively correlation with public health expenditure.
Life
expectancy at birth Life
expectancy index Public
health expenditure 1-yr
olds immunized against TB 1-yr
olds immunized against Measles Births
attended by skilled health pers Physicians
per 100000 people. Under
5 mortality rate.

Hierarchical clustering clusters
the countries into three clusters. The first group has high life expectancy,
low mortality rate for ages under five and high percentage of births is
attended by skilled physicians and number of physicians per 100000 people. Some
countries in this group are –
The scatter plot below shows the relation between life expectancy and physicians per 100000 people. In this plot, the life expectancy is seen to increase as the number of physicians per 100000 increases.

The figure below shows the scatterplot of public health expenditure versus under five mortality rate. It can be seen that there is a roughly linear relationship between the two with a negative slope. When public health expenditure decreases, the under five mortality rate increases. There are a few outliers that have high mortality rate and relatively high public health expenditure.

References and credits:
Datasets
The datasets used in this study were obtained from the Human Development Report Office’s website (http://hdr.undp.org/statistics/data/advanced.cfm).
Clustering Software Package
I used the software Hierarchical Clustering Explorer (HCE) 3.5 available for download at http://www.cs.umd.edu/hcil/hce/.
The HCE is a versatile software tool which allows the user to cluster and visualize high dimensional data. I found it useful to visualize the relations (by comparing the colors) and visualize the clusters at the same time. The ability to import data from standard formats like Excel made it easy to import the data which I obtained from the Human Development Report database (which exports in Excel format).
Original version submitted on: 02/28/2006
Current version (with minor revisions) submitted:
03/10/2006
My email: neeti.ogale@gmail.com