03/10/2006

CMSC 838S: Information Visualization

Spring 2006

Application Project Report

 

Neeti Ogale

 

Visualization of human development data by using HCE

 

Diseases like Malaria, Tuberculosis, and AIDS exist in developing as well as developed countries. In this application project, I tried to find the correlation between the prevalence of these diseases and various factors such as literacy rate of the population, medical facilities available to the people, and public health expenditure of the country.

 

In the three studies given below, I try to find the correlation of AIDS (study 1), Malaria and tuberculosis (study 2) and general life expectancy data (study 3) with various factors. In all cases, it is observed that the countries cluster in 3 or 4 groups. (Hierarchical clustering algorithm with Pearson correlation coefficient). In the clustering results, the groups roughly correspond to the developed and less developed countries.

 

 

STUDY 1: Prevalence of AIDS: relationship with Adult literacy rate (%), Public health expenditure (as % of GDP) and HDI (human development index) for 131 countries for the year 2002.

 

Summary of Results:

·         HIV prevalence is low in countries having high literacy rate.

·         There is a weak correlation between HIV prevalence and public health expenditure.

·         HDI values are high for the countries having low HIV prevalence.

 

 

HDI Value              

Adult literacy

Health expenditure

HIV Prevalence

 

 

 

The countries are clustered in three groups by the hierarchical clustering algorithm. Some countries in the first cluster are USA, France, Canada and Austria. There countries have very low AIDS prevalence, very high health expenditures, and high adult literacy rates. Some countries have grey color for the literacy rates because of the missing data. Some countries in the second cluster are China SriLanka, Mexico and Singapore. These countries have relatively low health expenditures but high adult literacy rate, and low AIDS prevalence. But countries in the third cluster like Ghana, Zambia, Zimbabwe and Sudan, have lower literacy rates and health expenditures and relatively high AIDS prevalence.

 

The high correlation between HIV prevalence and adult literacy rate can be clearly seen from the diagram below showing k-means clustering of the columns.

 

 

In the above figure, the columns are clustered into 3 clusters by the k-means algorithm. In this, we can see that Adult literacy rate is grouped with HIV prevalence, which indicates that HIV prevalence is most related with adult literacy rate among all the factors.

 

The high positive correlation between HIV prevalence and adult literacy rate can also be seen in the following profile diagram:

 

In this figure, all the countries in the second group having low HIV prevalence are selected in the upper graph, and thus only the profiles of those countries are shown in the lower graph. We can see that these countries have high literacy rate, low HIV prevalence while the health expenditures vary are in the lower half range (i.e. the are relatively low).

 

In general, it is seen that not many countries have an extremely high prevalence of HIV. This can be seen in the following histogram:

 

 

 

STUDY 2: Prevalence of Malaria and Tuberculosis in the year 2002. Dataset includes figures for 157 countries for (a) Prevalence of malaria and tuberculosis, (b) Population with access to improved water sources and improved sanitation (%), (c) Physicians per 10000 people (d) Immunization of 1-year olds against Tuberculosis (%).

 

Summary of results:

·         As expected, there is a high negative correlation between prevalence of tuberculosis and immunization against it.

·         Prevalence of tuberculosis is also related with availability of improved water and sanitation, but a high correlation is not observed.

·         Prevalence of malaria is also weakly negatively correlated with availability of improved water and sanitation.

Improved water

Improved sanitation

Physicians

Malaria cases

TB cases

Immunization (%)

 

 

 

 

 

Hierarchical clustering divides the countries into four clusters. In the first three clusters, the prevalence of tuberculosis is low, while it is high in the last cluster. Prevalence of malaria is also low in the first three clusters and relatively high in the last cluster, but it has a lot of missing values which are shown in grey color. Some countries in the first group are – Turkey, Egypt, Brazil, Russia, Oman, Qatar, Jamaica, Peru and Thailand. These countries have high immunization and good access to water and sanitation, and low prevalence of tuberculosis and malaria. Some countries in the second group are USA, Canada, Spain, Italy, Sweden and New Zealand. These countries have very high public expenditure for health and low prevalence for tuberculosis and malaria. Some countries in the third group are France, Argentina, Greece, Ireland and Poland. These countries also have high public expenditure for health and low prevalence for tuberculosis and malaria. Some countries in the fourth group are Kenya, Indonesia, Cambodia, Namibia, and Nigeria. These countries show high prevalence of tuberculosis, low availability for improved water and sanitation and low public expenditure on health.

 

From this analysis, it appears that there are definitely some factors which are not considered here, which will be highly correlated with the prevalence of these diseases.

 

The following figure shows the histogram of tuberculosis cases in 2003. It can be seen that there is a broad distribution in tuberculosis prevalence among countries.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

STUDY 3: Influence of various factors on Life expectancy and mortality under 5-years.

 

Data used includes:

(a) Life expectancy at birth (%).

(b) Life expectancy index.

(c) Public health expenditure (% of GDP).

(d) One year olds fully immunized against Tuberculosis (%).

(e) One year olds fully immunized against Measles (%).

(f) Births attended by skilled health personnel (%).

(g) Physicians per 100000 people.

(h) Under 5 mortality rate.

 

 

Summary of Results:

·         Life expectancy is positively correlated with Births attended by skilled health personnel, public health expenditure and physicians per 100000 people and immunizations.

·         Under 5 mortality is highly negatively correlated with immunizations and births attended by skilled personnel. It is also negatively correlation with public health expenditure.

 

 

Life expectancy at birth

Life expectancy index

Public health expenditure

1-yr olds immunized against TB

1-yr olds immunized against Measles

Births attended by skilled health pers

Physicians per 100000 people.

Under 5 mortality rate.

 

 

 

 

 

Hierarchical clustering clusters the countries into three clusters. The first group has high life expectancy, low mortality rate for ages under five and high percentage of births is attended by skilled physicians and number of physicians per 100000 people. Some countries in this group are – USA, Canada, Greece, France, Switzerland and Japan. The missing values for these countries are shown in grey color. The third group has relatively low life expectancy, high under-five mortality rate, low public health expenditure, low births attended by skilled physicians and low percentage of physicians. Some countries in this group are India, Pakistan, Nepal, Bangladesh and Indonesia. The second (middle) group has life expectancy indices lower that those of the first group and higher than the third group. The values for the remaining parameters are also between those for the first and the third group. Some countries in this group are – SriLanka, Egypt, Malaysia, Singapore and Mexico.

 

 

The scatter plot below shows the relation between life expectancy and physicians per 100000 people. In this plot, the life expectancy is seen to increase as the number of physicians per 100000 increases.

 

 

 

 

 

The figure below shows the scatterplot of public health expenditure versus under five mortality rate. It can be seen that there is a roughly linear relationship between the two with a negative slope. When public health expenditure decreases, the under five mortality rate increases. There are a few outliers that have high mortality rate and relatively high public health expenditure.

 

References and credits:

 

Datasets

 

The datasets used in this study were obtained from the Human Development Report Office’s website (http://hdr.undp.org/statistics/data/advanced.cfm).

 

Clustering Software Package

 

I used the software Hierarchical Clustering Explorer (HCE) 3.5 available for download at  http://www.cs.umd.edu/hcil/hce/.

 

The HCE is a versatile software tool which allows the user to cluster and visualize high dimensional data. I found it useful to visualize the relations (by comparing the colors) and visualize the clusters at the same time. The ability to import data from standard formats like Excel made it easy to import the data which I obtained from the Human Development Report database (which exports in Excel format).

 

Original version submitted on: 02/28/2006

Current version (with minor revisions) submitted: 03/10/2006

 

My email: neeti.ogale@gmail.com