CMSC 838S Application Report
Analyzing
By Vlad Morariu (morariu@umd.edu)
Introduction
This report presents medical school admission data analyzed using
two tools: TreeMap 4.1.1 and Spotfire
DecisionSite 8.0. The data consisted of
two datasets available on the Association of American Medical Colleges website,
http://www.aamc.org/data/facts/start.htm. Both datasets provided GPA and MCAT scores
for the year 2005, but one split the data by states and the other split the
data by undergraduate majors. Though the
analysis provided many insights, the four most important were the following: 1)
Biological Sciences majors have low acceptance rates compared to most others;
2) most states have roughly the same average acceptance rate and MCAT scores,
but Puerto Rico, Canada, and U.S. Territories are glaring exceptions; 3) some
states and regions have much higher acceptance rates than others; and 4) by
major, acceptance rates depend more directly on MCAT scores than on GPA.
Biological Sciences majors have low acceptance rates
compared to most others
Surprisingly, Biological Sciences majors do not have higher
average acceptance rates than most others. This is interesting because most students who
plan to attend medical school study majors in Biological Sciences, and one
might expect that they also have higher acceptance rates. In fact, Figure 1 shows that Math and
Statistics, Humanities, and Social Sciences all have higher acceptance rates
than Biological Sciences! It is also
interesting to note that Specialized Health Sciences have the lowest acceptance
rate. By simply looking at the major
descriptions, one would expect Biological, Physical, and Specialized Health
Sciences to top the list when ranked by acceptance rate, since they seem to be
most related to the medical field.
Figures 2 and 3 show that students in Biological Sciences have good
scores in the Biology section of the MCATs (tied for third place with
Humanities), but have poor scores in the other two sections. This hints to the possibility that the
Biology section of the MCATs is not the most important in determining
admission. Also, Figure 4 reinforces all
of the points made so far, but also shows that the Math and Statistics majors
are the smallest group of medical school applicants and have the largest
acceptance rate. The Biological Sciences
is indeed the largest group of applicants with only average acceptance
rates.

Figure 1. Bar graph of total GPA and MCAT scores of
applicants. Specialized Health Sciences
majors perform the worst, and Math and Statistics perform the best.

Figure 2. This heat map again shows that applicants who
are Specialized Health Science majors have the lowest scores in the three
sections of the MCAT shown above. Not
surprisingly, Humanities majors performed the best in the verbal section and
Math and Statistics and Physical Sciences majors performed the best in the
physical science section.

Figure 3. Bar graph version of data shown in Figure
2. The bar graph makes relationships
between scores more evident, but does not provide an overview as quickly as the
heat map.

Figure 4. TreeMap view of the acceptance rate for
applicants. The size of the squares
represents the number of applicants for each major, and the color represents
acceptance rate. Specialized Health
Science has the worst acceptance rate, and although Math and Statistics
applicants are few in numbers, they have the highest acceptance rate, followed
by Humanities majors.
Most states have roughly the same average acceptance
rate and MCAT scores, but
Figure 5 shows that all states are somewhat tightly
clustered around an acceptance rate of 50% and an MCAT score of 30. However,

Figure 5. Scatter plot that shows MCAT scores against
acceptance rate by state. Colors
indicate region. Most states are
clustered around an average point, but
Some states and regions have much higher acceptance
rates than others
Figure 6 shows the acceptance rate of each state in
descending order.
Figure 6. This bar
graph shows the acceptance rate by state sorted from highest to lowest. The colors indicate the region that each
state belongs to. Puerto Rico and Canada
hav a very high and low acceptance rates, respectively.

Figure 7. TreeMap showing total number of applicants as
size and acceptance rate as color. Some
of the smaller states have the highest acceptance rates, and most of them are
located in either the Southern or Northeastern regions.

Figure 8. This
TreeMap shows the total number of applicants as size and the MCAT scores as
color.
By major, acceptance rates depend more directly on
MCAT scores than on GPA
The last three figures show how average acceptance rates are
related to GPA and MCAT scores for each major.
Figures 9 and 10, show that Social Sciences is an outlier in both cases,
and Humanities is an outlier when compared only with the science GPA. Thus, the GPA scores do not fully predict
acceptance rates, at least when the averages are computed by major. When MCAT scores are compared with acceptance
rates, the linear relationship becomes very evident. Thus, it seems that the acceptance rate per
major is more closely related to average MCAT score than to the GPA of students
in that major (even though MCAT scores are probably related to GPA, for the
most part). When comparing only to the
science GPA, it is not surprising that Social Sciences and Humanities do not
fall in line with the other majors; though those majors might have weaker GPAs
in the science classes, they have higher verbal skills that bolster their
overall MCAT scores. However, the true relationship between
acceptance rate and GPA or MCAT score might be more evident if each applicant’s
values were used instead of the average values per major.



Figures 9-11. Scatter plots of GPA, science GPA, and MCAT
scores versus acceptance rate. Although
a linear relationship is generally present, MCAT scores best explain acceptance
rates.
Critique of Software Used
Both Spotfire and TreeMap performed very well during our
analysis. The format changes necessary
to input data into both were only minimal and the tools loaded the datasets
without much input from the user. One
shortcoming of TreeMap is that it does not accept XLS files, the format of
Excel files. However, XLS files can be
easily converted to tab separated TXT files, which can be read by TreeMap. Also, changing data colors or label font size
was somewhat more complicated than desired in Spotfire. Another shortcoming of Spotfire is that there
is no option (at least an easy-to-find option, if one does indeed exist) to
also export a legend along with a view.
When a graph contains color coded information, a legend is
necessary. Also, TreeMap suffers from a
similar problem: it cannot export views at all, unless a screenshot is manually
taken by the user. Such functions as
exporting views and including the proper information in the views are important
in software such as TreeMap and Spotfire since the result data analysis is
often a graph or visual representation of the findings that needs to be
included in a report. Although the two
software tools were meant to examine differing types of data, they both
provided good insight into different aspects of our data.
Conclusions
Spotfire and Treemap provided meaningful visualizations of the
medical school data and led to interesting conclusions. Both TreeMap and Spotfire performed well
during the analysis, each providing valuable insight about the data. Though the tools can always have more
features such as easier color or font setup for Spotfire, better view export
functions for TreeMap, and improved import functions for both, they were very user-friendly. Both tools facilitated the discovery of relationships
in the data without requiring a long learning period.