CMSC 838S Application Report

Analyzing Medical School Admission Data

By Vlad Morariu (morariu@umd.edu)

February 28, 2006

 

Introduction

 

This report presents medical school admission data analyzed using two tools:  TreeMap 4.1.1 and Spotfire DecisionSite 8.0.  The data consisted of two datasets available on the Association of American Medical Colleges website, http://www.aamc.org/data/facts/start.htm.  Both datasets provided GPA and MCAT scores for the year 2005, but one split the data by states and the other split the data by undergraduate majors.  Though the analysis provided many insights, the four most important were the following: 1) Biological Sciences majors have low acceptance rates compared to most others; 2) most states have roughly the same average acceptance rate and MCAT scores, but Puerto Rico, Canada, and U.S. Territories are glaring exceptions; 3) some states and regions have much higher acceptance rates than others; and 4) by major, acceptance rates depend more directly on MCAT scores than on GPA. 

 

Biological Sciences majors have low acceptance rates compared to most others

 

Surprisingly, Biological Sciences majors do not have higher average acceptance rates than most others.  This is interesting because most students who plan to attend medical school study majors in Biological Sciences, and one might expect that they also have higher acceptance rates.  In fact, Figure 1 shows that Math and Statistics, Humanities, and Social Sciences all have higher acceptance rates than Biological Sciences!  It is also interesting to note that Specialized Health Sciences have the lowest acceptance rate.  By simply looking at the major descriptions, one would expect Biological, Physical, and Specialized Health Sciences to top the list when ranked by acceptance rate, since they seem to be most related to the medical field.  Figures 2 and 3 show that students in Biological Sciences have good scores in the Biology section of the MCATs (tied for third place with Humanities), but have poor scores in the other two sections.  This hints to the possibility that the Biology section of the MCATs is not the most important in determining admission.  Also, Figure 4 reinforces all of the points made so far, but also shows that the Math and Statistics majors are the smallest group of medical school applicants and have the largest acceptance rate.  The Biological Sciences is indeed the largest group of applicants with only average acceptance rates. 

Figure 1.  Bar graph of total GPA and MCAT scores of applicants.  Specialized Health Sciences majors perform the worst, and Math and Statistics perform the best.

 

Figure 2.  This heat map again shows that applicants who are Specialized Health Science majors have the lowest scores in the three sections of the MCAT shown above.  Not surprisingly, Humanities majors performed the best in the verbal section and Math and Statistics and Physical Sciences majors performed the best in the physical science section. 

 

Figure 3.  Bar graph version of data shown in Figure 2.  The bar graph makes relationships between scores more evident, but does not provide an overview as quickly as the heat map.

 

Figure 4.  TreeMap view of the acceptance rate for applicants.  The size of the squares represents the number of applicants for each major, and the color represents acceptance rate.  Specialized Health Science has the worst acceptance rate, and although Math and Statistics applicants are few in numbers, they have the highest acceptance rate, followed by Humanities majors.

 

Most states have roughly the same average acceptance rate and MCAT scores, but Puerto Rico, Canada, and U.S. Territories are glaring exceptions

 

Figure 5 shows that all states are somewhat tightly clustered around an acceptance rate of 50% and an MCAT score of 30.  However, Puerto Rico, the U.S. Territories, Canada, and other countries are very noticeable outliers.  Puerto Rico for some reason has the highest average acceptance rate despite having the lowest average MCAT score.  What could cause this result?  And why does Canada have such a low acceptance rate despite having average MCAT scores?  Though by major there is a linear trend between MCAT scores and acceptance rate, when the averages are calculated by state, the trend is no longer apparent.  Why is there such a result?

 

Figure 5.  Scatter plot that shows MCAT scores against acceptance rate by state.  Colors indicate region.  Most states are clustered around an average point, but Puerto Rico, the U.S. Territories, Canada, and Other are noticeable outliers. 

 

Some states and regions have much higher acceptance rates than others

 

Figure 6 shows the acceptance rate of each state in descending order.  Puerto Rico has a very high acceptance rate of 63% and Canada has an acceptance rate of only 6%.  The colors of the bars show that each region clusters in a somewhat ordered fashion.  The Northeast seems to have the highest average acceptance rate, followed by the Central region, and then by the Western region.  The Southern region is interesting since it is spread throughout the graph, having both some of the highest acceptance rates and some of the lowest acceptance rates at the same time.   Figure 7 shows similar results, that some states have much higher acceptance rate than others, but it also shows that those states happen to be some of the smaller states.  Although some of the states have a high average acceptance rate in Figures 6 and 7 and a high average MCAT score in Figure 8, there is not a clear relationship between average acceptance rates and MCAT scores when they are calculated by state.  In fact, Puerto Rico has the lowest MCAT score and highest acceptance rate. 

 

 

Figure 6. This bar graph shows the acceptance rate by state sorted from highest to lowest.  The colors indicate the region that each state belongs to.  Puerto Rico and Canada hav a very high and low acceptance rates, respectively.

Figure 7.  TreeMap showing total number of applicants as size and acceptance rate as color.  Some of the smaller states have the highest acceptance rates, and most of them are located in either the Southern or Northeastern regions.

Figure 8. This TreeMap shows the total number of applicants as size and the MCAT scores as color.  Puerto Rico clearly has the lowest MCAT scores.  Some of the Northeastern states that have high acceptance rates also have high MCAT scores, but the Southern states do not have the MCAT scores to match the acceptance rates.

 

By major, acceptance rates depend more directly on MCAT scores than on GPA 

 

The last three figures show how average acceptance rates are related to GPA and MCAT scores for each major.  Figures 9 and 10, show that Social Sciences is an outlier in both cases, and Humanities is an outlier when compared only with the science GPA.  Thus, the GPA scores do not fully predict acceptance rates, at least when the averages are computed by major.  When MCAT scores are compared with acceptance rates, the linear relationship becomes very evident.  Thus, it seems that the acceptance rate per major is more closely related to average MCAT score than to the GPA of students in that major (even though MCAT scores are probably related to GPA, for the most part).   When comparing only to the science GPA, it is not surprising that Social Sciences and Humanities do not fall in line with the other majors; though those majors might have weaker GPAs in the science classes, they have higher verbal skills that bolster their overall MCAT scores.   However, the true relationship between acceptance rate and GPA or MCAT score might be more evident if each applicant’s values were used instead of the average values per major.

Figures 9-11.  Scatter plots of GPA, science GPA, and MCAT scores versus acceptance rate.  Although a linear relationship is generally present, MCAT scores best explain acceptance rates. 

 

 

 

Critique of Software Used

 

Both Spotfire and TreeMap performed very well during our analysis.  The format changes necessary to input data into both were only minimal and the tools loaded the datasets without much input from the user.  One shortcoming of TreeMap is that it does not accept XLS files, the format of Excel files.  However, XLS files can be easily converted to tab separated TXT files, which can be read by TreeMap.  Also, changing data colors or label font size was somewhat more complicated than desired in Spotfire.  Another shortcoming of Spotfire is that there is no option (at least an easy-to-find option, if one does indeed exist) to also export a legend along with a view.  When a graph contains color coded information, a legend is necessary.  Also, TreeMap suffers from a similar problem: it cannot export views at all, unless a screenshot is manually taken by the user.  Such functions as exporting views and including the proper information in the views are important in software such as TreeMap and Spotfire since the result data analysis is often a graph or visual representation of the findings that needs to be included in a report.  Although the two software tools were meant to examine differing types of data, they both provided good insight into different aspects of our data. 

 

Conclusions

 

Spotfire and Treemap provided meaningful visualizations of the medical school data and led to interesting conclusions.  Both TreeMap and Spotfire performed well during the analysis, each providing valuable insight about the data.  Though the tools can always have more features such as easier color or font setup for Spotfire, better view export functions for TreeMap, and improved import functions for both, they were very user-friendly.  Both tools facilitated the discovery of relationships in the data without requiring a long learning period.