Mudit Agrawal
mudit@cs.umd.edu
Exploring factors
for ranking among various universities
using SpotFire and HCE
Goal:
The goal is to study various patterns (like education quality, faculty reputation, publications, activity measures, program size, funding etc.) in Computer Science department across various universities.
Problem:
A lot of datasets exist on the graduate and undergraduate school rankings – still the problem with majority of them is that they are sorted across total points and have fixed weights on each variable. As a PhD candidate, one is more focused on some specific aspects of the program in schools. Moreover, current ranking of schools should also be compared with the factors on which they depend, in order to get the bigger picture.
For any classification or sorting problem, there are many features that contribute to the results. It is not necessary that all these features form a total independent set. In other words, these factors or features may show a dependency on each other contributing to the fact that dimensionality reduction might be possible. E.g. in a character recognition problem, the length and breadth of a character may contribute more to classification rather than the mean. Also, area of a character might be dependent on length and breadth, hence does not contribute significantly as a separate feature.
Hence, visualization of all these aspects is pretty much the need of any new candidate – either as a faculty member or a student.
Dataset:
Top 100 academic institutions in CS department were taken
The figure 1 shown below is a sample .xls file from our datasets. The various parameters and their description are as follows:


Figure 1
Result 1: Qualified Faculty increases Education Efficiency (*Using Spotfire*)
The
aim was to figure out any relation between Education efficiency and Faculty
Qualification. When a scatter plot was visualized, these two factors seem to
closely relate to each other signifying that qualified faculty increases education efficiency.
“Evenly distributed Qualified Faculty increases Education Efficiency which leads to higher
scores”
“Lower ranked universities have higher variation among faculty
qualifications than higher ranked universities”


Result 2: Faculty to Student
ratio is higher in good-ranked universities (*Using
Spotfire*)
The data did not give any relation between number of students and faculty. Hence, in order to determine the relation between them contributing towards the rank of the universities, the scatter graph between #Faculty and #Students was plotted. The color coding (red à blue) signifies the score of the universities.
The plot shows that for the same number of students (e.g. 100-200 range), as faculty increases, the score of the universities get higher (becomes bluer).
“Higher Faculty to student ratio implies better universities”


Result 3: Are higher-ranked universities more researchy? (*Using Spotfire*)
To know the answer of this question, numbers of PhDs in CS department across various universities were plotted against %TA (% of those PhDs who got TAship). As it’s clear from the graph, for higher ranked universities (bluer data-points), the %TA of PhD candidates was much smaller than for lower ranked universities!
(Exception being shown as Illinois Inst. Of
Tech. which is lower ranked yet has lower number of %TA PhD – reason being the
very poor #Fac)


To substantiate this result, it
was essential to see whether higher-ranked universities have higher %RAs. The
graph shown below shows the same selected top-ranked universities having much
higher %RA (and IIT being an outlier
again)


Result 4: Universities vary most in #students, %RAship, %TAship and
unevenness in publications (*using HCE*)
Sometimes simple tables effuse information when combined with some graphs. Mean and Standard Deviation for each factor (across all univs) was calculated, and the top 5 most varied parameters were found to be:
Also shown are two scatter plots, one showing the proportionality relationship between faculty qualification and publications per faculty and other showing the GiniPub verses plot in clustering order. It should be noted again that universities having lower gini-pub are the ones which have higher faculty qualifications & pub/faculty (shown as selected data-item in both the plots)


Result 5: Consistency is the key to be at the top (*using HCE*)
Dendrogram view of all parameters (on parallel axes) was shown for all universities. When flooring was done to look into the top few universities, the clear variation from the silhouette is visible – showing that the top ranked universities are consistently good at all measured aspects instead of toping at one or other factors. The selected one shows the rank 1 university

Result 6: Private schools though lesser in number, have more federal
or other funding (on an average) than public ones! (*Using Spotfire*)
When Federal funds and R&D expenditure were plotted against ranking, a linear relationship appeared. However, it is noteworthy that for a given range of ranks, private schools have more R&D expenditure than public ones.
The pie-chart shows that only 29% schools are private and contribute much less R&D expenditure cumulatively (right-top bar graph), but on an average, private schools have more R&D expenditure than public ones (bottom-left bar graph)

Conclusion:
Various relations between factors contributing to a score of universities were studied using Spotfire and HCE Visualization Tool. The ease with which we can play with the data, visualize it using different color and size of data-points is remarkable. Even a single scatterplot can convey lots of information. The dynamic linking of various graphs (as in Result 4 & 6) gives a clear idea how each university (or data-item) vary in different perspectives. Since every product has some space for improvement, so do Spotfire and HCE.
Spotfire:
HCE:
Ceiling and flooring limit to the point where they meet each other. This curtails on selecting high as well as low valued data-items and ignoring the middle-ones.
Nevertheless, both tools give a lot of flexibility to the user to perform complex tasks with neat visualization. E.g. k-means clustering on school-data shown below clusters various schools in 9 (user defined) clusters pretty decently. Both tools are a great help in the area of information visualization and are boon to the researchers to better understand their data.
