Information visualization is extremely important to the three major sports (American football, baseball, basketball) in the United States, as a sizable portion of each team's revenue derives from television contracts. The ability of the major networks to deliver information to their viewers at home is crucial to maintaining loyal viewers, and attracting new ones. Visualization of statistics is arguably most important to American football, hereafter referred to as football. With the plethora of rules, frequent rule changes between seasons, wide discrepencies between relevant statistical categories for each position, and relatively short careers of many players, football broadcasting needs easily digestible charts and graphs to make sense of the abundance of information. However, I've grown tired of the same standard bar graphs, tables, animations, and light-pen diagrams. These visualizations are limited in that they only compare one or two stats between one or two teams. I want visualize more stats about many teams, at the same time, and show relationships or correlations between several categories at once.
Thus, the goals of this project are two-fold: First, to explore various Parallel Coordinate tools as candidates for encoding more information about football statistics, and second, to test the validity of certain long-held notions about the game.
My Dataset is NFL football data for the 2000 season that ended in January, 2001. I gathered the data from ESPN.com and from NFL.com . Statistics for previous seasons are not readily available in digital form, and often are not available free-of-charge. This seems to be because gamblers and fantasy football enthusiasts will pay quite a lot of money for this type of information. At any rate, I have 19 statistical categories for 31 teams. My main goal is to establish Parallel Coordinates as an effective and useful method of visualizing this kind of information. This is a small data set, and I would need to conduct a more thorough analysis using more data to draw any general conclusions
Now, with Parallel Coordinates, I've encoded the same information about total yards of offense and wins. I have also encoded additional information about other offensive categories, including a breakdown of rushing offense and passing offense, as well as total points scored.
For this visualization, only the 12 teams with 10 or more wins are visible. We have eliminated the other 19 teams using the slider bar for the wins category.
We can see a stronger positive correlation between run defense, run offense, and wins than between pass defense, pass offense, and wins. This sample is too small to draw any general conclusions, but with more data we could try to visualize and verify this claim in general.
For this visualization, I left in all of the teams, but highlighted the teams with 10 or more wins. This clutters the display somewhat, but places the better teams in context with their peers.
Based on this visualization, there does not seem to be a stronger correlation between defensive rank, defensive yards per game, and wins than with the offensive categories. If we choose only the truly elite teams, i.e. teams with 11 or more wins, we see a stronger correlation between defense and winning. However, this particular visualization is important because it seems to show an unexpected inverse relationship between defensive and offensive rank, and defensive and offensive yards per game.
If I had more data on player salaries and time of possesion, I would like to delve into this subject more carefully. Announcers often claim that the defense gets tired if they're on the field too much, while they never make this claim about the offense. It's possible that there is a statistical basis for this seemingly irrational statement. Furthermore, I will bet that teams with a better defense probably have higher player salaries for the defensive squad than for their offensive unit.
It is very difficult to discern correlations with this much data. With parallel coordinates, there are too many lines to discern patterns or correlations, and with the sliders we have no notion of how dense each range is.
The trellis function lets us plot all the data without overlaying. If we pare down our search space to only the offensive categories, we can easily visualize and compare the entire league.
We can also do the same for the defense. Note that in this visualization, the 12 teams with 10 or more wins are all highlighted.
Finally, here is a trellis plotting only three things: wins, running defense, and running offense, in an attempt to discern which is most strongly related to winning. You can tell by the "slope" of the line and its relative height: if it's sloping up from right to left, this suggests that rushing defense is more important rushing offense. Whereas if there is a dip in the middle, this suggests that rushing offense is more heavily weighted towards winning. If the line is flat, this suggests that a balance is effective.
Note that the trellis function isn't necessarily supposed to be used for this. However, I think it is an excellent technique, and should be expanded.
Here is a visualization of the entire league, with the league average manually selected.
For more information about standard deviations and sports statistics, read Eddie Epstein's insightful article, The Best Defense of All Time?
A link to my impressions of this tool.
Conclusions about visualizing this data with Spotfire.:
This is a fun project. Spotfire is a very richly featured tool.
I think you can use parallel coordinates for these types of visualizations, but it would require more data and more time to make any general claims.
I already knew what I was looking for. Discerning arbitrary patterns and unknown relationships in a fresh dataset seems difficult.
A Parallel Coordinates Applet by Amit Goel from Virgina Tech. It's written in Java, it's free, and it was home-grown by a grad student. Thus, it is not fair to pick on this tool. It does have one excellent feature which spotfire lacks.
The applet is flaky, tempermental, and effectively not extendable. Though it is written in Java, it uses deprecated AWT components from jdk 1.1.X and 1.0.X. The source is available, but does not compile under 1.1.8. The code seems to contain errors in addition to the deprecation warnings. Luckily, a jar file is available which works locally. Data must be imported in STF (Simple Table Format). This is a pretty standard format for many columns of data. You cannot import Excel spreadsheets, and the Applet does nothing rather than return an error message if your data file contains errors.
I was not able to capture any screenshots because I was not able to get a large enough data set into .stf format, and successfully import this data into the Applet.
This applet does have one excellent feature which Spotfire lacks: You can drag-N-drop columns! This feature allows you to quickly move and manipulate columns to search for patterns.