CMSC 828S/838S: Information Visualization

Exploring the Ozone:
Interactive Visualization of EPA Ozone Data with SpotFire

October 4, 1999

Bill Kules
wmk@takoma-software.com

Introduction

Air pollution is a recognized problem in the United States. Smog, soot and other pollutants pose a well-known hazard to public health. The Environmental Protection Agency tracks these pollutants in its Aerometric Information Retrieval System (AIRS) database and makes the data (beginning with 1994) available in summarized form on the AIRSData web site (http://www.epa.gov/airsdata/).

Using the AIRSData web site, a user can create custom queries and reports with tabular outputs, based on criteria such as location, specific pollutants and year. A limited number of pre-generated, large-scale maps are available. No interactive map tool or visual presentations are available.

The purpose of this visualization is to augment the AIRSData web site by providing an interactive tool for visualizing one pollutant, ozone. SpotFire Pro 4 is used to provide several ways to explore this data. A geographic map is used to plot the location of monitoring stations and associated ozone readings. Other scattergraphs are used to show correlation (or lack of correlation) between several attributes of the data. Using SpotFire controls, the user can quickly zoom to a specific geographic location, then change views to look at trends or compare ozone readings.

Description of Data

Each record in the AIRS Data ozone table summarizes a year of measurements made at a specific site. In addition to fields for the location and characteristics of the site, each record contains the measured ozone value for the day with the highest ozone value, as well as the second, third and fourth highest days. Each record also contains the number of days in the year in which the measured ozone value exceeded the National Ambient Air Quality Standards value of 0.12 ppm in a one-hour average. This attribute is called the Actual Number of Exceedences. An estimated value is computed (called the Estimated Number of Exceedences), adjusting for differences in monitoring. This latter value provides an indication of air quality during the year and is one of the primary attributes displayed in the visualization.

Most locations have data starting in 1994, so each site is generally represented up to 5 times in the data table. The 1999 data is complete only through July, and cannot be directly compared to other years, although it is retained in this visualization.

Method

Two visualizations were created. The first contains 5841 records covering the entire United States. The second contains 277 records covering Maryland, Virginia, the District of Columbia and Delaware. For both, a new column was created by binning, to collapse the Estimated Number of Exceedences into one of three values and displayed as follows:

Exceedence Value

Binned Value

Display color

0

0

Green

1

1

Yellow

2 or more

2

Red

A geographic view plots the site location and colors the markers according to the binned exceedence value. A background image provides approximate state boundaries.

A temporal view plots the estimated exceedence value (not the binned value) by year. In the U.S. visualization, markers are colored according to the location type (urban, suburban, rural). In the regional visualization, markers are colored according to the state.

Observations

The visualizations clearly show several phenomena at the national level:

Figure 1 shows all EPA monitoring sites in the U.S. By hiding the green markers (Figure 2), we can easily see that the East and West Coasts and the Great Lakes region have the most significant problem with high ozone days.

Figure 1. All EPA Ozone monitoring sites in US

Figure 2. EPA Ozone monitoring sites exceeding permissable ozone levels

Figure 3 shows that the sites with a large number of high ozone days are commercial or residential, with some industrial and one forest. Location type (urban, suburban or rural) is encoded in the color, and there is no apparent correlation between that and the number of high ozone days. Note that the markers have been artificially spread out by adding jitter so that they are more visible.

Figure 3. Exceedences by land use and location type

Figure 4 shows the same view, with just records from San Bernadino and Los Angeles Counties (California). They had by far the largest ozone problems in 1994. Figure 5 shows the same records in a temporal view. Since 1994, these sites have seen steady reductions in high ozone days.

 

Figure 3. Exceedences by land use and location type in San Fransisco and San Bernadino counties

Figure 5. High ozone days per year by year for in San Fransisco and San Bernadino counties

At the regional level, the visualizations can be used to explore specific sites and trends at each site. For example, readings at the NASA Goddard Space Flight Center (Greenbelt, MD) have fluctuated significantly.

 

By looking at data through a one year sliding window, we see that the air quality in the region was markedly worse in 1997 (figure 7) than in 1996 (figure 6).

Figure 6. EPA Ozone monitoring sites in MD/DE/DC/VA region, 1996

Figure 7. EPA Ozone monitoring sites in MD/DE/DC/VA region, 1997

 

Although not shown here, as saw at the national level, there is no apparent correlation between location type and ozone problems.

Limitations and Opportunities for Improvement

This visualization shows only a single pollutant. With a modest effort, it could be extended to include the other six pollutants available from AIRSData.

The bitmaps used for the state boundaries are mediocre. The do not register well against the latitude/longitude grid, and this leads to the appearance of markers in the Atlantic or Pacific Oceans, or in the wrong state.

Many markers occupy the same point, due to the geographic and repeating nature of the samples. To prevent one marker from hiding all other markers at a specific location, the locations are artificially spread out with the addition of jitter. This allows the user to see the otherwise occluded data points. However, it can also cause data points from adjacent locations to overlap, making it difficult to interpret a plot.

Critique of SpotFire

SpotFire's mapping capabilities are somewhat limited because it is a general tool, not a mapping tool, such as MapInfo. For example, it does not provide any built-in geocoding functions. Nor does it provide any way to lock the aspect ratio of latitude/longitude. When the visualization window is resized (e.g. due to tiling), this can result in a very distorted map. The lack of built-in layers for political boundaries or other geographic features necessitates the use of bitmapped images, with consequent sizing and positioning problems.

SpotFire allows the user to ability to dynamically filter records using the dynamic query controls. It also allows the user to tightly couple alternate views by marking and selecting a set of records. This hides all other records in all views, allowing the user to easily view the reduced set of records from different perspectives. Unfortunately, there appears to be no way to unhide records without resetting all dynamic query parameters. This can have the undesirable side effect of undoing a zoomed view.

The window tiling capability would be more useful on larger screens, but on a 14" or 15" screen the tiled windows are too small.

Overall, SpotFire worked well for this of visualization.

Lessons Learned

I learned two lessons about visualizations during this short project:

Web Accessibility