Preliminary Exploration of the VAST 2006 Data

(CaseMap and Treemap)

Rachael Bradley
Graduate Student
College of Information Studies, University of Maryland College Park
rlb@umd.edu
Revised 3/5/2006

Data

The VAST dataset was created for the IEEE VAST 2006 Symposium with the goal being to "promote the development of benchmarks for visual analytics and establish a forum to advance evaluation methods" [1]. The task is to hypothesize about what events are occurring in the town of Alderwood and provide visual evidence for these events.

The data consists of:

The exploration that follows focuses primarily on the news stories, phone log and voter registry. The data sets along with additional information about VAST can be found at http://www.cs.umd.edu/hcil/VASTcontest06.

Software

CaseMap is a tool provided for conducting legal research. It allows the researcher to enter facts, objects, and issues and then graph the frequency of events against time or expand and customize a timeline of events using an extension called TimeMap [2]. This tool was selected because it provides a mechanism to import the complete text of the news articles, interactively examine them through the timeline visualization, and filter them using keyword searches.This facilitated quick interaction with all of the data.Treemap charts data across multiple dimensions and provides an alternate, complimentary method for viewing the data [3].

Hypothesis 1: Something out of the ordinary occurred involving someone in the City Council on or about January 17th.

Figure 1: Treemap of City Council Phone Log Organized by Date, Color Coded by Call Type
Figure 1: Treemap of City Council Phone Log Organized by Date, Color Coded by Call Type

On January 17th, someone at city hall called four out-of-town legal consultants (indicated in orange in Figure 1). No other similar calls were made during the logged time period.They also called the city attorney and municipal court. Exploration of the news articles provided no apparent reason for the calls to the attorneys. The occurrence most closely related in time was a news article about whistleblowers in the meet industry which came out on January 15th. It might also be worth noting that January contained the highest frequency of news relating to mad cow disease (see Figure 2); however, nothing definitive is apparent using these tools.

Figure 2: News Article Frequency in CaseMap Filtered by Terms Related to Mad Cow Disease
Figure 2: News Article Frequency in CaseMap Filtered by Terms Related to Mad Cow Disease

Hypothesis 2: There is some connection between someone at city hall and either Boyton Laboratory or Swiss developers.

Figure 1 also shows four calls made to three unknown numbers in Switzerland, indicated in yellow.The news points toward two possible reasons for this, a connection with Boyton Laboratory or a connection with land developers.Boynton Laboratories, a laboratory testing Mad Cow disease, opened in Alderwood in September 2002 and was announced by the Mayor on February, 2nd 2002. According to a news article on 1/2/02, two members of an international committee investigating a mad cow outbreak in Alderwood are from Switzerland. News after 2002 paints a questionable picture of the lab and suggests strong ties between the city council, mayor and the laboratory.If the relationship between the city government and the laboratory existed before the opening, it would explain a possible interest in the investigation by someone at city hall. The second possibility would be a connection between the city council and a land developer in Switzerland. The theme of land annexation and development occurs throughout the news stories and a Swiss developer is rumored to be involved.

Hypothesis 3: The youth who die in Alderwood are Catholics of Mexican Decent

Figure 3: Treemap of Obituaries in Alderwood organized by age, colored by religion
Figure 3: Treemap of Obituaries in Alderwood organized by age, colored by religion

Figure 3 is a treemap of the obituaries in the news. With the exception of one individual, all the people who died under the age of 30 were Catholics of Mexican decent. The breakdown remains the same regardless of whether religion or decent is used to color code it, indicating a high correlation between Catholicism and Mexican decent. The outlier, indicated in gray in the figure is a Native American. There are three individuals not shown in the map, ages 100, 103, and 107 who are all of unknown religion. The conclusion, it is better to be protestant or religiously ambiguous than Catholic in this area of the country.

Hypothesis 4: There are a few people who started voting before being born

Figure 4: Treemap of Voting records ordered by date born, color coded by voter registration date
Figure 4: Treemap of Voting records ordered by date born, color coded by voter registration date
Please Note: The upper left is the earliest birth date and lower right is the most recent; the dates increase in strips from left to right and then start again on the left.

As can be seen by Figure 4, in general, people in Alderwood tend to register to vote at about the same time relative to their age. The few outliers worth noting are the dark black dots in the bottom four bands. The darkness indicates a registration date much earlier than the birth date. No indication of how this occurred is evident in the data.

Tool Evaluation

This work relied on both CaseMap and Treemap. The treemap software provides much more compelling visualizations than the bar graphs used by CaseMap, but CaseMap allows for searching the larger full text news articles quickly and efficiently. Thus, the best method of combining these tools was using CaseMap to conduct initial investigations, treemap to visualize patters, and CaseMap to identify possible causalities.

The largest drawback to CaseMap is an automatic feature which assigns short names to individuals entered into the system and then cross references between information. This feature was originally a plus but the execution of it leaves much to be desired when using the software on large, uploaded data sets. CaseMap assigns a short name according to an algorithm based on the last and first name. It then cross references on the short name against all files. So if a person was named Sam Hill, the short name might be hills. Regrettably, this means any mention of hills in text is incorrectly cross referenced to this individual. There is no easy mechanism to correct this error and it dilutes the power of the tool. Additional visualization methods such as link visualization tools would also compliment this tool.

Treemaps provides excellent interaction with a few notable drawbacks. The first is that it does not handle the larger datasets with speed. The second is if borders are turned on with larger datasets some of the colors are lost; however turning the borders off loses the hierarchical category information as well.

Future Work

To be truly valuable, the next steps in analyzing this data would be to extract the names and relationships of people from the articles and voter records and then use a network analysis tool to explore the relationships between people. Figure 4's lack of date information illustrates this drawback.

References

  1. VAST contest 2006. http://www.cs.umd.edu/hcil/VASTcontest06
  2. CaseMap http://www.casesoft.com/
  3. Treemap http://www.cs.umd.edu/hcil/treemap/