IEEE VAST 2007 Contest
Information about data set and tasks

Dataset

The dataset consists of:

         about 1500 news stories from a ficticious newspaper plus a few other items collected by the previous investigators

         about 150 blog entries

         a few pictures (in jpg format)

         a few small databases (in XLS and CVS format)

         a few pages of background information (in .DOC or PDF format).

 

 

The dataset contains fictitious information and was created for testing and evaluation of visual analytic tools only. No part of this dataset should be taken as real.

Two forms: RAW and PREPROCESSED

New this year we provide a pre-processed version of the dataset (e.g. with entities already extracted) so that more teams can participate. Your team will need to decide whether to enter the contest using the preprocessed data or the raw data. 


Note that the
2006 data set is still available, with solutions, from the VAST 2006 contest web site. This allow you to practice for the 2007 contest as we now provide preprocessed data for the 2006 data set as well. Just register as usual and download the 2006 data set and you will find the preprocessed data together with the raw data.

Here is the direct link to the readme file of the 2007 preprocessed data setto give you an idea of what preprocessing we provide. This will help you decide which level of the contest your team should enter in 2007.   We used MITREís Alembic for the entity extraction process, with some modifications and hand work. See http://www.mitre.org/tech/alembic-workbench for details on Alembic.

In 2007 you have to choose one or the other (Raw or Preprocessed).We will keep track of who downloads which version, but of course we rely on the honor system and you should report correctly which data set you used. We will evaluate the 2 categories separately but use similar criteria.

Scenario

 

It is Fall of 2004 and one of your analyst colleagues has been called away from her current tasks to an emergency.The boss has given you the assignment of picking up her investigation and completing her task.She has been asked to pursue a line of investigation into some unexpected activities concerning wildlife law enforcement, endangered species issues, and ecoterrorism.This isnít exactly your specialty area, but your boss believes you are one of the few people who could get to the bottom of whatever is going on. In fact, you would have been given this investigation if you hadnít been busy on another assignment when your colleague had started.

 

Your colleague hasnít gotten very far, but she has assembled all the data you need to crack this case.It is a mixed assortment of information: text, images, numbers.The agency you work for is very accommodating -- you may use any analytical tool you need to help your investigations.

 

You do know a few things coming into this effort.First, you were instrumental in cracking those investigations from last year, so issues about mad cow or Alderwood, Washington, are not part of this.Also, you know a little about ecoterrorism and animal rights groups, so, for example, the activities of the People for the Ethical Treatments of Animals (PETA) and Earth Liberation Front (ELF) are not of interest, unless they happen to be tied to some larger, or more pervasive plot.

 

Your Task

 

Find the threat!A scenario should emerge when you have pieced together the pieces of this puzzle.††

 

Your boss is interested in knowing the whos-whats-wheres-whens-hows and -whys of the story, and how they are connected.

 

Key Questions to be answered


What is the situation in this scenario and what is your assessment of the situation?

(Note that a situation may have multiple plots.)

 

We will provide a standard form for you to answer those questions (see the VAST 2006 contest if you want an idea of what to expect)

 

For each most relevant plot (there may be only one) consider the following questions:

1.Who are the players relevant to the plot?

†† Which of the relevant players are innocent bystanders?

†† Which of the relevant players are deliberately engaged in deceptive activities?

†††††††† How are the relevant players connected?

2.What is the time frame in which this situation unfolded?

†††††††† What events occurring during this time frame are relevant to the plot?

3.What locations were relevant to the plot?

†††††††† What, if any, connections are there between relevant locations?

4.What activities were going on in this time frame?

†††††††† Which players are involved in the different activities?This question will be answered in the form of a written debriefing document.

 

Remember, the goal is to answer the main question:

What is the situation and what is your assessment of the situation?

 

In addition we will ask for an explanation of the process you used, insights gathered from the various displays, screen prints and a video demonstration, so we can judge the utility of the tools.

 

Definitions

Events are things that occur in a short, discrete time frame.

Activities occur over a much longer span of time.

For example, graduating from school would be an event.Going to graduate school is an activity.

 

All the submission information is on the contest home page.

 

Some Advice

 

  • The dataset development team tries very hard to create a scenario and dataset that are both believable and interesting and have a strong tie to reality. However, to create a synthetic dataset, there is an element of ďletís pretendĒ.When you are analyzing the dataset -- be flexible and go with the scenario.Think of the dataset as you would a mystery novel.We know there was no widely famous detective working in England named Hercule Poirot in the 1920ís era, yet if we suspend our disbelief for a while, his stories are enjoyable and sometimes educational!
  • If you have questions about the dataset or analysis, ask them!
  • Not all data is as important as others.In fact, some data may be red herrings.Consider all information very carefully as you evaluate your hypotheses.

 

Acknowledgments

The dataset was prepared with the assistance of:

TSG team:Jereme Haack, Carrie Varley, Wendy Cowley, Doug Love, Stephen Tratz

UPA team:Alex Gibson, Nick Cramer

NVAC:Jim Thomas, Richard May

Testing:Larry Becker Jr., Dave McColgin

Advisors:Cindy Henderson

 

Questions?Email the Contest Chairs

 

Return to VAST 2007 Contest page