The dataset consists of:
∑ about 1500 news stories from a ficticious newspaper plus a few other items collected by the previous investigators
∑ about 150 blog entries
∑ a few pictures (in jpg format)
∑ a few small databases (in XLS and CVS format)
∑ a few pages of background information (in .DOC or PDF format).†
The dataset contains fictitious information and was created for testing and evaluation of visual analytic tools only. No part of this dataset should be taken as real.
Note that the 2006 data set is still available, with solutions, from the VAST 2006 contest web site. This allow you to practice for the 2007 contest as we now provide preprocessed data for the 2006 data set as well. Just register as usual and download the 2006 data set and you will find the preprocessed data together with the raw data.
Here is the direct link to the readme file of the 2007 preprocessed data set† to give you an idea of what preprocessing we provide. This will help you decide which level of the contest your team should enter in 2007. We used MITREís Alembic for the entity extraction process, with some modifications and hand work. See http://www.mitre.org/tech/alembic-workbench for details on Alembic.
In 2007 you have to choose one or the other (Raw or Preprocessed).† We will keep track of who downloads which version, but of course we rely on the honor system and you should report correctly which data set you used. †We will evaluate the 2 categories separately but use similar criteria.
It is Fall of 2004 and one of your analyst colleagues has been called away from her current tasks to an emergency.† The boss has given you the assignment of picking up her investigation and completing her task.† She has been asked to pursue a line of investigation into some unexpected activities concerning wildlife law enforcement, endangered species issues, and ecoterrorism.† This isnít exactly your specialty area, but your boss believes you are one of the few people who could get to the bottom of whatever is going on. In fact, you would have been given this investigation if you hadnít been busy on another assignment when your colleague had started.†
Your colleague hasnít gotten very far, but she has assembled all the data you need to crack this case.† It is a mixed assortment of information: text, images, numbers.† The agency you work for is very accommodating -- you may use any analytical tool you need to help your investigations.
You do know a few things coming into this effort.† First, you were instrumental in cracking
those investigations from last year, so issues about mad cow or
Find the threat!† A scenario should emerge when you have pieced together the pieces of this puzzle.††
Your boss is interested in knowing the whos-whats-wheres-whens-hows and -whys of the story, and how they are connected.†
Key Questions to be answered
What is the situation in this scenario and what is your assessment of the situation?
(Note that a situation may have multiple plots.)
We will provide a standard form for you to answer those questions (see the VAST 2006 contest if you want an idea of what to expect)
For each most relevant plot (there may be only one) consider the following questions:
1.† Who are the players relevant to the plot?
†† Which of the relevant players are innocent bystanders?
†† Which of the relevant players are deliberately engaged in deceptive activities?
†††††††† How are the relevant players connected?
2.† What is the time frame in which this situation unfolded?
†††††††† What events occurring during this time frame are relevant to the plot?
3.† What locations were relevant to the plot?
†††††††† What, if any, connections are there between relevant locations?
4.† What activities were going on in this time frame?
†††††††† Which players are involved in the different activities?† This question will be answered in the form of a written debriefing document.
Remember, the goal is to answer the main question:
What is the situation and what is your assessment of the situation?
In addition we will ask for an explanation of the process you used, insights gathered from the various displays, screen prints and a video demonstration, so we can judge the utility of the tools.
Events are things that occur in a short, discrete time frame.
Activities occur over a much longer span of time.†
For example, graduating from school would be an event.† Going to graduate school is an activity.†
All the submission information is on the contest home page.
The dataset was prepared with the assistance of:
TSG team:† Jereme Haack, Carrie Varley, Wendy Cowley, Doug Love, Stephen Tratz
UPA team:† Alex Gibson, Nick Cramer
NVAC:† Jim Thomas, Richard May
Testing:† Larry Becker Jr., Dave McColgin
Advisors:† Cindy Henderson
Questions?† †Email the Contest Chairs