
Dataset
The dataset consists of:
·
about
1500 news stories from a ficticious newspaper plus a few other items collected
by the previous investigators
·
about
150 blog entries
·
a
few pictures (in jpg format)
·
a
few small databases (in XLS and CVS format)
·
a
few pages of background information (in .DOC or PDF format).
The dataset contains fictitious information and was created
for testing and evaluation of visual analytic tools only. No part of this
dataset should be taken as real.
Note that the 2006 data set is still available,
with solutions, from the VAST 2006 contest
web site. This allow you to practice for the 2007 contest as we now provide
preprocessed data for the 2006 data set as well. Just register as usual and download the 2006
data set and you will find the preprocessed data together with the raw
data.
Here is the
direct link to the readme file of the 2007 preprocessed data
set to give you an idea of
what preprocessing we provide. This will help you decide which level of
the contest your team should enter in 2007. We used MITRE’s Alembic
for the entity extraction process, with some modifications and hand work. See http://www.mitre.org/tech/alembic-workbench
for details on Alembic.
In 2007 you
have to choose one or the other (Raw or Preprocessed). We will keep track of who downloads which
version, but of course we rely on the honor system and you should report
correctly which data set you used. We
will evaluate the 2 categories separately but use similar criteria.
It is Fall of 2004 and one of your analyst colleagues has
been called away from her current tasks to an emergency. The boss has given you the assignment of
picking up her investigation and completing her task. She has been asked to pursue a line of
investigation into some unexpected activities concerning wildlife law
enforcement, endangered species issues, and ecoterrorism. This isn’t exactly your specialty area, but
your boss believes you are one of the few people who could get to the bottom of
whatever is going on. In fact, you would have been given this investigation if
you hadn’t been busy on another assignment when your colleague had
started.
Your colleague hasn’t gotten very far, but she has assembled
all the data you need to crack this case.
It is a mixed assortment of information: text, images, numbers. The agency you work for is very accommodating
-- you may use any analytical tool you need to help your investigations.
You do know a few things coming into this effort. First, you were instrumental in cracking
those investigations from last year, so issues about mad cow or
Your Task
Find the threat! A
scenario should emerge when you have pieced together the pieces of this
puzzle.
Your boss is interested in knowing the
whos-whats-wheres-whens-hows and -whys of the story, and how they are connected.
Key
Questions to be answered
What is the situation in this scenario and what is your assessment of the
situation?
(Note that a situation may have multiple plots.)
We will provide a standard form for you to answer those questions
(see the VAST 2006 contest if you want an idea of what to expect)
For each most relevant plot (there may be only one) consider
the following questions:
1.
Who are the players relevant to the plot?
Which
of the relevant players are innocent bystanders?
Which
of the relevant players are deliberately engaged in deceptive activities?
How
are the relevant players connected?
2.
What is the time frame in which this situation unfolded?
What
events occurring during this time frame are relevant to the plot?
3.
What locations were relevant to the plot?
What,
if any, connections are there between relevant locations?
4.
What activities were going on in this time frame?
Which
players are involved in the different activities? This question will be answered in the form of
a written debriefing document.
Remember, the goal is to answer the main question:
What is the situation and what is your assessment of the
situation?
In addition we will ask for an explanation of the process
you used, insights gathered from the various displays, screen prints and a
video demonstration, so we can judge the utility of the tools.
Definitions
Events are things that occur in a short, discrete time
frame.
Activities occur over a much longer span of time.
For example, graduating from school would be an event. Going to graduate school is an activity.
All the submission information is on the contest home page.
Some Advice
Acknowledgments
The dataset was prepared with the assistance of:
TSG team: Jereme
Haack, Carrie Varley, Wendy Cowley, Doug Love, Stephen Tratz
UPA team: Alex
Gibson, Nick Cramer
NVAC: Jim Thomas,
Richard May
Testing: Larry Becker
Jr., Dave McColgin
Advisors: Cindy
Henderson
Questions? Email
the Contest Chairs