IEEE VAST 2007 Contest
Update History and FAQs


June 19th
The answer form was updated (it was a draft before) .  Only minor changes:  added a question about data set used,name and email of faculty sponsor for student teams, improved layout of tables to be more consistent, link to judging page.
- The details of the submission materials are now posted on the main contest page.

April 24th
The top entry(ies) will coauthor a joint journal paper with the contest chairs, which will appear in the Applications Department of the March 2008 issue of Computer Graphics and Application (CG&A)  (was added on home page)
- Judging information updated, draft answer form posted.

April 11th:
Added FAQ Q9, + we decided not to ask to submit a 2 page summary (we will only ask for it from the selected entries)

March 14th
The preprocessed data set is not available.  FAQ added below (see Q7 and Q8)

February 27th,  2007: 
Raw Dataset available for download.

February 15th, 2007
Website up; announcement sent out to mailing lists
Also, to allow you to practice for the 2007 contest we provided preprocessed data for the 2006 contest data.  Just register as usual and download the 2006 data set and you will find the preprocessed data with it.  Here is the direct link to the readme file to have an idea of what we provide for 2006.


Note: FAQs from 2006 which were still valid were copied here.

Q1.  Is this data set similar to those given to real-life analysts?


A1.  The easy answer to this question is "yes" and "no".  Real-life analysts are given anything from hundreds of thousands of network traffic records, to photographs and excel spreadsheets, to forensic ballistic evidence.  Much depends on what question that analyst is being asked to answer, and the mission of the agency for which the analyst works.  The data set with which you are working is a realistic one, and contains all manner of information, important and not, through which a "real life" analyst would have to sift in order to provide a hypothesis to a decision-maker.  The plot of the dataset isn't one you might find on the front page of the Washington Post, but then stranger things have happened!



Q2.  We're tool builders, not analysts.  Can we participate without an analyst on our team?


A2.  Yes!  When it comes down to it, analysis is simply using a systematic logical thinking methodology; like using the "scientific method" for an experiment. There are standard analysis methods that are well documented and that you could reference and use in your efforts, for example: Link Analysis, Analysis of Competing Hypothesis, Social Network Analysis. 

If you are interested to learn more, there are also a couple of books you might want to look at:

Heuer, Jr. Richard J, Psychology of Intelligence Analysis

Morgan D. Jones, The Thinker's Toolkit:  14 Powerful Techniques for Problem Solving  

A nice article about subjective thinking and competing hypotheses is at: (or our local copy of the paper)

or see the presentations at:


Another site to explore:


BUT remember that we are interested in new approaches and ideas as well!  Surprise us… What really matters here is answering the questions.



Q3. I found some interesting anomalies in the dataset.  I should report these, right?


A3.  Possibly -- only report data anomalies if they are relevant to your hypotheses and/or conclusions.  For example, if you found that the days of the news articles are only Monday, Wednesday, and Friday, that might be an anomaly when just considering data, but you shouldn't report that unless you can find a link to the scenario.  However, if you found that stories about "Sam" were always associated with Merino sheep and Merino sheep play a part in your evolving hypotheses, then you should report this in some way. 



Q4. Could you move the deadline to August? That would allow us to have a summer intern to help out.


A4. Unfortunately the July deadline IS firm as we need to determine who will participate in the live session, and also to collect the camera ready materials in time.



Q5. How much you are allowing teams to "build a tool to fit this data" – i.e. we could wind up building a tool incrementally, trying to solve the problem as we went thru it and bringing in custom pieces as needed, even building more special tools to present the "answer" afterwards. I wasn't sure if that was allowed. Some contests (like TREC) are not run that way (e.g. TREC is more "run your best existing system on this test data", you're not allowed to look at the test data). Then we definitely couldn't participate, as we'd have to pull in a lot of things to extend our system to work with this type of unstructured data.


A5. (Updated 07/2007) Your approach is fine for the contest, but keep in mind that if your tool does well and is selected for the live event at the symposium, it will be used for a different – simpler but similar – problem.  The live event at the symposium is not a contest but an opportunity for top scoring teams to get feedback from analysts.  Teams will only be given the data only an hour or two prior to the event.  Teams will be comprised of one or two members from the tool builder team and a professional analyst, working together to assess as much as possible of a new situation in a few hours.  It is important that your contest submission describes the process by which you arrived at answers and identify whether your success only can occur for this specific dataset (in other words does your process generalizes?) If you feel it does then that's great. 



Q6.  Can PNNL employees participate?


A6.  PNNL employees cannot participate if their team is not clearly separated from the group which created the dataset.  If the separation is clear, the team can submit their entry to the contest but will not considered for a prize (i.e. they will be “hors concours”).



Q7. Are the teams who use the pre-processed data judged separately from the teams who use only the raw data?  If not, how do the criteria for each entry type differ?

A7. The teams who used the pre-processed data will be judged in a separate category than those teams who use the raw data and do their own processing.  The same criteria for each type of entry will, however, be the same.


Q8. What tool did you use for the preprocessing:

We used MITRE’s Alembic with some modifications and some manual editing. See  for details on Alembic. 


Q9. Was metadata extracted from the pictures (and provided in the preprocessed data set)?

NO, we only preprocessed the text, so you have to look at the pictures yourselves and add that information.


Questions?   Email the Contest Chairs

Return to Dataset and Tasks
Return to VAST 2007 Contest