Q1. Is this data set similar to those given to
real-life analysts?
A1.
The easy answer to this question is "yes" and
"no". Real-life analysts are
given anything from hundreds of thousands of network traffic records, to photographs
and excel spreadsheets, to forensic ballistic evidence. Much depends on what question that analyst is
being asked to answer, and the mission of the agency for which the analyst
works. The data set with which you are
working is a realistic one, and contains all manner of information, important
and not, through which a "real life" analyst would have to sift in
order to provide a hypothesis to a decision-maker. The plot of the dataset isn't one you might
find on the front page of the Washington Post, but then stranger things have
happened!
Q2. We're tool builders, not analysts. Can we participate without an analyst on our
team?
A2.
Yes! When it comes down to it,
analysis is simply using a systematic logical thinking methodology; like using
the "scientific method" for an experiment. There are standard analysis
methods that are well documented and that you could reference and use in your
efforts, for example: Link Analysis, Analysis of Competing Hypothesis, Social
Network Analysis.
If you are interested to learn more,
there are also a couple of books you might want to look at:
Heuer, Jr. Richard J, Psychology of
Intelligence Analysis
Morgan D. Jones, The Thinker's
Toolkit: 14 Powerful Techniques for
Problem Solving
A nice article about subjective
thinking and competing hypotheses is at:
http://www.dodccrp.org/events/10th_ICCRTS/CD/papers/126.pdf
(or our local copy of the paper)
or see the presentations at:
http://www.dodccrp.org/events/10th_ICCRTS/CD/presentations/126.pdf
Another site to explore:
http://www.insna.org/INSNA/na_inf.html
BUT remember that we are interested
in new approaches and ideas as well!
Surprise us… What really matters here is answering the questions.
Q3.
I found some interesting anomalies in the dataset. I should report these, right?
A3.
Possibly -- only report data anomalies if they are relevant to your
hypotheses and/or conclusions. For example,
if you found that the days of the news articles are only Monday, Wednesday, and
Friday, that might be an anomaly when just considering data, but you shouldn't
report that unless you can find a link to the scenario. However, if you found that stories about
"Sam" were always associated with Merino sheep and Merino sheep play
a part in your evolving hypotheses, then you should report this in some
way.
Q4.
Could you move the deadline to August? That would allow us to have a summer
intern to help out.
A4. Unfortunately the July deadline
IS firm as we need to determine who will participate in the live session, and
also to collect the camera ready materials in time.
Q5.
How much you are allowing teams to "build a tool to fit this data" –
i.e. we could wind up building a tool incrementally, trying to solve the
problem as we went thru it and bringing in custom pieces as needed, even
building more special tools to present the "answer" afterwards. I
wasn't sure if that was allowed. Some contests (like TREC) are not run that way
(e.g. TREC is more "run your best existing system on this test data",
you're not allowed to look at the test data). Then we definitely couldn't
participate, as we'd have to pull in a lot of things to extend our system to
work with this type of unstructured data.
A5.
(Updated 07/2007) Your approach is fine for the contest, but keep in mind that
if your tool does well and is selected for the live event at the symposium, it
will be used for a different – simpler but similar – problem. The live event at the symposium is not a
contest but an opportunity for top scoring teams to get feedback from
analysts. Teams will only be given the data only an hour or two prior to
the event. Teams will be comprised of
one or two members from the tool builder team and a professional analyst,
working together to assess as much as possible of a new situation in a few
hours. It is important that your contest
submission describes the process by which you arrived at answers and identify
whether your success only can occur for this specific dataset (in other words
does your process generalizes?) If you feel it does then that's great.
Q6. Can PNNL employees participate?
A6. PNNL employees
cannot participate if their team is not clearly separated from the group which
created the dataset. If the separation
is clear, the team can submit their entry to the contest but will not
considered for a prize (i.e. they will be “hors concours”).
Q7.
Are the teams who use the pre-processed data judged separately from the teams
who use only the raw data? If not, how do the criteria for each entry
type differ?
A7. The teams who used the pre-processed data will
be judged in a separate category than those teams who use the raw data and do
their own processing. The same criteria for each type of entry will,
however, be the same.
Q8.
What tool did you use for the preprocessing:
We used MITRE’s Alembic with some modifications and
some manual editing. See http://www.mitre.org/tech/alembic-workbench
for details on Alembic.
Q9.
Was metadata extracted from the pictures (and provided in the preprocessed data
set)?
NO, we only preprocessed the text, so you have to
look at the pictures yourselves and add that information.
Questions? Email
the Contest Chairs
Return
to Dataset and Tasks
Return to VAST 2007 Contest