IEEE VAST 2008 Challenge
Judging Criteria
Last updated June 4th, 2008
(May still
be slightly revised. Unlikely significant
changes will be reflected in the history page)
Submissions will be reviewed by judges from the analytic
community and judges from the visualization community. Two teams of judges will review the
submissions separately and merge their findings.
Both qualitative and quantitative metrics will be
computed. The quantitative metrics evaluate
the correctness of the answers based on the “ground truth” embedded in the
dataset. The qualitative metrics are
based on the perceived utility of the system including the visualizations and the
analytic process. These will be based on
your descriptive explanations (short or detailed answers). The section below details the scoring
criteria.
Judging
will be based on:
1.
The correctness of answers to the questions and the evidence provided. Participants will be given points for correct
answers and penalized for incorrect answers. The correct answers will be based
on the “ground truth” embedded in the data.
For example
·
LIST OF
PARTICIPANTS if you
are asked for the participants in a certain activity, you will be given points
for finding those who did participate and penalized for missed participants or
identifying irrelevant people.
·
SOCIAL NETWORK For scoring the correctness of the submitted social network files we
will compare the network with the ground truth network. We will count positively
the correct nodes and links, and negatively the missing or extraneous nodes,
and links. In the phone 1 question we can also see if the people’s names have
been correctly identified or not.
·
SHORT and DETAILED ANSWERS and THE DEBRIEF: A human
reader will identify which of the ground truth elements (e.g. activities,
groups, changes over time, etc.) have been identified in your debrief and
explanations. Those elements are
weighted by difficulty and points calculated for each elements which has been found
or missed. Additional suspicious elements identified that were not part of the
known ground truth are reviewed by analysts who participated in the creation
the datasets. Those additional elements judged
legitimate are added to the list, and extra points given for their discovery to
any other team that finds them.
2. Subjective assessment of the quality of the visualizations,
interactions and support for the analytical process.
This assessment will be based on your visualizations (shown in your
answers, process description and any video).
Note that during these assessments the judges will not be able to ask
you questions so the clarity of the
explanations you provide is critical. The judges cannot correctly assess
something they don’t understand.
The
debrief is also judged qualitatively by analysts who will review and evaluate
your process and your approach, whether it is scalable, whether it is
generalizable to other situations. And whether the explanations clear and
understandable.
For each Mini Challenge entry the questions will be measured
first on the correctness of the answers and the qualitative measures as
described above. In addition the interfaces and visualizations will be
subjectively scored as follows:
Subjective Scoring of the interfaces and visualizations:
Based on the written descriptions, screen captures and video
you provide and the insights you report being able to gather from those
displays, judges will be asked to give a subjective assessment rating based on
the following criteria:
Primary criteria (the basis for the main
score):
·
Utility of the interface components and
visualizations - based on the specific INSIGHTS reported in your descriptions.
·
Quality of the static representations -
based mostly on the visualizations (Meaningful layout, good use of color or icons, good labeling,
saliency of information, etc.)
·
Quality of the interaction - based on
the descriptions and the video
Secondary
criteria (i.e. criteria that are also very important and this
year will be used to award bonus points)
·
Scalability (i.e. are some aspects of the
analysis automated? Do results of the
automation seem understandable? Are
there mechanisms to guide your use of the visualizations?)
·
Versatility (i.e. variety of data types
which can be handled)
·
Handling of missing data and uncertainty
·
Support for collaboration
·
Learnability
(note that the clarity of the explanations will have a strong impact here)
·
Other features such as: History mechanism, ease of importing and
exporting data, innovative features in general, etc.
Questions? Send email to challengecommittee AT cs.umd.edu