IEEE VAST 2008 Challenge

Judging Criteria

 

Last updated June 4th, 2008

(May still be slightly revised. Unlikely significant changes will be reflected in the history page)

 

 

Submissions will be reviewed by judges from the analytic community and judges from the visualization community.  Two teams of judges will review the submissions separately and merge their findings.

 

Both qualitative and quantitative metrics will be computed.  The quantitative metrics evaluate the correctness of the answers based on the “ground truth” embedded in the dataset.  The qualitative metrics are based on the perceived utility of the system including the visualizations and the analytic process.  These will be based on your descriptive explanations (short or detailed answers).  The section below details the scoring criteria. 

Scoring criteria

 

Judging will be based on:

1.       The correctness of answers to the questions and the evidence provided.  Participants will be given points for correct answers and penalized for incorrect answers. The correct answers will be based on the “ground truth” embedded in the data. 

For example

·         LIST OF PARTICIPANTS if you are asked for the participants in a certain activity, you will be given points for finding those who did participate and penalized for missed participants or identifying irrelevant people.   

·         SOCIAL NETWORK For scoring the correctness of the submitted social network files we will compare the network with the ground truth network. We will count positively the correct nodes and links, and negatively the missing or extraneous nodes, and links. In the phone 1 question we can also see if the people’s names have been correctly identified or not.

·         SHORT and DETAILED ANSWERS and THE DEBRIEF:   A human reader will identify which of the ground truth elements (e.g. activities, groups, changes over time, etc.) have been identified in your debrief and explanations.  Those elements are weighted by difficulty and points calculated for each elements which has been found or missed. Additional suspicious elements identified that were not part of the known ground truth are reviewed by analysts who participated in the creation the datasets.  Those additional elements judged legitimate are added to the list, and extra points given for their discovery to any other team that finds them.

 

                                                 

2.       Subjective assessment of the quality of the visualizations, interactions and support for the analytical process.  This assessment will be based on your visualizations (shown in your answers, process description and any video).  Note that during these assessments the judges will not be able to ask you questions so the clarity of the explanations you provide is critical. The judges cannot correctly assess something they don’t understand.

 

The debrief is also judged qualitatively by analysts who will review and evaluate your process and your approach, whether it is scalable, whether it is generalizable to other situations. And whether the explanations clear and understandable.  

 

 

For each Mini Challenge entry the questions will be measured first on the correctness of the answers and the qualitative measures as described above. In addition the interfaces and visualizations will be subjectively scored as follows:

 

 

Subjective Scoring of the interfaces and visualizations:

Based on the written descriptions, screen captures and video you provide and the insights you report being able to gather from those displays, judges will be asked to give a subjective assessment rating based on the following criteria:

 

Primary criteria (the basis for the main score): 

·         Utility of the interface components and visualizations - based on the specific INSIGHTS reported in your descriptions.

·         Quality of the static representations - based mostly on the visualizations (Meaningful layout, good use of color or icons, good labeling, saliency of information, etc.)

·         Quality of the interaction - based on the descriptions and the video

 

Secondary criteria (i.e. criteria that are also very important and this year will be used to award bonus points)

·         Scalability (i.e. are some aspects of the analysis automated?  Do results of the automation seem understandable?  Are there mechanisms to guide your use of the visualizations?)

·         Versatility (i.e. variety of data types which can be handled)

·         Handling of missing data and uncertainty

·         Support for collaboration

·         Learnability (note that the clarity of the explanations will have a strong impact here)

·         Other features such as: History mechanism, ease of importing and exporting data, innovative features in general, etc.

 

 

Questions?  Send email  to challengecommittee  AT  cs.umd.edu

Return to VAST 2008 Challenge