visweek

 

Data Samples for the VAST Challenge 2010
(Posted March 14)

Data provided in these challenges are synthetic and should not be construed as involving any real people or events.  There may be similarities to real events you may have heard of, but the information you need to puzzle out the challenges are contained in the dataset provided.

The VAST 2010 Challenge features a rather strange tale in the general topic areas of arms dealing and public health.  You will be given three datasets each representing a mini-challenge, which can be integrated to form an overall picture of what is transpiring with a certain set of people of interest. 

Mini-Challenge 1:

The first dataset focuses on some suspected arms deals that have multi-national implications.  You will be provided several different types of text reports from “US Government Intelligence Sources”, newspaper reports, web sites and blogs, and so on.  The following is an example of the data that will be provided in this mini-challenge:

 

10. US GOVERNMENT TELEPHONE INTERCEPT: Translated from Russian, from an Internet Café in Moscow to a pre-paid cell phone in Yemen, 4 February 2009.

 

The caller says, “Your brother is dead, but I will be like a brother to you.” The person receiving the call says in Russian, but with an Arabic accent, “I am truly blessed for your friendship.  I will need your support to continue to build the family farm. Shall we plan a family reunion?”  The caller says, “Soon, my brother, very soon.”

 

Mini-Challenge 2:

The second dataset looks at tracking and characterizing the outbreak of a disease in disparate locations.  Health officials have pooled their data for these locations in the hopes of learning more about the disease and its causes.  We will provide hospital admittance and records of death for various cities, and participants will be asked to generate visualizations of the course of the disease over the time span of the data.  The following is an example of the data that will be provided:

Hospital Admittance:

USER_WARNING

DATE

GENDER

AGE

SYNDROME

ID

SYNTHETIC_DATA

4/15/2005

M

53

LEFT KNEE PAIN

1

SYNTHETIC_DATA

4/15/2005

F

55

HEAD ACHE

2

SYNTHETIC_DATA

4/15/2005

F

57

NAUSEA, VOMITING

3

SYNTHETIC_DATA

4/15/2005

F

40

R ANKLE INJ

4

SYNTHETIC_DATA

4/15/2005

F

55

STOMACH CRAMPS

5

SYNTHETIC_DATA

4/15/2005

F

81

TREMORS

6

SYNTHETIC_DATA

4/15/2005

F

45

LOWER ABDOMINAL PAIN

7

SYNTHETIC_DATA

4/15/2005

M

28

SKIN RASHABCESS

8

SYNTHETIC_DATA

4/15/2005

M

38

LEFT KNEE PAIN

9

SYNTHETIC_DATA

4/15/2005

F

40

LACERATION TO FOREHEAD

10

 

Patient Death Records:

USER_WARNING

DATE

ID

SYNTHETIC_DATA

4/15/2005

2

SYNTHETIC_DATA

4/15/2005

3

SYNTHETIC_DATA

4/15/2005

23

SYNTHETIC_DATA

4/15/2005

32

SYNTHETIC_DATA

4/15/2005

101

 

Mini-Challenge 3:

This challenge focuses on medical information collected from some hospital patients that is relevant to the overall scenario.  For this mini-challenge, you will be asked to analyze some genetic data and provide visualizations of sequence variations and their evolution.

The data consists of multiple text files containing genetic sequences collected from viral mutants present in human blood samples.

Each sequence is composed of hundreds to several thousands of bases.  Each base is encoded as A, T, C, or G.

A genetic sequence file looks like this:

>Sequence24

ATGGATTCCAACACTGTGTCAAGTTTCCAGGACATACTATTGAGGATGTCAAAAATGCAATTGGGGTCCTATGGTTCATACATGTTGGAAAGGGAACTGGTCCGCAAAACCAGATTCCTACCGGTAGCAGGCGGAACAAGCAGTGTGTACATTGAGGTATTGCATCTGACTCAAGGGACCTGCTGGGAACAGATGTACACTCCAGGCGGCATCGGAGGGCTTGAATGGAATGATAACACAGTTCGAGTCTCTAAAAATCTACAGAGATTCGCTTGGTAA

 

>Sequence32

ATGAGTAATGAGAATGGGGGACCTCCACTTACTCCAAAACAGAAACGGAAAATGGCGAGAACAGCTAGGTGTATCAGCGGATCCACTGGCATCACTGCTGGAGATGTGTCACAGCACACAAATCGGTGGGATAAGGATGGTGGACATCCTTAGGCAAAATCCAACTGAGGAACAAGCTGTGGATATATGCAAAGCAGCAATGGGTCTGAGCAAAAGTTTGAAGAGATAAGATGGCTGATTGAAGAAGTGAGACACAGACTAAGAACAACTGAGAATAGTTTTGAGCAAATAACATTCATGCTAG

 

Web Accessibility