INFOVIS 2006 CONTEST
DESCRIPTION OF THE DATA AND THE TASKS
The contest data set consists of 1% of the results of the U.S. Census from 2000 (details below). The data is 180 MB (zipped), so we are also giving you the choice to pick a smaller sample (subset) based on the formulation of the questions that we will be asking and per your choice.
DATA – COMPLETE SET AND SUBSET
The U.S. Census is a broad demographic survey of the people of the United States, conducted once every ten years. It is meant to be an accurate picture of the state of the country and is used for a variety of political, economic, and social decisions. Because surveyors attempt to ask questions for every household, the resulting data set is huge and fraught with privacy concerns; therefore we are using a 1% sample, known as the PUMS
(Public Use Microdata Sample = PUMS 1%) You can read the detailed documentation at the US Census 2000 web page: http://www.census.gov/prod/cen2000/doc/pums.pdf. Chapter 6 of the link describes the attributes of PUMS 1% file.
The data sets contain first a Housing Unit record, followed by a number of Person Records, for the surveyed people living at each of those housing units. It is possible that there are no housing units present without person records (if no one was surveyed). The highest geographical unit in the files is the level of a State, which is a zipped file.
Although we encourage the contestants to attempt to enter using the complete U.S. Census PUMS data set, we also support the contestants to select a subset (sample) of one or more geographic regions of the complete data set. For example, California, city of Los Angeles, single Metropolitan Area, etc. would all be valid subsets of the data set, with which you could answer the questions of the contest.
AREAS OF FOCUS
Creating a general tool to explore the census data is well beyond the scope of this contest! Instead, we ask that contestants focus their efforts on at least one of the following three areas.
1. Nationalities & Languages
Create a visualization that lets users understand where various nationalities, ethnicities, and linguistic/cultural backgrounds tend to concentrate. Showing the geographic concentration of just one ethnicity is interesting; but showing several at once is even more interesting! Note: Housing unit records have info on household language; Person records have several variables that can potentially help capture the answer: Asian recode, Black or African American recode, Hispanic or Latino origin, Citizenship status, just to name a few.
Where do different professions tend to concentrate? Which professions are found in the same places? Professions have a natural hierarchy (see Industry codes) which makes this a slightly different visualization task than focus area #1. Note: Person records have the Industry (Census) and Industry (NAICS) variables present, which could help identify the hierarchical levels of information.
3. Small space visualization
Assume that you only have a color iPod (16-bit color) available for your visualization. Use its 220 x 176 pixel space to display a visualization that provides the most comprehensive information about one or more aspects of the data set of your choice.
We will be looking, above all, for a combination of creativity and utility--winning entries must have ideas that are new, interesting, and helpful. (In other words, if you submit a program that is nothing but an interactive zooming choropleth map linked to sliders, scatterplots and parallel coordinates displays, you will definitely not win!) Beyond interesting new ideas, we will look for excellence in implantation and usability.
If the entry focuses on more than one of the above areas, each of the parts will be scored. However, a very creative and interesting entry in one focus area might win a higher score than another entry with less helpful or interesting solutions for two focus areas.
Augmenting the data
There are many data sets that are available on the web that can be utilized to augment the census data. Please feel free to use any publicly available data of your choice to answer in your entry. If you do so, make sure to identify the source of the supplemental data set and describe your utilization of it.
Academic – the authors are faculty & students (not student first entry)
Industry – at least the first author is coming from the industry
Student – an entry where the first author is the student, with no more than two faculty advisors
There will likely be a monetary prize for the top student entry.
Results of the competition will be published on the Conference Web site and on the Conference DVD. Winning authors will also be invited to present their work during the conference.