InfoVis 2004 Contest
History of Changes


Oct 19th, 2004

Known improvements to the dataset (available after the contest)




June 30th

-The submission website will be open all night July 1st until Maryland wakes up around 8am Friday on the US east coast - so donít worry about exact time on Thursday.Also, itís ok if you didnít register before June 21st, the important deadline is July 1st.If possible check email next week, in case we have problems reading what you submitted.

-Clarification: student teams usually only have student authors (and may be a mentor who didnít contribute too much to the ideas), typically itís a class project.In doubt, add something in the submission (i.e. at the beginning of the standard form) saying how you worked together, and who did what.


June 14

The submission page is now up (see the contest home page for all submission information).
We extended the deadline for the submission of final materials to July 1st (but we asked to please register your submission and submit what you have ready by the original deadline of June 21st so we can prepare this short review process.)You will be able to submit as many revisions as you want until the July 1st deadline.


May 21

The template for the standard form you have to use to submit your answers has been posted (see ďsubmission informationĒ in the main contest page). We also added more details about the video submission.



May 20

A beta version of the standard form was posted


Answered some questions:

†† - You should use the whole dataset for all questions, including question 1 (you can visualize papers and refs differently if you want!)

†† - You should not use more data than is provided in the contest dataset.For example we never received the 2003 papers from the Digital Library (they are still not posted as of today!) so you should NOT add 2003 yourself. But note that after the contest results are announced you will be given a chance to resubmit a revised version that could add more materials for the repository, and could include the 2003 data as an additional task (e.g. how does things change when you add the 2003 data?Ē)



May 5

The final dataset is posted.

The main dataset now includes:

- lots of extra abstracts (was 227, now 429 over 614 entries)

- lots of extra keywords (added or completed)

- lots of extra references (added or completed)

- some entries have been unified (were id..., are now acm...), but no entries were added.


Improvements to the main dataset are thanks to Kevin Stamper, Tzu-Wei Hsu, Dave McColgin, Chris Plaue, Jason Day, Bob Amar, Justin Godfrey, and Lee Inman Farabaugh (all at Georgia Tech), and to Howie Goodell (UML) and Niklas Elmqvist (Chalmers, Sweden)


In addition:

- a tabular version of the data is provided by the team at UBC (Jung-Rung Han, Chia-Ning Chiang and Tamara Munzner)

- a small file provided by Maylis Delest (Universite de Bordeaux). lists 4 autoreferences (really references to variants of the same content).

- Note that the list of duplicate names is still external to the dataset. (3 duplicates were added).


The dataset is not perfect and will probably never be.But we need to freeze it so that you all use the SAME data.

May 3
Answer to a question:
††††††† Q: ďWill I able to submit more than one figure to explain how my tool accomplishes the task?ĒA: Yes of course you are encouraged to show several figures in the submitted web form.Your recorded video demonstration will also help.

Mar 16
Clarified some of the tasks.

Someone asked:
Under "Tasks" section, #2, "Characterize the research areas (areas/topics to be defined by you)..."

is this "you" referring to the users who will be using the tool (i.e. allowing users to define, add topics), or is it referring to us (the designers) who will define research areas/topics when designing the tool?

 [our answer]it is referring to you the designers


Someone asked:
Under task 4, "Additional related items to build into the visualizations include uncertainty, reliability, range, flexibility, broader applicability", can you elaborate further on these terms? (i.e. uncertainty on what? reliability of what? etc.)

[our answer] There is interpretation here. For example you could handle missing values, ambiguous authors, and partial names and define an uncertainty measure that is then represented visually providing the viewer information about the data used for this visualization. You could also compute p-values for confidence and somehow represent these within your visualizations.

It is important to realize that the metadata is NOT COMPLETE which will lead to uncertainties. For example, you can get the author list either as provided or parse the provided reference string automatically but note that some names can be ambiguous (e.g., North could be several individuals).

Mar 15
We cleaned-up the dataset, mostly correcting many characters that were incorrect (accents, colons, etc.)
Jeff Klingner from Stanford sent us a file containing a list of equivalent author names and IDs.

Our dataset uses author IDs provided by the ACM Digital Library that contain errors: the same person can be referenced with different IDs.Jeff found many similar names, and also gave us a script written in Python to apply the name mapping to the dataset and unify the author IDs. Both files are now included in the dataset zip file.We haven't applied the script to our dataset but you might want to do that.Notice that this is an example of uncertainty in the dataset (i.e. we think those are the same authors but they may not).

Keep in mind that the dataset remains noisy. If you spot other names with several IDs, let us know so that we can update the duplicate author list.†† There might be other problems you can help us with.†† For example, authorsí names may also appear Ė and probably do -in the references to non ACM-DL articles (i.e. for which we do not have metadata).Please consider helping the contest by locating them and making the list available...

Mar 12
We realized that the registration script was broken and we had lost 2 weeks of registration information (between Feb 22 and March 12).Please re-register or you may not get email notices of updates. (An email was sent to the list on March 16, so if you didnít get it itís because your address is not in the list, or you typed it wrong)

Feb 18
We extended the deadlines for task revisions to March 10.Please send comments and suggestions (also made minor cosmetic changes in pages).

Feb 13

In Data and Task page we added a link to a new page describing the dataset format and examples (direct link).
We also describe how the dataset was built and how you can help, which might answer many of your questions (for example why the metadata is sometimes incomplete...)
We corrected the tasks by removing the task about panels (since we donít have data about panels yet!) We also gave example of what other tasks could be addressed if you help us add extra tags to the data.
The next things we will work on: add some of the references that are in the ACM DL but did not get included the first time around, probably because the DL was busy and did not finish the search.

Feb, 7
Cleaned-up some references, improved the <source> description by adding a "ref" attribute to uniquely identify proceedings/journals. For books, the source is not empty any more, it contains the ISBN number. More ACM articles referenced by the Infovis articles have been added.

Contact:; ;

Return to Dataset and Tasks
Return to InfoVis 2004 Contest