Several authors have continued improving the dataset provided for the InfoVis 2004 contest. We list them here, hoping they can be useful to you. We hope that someone will volunteer to merge all the imporvements and generate a newer version of the dataset we could post here.
Cleaning of source names (314 names in the provided dataset, 106 unique after data cleaning)
Cleaning keywords (from 1859 to 1753)
Identifying authors (from 1161 to 1036)
Complete cleaned database available at: http://iv.slis.indiana.edu/ref/iv04contest/iv04-contest.mdb
Provide extra information on the articles of the benchmark
dataset, such a session information, affiliation of authors,
location of authors, whether the article contains a user
study and the email of the authors.
dataset-improved-microsoft-8-2-04.zip