Project Reporting ANNUAL REPORT FOR AWARD # 0219492

Benjamin B Bederson ; U of MD College Park
Search Interfaces for Biodiversity Informatics

Participant Individuals:
Senior personnel(s) : Cynthia S Parr
Graduate student(s) : Bongshin Lee
Senior personnel(s) : Dana L Campbell
Undergraduate student(s) : Liz Noppinger; Marie Pitts; Lauren Culler; Sara Lustusky; Elizabeth Thammasuvimol
Graduate student(s) : Alessandra Rung
Undergraduate student(s) : Valerie Harrell; Stephanie Little; Svetlana Yarosh
Senior personnel(s) : Catherine Plaisant

 

Partner Organizations:
University of Michigan Ann Arbor: In-kind Support; Collaborative Research

Some equipment (e.g. PC printer) owned by University of Michigan was
used by C. Parr in pursuit of work relevant to both this project and
BioKIDS (Interagency Education Research Initiative REC-0089283). 
Personnel on both projects advised the efforts of the other,
programmers communicate to ensure compatibility across projects.

University of Maryland Baltimore County: Collaborative Research


Rensselaer Polytechnic Institute: In-kind Support; Collaborative Research
In order to collaborate on user studies of Graph Visualization
software, personnel from the SAGE project at RPI provided and
installed an eyetracking system in our facilities and will assist in
designing data collection and analyzing results.

Other collaborators:

Jeffrey Jenson, University of Maryland College of Life Sciences,
instructor of the college course for which we designed software and
conducted user testing.  He provided content, advice, and logistical
support, and administered user surveys.

William Fagan, University of Maryland College of Life Sciences is
providing content and advice related to food web visualizations.

Wayne Gray, Rensselaer Polytechnic Inst. and other members of the SAGE
project are collaborating on cognitive aspects of the graph
visualization  user study.

We are collaborating with Tim Finin, University of Maryland Baltimore
County, and other members of the SPIRE project on semantic web
representations of ecological interaction data and natural history
information.

Kevin Omland, University of Maryland Baltimore County.  We have begun
investigating the role of visualizations in 'tree-thinking' for
evolutionary biology.

Activities and findings:

Research and Education Activities: 
We continued our studies of tree visualization techniques through further development and testing of our software applications TaxonTree and DoubleTree. TaxonTree allows users to browse and search a very large tree of almost 200,000 animal names that we constructed by integrating data from a number of public and private sources. Names link to external web pages on three different, publicly available websites. TaxonTree uses zooming interactivity, integrated searching and browsing, and dynamic queries. In particular, search results are presented in a larger biological context. We used zoomable user interface toolkits, Jazz and Piccolo, developed by our project personnel, as well as Java, Microsoft Access, and MySQL. We published results from a qualitative user study of TaxonTree conducted near the end of our first year. Through interviews and specific tasks we characterized information-seeking behavior and interaction preferences in the biodiversity domain. We developed DoubleTree, an extension of TaxonTree that allows comparison of two trees using coupled interaction. We implemented our prototype with various datasets. First, we compared the original TaxonTree dataset of 200,000 names with a widely used classification from ITIS. Then, we constructed a 'folk biology' tree using a small set of animal names from the BioKIDS project in southeastern Michigan, manually coupled this to the original large TaxonTree scientific tree, and demonstrated how a 'folk' tree could help non-expert users navigate a scientific tree. In our third year we began collaborating with Kevin Omland at UMBC to further investigate the role of tree visualizations in 'tree-thinking' for evolutionary biology. We ported TaxonTree to MySQL and Java Web Start so that it could be deployed on the web, and compared its querying and browsing performance to stand-alone versions and DoubleTree. We presented our results at two conferences: Advanced Visual Interfaces and the Ecological Society of America. We have completed integrating TaxonTree with the existing high-traffic website, University of Michigan's Animal Diversity Web (ADW) (http://www.animaldiversity.org); it is functional now for Windows users and will soon be available for Mac OSX users. This fall we will conduct two quantitative studies comparing effectiveness of our interface with the more traditional interface already available on ADW. One study will involve volunteers in a controlled laboratory setting, and one will involve surveys and site log analysis of usage by the worldwide general public. When complete this Fall, these studies will fulfill our original project objectives. We have engaged in three activities beyond our original goals. First, we are extending our exploration beyond simple hierarchical data. We have compiled ecological interaction datasets and have implemented a new tool, TreePlus (Figures 1 and 2, attached pdf), that employ similar and new techniques to allow users to interact with graph data. We have also adapted an existing tool, PaperLens, to serve as a database visualization platform, EcoLens, for the over 250 food web datasets. Qualitative assessment of EcoLens will be conducted by asking food web researchers to evaluate it using their own data. We will soon conduct a quantitative user study to determine for which tasks and graph densities TreePlus, which uses an incremental tree-layout approach, outperforms a more traditioal spring-embedded graph layout. Second, we have begun exploring ontologies, in terms of applying visual interactive tools for them and in building them in collaboration with the Animal Diversity Web. A preview of this work was presented at the CODATA 2005 conference in Berlin. We are exploring using ontologies in collaboration with the Semantic Prototypes in Research Ecoinformatics project (SPIRE) at UMBC, providing data and expertise towards developing a semantically-rich predictive modeling framework. EcoLens and TreePlus will probably be used by SPIRE personnel. Animal Diversity Web will be one of the first large databases available on the semantic web as a case study and data source for biologists. Third, we organized a full-day workshop entitled 'HCI in Biodiversity Informatics' http://www.cs.umd.edu/hcil/biodiversity/workshop/. Largely supported by the National Biological Information Infrastructure, it was held on June 2, 2005 in association with the University of Maryland Human-Computer Interaction Laboratory's 22nd Annual Symposium and Open House. Twenty-five participants from academia, industry, and government gathered to hear 8 invited speakers, join 3 invited panelists for discussion, and engage in hands-on software demonstrations. Participants decided to prepare a white paper covering the state of the field and suggestions for future work to be submitted to a journal.

Findings:
Our qualitative user study of TaxonTree in an undergraduate course is the first to describe usage patterns in the biodiversity domain. We found that interaction with an animated, zoomable node-link diagram aided users' understanding of the data. Most users approached biodiversity data by browsing, using common names and general knowledge rather than the scientific keyword expertise necessary to search using traditional interfaces. Users with different levels of interest in the domain had different interaction preferences -- results suggest that users with higher interest levels (usually female) prefer greater control over node opening. Performance of TaxonTree and DoubleTree on large datasets was quite good, with basic browsing and querying tasks requiring from 62 ms to 2547 ms. This is because our approach is to show only the subset of data of immediate interest to the user, while retaining the ability for users to browse to obtain nearby detail. Our work demonstrates trade-offs inherent in displaying phylogenetic vs. classification trees and shows that a combined approach is not only feasible but usable. Coupling a folk tree with a large scientific tree shows promise but a more effective way to illustrate one to many mappings is needed. Preliminary, unpublished results suggest that TaxonTree's tree-drawing style does not result in significantly more misconceptions when compared to more traditional cladogram styles. In fact, subjects naive to evolutionary biology often misinterpret both kinds of trees and further research will be necessary to explore the role of visualizations in fostering 'tree-thinking.'

Training and Development:
In addition to teams of undergraduate design partners we worked with in our early years, we continued to work with a computer science undergraduate, Svetlana Yarosh. She gained database implementation skills as well as experience implementing zoomable interfaces, and designed a user study. Our graduate student, Bongshin Lee, developed and defended her doctoral dissertation proposal last year and gained expertise in graph visualization implementation this year. Our project biologist, Cynthia Parr, continued to learn semantic web technologies and community ecology theory. Together with Bill Fagan, she led a seminar course for ten biology graduate students on ecological informatics http://hcil.cs.umd.edu/wiki/tiki-index.php?page=ecoinformatics+course. Students participated in discussing recent literature and technology concepts, tested software, and conducted independent research projects in ecology.

Outreach Activities:
By collaborating with a college course for two years we exposed nearly 200 undergraduates to a novel use of technology in the classroom. Our technology is freely available to the general public on our website, and is now available on the high-traffic outreach and education site, Animal Diversity Web. We demonstrated the technology to visitors at several Human-Computer Interaction Laboratory open houses. The CIPRes project evaluated our software with high school teachers to determine its suitability for future outreach related to the tree of life.

Journal Publications:
B. Lee, C.S. Parr, D. Campbell, and B. Bederson, "How Users Interact with Biodiversity Information Using TaxonTree.", Proceedings of Advanced Visual Interfaces (Gallipolli, Italy), vol. , (2004), p. 320. Published
Parr, C.S., B. Lee, D. Campbell, and B. Bederson, "Tree visualizations for taxonomies and phylogenies.", Bioinformatics, vol. 20, (2004), p. 2997. Published
Parr, C.S., R. Espinosa, T. Dewey, G. Hammond, P. Myers , "Building a biodiversity content management system for research, education, and outreach", Data Science Journal, vol. 4, (2005), p. 1. Published
Parr, Cynthia S. and Cummings, Michael, J., "Data sharing in ecology and evolution", Trends in Ecology and Evolutio, vol. 20, (2005), p. 362. Published
Lee, B., Parr, C.S., Plaisant, C., Bederson, B.B., "Visualizing directed networks with enhanced tree layouts: can interaction tame complexity?", 13th Annual Symposium on Graph Drawing, vol. , (), p. . Submitted

Book(s) of other one-time publications(s):

Other Specific Products:

Data or databases
Two XML formatted biological classification trees to be used for
benchmark testing of information visualization tools
Parr, C.S., B. Lee, D. Campbell, and B. Bederson.  'Classification
Datasets,' Datasets made available for the IEEE Symposium on
InformationVisualization InfoVis Contest 2003: Visualization and
pairwise comparisons of trees.  (2003) Available at:
http://www.cs.umd.edu/hcil/iv03contest/.
Software (or netware)
TaxonTree version 1.3 is now available to Windows OS users by choosing
"Find in TaxonTree"  from most Animal Diversity Web pages.  Names now
link to different kinds of resources available at that website. A new
feature, "Load List", makes it possible to see how any given list of
taxonomic names is grouped taxonomically.

Version 1.2 is still available from our project web page.
Originally designed for teaching of college-level animal diversity
courses this is an interactive, searchable tree of almost 200,000
scientific names.  Many names link to web pages created by others
outside our project and some names have synapomorphies diagnosing
groups. Users can choose to work online, using any web browser and
Java Web Start, or can work offline using a stand-alone version.
Version 1.2 Available free to all users at
http://www.cs.umd.edu/hcil/biodiversity/

Version 1.3 available from most pages at
http://animaldiversity.ummz.umich.edu.
Software (or netware)
DoubleTree is an application for comparing two trees using coupled
interaction.  Download include several datasets, including two large
scientific trees of animal names, and one smaller "folk" tree of
animals in southeastern Michigan.
Available at http:www.cs.umd.edu/hcil/biodiversity
Software (or netware)
TreePlus: a tree-layout approach to graph visualization using
animation, zooming, and incremental exploration.
Will be downloadable from website
http://www.cs.umd.edu/hcil/biodiversity.  Video demonstrations are
available there.
Software (or netware)
EcoLens: a database visualization tool for ecological interactions
Will be made available on http://www.cs.umd.edu/hcil/biodiversity
Data or databases
ADW Ontology: an ontology for animal natural history
Freely available at Open Biological Ontologies,
http://obo.sourceforge.net
Data or databases
EcoLens data: a collection of ecological interaction datasets taken
from public or private sources and modified for use in ecological
predictive modeling
Selected data available upon request of the researcher.  Some data is
not publicly available and/or should not be redistributed by us.

Internet Dissemination:

http://www.cs.umd.edu/hcil/biodiversity

This page provides a synopsis of the project and offers online usage
or free downloads of all of our products for non-commercial use.

Contributions:

Contributions within Discipline:

 Our work in the third year consisted of extensive programming to
implement graph and database visualization tools drawing upon our
previous results, and integration of our previously developed tool
with a widely used online database. 

Generally, our findings extend the understanding of zooming and
integrated searching and browsing and incremental exploration.  Our
emphasis on content understanding, and not just ease of use, shows
promise for contributing to efforts to innovate in this arena.  We are
adding to knowledge about the behavior of non-content experts and how
they can be supported in exploring complex biological databases.  We
have also begun to better characterize changes in user preferences as
they gain content expertise.  We have illustrated the value of coupled
interaction in comparing two large trees or in allowing one folk tree
to foster exploration of a large scientific tree.  We have developed
new ways to implement incremental exploration of graphs using a tree
layout, adjacent node previews, and path previews.  Our work on
EcoLens has identified directions for generalizing coupled views to
robustly support two-element schema.

Contributions to Other Disciplines:
 Most current work in biodiversity informatics emphasizes back-end
issues such as metadata standards and interoperability and distributed
computing. Our work provides some of the first findings focused on
front-end systems, and in particular in targeting an expanded user
community of non-experts. It could be argued that directing energy
towards user needs and experiences will drive further innovation in
back-end systems, as new users provide feedback on the kinds and
sources of biodiversity information resources available to them. 

Our biologist collaborators at the Animal Diversity Web already have
begun using TaxonTree to proofread their taxonomic database.  Other
projects such as CIPRes and SEEK are evaluating our tools so that they
may modify them for taxonomic concept displays for experts and for the
general public.  A future Assembling the Tree of Life project will use
a TaxonTree-like interface for data sharing .

We have provided an innovative application for education.

Finally, we provide further evidence of lay user cognition,
contributing to the field of cognitive anthropology, or
'folkbiology.'

Contributions to Education and Human Resources:
 We have previously exposed undergraduate design partners to research,
particularly in an exciting new field, thus modeling potential career
paths.  All of these design partners were women. This project provides
research opportunities for our project biologist in a way that
substantially expands her technical skillsets.   Finally, computer
science students (undergraduate and graduate) are further developing
their skills in working in multidisciplinary teams.

Contributions to Resources for Research and Education:
 In addition to producing TaxonTree as a resource for teaching, we have
provided content resources for three other projects.

First, our integrated taxonomic content is now being used by the
University of Michigan's Animal Diversity Web
(http://www.animaldiversity.org).   

Second, we also developed sample datasets for an Information
Visualization Contest (http://www.cs.umd.edu/hcil/iv03contest/),
expected to generate innovative solutions to the problem of comparing
large trees.

Third, the ADW natural history ontology and augmented composite
ecological interaction database are being used by the SPIRE project at
University of Maryland Baltimore County.