Search Interfaces for Biodiversity Informatics

ITR/IDM 0219492


Principal Investigator

Benjamin B. Bederson
Department of Computer Science
University of Maryland
Human-Computer Interaction Laboratory
A.V. Williams Bldg.
College Park
MD 20742
Phone: (301) 405-2764  Fax: (301)405-6707
bederson@cs.umd.edu   
http://www.cs.umd.edu/~bederson

Keywords

information visualization, interactive user interfaces, biodiversity, zooming, bioinformatics

Project Summary

This project explores technologies for visualizing complex datasets to assist information retrieval and understanding.   Our particular interest is in biodiversity information, which underlies most pressing environmental and conservation debates but is needed by users without significant content expertise.  This project combines information visualization techniques and rapid feedback dynamic query interfaces coupled with an aggressive approach of working with representative users from design through evaluation. Zoomable interfaces will allow users to navigate multiple hierarchies, in order to visually accommodate and understand highly interconnected data.

Publications and Products

  1. Parr, C.S., B. Lee, D. Campbell, and B. Bederson. 2004. Tree visualizations for taxonomies and phylogenies. Bioinformatics. Accepted.
  2. Lee, B., C.S. Parr,  D.  Campbell, and B. Bederson. 2004. How Users Interact With Biodiversity Information Using TaxonTree. In Proceedings of Advanced Visual Interfaces. ACM Press, pp. 320-327. 
  3. Parr, C.S., B. Lee, D. Campbell, and B. Bederson.  2003.  Classification Datasets.  Datasets made available for the IEEE Symposium on InformationVisualization InfoVis Contest 2003: Visualization and pairwise comparisons of trees. 
  4. Parr, C.S., B. Lee, D. Campbell, and B. Bederson. 2002.  Tree browsing: Visualizing biodiversity information , Biosciences Day poster presentation, University of Maryland. 

Project Impact

Discoveries at and across the frontier of science and engineering
Our findings extend our understanding of zooming and integrated searching and browsing as tools for information retrieval. We are adding to knowledge about the behavior of non-content experts and how they can be supported in exploring complex biological databases, even as they gain content-expertise. Our first applications, TaxonTree and DoubleTree, scale up to very large trees (up to 400,000 nodes) through use of a database backend. We contributed sample datasets for an Information Visualization Contest which generated other innovative solutions to the problem of comparing large trees. We are exploring and evaluating different ways to display node-link diagrams and node attributes.   This interdisciplinary work provides some of the first findings focused on front-end systems in biodiversity informatics. In particular it targets an expanded user community of non-experts. At the same time, expert biologists will benefit from the ability to visualize and interact with taxonomic and phylogenetic databases.

Connections between discoveries and their use in service to society
Supporting users across content-expertise levels is of vital importance to the global information economy. People in governments and schools and private industry rely on internet resources for decision-making and learning. Specifically, this project represents a new approach for visualizing and reducing biodiversity data complexity so that it can be successfully used across society.

A diverse, globally oriented workforce of scientists and engineers
Other than the PI, all project personnel including one Ph.D. student, two part-time research scientists, one undergraduate researcher, and seven undergraduate design partners have been women.

Improved achievement in mathematics and science skills needed by all Americans
Our tools are expected to support increased understanding of scientific databases and biodiversity data. In addition to its use in a core biology course at University of Maryland, TaxonTree is being adapted for use by the Animal Diversity Web, part of the BioKIDS project (NSF REC 0089283). BioKIDS’ inquiry-based biodiversity curriculum targets 5th and 6th graders in the Detroit Public School System.

Goals, Objectives and Targeted Activities

Our project goals are to:
1) Develop a searching interface for biodiversity databases targeting domain-novice adults.
2) Build interfaces combining "folk" and "scientific" understanding.
3) Evaluate the developed interfaces and compare them to existing interface models in the biodiversity domain.

Since initiating the project in September 2002 we have created one application and two prototypes towards the first two goals and have conducted one qualitative user study towards the third goal.  This second year was spent disseminating results and preparing for final refinement and evaluation of our first application and its design principles.  In addition we are expanding our scope and have begun developing datasets and tools for visualization of ecological interaction data.

We developed a new software application, TaxonTree by modifying an existing application, SpaceTree. TaxonTree allows users to browse and search a very large node-link diagram of animal names that we constructed by integrating data from a number of public and private sources. Names link to external web pages. TaxonTree uses zooming interactivity and integrated searching and browsing. Search results are presented in the larger biological context of their classification tree.  Towards the second goal, we developed a prototype called DoubleTree which couples navigation in the scientific biological classification in TaxonTree with a simpler, folk tree; another prototype supported multi-dimensional natural history data exploration. 

Our qualitative user study of TaxonTree in an undergraduate course is the first to describe usage patterns in the biodiversity domain. We found that interaction with an animated, zoomable node-link diagram aided users' understanding of the data. Most users approached biodiversity data by browsing, using common names and general knowledge rather than the scientific keyword expertise necessary to search using traditional interfaces. Users with different levels of interest in the domain had different interaction preferences -- results suggest that users with higher interest levels (usually female) prefer greater control over node opening. Performance of TaxonTree and DoubleTree on large datasets was quite good, with basic browsing and querying tasks requiring from 62 ms to 2547 ms. This is because our approach is to show only the subset of data of immediate interest to the user, while retaining the ability for users to browse to obtain nearby detail.  Our work demonstrates trade-offs inherent in displaying phylogenetic vs. classification trees and shows that a combined approach is not only feasible but usable. Coupling a folk tree with a large scientific tree shows promise but a more effective way to illustrate one to many mappings is needed. We can now refine the tasks and metrics to allow comparative studies to accomplish goal 3 for TaxonTree and future applications.

Area Background

This research addresses the general problem of diverse users and complex information sources via visualization.  In the same way that bioinformatics has revolutionized the fields of molecular biology and biophysics, biodiversity informatics is at the threshold of providing data and tools to allow the next generation to discover and understand global patterns and processes governing the diversity of life.  Much biodiversity information is already available on the Internet (Bisby, 2000), where keyword searches remain the predominant method of access (Cockburn & McKenzie, 2000). In the biodiversity domain, the efficiency of single word searches is constrained because inherently complex biological data are stored in a controlled language that is not necessarily understood by domain novices.  Users may be professionals such as taxonomists and conservation biologists, or they may be domain novices, such as students or educated professionals of other fields such as land-use planners or lawyers (Maier et al., 2000).

Area References

  1. Bisby, F.A. (2000). The quiet revolution: biodiversity informatics and the Internet. Science 289:2309-2312.
  2. Cockburn, A., & McKenzie, B. (2000). What Do Web Users Do? An Empirical Analysis of Web Use. International Journal of Human-Computer Studies, 54(6), pp. 903-922, Academic Press.
  3. Maier, D., E. Landis, J. Cushing, A. Frondorf, A. Silberschatz, M. Frame, and J.L. Schnase, eds. (2000). Research directions in biodiversity and ecosystem informatics. Report of an NSF, USGS, NASA Goddard Space Flight Center, June 22-23, 2000. Available http://bio.gsfc.nasa.gov/ (accessed 30 January 2002).

Potential Related Projects

Related projects include visualizing trees and other information for content experts (e.g. Guimbetrere and Munzner's TreeJuxtaposer, Maddison et al.'s Tree of Life).  IDM projects involving visualization of knowledge maps/ontologies may be related (e.g. Chen's
Intelligent Patent Analysis and Visualization, Garg's
Information Visualization Research and Education
).  In addition,
digital library querying systems aimed at diverse audiences
face similar challenges (e.g. Druin et al.'s International Children's Digital Library).

Project Websites

http://www.cs.umd.edu/hcil/biodiversity/    The HCIL Biodiversity Informatics Visualization project page provides a synopsis of the project and offers a free download of all products for non-commercial use. 

Illustrations

http://www.cs.umd.edu/hcil/biodiversity/suppfig.shtml  This screenshot of DoubleTree shows differences in tree topology between a folk tree and a composite scientific tree.  The folk tree is a reduced dataset of organisms found on southeastern Michigan, as grouped by a non-biologist.  Interaction with either tree results in corresponding nodes opening and closing in the other tree.   Browsing and searching of “Earthworms and slugs” group in the folk tree maps onto two distant groups in the scientific tree (circled).

http://www.cs.umd.edu/hcil/biodiversity/tt_overview.jpg This screenshot of TaxonTree shows an overview of a hierarchical biological classification with tooltip magnification of a node.   Triangles on nodes indicate more information. External links further information are shown as dots.  Synapomorphies (characters that diagnose a node) are shown on links between nodes.

Online Software

All software is available for download or use at http://www.cs.umd.edu/hcil/biodiversity .  DoubleTree version 0.7 was released December 2003.   TaxonTree DB version 1.2 was released September 16, 2003.

Online Data

http://www.cs.umd.edu/hcil/iv03contest/datasets/classif_A_03-04-16.xml   http://www.cs.umd.edu/hcil/iv03contest/datasets/classif_A_03-04-16.xml
These datasets (40 MB each) were prepared for use in the InfoVis 2003 contest which is focusing on techniques for visualizing and comparing trees.  A full description of them can be found on the Classification Datasets page.