Serving computational ecology from a digital library
Cynthia Sims Parr
1-301-405-7445
csparr@umd.edu
Roger Espinosa
IT
1-734-763-4677
roger@umich.edu
Philip Myers
Dept of EEOB,
1-734-647-2206
pmyers@umich.edu
ABSTRACT
We describe a case study using a digital library resource to assist ecological research that involves computational approaches. Our purpose is to detail the approach and demonstrate the power of combining encyclopedic content presentation with harvestable data. While acknowledging the advantages and generality of this approach, we also consider the challenges faced before digital libraries can adequately support research in this way.
Categories and Subject Descriptors
General Terms
Design, Standardization
Keywords
Biodiversity, ecology, digital library, encyclopedia
While efforts to design, implement, and populate digital libraries for education and literature access are well underway (e.g. NSDL and DLESE), effective use of them for scientific research is not yet common practice. Notable leaders are Unidata, e.g. THREDDS [1], and FishBase (http:///www.fishbase.org). Research taking a synthesizing approach typically involves manual coding of data from tables or text found via literature searches, or downloading and integrating data from multiple, specialized data archives. How can digital libraries improve the process of compiling data for these studies?
We describe a computational approach to ecological interaction analysis, and present results of integrating data from a digital encyclopedia and data archives to support this analysis. Deepening the contribution of digital libraries to such research will require thoughtful structuring and exposure of data to facilitate discovery, export, and integration.
When one organism regularly eats, parasitizes, or benefits another organism in its community, they have an ecological interaction. A food web is a well-known example of a collection of ecological interactions. We chose this domain for testing ideas on the use of digital libraries in scientific research for several reasons. This field has a number of well-established datasets, a history of synthetic studies, and recent theories that are amenable to computational approaches (reviewed in [2]). Also, food webs provide an example familiar to non-biologists.
Our
primary digital library resource in this case study is the Animal Diversity Web
(ADW) (http://www.animaldiversity.org). Initially designed for education by
zoologists at the
Our science goals are to 1) investigate general ecological interaction rules in known food webs, and 2) predict interactions in less-well-known food webs. We begin by combining large numbers of known food webs in a relational database, as described in more detail below. These data on “who eats whom” can come partly from data archives of the results of particular studies, but can also come from aggregated summaries in digital encyclopedias such as ADW. Interactors are identified where possible to the scientific name at the most appropriate taxonomic level. This allows data from different sources to be combined, using scientific names. Additional data tables with traits or attributes such as size, habitat preferences, reproductive characteristics, and nutritive requirements allow the construction of “trait-space” for each organism. Visualization tools, under development, will allow biologists to explore the data for patterns or to select subsets for analysis. Algorithms, to be discussed elsewhere, involve predictive modeling using trait-spaces and inferences across related organisms. Once parameterized by well-studied systems, these algorithms will generate testable hypotheses about unstudied systems.
Our approach requires large quantities of data to be brought together into a single analysis which should expand as new results are added to digital libraries. It does not rely on particular algorithms, but is essentially a blueprint for the workflow of data gathering, analysis, and predictions.
We obtained delimited ASCII or spreadsheets directly from researchers (Webs on the Web, EcoWEB) or from a public data archive (Interaction Web Database, http://www.nceas.ucsb.edu/ interactionweb/). Common name searches using ADW, TaxonTree [4], FishBase, ITIS (http://www.itis.usda.gov), and other online sources aided identifications of interactors to scientific name. These sources include both animal and non-animal interactors.
We also obtained delimited ASCII for the entire structured contents of the Animal Diversity Web. This included lists of animal predators and their prey (predator-prey links) in addition to quantitative data such as lifespan or size, as well as natural history keywords applying to each scientific name. These attribute data use a controlled vocabulary associated with an OWL ontology [3]. ADW’s controlled vocabulary structured the coding of non-standardized portions (such as location and habitat of food web site) from the other datasets.
ADW contributed over 30,000 attribute records (Table 1), representing the distillation of about 10,000 references, compiled by about 1400 authors. A comparison with specialized archive data shows the relative contribution of a digital encyclopedia to predator-prey interaction data (Table 2).
Table 1. Large
amounts of structured data can be downloaded from ADW. The 6 most populated, relevant categories are shown.
Attribute category |
# records |
Reproduction keywords |
9858 |
Habitat keywords |
5799 |
Physical
characteristics keywords |
4174 |
Behavior keywords |
4170 |
Food habits (e.g.
trophic levels) |
3000 |
Size |
1819 |
Table 2. ADW supplements data from 3 food web data
archives.
Source |
#webs |
#interactors |
#links |
ADW |
n/a |
1012 |
2869 |
Webs on the Web |
17 |
1537 |
6328 |
Interaction Web DB |
26 |
2177 |
9882 |
EcoWEB |
213 |
4064 |
6363 |
Our approach generalizes to most comparative studies using compiled data. An already aggregated resource such as ADW has disadvantages. One must trust the coding that others have done, which may be subject to hidden biases (though ADW’s authoring model should randomize errors). Coding schemes for such an all-purpose resource may not be as effective as a taxon-specific dataset (e.g. focusing only on birds), or one with coding tailored to answer a specific research question.
At the same time, there are many advantages to using a digital encyclopedia. Data are easier to explore before downloading. Compiling data is less time consuming because data are pre-aggregated according to a single standard. Fewer mappings of schema are required in order to integrate the data with other sources. Coding can be checked against accompanying text and references in the encyclopedic source. As digital library collections grow, analyses can be rerun with more data or with additional attributes. Importantly, digital encyclopedia data also serve education and outreach purposes.
Digital encyclopedias can never replace high-quality, specialized archives, but ADW can serve as a model for encyclopedic resources. Currently one cannot easily find nor retrieve the data we used via the National Science Digital Library, though ADW metadata is available there. We recommend that digital collections in general expose data to harvesting and discovery by indexing controlled vocabulary terms, not just the general metadata. Semantic web approaches to data discovery and integration, such as those pursued by the SPIRE project (http://spire.umbc.edu) are also promising for ecological research.
Our thanks to J. Cohen (EcoWEB), J. Dunne and N. Martinez (Webs on the Web), for providing food web data, to B. Fagan for discussions on “trait-spaces,” to B. Lee for help with the database schema, and to T. Jones and T. Dewey for helpful comments on the manuscript. Funding from NSF IDM/ITR 0219492 (PI Bederson) and IERI REC-0089283 (PI’s Songer and Myers).
[1] Domenico, B., Caron, J., Davis, E., Kambic, R., and Nativi, S. Thematic real-time environmental distributed data services (THREDDS): Incorporating interactive analysis tools into NSDL. Journal of Digital Information, 2, 4 (2002), No. 114.
[2] Pimm,
S. Food
webs.
[3] Parr,
C.S., Espinosa, R., Dewey, T.,
[4] Parr,
C.S., Lee, B.,