Attendees
Stephen M. Mount, University of Maryland, smount@wam.umd.edu
Abstracts
TimeSearcher:
Interactive Querying for Identification of Patterns in Genetic Data
Harry Hochheiser and Ben Shneiderman
University of Maryland, Department of
Computer Science
PowerPoint slides
Microarray experiments are often used to examine changes in gene
expression over time. Generally, these data sets are analyzed using
clusters, self-organizing maps, heat maps, and other standard microarray
analysis tools. TimeSearcher is a general purpose tool for
exploration and pattern identification in time series
data. TimeSearcher is based on the use of timeboxes -
rectangular, direct-manipulation queries - to support interactive
exploration via dynamic queries (100ms response time). TimeSearcher
also provides overviews of query results and drag-and-drop
support for query-by-example. The use of TimeSearcher for analysis of
microarray time series will be discussed, along with other potential
applications of TimeSearcher to bioinformatics problems.
http://www.cs.umd.edu/hcil/timesearcher
Hierarchical Clustering Explorer - Understanding Hierarchical
Clustering Results by Interactive Exploration of Dendrograms: A Case
Study with Genomic Microarray
Jinwook Seo and Ben Shneiderman
University of Maryland, Department of
Computer Science
PowerPoint slides
Hierarchical clustering is widely used to find patterns in
multi-dimensional datasets, especially for genomic microarray data.
Finding groups of genes with similar expression patterns can lead to
better understanding of the functions of genes. Current visualization
tools for hierarchical clustering that provide static outputs on
screens or even large printouts can be improved by adding interactive
exploration tools. HCE (Hierarchical Clustering Explorer) is a
visualization tool that integrates four general techniques that could
be used in interactive explorations of hierarchical clustering
results: (1) overview of the entire dataset, coupled with a detail
view so that high-level patterns and hot spots can be easily found and
examined, (2) dynamic query controls so that users can restrict the
number of clusters they view at a time and show those clusters more
clearly, (3) coordinated displays: the overview mosaic has a
bi-directional link to 2-dimensional scattergrams, (4) cluster
comparisons to allow researchers to see how different clustering
algorithms group the genes. In this talk, I’ll discuss how HCE can
be used for the clustering results of microarray data, together with
some other issues in multi-dimensional data visualization.
http://www.cs.umd.edu/hcil/multi-cluster
Yidong Chen
NHGRI
PDF slides
With the advance of microarray technologies, biologists are currently
capable of observing the abundance of transcripts from tens of
thousands genes in biological samples, enabling the exploration of the
dynamics of transcription and interaction between genes on a
genome-wide scale. With the accumulation of gene expression datasets,
the challenging task of all microarray experiments is how to extract
meaningful and trustworthy information out of thousands of genes that
do not contribute in the designed experiments. To achieve this goal,
many rigorous mathematical tools and computational software were
introduced to the field, such as statistical techniques for data
normalization, clustering algorithms, class prediction methods, ANOVA,
and gene-gene interaction studies. Realizing that many of gene
expression experiments collect relatively small number of samples from
patients, cell-lines, or other biological samples, rendering some of
popular statistical tools meaningless, the development of data
visualization techniques is crucial in the earlier stage of microarray
experiment design. To assist biologists to efficiently organize, and
therefore, understand the properties of their dataset, we introduced
and implemented the multidimensional scaling (MDS) technique to
provide direct appreciation of the clustering outcome, various
clustering techniques for data organizing and pattern finding purpose,
techniques for visualizing gene-gene interaction via coefficient of
determination (CoD), and many others. In this presentation, we will
use one of the gene expression profile studies of melanoma cancer
samples in our lab to illustrate, step by step, the visualization
tasks required in the lab, and many tools available at NHGRI
microarray data analysis web site.
Discovering Functional
Similarity of Genes by Mining in Visualizations of Gene Profiles
Tanveer Syeda-Mahmood
IBM
Almaden Research Center
Traditionally, visualization techniques
have been used to illustrate the results of mining. Visualization scientists,
on the other hand, have recognized that often the visualization itself
can be a good source of mining for further information. Automatic
tools to mine such visualized representations, however, are lacking. In
this talk, I will present a method for simultaneously discovering similarities
between multiple time-varying profiles that operates directly on
the combined multi-dimensional visualization of such profiles. Specifically,
scale-space analysis is used to identify salient curvature changes in multi-dimensional
curves forming the basis of similarity between time profiles.
An application of this technique for discovering functional similarities
in genes will be discussed.
Interactive
Graphical Display of Protein Structures
Amitabh Varshney
University of Maryland, Department of
Computer Science
The recent successes in the human genome sequencing have taken us a
step closer to the goal of designing novel therapeutic drugs. We are
working on developing visual computing tools and technologies that
will give scientists deeper insights in understanding the
relationships between form and function in various biological
proteins. The smooth, solvent-accessible molecular surface is useful
for studying the structure and interactions of proteins, especially
for testing the accessibility of a solvent in a molecule; for
prediction of three-dimensional structures of biological
macromolecules and assemblies; and for evaluating different docking
conformations of molecules which can be used in drug design. I shall
discuss a fast and efficient parallel algorithm for interactive
computation of solvent-accessible smooth molecular surfaces. I shall
also discuss some of our recent approaches to study surface
complementarity and efficient algorithms for computing and visualizing
molecular electrostatics.
Using Self-Similar Geometry
to Represent Letter-Sequence-Indexed Statistics
With Application to Nucleotide and Peptide Docking
Daniel B. Carr
George Mason
University
The paper addresses the challenge of representing
statistics indexed by sequences of letters. Letters of a sequence represent
nucleotides or amino acids in the motivating applications. The number of
letter combinations grows exponentially with sequence length.The challenge
is to develop representations for the space of possibilities that are cognitively
accessible and that convey scientific relationships. The approach described
in this paper develops coordinate systems based on simple geometric structures:
tetrahedrons in the case of 4 nucleotides and icosahedron face centers
in the case of 20 amino acids. The paper demonstrates two self-similar
coordinate generating mechanisms that help to provide cognitive accessibility:
self-similarity at the same scale and at different scales. The coordinate
systems directly represent short sequences of say 6 nucleotides or 3 amino
acids and extend to longer sequences by connecting points.Layout variations
modify the representations to produce simpler appearance and concentrate
sequences with similar statistics. New visualization software also handles
the representation of features in two-, three- and four-dimensional margin
tables and provides dynamic options such as filtering.
Numbers, Images
and Geometry: Using Visualization to Explore Patterns
Across Multiple Data Types in the Life
Sciences
Bernice Rogowitz
IBM
Genome and Literature:
Combining Two Massive data Sets through Ontologies
Peter Li
Celera
A challenge facing bioinformaticians in the era of post-genome
research is the integration of genome data with other domains. One
such domain is literature, which is massive and just as
complex. Medline provides an easy access to the majority of the
published literature that are of interest to biomedical
researchers. While only the abstracts are available, it can
nevertheless serve as a representative literature source for
integration with genome data. A basic integration approach is to find
common names and sequences "quoted" by both sources. A more semantic
approach would take advantage of the active development of ontologies
in both data sets, e.g. Gene Ontology for genes and MESH/UMLS for
Medline. We will explore both approaches and the user interface
challenges they present.
The Celera Genome
Browser: A Tool for Visualizing and Annotating the Human Genome
Russell Turner
Celera
We present the Genome Browser, an interactive graphical tool for
visualizing and curating the nucleotide sequences of large genomes, in
particular, the human genome. This tool, developed by Celera Genomics
and used by Celera scientists customers, permits raw nucleotide
information to be visualized, together with accompanying annotation
information. It also provides interactive capabilities for human
curation of genes. The software is written completely in Java and has
a three-tiered architecture with a high-performance "thick" graphical
client, an EJB-based middle-tier server, and an Oracle database
backend. This architecture allows a terabyte-sized genomic database
containing annotations on sequences exceeding 3 Billion base-pairs in
length to be viewed using a direct manipulation graphical user
interface displaying tens of thousands of zoomable data points at a
time. It also allows layering of additional user-specified data on top
of the database data via an XML import capability. Curation operations
are performed by the user using an interactive "drag-and-drop" style
to create and modify gene and transcript information. Curation
information is exported via XML files which can then be loaded into
the database using a separate curation "promotion" utility. This
combined XML and three-tiered data architecture provides sufficient
flexibility to support a variety of different genomic data formats and
curation workflows.
The Comprehensive
Microbial Resource
Owen White, Lowell Umayam, Tanja
Dickinson, Jeremy Peterson
TIGR
PowerPoint slides
One of the challenges presented by large-scale genome sequencing
efforts is the effective display of information in a format that is
accessible to the laboratory scientist. The Comprehensive Microbial
Resource (CMR) contains all of the fully sequenced microbial genomes,
the curation from the original sequencing centers, and further
curation from TIGR (for those genomes sequenced outside TIGR). The
interface to this database effectively "slices" the vast amounts of
data in the sequencing databases in a wide variety of ways, allowing
the user to formulate queries that search for specific genes as well
as to investigate broader topics, such as genes that might serve as
vaccine and drug targets. The web presentation of the CMR includes the
comprehensive collection of bacterial genome sequences, curated
information, and related informatics methodologies. The scientist can
view genes within a genome and can also link across to related genes
in other genomes. The effect is to be able to construct queries that
include sequence searches, biological role, taxonomy, function,
environment and other questions, and isolate the genes of
interest. The database contains extensive curated data as well as
pre-run homology searches to facilitate data mining. The interface
allows the display of the results in numerous formats that will help
the user ask more accurate questions. The methodology for populating
the database, the user interface, and new methods for automated
functional assignment will be presented.
Comparative Visualization
of Genome-Scale Datasets
Brian Wylie
Maggie Werner-Washburne
VisWave
University of New Mexico,Department of Biology
PowerPoint slides
Genome-scale data presents incredible analytical
challenges to biologists. Here we report the comparative, visual analysis
of yeast gene-expression (cell cycle and exit from stationary phase/G0)
and several protein-interaction datasets using VxInsight, a clustering
and visualization tool to develop hypotheses, speed data mining and, thus,
enhance the discovery process. Differences in gene clusters between
the gene-expression datasets for the two related biological processes led
to new, testable
hypotheses. For example, lack of
clustering of G1-regulated (cell-cycle) genes in the exit from stationary
phase dataset suggests that either the cells exiting stationary phase are
not synchronous or that a subset of G1-regulated genes is required for
this process. Additionally, the relative lack of interactions between
ribosomal proteins in both 2-hybrid datasets, which is easily observed
as a function of gene expression, suggests that 2-hybrid methods may not
be able to detect ribosomal protein interactions, possibly because the
bait and prey proteins are incorporated into ribosomes in the nucleus.
Biologists tend to be visually oriented. Thus, providing a tool that
allows large datasets to be "learned" and queried visually enhances hypotheses
development and, eventually, the design of these large experiments, as
biologists learn to use visual analysis in designing genome-scale experiments
to ask more specific and novel questions.
Supporting
Collaborative Bio-Informatics Discovery with Visualization and
Analytics
Christopher Ahlberg
Spotfire
Pharmaceutical discovery has over the last 10 years seen an explosion
in data generated from high throughput technologies as well as from
procurement of high value content - across the whole pharmaceutical
discovery value chain. In addition to the underlying data explosion,
pharma discovery is also facing a decision explosion where new ways of
organizing research and development drives novel decision making
approaches.
This presentation will draw from the speaker's experience in deploying
visualization and analytics in large pharma and biotech over the last
five years - and show successes and challenges - what works and what
doesn't work. Further, key insights in how to make novel
visualizations and algorithms matter beyond small groups of high end
researchers will be presented - trying to show how the power of high
end individuals can be spread to large user communities.
Building Biological
Explanations for Gene Expression Patterns
Terry Gaasterland
Rockefeller Institute
Biological Storytelling:
A Software Tool for Biological Information Organization Based upon Narrative Structure
Allan Kuchinsky, Kathy Graham, David Moh,
Michael L. Creech, Ketan Babaria, and Annette Adler
Agilent
Corporation
PowerPoint slides
The work of molecular biologists seeking
to understand the molecular basis of disease centers on identifying and
interpreting the relationships of genes, proteins, and pathways in living
organisms. While emerging technologies have provided powerful analysis
tools to this end, they have also produced an explosion of data, which
biologists need to make sense of. We have built software tools to support
the synthesis activities of molecular biologists, in particular the activities
of organizing, retrieving, using, sharing, and reusing diverse biological
information. A key aspect of our approach, based upon the findings of user
studies, is the use of narrative structure as a conceptual framework for
developing and representing the ?story? of how genes, proteins, and other
molecules interact in biological processes. Biological stories are represented
both textually and graphically within a simple conceptual model of items,
collections, and stories. Using our software, biologists can build up high-level
graphical and narrative models of biological processes in living cells,
interactively explore those models, and evaluate these models against detailed
experimental data, using visual data overlays.
Modeling
Intra-Cellular Regulatory Networks with Applications in Model
Definition and Evaluation
Naren Ramakrishnan, Cliff Shaffer, and Marc Vass
Virginia
Tech
PowerPoint slides
The JigCell Problem Solving Environment (PSE) provides experimentalists
and modelers with a set of tools for modeling intra-cellular
regulatory networks. Users define models in terms of chemical
reactions entered into a Model Builder. Our approach simplifies model
building through a spreadsheet metaphor that reduces visual clutter
and segments the model into chunks that naturally fit the typical
user's mental image. Specifications for simulating the model with
specific parameters and initial conditions are made in a Run Builder.
The Run Builder then takes the set of chemical equations and the
various parameter settings to generate a set of ordinary differential
equations. Several tools may then be used to explore the output
produced by solving these ODEs. The Comparator quantitatively
evaluates collections of experimental measurements and simulation
results to assist the user in validating the model. Numerical and
graphical visualizations are provided with support for external
visualization packages. JigCell is currently being tested with frog
egg extract models and budding yeast cell models from John Tyson's
Computational Cell Biology Lab at Virginia Tech.
|