Capital Area Theory Seminar: Fall 2001

Current CATS homepage

The Capital Area Theory Symposia is an UMIACS sponsored series of symposia in theoretical computer science bringing computer scientists from around the world to the Capital area. The Symposia are given at the University of Maryland in cooperation with the Computer Science Department.

Talks

If you'd like to meet the speaker, send e-mail to samir@cs.umd.edu.

CATS will usually meet on Wednesday's, at 1:00pm in AVW 3258, but some talks may be at a different place and time. Abstracts are available.

Contact: Send email to samir@cs.umd.edu for additional information. To join the theory-local mailing list, send email to majordomo@cs.umd.edu with "subscribe theory-local {emailaddress}" in the body.

Talks from previous semesters

Other cats

Abstracts

CATS panning meeting

Host: Samir Khuller

We plan on having a short meeting on Sep 5 (Wed) at 4pm in AVW 2120 to see how many students are interested in giving talks this semster. We can also decide on which papers to cover at the meeting. The planning meeting will not last more than 25-30 mins.

If you would like to be on the CATS mailing list, please email smith@cs.umd.edu and join the list. I will not be sending announcements on gradlist@cs in the future.


The Dense k-Subgraph Problem

Nan Wang

This paper considers the problem of computing the dense k-vertex subgraph of a given graph, namely, the subgraph (of k vertices) with the most edges. This problem is NP-hard. An approximation algorithm is developed for the problem, with approximation ratio O(n^delta), for some delta < 1/3.

Download the paper: Local mirror


Simple Communication Strategies for Dynamic Networks

Christian Scheideler

In this talk I will consider the problem of delivering dynamically changing input streams in dynamically changing networks where both the topology and the input streams can change in an unpredictable way. In particular, I will present two simple distributed balancing algorithms (one for packet injections and one for flow injections) and show that for the case of a single receiver, these algorithms will always ensure that the number of packets in the system is bounded at any time step, even for an injection process that completely saturates the capacities of the available edges, and even if the network topology changes in a completely unpredictable way. Also, an essentially matching lower bound that holds for ANY online algorithm will be presented. Interestingly, the balancing approach does not only behave well in a completely adversarial setting. I will show that also in the other extreme of a static network and a static injection pattern the algorithms will converge to a point in which they achieve an average routing time that is close to the best possible average routing time that can be achieved by ANY strategy. This demonstrates that there are simple, distributed algorithms that can be efficient at the same time for very different communication scenarios. To have such algorithms will be of particular importance for communication in wireless mobile ad hoc networks (or short MANETs), in which at some time the connections between mobile nodes and/or the rates of input streams may change quickly and unpredictably and at some other time may be quasi static.

Christian Scheideler is a faculty member in the Department of Computer Science of the Johns Hopkins University.


Reverse and Recursive Combinatorics

Bill Gasarch

Some proofs in math are nonconstructive. Both Nerode's recursive math program and H. Freidman's reverse math program are ways to try to quantify and understand this phenomena (e.g., do some theorems REQUIRE nonconstructive proofs?)

This talk will survey both of these fields and include historical motivation. The main examples given will be from infinite combinatorics. For example, the infinite Ramsey theory uses nonconstructive techniques--- does it have to? Come and find out!

Bill Gasarch is a faculty member in the Department of Computer Science of the University of Maryland, College Park.


Coloring a region

Clyde Kruskal

The chromatic number of the plane is the fewest colors needed in order to paint each point of the plane so that no two points distance (exactly) 1 apart are the same color. It is known that seven colors suffice and at least four colors are necessary.

We investigate how large a bounded region can be, and still be 2-colorable or 3-colorable. We obtain tight bounds for coloring regions enclosed by circles, regular polygons, and rectangles.

It turns out that if a region is 2-colorable or 3-colorable, there is a simple coloring for the region. Mostly, we will look at ``pretty'' pictures.

Clyde Kruskal is a faculty member in the Department of Computer Science of the University of Maryland, College Park.


An Optimal Procedure for Gap Closing in Whole Genome Shotgun Sequencing

Richard Beigel

Tettelin et. al. proposed a new method for closing the gaps in whole genome shotgun sequencing projects. The method uses a multiplex PCR strategy in order to minimize the time and effort required to sequence the DNA in the missing gaps. This procedure has been used in a number of microbial sequencing projects including Streptococcus pneumoniae genome sequencing that recently appeared in Science. We describe a computational framework for gap closing. Our problem can be reduced to certain graph learning problems. We propose several methods that guarantee to minimize the number of steps involved in multiplex PCR procedures in general and whole genome gap closure in particular.

Joint work with Noga Alon, Mehmet Serkan Apaydin, Lance Fortnow, Steven Rudich, and Simon Kasif.

Richard Beigel is a professor in the Department of Computer and Information Sciences of the Temple University.


Approximate Pattern Matching - the Hamming Distance Case

Amihood Amir

Many different meanings can be assigned to the words "approximation" of pattern matching. One possibility is that of allowing "local" errors.

Intuitively, we group under the label "local" errors that take place in a bounded location, as opposed to changes that permeate the entire data (e.g. scaling, rotation).

Specifically, consider the most limited local error -- the mismatch. This error occurs in a single symbol and effects only its location. In contrast, insertions and deletions have a global effect, although the error itself is confined to a single location.

We discuss known techniques and upper bounds for matching with the mismatch local error. We also present a new algorithm that finds in a length-n text all locations where a length-m pattern matches with at most k mismatches, in time O(n sqrt(k log k).

The talk uses the approximate Hamming distance problem only as an excuse for a quick review of some main techniques in the field. Some of these may come in handy even if the listener is not (heaven forbid) a researcher in pattern matching.

The new result is joint with Moshe Lewenstein and Ely Porat.

Amihood Amir is a member in AT&T Labs - Research (on leave from Bar-Ilan University)


Distributions on level-sets with applications to approximation algorithms

Aravind Srinivasan

We will show how to sample from certain distributions that have a natural negative correlation property, as well as pre-specified conditions on their marginal distributions. We will then present applications to improved approximation algorithms (relating to routing, versions of set cover, etc.).


What to do with all this hardware?

Uzi Vishkin

The upcoming so-called ``on-chip Billion transistor'' era raises the question: What to do with all the on-chip hardware once the returns on adding more on-chip memory start to diminish?

Parallel computing (many computers operating in parallel) has been a strategic area of growth for computer science since the 1940s. So far, parallel computing affected main stream computer science only in a limited way. The key problem with parallel computers has been their programmability.

The parallel algorithms research community has developed a theory of parallel algorithms, for a very simple parallel computation model. That theory appears to be second in magnitude only to serial algorithmics.

However, the evolution of parallel computers never reached a situation where that algorithmic computation model offered effective abstraction for them. So, this elegant algorithmic theory remained in the ivory towers of theorists. Not only that it has not been matched with a real computer system, there has hardly been an experimental study of what works better, more refined performance measurements, and a broad study of applications. For example, the general question ``how good parallel algorithms can really be'' has remained generally open.

Explicit Multi-Threading (XMT) is a new fine-grained computation framework which tries to address the hardware opportunity using the parallel algorithmic knowledge base. XMT aims at faster single-task completion time by way of executing in parallel many instruction all within a single chip. Building on some key ideas of parallel computing, XMT covers the spectrum from algorithms through architecture to implementation; the main implementation related innovation in XMT was through the incorporation of low-overhead hardware mechanisms (for more effective fine-grained parallelism).

The two key research questions are ``how to build?'' an XMT computer and ``who cares?''; that is, what will be the key applications?

As it turns out, the effort of answering these questions is integrative in nature, which could make it interesting to CS and ECE graduate students in with a variety of interests; the effort requires flexible and adaptive minds, and willingness to continue learning new domains; for some questions, it is even hard to predict which areas of knowledge will be needed.

URL: http://www.umiacs.umd.edu/~vishkin/XMT


Reusable Time-Lines and Applications

Juan A. Garay

While probably the most important pillar of cryptography is the believed existence of hard problems, the notion of moderately hard problems also bears promise. A moderately hard problem is one which is not computationally infeasible to solve, but also not easy, meaning that a solution can be obtained after a certain - verifiable - amount of work. One such candidate problem, which, moreover, seems to be ``intrinsically sequential,'' is modular exponentiation. Indeed, modular exponentiation has been the basis for recent work in timed-release cryptography, including ``time-lock'' puzzles by Rivest, Shamir and Wagner, and timed commitments by Boneh and Naor.

In this work we build on the Boneh-Naor structure for timed commitments, which we call a ``time-line.'' In a nutshell, a time-line is a vector of elements, where each element is obtained by iterated squaring of the previous element. We first introduce the notion of derived time-lines, which are time-lines that are obtained by ``shifting'' of a master time-line. While expensive proofs are needed to verify the correctness of a master time-line, a much simpler and less expensive verification of correct shifting can be used. As a result, existing applications using time-lines (including timed commitments, electronic contract signing, and concurrent zero-knowledge proofs) benefit from the efficiency gain.

Additionally, our derived time-line construction allows us to solve a problem not addressed previously: the timed release of standard digital signatures (RSA, DSS, Schnorr). Such signatures, once released, cannot be distinguished from signatures of the same type obtained without a timed release, making it transparent to an observer of the end result. We demonstrate how to achieve a transparent timed release of RSA signatures.

An effort will be made to make this talk self-contained. This is joint work with Markus Jakobsson.

Key words: Timed-release cryptography, timed commitments, blind signatures, discrete logarithm problem.


Primer: Some Biology That Computer Scientists Need for Bioinformatics

Lenwood S. Heath

(40 minutes)

Improved experimental technologies in the life sciences, such as DNA sequencing, microarray miniaturization of gene expression studies, and high resolution mass spectrometry for proteomics, has created an explosion in the production and availability of biological data. Biologists now deposit data from their experiments in databases, as it is no longer feasible to directly report the mass of detailed experimental data in a journal paper.

There are numerous online databases of sequence data --- genomic DNA, cDNAs, open reading frames, and proteins. The sequencing of the entire genomes of over 800 organisms have been completed and the sequences placed online, including drosophila (fruit fly), human, mouse, and arabidopsis (thale cress). Numerous microarray gene expression data sets are also available through the Internet. This abundance of biological data demands computational resources for managing, searching, analyzing, and mining that data, giving rise to the interdisciplinary field of bioinformatics. Bioinformatics, in turn, presents major new career and research opportunities for computer scientists.

In this talk, we give an overview of some of the key biological concepts needed by computer scientists to understand the challenges and opportunities of bioinformatics. We will also give a succinct list of the bioinformatics challenges we currently find most interesting in our bioinformatics group.


Functional Genomics and Bioinformatics Applied to Understanding Oxidative Stress Resistance in Plants

Lenwood S. Heath

In this talk, we discuss the application of microarray technology and bioinformatics to studying successful resistance of loblolly pine trees to drought stress. Microarray technology gives biologists access to information about gene expression for thousands of genes simultaneously. The computational component of the study is supported by an NSF-funded project named Expresso. Expresso is a problem solving environment that is being developed by computer scientists at Virginia Tech to support the management of the process of microarray experiment design, data capture, and data analysis. It is being developed in parallel with several biological studies involving microarray technology, including a large study of drought stress in pine. We describe both that biological study and the computational ideas being developed by our bioinformatics collaborators at Virginia Tech and elsewhere.


Justin Wan / ycwan@cs.umd.edu