University of 
Maryland

Capital Area Theory Seminar: Spring 2002


Current CATS homepage

The Capital Area Theory Symposia is an UMIACS sponsored series of symposia in theoretical computer science bringing computer scientists from around the world to the Capital area. The Symposia are given at the University of Maryland in cooperation with the Computer Science Department.

Talks

If you'd like to meet the speaker, send e-mail to samir@cs.umd.edu.

CATS will usually meet on Wednesday's, at 2:00pm in AVW 2120, but some talks may be at a different place and time. Abstracts are available.

Contact: Send email to samir@cs.umd.edu for additional information. To join the theory-local mailing list, send email to majordomo@cs.umd.edu with "subscribe theory-local {emailaddress}" in the body.

Talks from previous semesters

Other cats

Abstracts

Efficient and Secure Authentication using Short Passwords

Jonathan Katz

A long-standing goal of computer security has been to design protocols for user authentication and authenticated key exchange which are secure even when users choose "weak" (i.e., low-entropy) passwords, as is typically the case. Commonly-used protocols are secure when users choose long, hard-to-remember secrets, but are completely vulnerable to off-line dictionary attacks when short, easily-memorized passwords are used.

We propose the first efficient protocols for password-only authentication and authenticated key exchange which are provably secure against off-line dictionary attacks. Proofs of security rely on standard cryptographic assumptions, and the protocols require only slightly more computation than the original key exchange protocol of Diffie and Hellman (which provides no authentication at all).

We also discuss more recent work on password-based authentication and authenticated key exchange in the setting where a secure PKI is available. We demonstrate the first efficient protocols for these tasks whose security may be based on, for example, the RSA assumption or the hardness of factoring.


Computational Approach to Understanding Protein Structure

Teresa Przytycka

Proteins are molecules that perform a large variety of tasks in a living organism. The ability to carry out protein's characteristic function is the direct consequence of protein's three-dimensional shape. The central dogma states that three-dimensional structure of a protein is determined by its sequence and the environment.In principle, given a protein sequence one shall be able to compute it's structure based on biophysical principles. Despite of tremendous effort this problem remains unsolved. With the large data base of protein structures available today, many successful predictions are made by extracting folding patters observed in structure data base and applying them to predict structure. Can such patters also be used to gain insight into the underlying protein folding process? This talk provides a brief introduction to some of the computational methods used in protein structure research and subsequently introduce characterization of protein structures using "folding rules." Folding rules attempt to describe protein folds in a way similar in which productions describe a formal language. The talk is directed to computer science audience and makes no assumption of molecular biology knowledge.


Algorithmic Tools for Analyzing Large Scale Genome Rearrangements

Suleyman Cenk Sahinalp

The human genome contains several classes of repeat segments, i.e., sequences that appear more than once in the genome. Among these, two well identified classes are tandem and interspersed repeats. Both of these repeats tend to be short and have high copy numbers; they differ mainly by the mechanisms responsible for their propagation. Tandem repeats are believed to be generated by unequal cross-overs or replication slippage whereas interspersed repeats are propagated via mechanisms of retrotransposition.

There are also less well-understood classes of repetitive DNA such as the long repeat segments within the pericentromeric regions of the chromosomes. These repeats usually have low copy numbers (2 to 10) and high similarity (over 90%) and may include partial or complete gene segments. In the course of meiosis, these repeats may lead to misalignment between sister chromatids, which may result in an excess or lack of developmentally important genes and cause genomic disorders.

It is conjectured that there may be unique DNA transposition mechanisms for duplication and dispersal of such segments, different from the mechanisms mentioned above. In fact, there are potentially many more types of structural rearrangement mechanisms waiting to be discovered. The abundance and the hierarchical structure of many of these segmental rearrangements necessitate development of specialized sequence search, discovery and evolutionary analysis algorithms. More specifically, efficient solutions to the following algorithmic challenges are needed towards a comprehensive understanding of the human genome organization. (1) Identifying the exact boundaries of all repeat segments, (2) analyzing evolutionary relationships between these segments, and (3) determining the order of genomic rearrangements.

In this talk we will focus particularly on the first challenge and discuss a number of algorithmic approaches which we developed utilizing tools from Combinatorial Pattern Matching and Hidden Markov Models. We will also discuss novel approaches for computing evolutionary distances between genomic segments. Our techniques take into account all known mechanisms for genome-wide structural rearrangements such as duplications, deletions, translocations and repeats, and provide efficient tools for large scale sequence indexing.


Computational Aspects of the Mouse Genome Project

Dan Brown

Beginning in 2000, an international consortium has begun large-scale sequencing of the genome of the mouse. The mouse has been chosen because of its evolutionary closeness to the human genome, and also because of its importance as an experimental system.

As a result of this sequencing effort, several computational challenges have become available. First, how to assemble the mouse sequence into a composite assembly. Second, how to identify regions of the genomes that have been conserved by evolution in the tens of millions of years since the human and mouse genomes diverged. Third, how to categorize these regions by their function and use this information to generate testable hypotheses.

We will survey several of these areas. Many of the computational challenges are easily addressed by parallel computation, as the underlying questions are themselves easily separated into small subproblems. On the other hand, a new challenge which faces us is that there are many different sources of often inaccurate annotation information for a genome. On that front, we will discuss work in progress to combine information from noisy and uncertain sources, and suggest possible ways to cope with the computational overhead.

Much of this is joint work with researchers at the Whitehead Institute / MIT Center for Genome Research, and also with Brona Brejova and Tomas Vinar at Waterloo.


Approximating the Number of Inversions in a Data Stream

T.S. Jayram

Inversions are used as a fundamental quantity to measure the sortedness of data, to evaluate different ranking methods for databases, and in the context of rank aggregation. Considering the volume of the data sets in these applications, the data stream model is a natural setting to design efficient algorithms. (If there is enough time, I will give an overview of this model and some of the classical results in this area.)

We obtain a suite of space-efficient streaming algorithms for approximating the number of inversions in a permutation. The best space bound we achieve for this problem is O(log n loglog n) through a deterministic algorithm. For the more general problem of approximating the number of inversions between two permutations, we obtain a randomized O(sqrt(n) log n)-space algorithm. For approximating the number of inversions in a general list, we give a randomized O(sqrt(n) log^2 n)-space two-pass algorithm.

Joint work with: Miklos Ajtai, Ravi Kumar, and D. Sivakumar


Anonymous Credentials Anna Lysyanskaya

In order to obtain access to a resource, a credential is usually required. For example, one needs a driver's license to rent a car, or a library card to borrow a book. As paper-based transactions are being replaced by electronic ones, credentials are also taking electronic form.

Hand-in-hand with the convenience of electronic credentials, comes the danger that all transactions can be easily recorded and analyzed. To be sure, the recording happens in the paper-based world, as well; however, electronic records make searching and aggregating information practical on a large scale. Thus, electronic credentials make it ever so easy to collect too much personal information. For example, by observing a person's use of a driver's license, one can trace this person's itinerary.

I will present an anonymous credential system designed to solve this problem. In this system, a user (credential owner) can prove possession of a credential without revealing more than this single bit of information, and can obtain a credential without revealing more than the information than required by the issuing authority. As a result, the user's identity remains hidden, and, moreover, transactions carried out by the same user cannot be linked. Our system thus guarantees privacy of users.

Ours is the first anonymous credential system suitable for practical use: it requires no involvement of trusted third parties, and all the protocols are efficient. In addition to ensuring privacy of users, our system enables credentials that have expiration dates and other attributes, and ensures that they are non-transferable and revocable. Moreover, a user's anonymity can be revoked in case of emergency.

A commercial product based on this technology is currently being developed by IBM.


Optical Network Optimization

Lisa Zhang

In this talk, I present some problems in optimizing optical networks for the Lucent ONG unit (Optical Network Group).

To carry traffic over dark fibers we need equipment to perform signal generation, amplification, switching, etc. How to route traffic and how to deploy equipment to support the routing have a large impact on the cost of the network. Equipment for an optical network typically costs hundreds of millions of dollars. Hence, even saving a small percentage results in large dollar savings. However, the optimization is subject to complex engineering rules and has hard components such as coloring and buy-at-bulk design.

I present some heuristics that have dramatically reduced the network cost when deploying the LambdaXtreme system (Lucent's new optical product). Key components of LambdaXtreme include OADMs (optical add-drop multiplexers) and long-haul OTs (optical transponders). I discuss how best to take advantage of these components.

This work is joint with David Einstein, Steven Fortune and Wim Sweldens.


Handling Long Targets and Errors in Sequencing by Hybridization

Eran Halperin

Sequencing by hybridization (SBH) is a DNA sequencing technique, in which the sequence is reconstructed using its k-mer content. This content, which is called the spectrum of the sequence, is obtained from detecting hybridizations to a universal DNA array. However, hybridization errors, as well as the fact that standard DNA chips are not sufficient to uniquely reconstruct sequences more than a few hundred long, make SBH incompetitive with current standard gel-based sequencing methods.

In this talk we deal with both problems. We introduce a simple SBH algorithm which is based on standard DNA arrays (which consist of all k-mers), and has provable performance in the presence of both false negative and false positive errors. We propose a novel chip design, alternative to the one of Preparata et al., which uses universal bases. By applying a simple algorithm we reconstruct sequences of length optimal up to a square-log factor of the information theoretic bound. Moreover, our algorithm is very robust, and has a provable performance even if there are both false neagtive and false positive errors. The theoretical results are complemented by simulations of the latter algorithm, which indicate that its sensitivty to errors is very small also in practice.

This is a joint work with Shay Halperin, Tzvika Hartman and Ron Shamir.


Computational Depth

Lance Fortnow

We develop a measure of "useful information" that we call computational depth as the difference of two different Kolmogorov complexity measures. We believe our measure is simpler and cleaner than previous definitions. We also get many interesting applications, including:

- We develop a notion of shallow sets that generalizes sparse and random sets, and show that any computable set reducible to a shallow set has small circuits.

- We show that an algorithm is polynomial-time on average for all "reasonable" distributions if and only for all inputs x, the algorithm runs in time exponential in the depth of x.

We also discuss other results on depth and some future directions.

This work is based on the two papers below: L. Antunes, L. Fortnow, and D. van Melkebeek. Computational depth. In Proceedings of the 16th IEEE Conference on Computational Complexity, pages 266-273. IEEE, New York, 2001.

L. Antunes, L. Fortnow, and V. Vinodchandran. Computational depth vs. average polynomial time. Technical Report 2001-097, NEC Research Institute, 2001.

Both of these papers are available at http://www.neci.nj.nec.com/homepages/fortnow/papers


Improved Results for Stackelberg Scheduling Strategies

Anil Vullikanti

We continue the study initiated by Roughgarden on Stackelberg Scheduling Strategies. We are given a set of m independent parallel machines or equivalently a set of m parallel edges on which certain flow has to be sent. Each edge e is endowed with a latency function. The setting is that of a non-cooperative game: players choose edges so as minimize their individual latencies. Additionally, there is a single player (a leader) who controls a fixed fraction of the total flow. The goal is to find a strategy for the leader (i.e. an assignment of flow to individual links) such that the selfish users react so as to minimize the total latency of the system.

We devise a fully polynomial approximate scheme to compute a flow assignment, so that the cost of the induced equilibrium is within a $1+\epsilon$ factor of the optimum Stackelberg strategy, for any given $\epsilon$.

We then consider a two round Stackelberg strategy. In this strategy, the game consists of three rounds: a move by the leader followed by the moves of all the followers followed again by a move by the leader who possibly reassigns some of the flows. We show that this always dominates the one round scheme, and for some classes of latency functions, is guaranteed to be closer to the global social optimum. It is easy to see that more than 2 rounds do not help.


A 3/2 ratio approximation algorithm for augmenting the edge-connectivity of a graph from 1 to 2

Guy Kortsarz

We consider the following 2-ECST problem: given a connected graph G=(V,E) and an additional edge set E_A \subseteq V^2, find a minimum size subset of edges F of E_A such that (V, E \cup F) is 2-edge connected. As the 2-edge-connected components of G form a tree, it follows that by contracting these components, one may assume that G is a tree. Hence, the 2-ECST Problem is equivalent to the Tree Augmentation Problem (TAP) defined as follows. The input consists of a tree T(V,E) and a set of links E_A \subseteq V^2. The goal is to find a smallest subset F \subseteq E_A such that G(V, E \cup F) is 2-edge connected. This problem is NP-hard. For a long time, 2 was the best approximation ratio known. Recently, Nagamochi reported a (1.875+epsilon)-approximation algorithm. We give a new algorithm with a better approximation ratio of 3/2 and a practical running time. Our analysis is based on amortized local ratio.


Justin Wan / ycwan@cs.umd.edu