| Capital Area Theory Seminar: Spring 2005 |
| The Capital Area Theory Symposia is an UMIACS sponsored series of symposia in theoretical computer science bringing computer scientists from around the world to the Capital area. The Symposia are given at the University of Maryland in cooperation with the Computer Science Department. |
CATS usually meets on Monday 12:00pm-1:00pm in CSIC 2107 but some talks may be at a different place and time.
For additional information send email to Samir Khuller.
If you want to receive announcements of upcoming talks join the theory-local mailing list.
Papers from SODA 2005: Zip'ed PDF Proceedings (25 MB)
Papers from STOC 2005: List of accepted papers
![]() |
The identification of peptides from tandem mass spectra is an important part of many high-throughput proteomics pipelines. In the high-throughput setting, the spectra are typically identified using software that matches tandem mass spectra with peptides from amino-acid sequence databases. These search engines test each peptide against various experimentally derived properties, such as intact mass, enzymatic digestion motifs, and ultimately, the tandem mass spectra itself. The running times of these search engines are typically proportional to the size of the amino-acid sequence database used, rather than the number of distinct peptides it contains.
We model the set of distinct peptides of an amino-acid sequence database by its k-mers for suitable k, and find superstrings that contain all the original k-mers and introduce no new ones. We show that the minimum length k-mer superstring can be found in polynomial time under two regimes, one in which k-mers from the original sequence database must appear exactly once in the output, and one in which k-mers may appear more than once.
We demonstrate that the minimum length k-mer superstring can be used to represent the peptide content of amino-acid sequence databases using significantly fewer amino-acids. In some cases, where the original amino-acid sequence database contains many redundant peptides, we have been able to reduce the size of the amino-acid sequence to almost half of its original size. We develop a lower bound for achievable compression and demonstrate empirically that the optimal k-mer superstrings are consistently within 15-25% of this lower bound.
It is shown that the problem of maintaining the topological order of the nodes of a directed acyclic graph while inserting m edges can be solved in O(min{m^{3/2}logn, m^{3/2}+n^{2}logn}) time, an improvement over the best known result of O(mn).
In a topological ordering each node 'v' of a given directed acyclic graph (DAG) is associated with a value 'ord(v)', such that for each edge (u,v) we have odr(u) < ord(v). In the online variant of the topological ordering problem, the edages of the DAG are not know in advance but are given one at a time.
The online topological ordering problem has several applications. It has been studied in the context of compilation where dependencies between modules are maintained to reduce the amount of recompilation performed when an update occurs, source code analysis where the aim is to statically determine the smallest possible target set of all pointers in a program and as a subroutine of an algorithm for incremental evaluation of computational circuits.
Reference: "Online Topological Ordering," Irit Katriel and Hans L. Bodlaender, SODA 2005.
This paper considers two inter-related questions:
(i) Given a wireless ad-hoc network and a collection of source-destination pairs {(s_i,t_i): i = 1...n}, what is the maximum throughput capacity of the network, i.e. the rate at which data from the sources to their corresponding destinations can be transferred in the network?
(ii) Can network protocols be designed that jointly route the packets and schedule transmissions at rates close to the maximum throughput capacity?
Much of the earlier work focused on random instances and proved analytical lower and upper bounds on the maximum throughput capacity. Here, in contrast, we consider arbitrary wireless networks. Further, we study the algorithmic aspects of the above questions: the goal is to design provably good algorithms for arbitrary instances. We develop analytical performance evaluation models and distributed algorithms for routing and scheduling which incorporate fairness, energy and dilation (path-length) requirements and provide a unified framework for utilizing the network close to its maximum throughput capacity.
Motivated by certain popular wireless protocols used in practice, we also explore "shortest-path like" path selection strategies which maximize the network throughput. The theoretical results naturally suggest an interesting class of congestion aware link metrics which can be directly "plugged into" several existing routing protocols such as AODV, DSR, etc. We complement the theoretical analysis with extensive experiments. The results indicate that routes obtained using our congestion aware link metrics consistently yield higher throughput than hop-count based shortest path metrics.
Reference: "Algorithmic Aspects of Capacity in Wireless Networks," V.S. Anil Kumar, Madhav V. Marathe, Srinivasan Parthasarathy, and Aravind Srinivasan, to appear in SIGMETRICS 2005.
Broadcasting is a classical problem that has been widely studied for decades. One source node wishes to broadcast a message to every other node. We study generalizations of broadcasting to the situation when the source node has several pieces of data to broadcast or multicast. Moreover, each piece of data may have a different destination set. Communication typically proceeds in rounds, and in each round each node can send (receive) a message to (from) one other node. The objective is to minimize the number of rounds. We present a near optimal algorithm for this problem.
In addition we develop broadcast and multicast algorithms for a more complex communication model, where the nodes are arranged in clusters. In this model, sending a message from one node to another node in the same cluster takes one round, and sending a message to a node in a different cluster takes C rounds. We develop a simple and efficient approximation algorithm for this model. We then extend the model to more complex models where we remove the assumption that an unbounded amount of simultaneous communication may happen using the global network. Finally, we discuss broadcasting algorithms for the postal model where the sending node does not block for C rounds when the message is in transit.
Reference: Samir Khuller, Yoo-Ah Kim, and Yung-Chun (Justin) Wan, "On Generalized Gossiping and Broadcasting," ESA 2003.
Finding the nearest neighbor (NN) of a query point q, among a set of n points in d-dimensional Euclidean space, is a fundamental geometric data structures problem. Unfortunately, efficient solutions are not known for high dimensions. We consider the problem of approximate nearest neighbor (ANN), which consists of finding a point p such that dist(p,q) <= (1 +eps)dist(NN(q),q), for 0 < eps < 1/2. We show how to construct a linear size quad-tree based data structure that answers ANN queries in logarithmic time. Lastly, we present techniques that lead to improved performance, especially by reducing the size dependence on 1/eps and d.
Given a graph G=(V,E) a vertex cover is a set of vertices C that cover every edge, that is, for every edge at least one of its endpoint is chosen in C. In the partial version of the problem we are given a coverage target t, now C must cover not all, but at least t of the edges. This relaxation can be regarded as a tradeoff between cost and coverage.
In this talk I will present a primal-dual 2-approximation algorithm for the problem that runs in O(V log V + E) time. This is an improvement in running time over the best known algorithm for the problem.
We present a fully distributed algorithm which chooses a node uniformly at random from the set of all nodes in a large-scale network. Our algorithm makes use of a distributed hash table (DHT) data structure. Our algorithm has latency O(log n) and sends O(log n) messages in expectation for a standard DHT like Chord. Our motivation for studying this problem is threefold: to enable data collection by statistically rigorous sampling methods; to provide support for randomized, distributed algorithms; and to support the creation and maintenance of random links, and thereby offer a simple means to improve fault-tolerance.
This is joint work with Valerie King (U. Victoria) and Scott Lewis (U. New Mexico). The talk covers work published in Principles of Distributed Computing (PODC) 2004 and submitted to Symposium on Parallel Architectures and Algorithms (SPAA) 2005.
Here I give a primal-dual approach with approximation guarantee of H_n for the set cover problem using the greedy algorithm and I show that the bound can not be improved.
Then I consider a more general version of set cover called set multicover in which each element need to be covered a specified integer number of times. The objective is to cover all elements up to their coverage requirements at minimum cost. A constrained version of this problem has the additional constraint that each set can be picked at most once. We consider this constrained version. This version of the problem has approximation guarantee of H_n too.
Reference: Chapter 13 of the book "Approximation Algorithms" by Vijay Vazirani.
Suppose that you want to solve a geometric optimization problem approximately on a large data set. One approach would be to extract a small subset that has the property that solving the problem on the subset provides an approximate solution to the original problem. Such a set is called a coreset.
In this talk I will present the paradigm of coresets and mention its applications in efficiently approximating various extent measures of a point set P as well as its applications in efficiently approximating solutions to various optimization problems on P.
Reference: "Geometric Approximation via Coresets" by Pankaj K. Agarwal, Sariel Har-Peled, and Kasturi R. Varadarajan.
The edge coloring problem asks for assigning colors from a minimum number of colors to edges of a graph such that no two edges with the same color are incident to the same node. In this paper a polynomial time for approxime edge coloring of multigraphs is given. The best previous algorithms achieve a fixed constant approximation factor plus a small additive offset. The algorithms in this paper achieve arbitrarily good approximation factors at the cost of slightly larger additive term. In particular, for any e>0, they achieve a solution quality of (1+e)opt + O(1/e).
Reference: Peter Sanders and David Steurer, "An Asymptotic Approximation Scheme for Multigraph Edge Coloring," SODA 2005.
An additive k-spanner of a graph is a subgraph whose distance metric approximates that of the original graph to within an additive error of k. (Graphs are undirected and unweighted). It is known that every graph contains an additive 2-spanner with O(n^{3/2}) edges and that this bound is optimal. Until the present work no other additive spanners were known and their existence was very much in doubt.
In this talk I'll discuss a completely new approach to computing spanners based on an economics inspired view of the problem. Our primary result is that every graph contains an additive 6-spanner with O(n^{4/3}) edges. The same method is used to prove several lesser results.