![]() |
Capital Area Theory Seminar: Spring 2003 |
The Capital Area Theory Symposia is an UMIACS sponsored series of symposia in theoretical computer science bringing computer scientists from around the world to the Capital area. The Symposia are given at the University of Maryland in cooperation with the Computer Science Department.
CATS will usually meet on Friday 1:00pm-2:00pm CSIC 2107 but some talks may be at a different place and time. Abstracts are available.
Several algorithmic techniques have been devised recently to deal with large volumes of data. At the heart of many of these techniques are ingenious schemes to represent data compactly. This talk will present some constructions of such compact representation schemes. We will see an application to identifying similar objects (e.g. text documents) in a large collection. We will also see how such schemes lead to efficient, one-pass algorithms for processing large volumes of data (streaming algorithms).
Dot maps--drawings of point sets--are a well known carthographic method to visualize density functions over an area. We study the problem of simplifying a given dot map: given a set P of points in the plane, we want to compute a smaller set Q of points whose distribution approximates the distribution of the original set P.
We formalize this using the concept of epsilon-approximations, and we give efficient algorithms to compute the approximation error of a set Q with respect to a set P, for several families of ranges, namely unit squares, arbitrary squares, and rectangles. We also present experimental results on several heuristics to compute a good approximation Q of a given set P.
This is joint work with: Jit Bose, Otfried Cheong, Pat Morin, and Antoine Vigneron
We show that for any data set in a metric space, it is possible to construct a hierarchical clustering with the guarantee that for every k the induced k-clustering has cost at most 8 times the optimal k clustering. Here the cost of a clustering is taken to be the maximum radius of its clusters. Our algorithms is similar in simplicity and efficiency to common heuristics for hierarchical clustering and we show that these heuristics have poorer approximation factors.
Consider a mobile robot delivering packages in a building. Each package has an associated urgency value, and we would like the robot to deliver packages in a timely manner while respecting their various degrees of urgency. Unfortunately, the robot's behavior is somewhat unreliable, and the problem is therefore not completely deterministic.
Such planning problems are usually modelled using Markov decision processes, by building a matrix of states, actions, and transition probabilities. The goal is to maximize the expected aggregate reward (where the robot receives a reward for delivering each package). One way to model the notion of urgency involves decreasing the reward (i.e. by a multiplicative penalty) of each package over time. There are well known algorithms for solving Markov decision processes, but the running time is polynomial in the size of the state space. In our problem, the state is the robot's current location and time, and the set of delivered packages. Since the number of possible subsets of the packages is exponential, we must consider ways to generate a good solution without considering all the states.
The problem outlined above is a version of the famous travelling salesman problem: a salesman (in this case a robot) must visit various locations while minimizing the total travel time. The main distinguishing feature of our formulation is the dependance of the reward function on the time at which the package is delivered -- two tours of equal length will have different values if one delivers most of the packages near the end.
In this talk, we will describe the first constant approximation for this discounted travelling salesman problem. We also describe the first constant approximation for the related orienteering problem, in which we must maximize the number of packages delivered within a given time constraint. These problems are NP-Hard by a simple reduction from TSP.
The results described are the product of joint work with Avrim Blum and Shuchi Chawla of Carnegie-Mellon, and with David Karger and Maria Minkoff of MIT. The problem (and its basis in robot navigation) was brought to our attention by Terran Lane of the University of New Mexico.
We present a probabilistic algorithm that, given a connected graph G (represented by adjacency lists) of maximum degree d, with edge weights in the set {1, ..., W}, and a given parameter 0 < epsilon < 1/2, estimates in time O(d w epsilon^-2 log(w/e)) the weight of the minimum spanning tree of G with a relative error of at most epsilon. Note that the running time does not depend on the number of vertices in G. We also prove a nearly matching lower bound of Omega(d w epsilon^-2) on the probe and time complexity of any approximation algorithm for MST weight.
In many standards, e.g. SSL/TLS, IPSEC, WTLS, messages are first pre-formatted, then encrypted in CBC mode with a block cipher. Decryption needs to check if the format is valid. Validity of the format is easily leaked from communication protocols in a chosen ciphertext attack since the receiver usually sends an acknowledgment or an error message. This is a side channel. In this paper we show various ways to perform an efficient side channel attack. We discuss potential applications, extensions to other padding schemes and various ways to fix the problem.
Abstract: Although the study of clustering is centered around an intuitively compelling goal, it has been very difficult to develop an unified framework for reasoning about it at a technical level, and profoundly diverse approaches to clustering abound in the research community. Here we suggest a formal perspective on the difficulty in finding such a unification, in the form of an impossibility theorem: for a set of three simple properties, we show that there is no clustering function satisfying all three. Relaxations of these properties expose some of the interesting (and unavoidable) trade-offs at work in well-studied clustering techniques such as single-linkage, sum-of-pairs, k-means, and k-median.
Reference: You can get the paper here
Abstract: Randomization is vital in cryptography: secret keys should be randomly generated and most cryptographic primitives (e.g., encryption) must be probabilistic. As a common abstraction, it is assumed that there is a source of truly random bits available to all the participants of the system. While convenient, this assumption is often highly unrealistic, and cryptographic systems have to be built based imperfect sources of randomness. Remarkably, this fundamental problem has received little or no attention so far, despite the fact that a related question of simulating probabilistic (BPP) algorithms with imperfect random sources has a long and rich history.
In this work we initiate the quantitative study concerning feasibility of building secure cryptographic primitives using imperfect random sources. Specifically, we concentrate on symmetric-key encryption and message authentication, where the shared secret key comes from an imperfect random source instead of being assumed truly random. We show that the corresponding cryptographic sources lie strictly in between extractable and simulatable sources, which implies that "cryptographic usage" of randomness is more demanding than the corresponding "algorithmic usage", but still does not require perfect randomness.
Reference: You can get the paper here
Abstract: Given a graph G (possibly directed) and set of pairs of vertices the maximum edge disjoint paths problem (EDP) is the optimization problem of maximizing the number of pairs that can be connected by edge disjoint paths. EDP is NP-hard and in directed graphs also hard to approximate as shown by the Omega(m^{1/2-epsilon})-hardness result of Guruswami et al. This seemed to have settled the approximability of the problem since an O(m^{1/2})-approximation is achievable. Here m is the number of edges in the graph. The O(m^{1/2}) bound is also the best known upper bound on the approximability of the problem in undirected graphs.
We show that approximability of the problem in directed graphs is still open by suggesting that n, the number of vertices, is a more appropriate measure to express the approximability. In particular we conjecture that the approximability threshold for directed EDP is Omega(n^{1/2}). We justify this by obtaining the first "sub-linear" (in terms of n) approximation ratios for the the EDP and by noting that the hardness result of Guruswami et al applies only to sparse graphs. In particular we show that the greedy algorithm has an O(min(n^{2/3}, m^{1/2}))-approximation in undirected graphs and an O(min(n^{4/5},m^{1/2}))-approximation in directed graphs. Our analysis is based on an interesting question on the maximum multicommodity flow in simple unit-capacity graphs. We also give an O(n^{1/2} log n)-approximation EDP in directed acyclic graphs via LP rounding, matching the integrality gap to within a log factor.
Reference: "Edge Disjoint Paths Revisited", Chandra Chekuri, Sanjeev Khanna, SODA 2003.
A decomposition of a graph G=(V,E) is a partition of the vertex set into subsets called (blocks). The diameter of a decomposition is the least d such that any two vertices belonging to the same connected component of a block are at distance <= d. In this paper we prove (nearly best possible) statements of the form: Any n-vertex graph has a decomposition into a small number of blocks each having small diameter. Such decompositions provide a tool for efficiently decentralizing distributed computations. It is known that every graph has a decomposition into at most s(n) blocks of diameter at most s(n) for s(n) = n^(O(sqrt(loglogn/logn))). Using known techniques, we improve this result by showing that every graph has a decomposition of diameter O(logn) into O(logn) blocks. In addition, we give a randomized distributed algorithm that produces such a decomposition and runs in time O(log^2(n)). The construction can be parametrized to provide decompositions that trade-off between the number of blocks and the diameters.
Reference: N. Linial and M. Saks. Low diameter graph decompositions. Combinatorica, 13:441-454, 1993.
Given an undirected graph G=(V,E), non-negative weights on its edges and vertices, root vertex r in V, and a positive number k, we consider the capacitated minimum spanning tree (CMST) problem which asks for a minimum spanning tree rooted at r such that the sum of the vertex weights of every subtree, hanging off r, is at most k. The problem is NP-complete, even if all vertices are of unit weight and k=3. We present improved approximation algorithms for the CMST problem and several of its variants. Our results improve the, 15 year old, current best ratios for the respective problems.
This is a joint work with Balaji Raghavachari.
I will speak on a simple dynamical system which models the Internet protocol TCP. The simple model embodies TCP's "additive increase, multiplicative decrease" rule. Two sources s1 and s2 send packets at varying rates r1 and r2 to a recipient; whenever packets are lost, the sender halves its sending rate. We prove that for infinitely many choices of the parameters, the set of feasible rate pairs that can occur in the limit is a fractal. (This does not mean, however, that the traffic is statistically self-similar.) No previous knowledge of TCP or of fractals will be assumed.
This is joint work with Anna Gilbert of AT&T Labs--Research.
I'll aim the talk at theory students unfamiliar with Markov Chain Monte Carlo (MCMC) methods.
After a brief survey of the major highlights of MCMC methods in Theoretical Computer Science, we'll focus on a series of recent works involving the coupling technique, a classical tool from Probability Theory.
Our emphasis is the analysis of the convergence rate of a simple Markov chain, known as the Glauber dynamics, for randomly sampling (proper) k-colorings of an input graph G=(V,E) with maximum degree D and girth g.
We'll conclude with a very recent result (joint with Tom Hayes) proving the Glauber dynamics is close to the uniform distribution after O(nlog{n}) steps whenever k>(1+epsilon)D, for all epsilon>0, assuming sufficiently large constant g and D=Omega(log{n}), where n=|V|. The best previously known bounds were k>11D/6 for general graphs, and k>1.489D for graphs satisfying girth and maximum degree requirements.
Our proof relies on the construction and analysis of a so-called non-Markovian coupling. This appears to be the first application of a non-Markovian coupling to substantially improve upon known results.
Let G = (V, E) be an n-vertex connected graph with positive edge weights. A subgraph G' = (V, E') is a t-spanner of G if for all (u,v) in V, the weighted distance between u and v in G is at most t times the weighted distance between u and v in G.
We consider the problem of constructing sparse spanners. Sparseness of spanners is measured by two criteria, the size, defined as the number of edges in the spanner, and the weight, defined as the sum of the edge weights in the spanner. In this paper, we concentrate on constructing spanners of small weight.
For an arbitrary positive edge-weighted graph G, for any t>1 and e>0, we show that a t-spanner of G with weight O(n^( (2+e)/(t-1) )) wt(MST) can be constructed in polynomial time. We also show that (log^2 n)-spanners of weight O(1) wt(MST) can be constructed.
Reference: Barun Chandra, Gautam Das, Giri Narasimhan, Jose Soares. "New sparseness results on graph spanners", Proc. 8th Annual ACM Symposium on Computational Geometry, pp 192-201, 1992.