Capital Area Theory Seminar: Fall 2006

 The Capital Area Theory Symposia is an UMIACS sponsored series of symposia in theoretical computer science bringing computer scientists from around the world to the Capital area. The Symposia are given at the University of Maryland in cooperation with the Computer Science Department.

Schedule

CATS usually meets on Fridays 11:00pm-12:00pm AVW 3258 but some talks may be at a different place and time.

Date Time Speaker Title
Sep 15 11:00am David Mount
UMD
On Approximate Range Searching - or - Get in Shape; Round is a Good Choice

Sep 22 11:00am Azarakhsh Malekian
UMD
Near Optimal Online Auction

Sep 29 11:00am Samir Khuller
UMD
K-Center Clusterings and Generalizations

UMD
New Proofs for NMAC and HMAC

Oct 13
Theory Day

Oct 20 11:00am William Gasarch
UMD
VAN DER WAERDEN"S THEOREM and THE POLYNOMIAL VDW THEOREM:an Exposition in two parts. Part I.

Oct 27 11:00am William Gasarch
UMD

Nov 3 11:00am Alon Orlitsky Information Theory and Probability Estimation:
From Shannon to Shakespeare via Laplace,
Good, Turing, Hardy, Ramanujan, and Fisher

Nov 10 11:00am Aravind Srinivasan
UMD
Approximating the Domatic Number

Nov 17 11:00am Mihai Pop Combinatorial problems in genome assembly
Nov 27 2:00pm An Zhu

Towards Achieving Anonymity
Dec 8 11:00am Marius Zimand Exposure-resilient extractors and their application in the derandomization of BPTIME[sublinear]

Dec 22 1:00 pm (4185) David Kempe Maximizing the Spread of Influence in a Social Network

For additional information send email to Samir Khuller.

If you want to receive announcements of upcoming talks join the theory-local mailing list.

Abstracts

On Approximate Range Searching - or - Get in Shape; Round is a Good Choice

David Mount

ABSTRACT

Range searching is among the most fundamental and well-studied problems in computational geometry. Given an n-element point set in real d-dimensional space the problem is to preprocess the points so that the total weight, or generally the semigroup sum, of the points lying within a given query range Q can be determined quickly. Years of research have resulted in nearly matching upper and lower bounds for many formulations of this problem. In approximate range searching, the user provides an approximation parameter, and the search algorithm is allowed to either count or ignore points that are sufficiently close the range's boundary. For many formulations the complexity of exact range searching is rather insensitive to semigroup properties and range properties. Recent research has indicated that matters are much different for approximate range searching. In approximate range searching the interaction of semigroup properties (such as idempotence) and range shape properties (such as smoothness) can have a significant impact on the complexity of answering queries. In this talk we will explore the differences between exact and approximate range searching, survey recent complexity results, present links to other retrieval problems, and discuss open problems and challenges for future research.

Near Optimal Online Auction

Azarakhsh Malekian

ABSTRACT

In This talk, the online auction problem in which an auctioneer is selling identical items to bidders arriving one at a time. In this paper the authors propose an online algorithm that achieves a constant factor of the optimal profit less an O(h) additive loss term, where h is the value of the highest bid. Furthermore, this auction does not require forknowledge of the range of bidders valuations. Also they improve the result on the online posted price problem by reducing the additive loss term from O(hlog(h)loglog(h)) to O(hloglog(h)). Finally, they define the notion of an (offline) attribute auction for modeling the problem of auctioning items to consumers who are not a-priori in-distinguishable. The online auction solution is applied to this problem to achieve good bounds with 1-dimensional attributes. research.microsoft.com/users/hartline/papers/online-auctions-SODA-05.pdf

New Proofs for NMAC and HMAC

ABSTRACT

HMAC is a popular hash-function-based technique for Message Authentication Codes (MAC). The basic construct is NMAC of which HMAC can be viewed as a derivative. HMAC is widely standardized and implemented, e.g. in SSL, SSH, IPsec and TLS amongst others. It is also often used as a Pseudo Random Function (PRF), e.g. for key-derivation in TLS and IKE rather than just as a MAC. In a paper from 1996, Bellare et. al. proved that HMAC is a PRF (which implies secure MAC) assuming (1) the underlying compression function is a PRF and (2) the iterated hash function is weakly collision-resistant. HMAC is usually implemented with MD5 or SHA-1, but recent attacks show that assumption (2) is false for them. This paper proves that HMAC is a PRF under the sole assumption (1). This recovers a proof based guarantee since no known attacks compromise the pseudorandomness of the compression function which also helps explain the resistance-to-attack that HMAC has shown even when implemented with hash functions whose weak colision-resistance is compromised.

http://eprint.iacr.org/2006/043

K-Center Clusterings and Generalizations

Samir Khuller

ABSTRACT

The K-center problem is a fundamental clustering question. How do we cluster n points into K clusters, so that we minimize the radius of the largest cluster? This problem is NP-hard, and an optimal approximation factor of 2 can be obtained quite easily. Our focus will be on studying generalizations of this for the case where
we have to cluster only a certain fraction of the points, rather than all the points. We also study generalizations where there are upper and lower bounds on the sizes of the clusters.
This is a survey talk with results from several papers.

William Gasarch

ABSTRACT

A Corollary of VDW theorem is as follows: For any 2-coloring COL of Z there exists a,d such that COL(a) = COL(a+d) = COL(a+2d) = ... = COL(a+44d) As you may have guessed the 44 is not important, it can be anything. In fact, the 2 is also not important, it can be anything. The full VDW theorem is as follows: For any c\in N, for any k\in N, For any c-coloring COL of Z there exists a,d such that COL(a) = COL(a+d) = COL(a+2d) = ... = COL(a+kd) This has a NICE proof. The PolyVDW is a generalization where we use poly's in d instead of the linear polys d, 2d, ..., kd. Formally: PolyVDW (proven by Bergelson and Leibman) is the following: for all c,\in N, p_1,\ldots,p_k polynomials over the integers (with p_i(0)=0) for any c-coloring COL of Z, there exists a,d\in Z such that COL(a) = COL(a+p_1(d)) = COL(a+p_2(d)) = ... = COL(a+p_k(d)) The original proof of the polyvdw theorem involved ergodic technqiues and was somewhat difficult. Walters has an elementary proof. We will present a simpler version or his proof and discuss even further generalizations.

Professor Alon Orlitsky

ABSTRACT

Standard information-theoretic results show that data over small, typically binary, alphabets can be compressed to Shannon's entropy limit. Yet most practical sources, such as text, audio, or video, have essentially infinite support. Compressing such sources requires estimating probabilities of unlikely, even unseen, events, a problem considered by Laplace. Of existing estimators, an ingenious if cryptic one derived by Good and Turing while deciphering the Enigma code works best yet not optimally. Hardy and Ramanujan's celebrated results on the number of integer partitions yield an asymptotically optimal estimator that compresses arbitrary-alphabet data patterns to their entropy. The same approach generalizes Fisher's seminal work estimating the number of butterfly species and its extension authenticating a poem purportedly written by The Bard. The talk covers these topics and is self contained.

Approximating the Domatic Number

Aravind Srinivasan

ABSTRACT

A set of vertices in a graph is a dominating set if every vertex outside the set has a neighbor in the set. The domatic number problem is that of partitioning the vertices of a graph into the maximum number of disjoint dominating sets. Let n denote the number of vertices, delta the minimum degree, and Delta the maximum degree. I will present two results from my paper "Approximating the Domatic Number" (joint work with Feige, Halldorsson and Kortsarz, available at www.cs.umd.edu/~srin/PS/domat-jou.ps). 1. We show that every graph has a domatic partition with $(1 - o(1))(\delta + 1)/\ln n$ dominating sets, and moreover, that such a domatic partition can be found in polynomial time. This implies a $(1 + o(1))\ln n$ approximation algorithm for domatic number, since the domatic number is always at most $\delta + 1$. (We also show this to be essentially best possible.) 2. We also show that every graph has a domatic partition with $(1 - o(1))(\delta + 1)/\ln \Delta$ dominating sets, where the $o(1)$'' term goes to zero as $\Delta$ increases. This can be turned into an efficient algorithm that produces a domatic partition of $\Omega(\delta/\ln \Delta)$ sets. Though this paper appeared a few years ago, result 2 above illustrates a nice and powerful "slow coloring" approach that has several applications (especially in graph coloring), which I plan to spend some time on.

Combinatorial problems in genome assembly

Mihai Pop

ABSTRACT

Current sequencing technologies can "read" only approximately 1000 base pairs of DNA at a time.  Determining the DNA sequence of an organism
(whose genetic code ranges between millions and billions of base pairs) thus requires us to join together many such short fragments in a process
not unlike solving a jigsaw puzzle  - genome assembly.  I will present several combinatorial problems that arise in the genome assembly process, as well  as some emerging scientific questions raised by the introduction of new technologies for sequencing DNA.

Towards Achieving Anonymity

An zhu

ABSTRACT

We study the problem of publishing data from a table containing personal data, in a privacy preserving manner.  In particular, we aim to anonymize quasi identifiers, i.e., non-key attributes that combinedly identifiy a unique record in the table.
The first model is proposed by Sweeney, called k-anonymity.  This approach suppresses some values of the quasi-identifiers, such that for every record in the modified table, there are at least k-1 other records with exactly the same value.  And the quality measure here is the number of quasi-identifier values suppressed.  We provide a O(k)-approximation algorithm for this problem, improving upon the previous O(k log k) result.   We also show that this is the best
approximation bound possible using the distance representation. For small values of k, we provide improved bounds as well.
We propose a second model which generalizes the quasi-identifier values via clustering.  The records are first clustered and then the cluster centers are published.  To ensure privacy, we impose the constraint that each cluster must contain at least k records.  We consider the measure of minimizing the maximum cluster radius, for which we provide a tight 2-approximation algorithm.    We also take into consideration the number of records per cluster.  The second
measure concerns minimizing the sum, over all clusters, the product of number of records per cluster and the cluster radius.  For this
measure we also provide a constant approximation algorithm.  Further, we extend the algorithms to handle the case where we can omit outliners.
This talk is based on two papers:
Anonymizing Tables, coauthored with Aggarwal, Feder, Kenthapadi,
Motwani, Panigrahy, and Thomas in ICDT 2005.
Achieveing Anonymity via Clustering, coauthored with Aggrawal, Feder,
Kenthapadi, Khuller, Panigrahy, and Thomas in PODS 2006.

Exposure-resilient extractors and their application in the derandomization of BPTIME[sublinear]

Marius Zimand

ABSTRACT

Extractors are procedures that improve the quality of randomness sources. They transform distributions on binary strings of length n with min-entropy less than n, into distributions that are close to uniform. An exposure-resilient extractor is a more robust type of an extractor in that the output looks random even to tests that have adaptive but bounded-query access to the original distribution. I will present a construction of such an extractor and I will show that probabilistic algorithms running in sublinear time can be simulated deterministically with little time overhead on all inputs with the exception of a negligible fraction

Maximizing the Spread of Influence in a Social Network

David Kempe

ABSTRACT

joing work with Jon Kleinberg, Eva Tardos) Based on papers from KDD 2003 and ICALP 2005
A social network - the graph of relationships and interactions within a group of individuals - plays a fundamental role as a medium for the spread of information, ideas, and influence among its members.
An idea or innovation will appear - for example, the use of cellphones among college students, the adoption of a new drug within the medical profession, or the rise of a political movement in an unstable society - and it can either die out quickly or make significant inroads into the population.
The resulting collective behavior of individuals in a social network has a long history of study in sociology. Recently, motivated by applications to word-of-mouth marketing, Domingos and Richardson proposed the following optimization problem: allocate a given "advertising" budget so as to maximize the (expected) number of individuals who will have adopted a given product or behavior.
In this talk, we will investigate this question under the mathematical models of influence studied by sociologists. We present and analyze a simple approximation algorithm, and show that it guarantees to reach at least a 1-1/e (roughly 63%) fraction of what the optimal solution can achieve, under many quite general models. In addition, we experimentally validate our algorithm, comparing it to several widely used heuristics on a data set consisting of collaborations among scientists.

http://www.cs.umd.edu/areas/Theory/ / Webmaster: Azarakhsh Malekian