Srinivas R. Kashyap
PhD 2007, Currently at IBM T.J. Watson Research Center
Department of Computer Science
University of Maryland
College Park, MD 20742
Email: raaghav AT cs DOT umd DOT edu .

Research Interests

  • Design and analysis of algorithms for networked and distributed systems, information networks and combinatorial optimization.

Published Papers

  • Efficient Constraint Monitoring Using Adaptive Thresholds (with Jai Ramamirtham, Rajeev Rastogi and Pushpraj Shukla). In Proceedings of the the 24th IEEE International Conference on Data Engineering (ICDE) 2008.

      Abstract: Detecting constraint violations in large-scale distributed systems has recently attracted plenty of attention from the research community due to its varied applications (security, network monitoring, etc.). Communication efficiency of these systems is a critical concern and determines their practicality. In this paper, we introduce a new set of methods called non-zero slack schemes to implement distributed SUM queries efficiently. We show, both analytically and empirically, that these methods can lead to a considerable reduction in the amount of communication. We propose three adaptive non-zero slack schemes that adapt to changing data distributions; our best scheme is a lightweight reactive scheme that probabilistically adjusts local constraints based on the occurrence of certain events (using only a periodic probability estimation). We conduct an extensive experimental study using real-life and synthetic data sets, and show that our non-zero slack schemes incur significantly less communication overhead compared to the state of the art zero slack scheme (over a 60% savings).


  • Efficient Gossip-Based Aggregate Computation (with Supratim Deb, K.V.M. Naidu, Rajeev Rastogi and Anand Srinivasan). In Proceedings of the 25th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 2006.

      Abstract: There has been a growing interest in gossip-based protocols that employ randomized communication to ensure robust information dissemination. In this paper we present a novel gossip-based scheme using which all nodes in an n-node overlay network can compute the common aggregates of MIN, MAX, SUM, AVERAGE and RANK of their values using O(n loglog n) messages in O(log n loglog n) rounds of communication. To the best of ours knowledge, ours is the first result that shows how to compute these aggregates with high probability using only O(n loglog n) messages. In contrast, the best known gossip-based algorithm for computing these aggregates requires O(n log n) messages and O(log n) rounds. Thus, our algorithm allows system designers to trade off a small increase in round complexity with a significant reduction in message complexity. This can lead to dramatically lower network congestion and longer node lifetimes in wireless and sensor networks, where channel bandwidth and battery life are severly constrained.


  • Fast Reconfiguartion of Data Placement in Parallel Disks (with Samir Khuller, Yung-Chun (Justin) Wan and Leana Golubchik). In Proceedings of the 8th ALENEX 2006.

      Abstract: The “How much information?” study produced by the school of information management and systems at the University of California at Berkeley [10], estimates that about 5 exabytes of new information was produced in 2002. It estimates that the amount of stored information doubled in the period between 1999 and 2002. It is believed that more data will be created in the next five years than in the history of the world. Clearly we live in an era of data explosion. This data explosion necessitates the use of large storage systems. Storage Area Networks (or SANs) are the leading [13] infrastructure for enterprise storage. A SAN essentially allows multiple processors to access several storage devices. They typically access the storage medium as though it were one large shared repository. One of the crucial functions of such a storage system is that of deciding the placement of data within the system. This data placement is dependent on demand pattern for the data. For instance, if a particular data item is very popular the storage system might want to host it on a disk with high bandwidth or make multiple copies of the item. The storage system needs to be capable of handling flash crowds [8]. During events triggered by such flash crowds, the demand distribution becomes highly skewed and different from the normal demand distribution. In this paper we consider a new approach to dealing with the problem of changes in the demand pattern. We ask the following question: In a given number of migration rounds, can we obtain a layout by making changes to the existing layout, so that the resulting layout will be the best possible layout that we can obtain within the specified number of rounds? Of course, such a layout is interesting only if it is significantly better than the existing layout for the new demand pattern. We approach the problem of finding a good layout that can be reached in a specified number of rounds by trying to find a sequence of layouts. Each layout in the sequence can be transformed to the next layout in the sequence by applying a small set of changes to the current layout. These changes are computed so that they can be applied within one round of migration (a disk may be involved in at most one transfer per round).


  • Similarity Searching in P2P databases (with Indrajit Bhattacharya and Srinivasan Parthasarathy). In Proceedings of the 25th IEEE ICDCS 2005.

      Abstract: We consider the problem of handling similarity queries in peer-to-peer databases. We propose an indexing and searching mechanism which, given a query object, returns the set of objects in the database that are semantically related to the query. We propose an indexing scheme which clusters data such that semantically related objects are partitioned into a small set of clusters, allowing for a simple and efficient similarity search strategy. Our indexing scheme also decouples object and node locations. Our adaptive replication and randomized lookup schemes exploit this feature and ensure that the number of copies of an object is proportional to its popularity and all replicas are equally likely to serve a given query, thus achieving perfect load balancing. The techniques developed in this work are oblivious to the underlying DHT topology and can be implemented on a variety of structured overlays such as CAN,CHORD, Pastry, and Tapestry. We also present DHT independent analytical guarantees for the performance of our algorithms in terms of search accuracy, cost, and loadbalance; the experimental results from our simulations confirm the insights derived from these analytical models.


  • Node Ranking in Labeled Directed Graphs (with Krishna Chitrapura). In Proceedings of the 13th ACM CIKM 2004.

      Abstract: Our work is motivated by the problem of ranking hyperlinked documents for a given query. Given an arbitrary directed graph with edge and node labels, we present a new flow-based model and an efficient method to dynamically rank the nodes of this graph with respect to any of the original labels. Ranking documents for a given query in a hyperlinked document set and ranking of authors/articles for a given topic in a citation database are some typical applications of our method. We outline the structural conditions that the graph must satisfy for our ranking to be different from the traditional PageRank.

      We have built a system using two indices that is capable of dynamically ranking documents for any given query. We validate our system and method using experiments on a few datasets: a crawl of the IBM Intranet (12 million pages), a crawl of the www (30 million pages) and the DBLP citation dataset. We compare our method to existing schemes for topic-biased ranking that require a classifier and the traditional PageRank. In these experiments, we demonstrate that our method is well suited for fine-grained ranking and that our method performs better than the existing schemes. We also demonstrate that our system can obtain an improved ranking with very little impact on query time.


  • Approximation algorithms for non-uniform size data placement on parallel disks. (with Samir Khuller). FSTTCS 2003. Journal of Algorithms 2006.

      Abstract: Consider a set of disks that are part of a storage system. Each disk has a storage capacity and a load capacity. The storage capacity limits the total size of items that can be stored on the disk. The load capacity limits the total amount of demand that can be routed to the disk. We are also given a set of items - each item has a size and a demand. The data placement problem is that of finding a placement of these items on the disks and an assignment of demand for each of the items stored on the disk such that the total (over all the disks) assigned demand is maximized.

      Previous work on this problem has assumed that all data items have unit size. We present the first successful attempt at removing the assumption that all items have same size. We develop a polynomial time approximation scheme (PTAS) for case where the maximum item size is a constant. For the case where we have two distinct item sizes, we develop an algorithm with a tight performance guarantee. Specifically, no algorithm can guarantee (for all distributions) to pack a larger fraction of items than our algorithm.



Samir Khuller (my advisor / mentor) is here.
CSD.
Nedstat Basic - Free web site statistics
Personal homepage website counter