Barna Saha

We explore the approximability of several well-known coverage problems in streaming model, design new algorithms with approximation guarantee and show how these algorithms can be useful in searching blogs containing posts on a given list of topics.

We extend the minimum-cut tree based algorithm for clustering to handle dynamic graphs. We show how the clusters can be efficiently maintained in presence of edge insertion and deletion.

We improve the previous implementation of Cluster-Rank query on deterministic databases by providing new constant pass algorithm for clustering, which seamlessly integrate ranking as well. Our clustering algorithm allows data points to belong to more than one cluster ( but less than some specified number of clusters) and ranking of tuples is performed in parallel. We define new more relevant ranking function for probabilistic databases and show how to compute them efficiently, even when there is non-trivial dependency. We provide the first algorithm for Cluster-Rank Query in probabilistic Databases

Cluster Rank Query on Deterministic and Probabilistic Databases

(with Jian Li and Amol Deshpande, Fall 2007)

Space Complexity of Estimating Path Aggregates over Massive Graph Streams and Related Metrics

(with sumit ganguly, mtech thesis, 2005-06)

Covering Problems in Data Streams with Applications to Blog Search

( with Lise getoor, Fall 2007)

PUBLICATIONs:

 

1.``Ranking and Clustering in Probabilistic Databases'', with Jian Li and Amol Deshpande, submitted to VLDB, 2008. Techreport.

 

2.``On Estimating Path-Aggregates over Streaming Graphs'', with Sumit Ganguly, Proceedings of the 17th International Symposium on Algorithms and Computations (ISAAC) , 2006.

 

3.``Dynamic Algorithm for Graph Clustering using Minimum Cut Tree'', with Pabitra Mitra, Proceedings of the Seventh SIAM International Conference on Data Mining (SDM), 2007.

 

4.``Bi-directional Fuzzy Regression Model for Road-lines Detection'', with Arya Mazumdar and N R Pal,  Proceedings of IEEE International Conference on Engineering of Intelligent Systems (ICEIS)}, 2006.

Research Summary

Dynamic Clustering algorithm

(with pabitra mitra, summer 2006)

Undergraduate Project: Automatic Voice Recognition

Undergraduate Summer Project: Intelligent Traffic Monitoring System

Course Projects:

1. Clustering Data-Streams.

2. Clustering Web-Graph using Random Walk Techniques

3. GroupKey Aggrement Protocols'

4. Sparse Matrix Solver

5. Counting Classes

Other Projects

The \emph{path-aggregate}metric $P_k$ counts the number of pairs of vertices that have a simple path between them of length $k$. Estimating path-aggregates plays a significant role in understanding the changing behavior of network topology and has intrinsic relevance in query optimization. We present the first sub-linear space algorithms for estimating $P_2$ over undirected streaming graphs(multigraphs). We show space lower bound results for $P_2$ in streaming model and show that in this model, estimating $P_k$, for $k > 2$, is intractable in low space. We show that if edge deletions are allowed over the graph stream, then any deterministic algorithm for, (a) checking whether a symmetric graph is connected, (b) estimating the size of transitive closure to within $1 \pm \frac{1}{5}$ and (c) estimating diameter and distances, all require $\Omega(n^2)$ space. These problems can be easily solved  using $O(n log n)$ space in the insert-only streaming model, thus revealing an essential difference between the insert-only and updatable graph streaming models.