Robust Uncertain Data Management

The goal of this project is to develop a systematic framework to enable "robust" query processing in presence of data uncertainties, that arise naturally in a wide variety of application domains including social media analysis, scientific and biological data management, sensor data management, and text analytics. Data uncertainties may take the form of missing or incomplete data, inherent noise in the data, trust or reputation scores assigned to data based on their sources of origin, or confidences in predictions made using automated modeling tools. The input uncertainties naturally lead to uncertainties in the results of any queries or analysis performed on such data. To enable robust and systematic reasoning over such uncertain query results, we are designing algorithms and tools to identify the input uncertainties to which the query results are most sensitive, to decide how to use scarce resources like subject matter experts to resolve uncertainties in query results, and to incorporate user feedback to improve the robustness of the input uncertainty parameters themselves.

Project Participants

Publications

  • Approximation algorithms for stochastic submodular set cover with applications to Boolean function evaluation and min-knapsack;
    Amol Deshpande, Lisa Hellerstein, and Devorah Kletenik;
    ACM Transacations on Algorithms 12:3 (42), 2016.
  • Scenario Submodular Cover;
    Nathaniel Grammel, Lisa Hellerstein, Devorah Kletenik, and Patrick Lin;
    CoRR Technical Report arXiv:1603.03158 (extended abstract to appear in Proceedings of the Workshop on Approximation and Online Algorithms (WAOA), 2016).
  • Evaluation of monotone DNF Formulas;
    Sarah R. Allen, Lisa Hellerstein, Devorah Kletenik, and Tonguc Unluyurt;
    Algorithmica, 2016.
  • CrowdGather: Entity Extraction over Structured Domains;
    Theodoros Rekatsinas, Amol Deshpande, Aditya Parameswaran;
    CoRR Technical Report arXiv:1502.06823, 2015. [abstract]
  • Discrete Stochastic Submodular Maximization: Adaptive vs. Non-Adaptive vs. Offline;
    Lisa Hellerstein, Devorah Kletenik, and Patrick Lin;
    International Conference on Algorithms and Complexity (CIAC), 2015 (also CoRR Technical Report arXiv:1504.02146). [pdf]
  • Evaluation of DNF Formulas (on-line extended abstract);
    Sarah R. Allen, Lisa Hellerstein, Devorah Kletenik, and Tonguc Unluyurt;
    International Symposium on Artificial Intelligence and Mathematics (also CoRR Technical Report arXiv:1310.3673), 2014.
  • Boolean function evaluation over a sample;
    Lisa Hellerstein, Devorah Kletenik, and Patrick Lin;
    NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning (DISCML), 2014. [pdf]
  • Algorithms and structural results for DNF and set cover problems;
    Devorah Kletenik;
    PhD Dissertation, Polytechnic University, 2014. [pdf]
  • Approximation Algorithms for Stochastic Boolean Function Evaluation and Stochastic Submodular Set Cover;
    Amol Deshpande, Lisa Hellerstein, and Devorah Kletenik;
    SODA 2014 (also CoRR Technical Report arXiv:1303.0726). [pdf] [abstract]
  • Approximation algorithms for reducing classification costs in ensembles of classifiers;
    Sarah R. Allen, and Lisa Hellerstein;
    NIPS Workshop on Discrete and Combinatorial Problems in Machine Learning (DISCML), 2013. [pdf]
  • Maximizing Expected Utility for Stochastic Combinatorial Optimization Problems;
    Jian Li, and Amol Deshpande;
    FOCS 2011 (also CoRR Technical Report arXiv:1012.3189). [pdf] [abstract]
  • Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases;
    Bhargav Kanagal, Jian Li, Amol Deshpande;
    SIGMOD 2011. [pdf] [abstract]

Acknowledgments

This material is based upon work supported in part by the National Science Foundation under Grants 0916736, 0917153, 1217968, and 1218367. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.