Declarative Graph Analytics and Querying over Very Large, Dynamic Information Networks

Over the last decade, information networks have become ubiquitous and widespread. These include social networks, communication networks, financial transaction networks, citation networks, gene regulatory networks, disease transmission networks, ecological food networks, sensor networks, social contact graphs, and many more. Network data arises even in mundane applications like phone call data, IP traffic data, or parcel shipment data. Social contact graphs are expected to be available for analysis in near future, and can potentially be used to gain insights into various social phenomena as well as in disease outbreak and prevention. There is a growing need for data management systems that can support real-time ingest, storage, querying, and complex analytics over such network data. Network data is most naturally represented as a graph, with nodes representing the entities and edges denoting the interactions between them. However, despite much work on graph querying algorithms and graph programming frameworks in recent years, there is still a lack of established data management systems that provide declarative frameworks for querying and analyzing such graph-structured data, especially very large volumes of heterogeneous, complex-structured, and rapidly changing data. The raw observational network data is also often noisy and needs to cleaned and annotated through use of statistical models before querying and analysis. The increasing availability of historical traces of time-evolving graphs has also opened up opportunities in temporal evolutionary analysis as well as in data mining and comparative analytics over historical information. Similarly there is increasing interest in continuous query processing and real-time analytics, especially anomaly or event detection, on streaming graph data. Further, the graph sizes and the number of operations that need to be supported are growing at an unprecedented pace, necessitating use of parallel and distributed solutions, both for efficiency and for better fault-tolerance; however, graph operations are notoriously hard to parallelize.

In this project, we are building a graph data management system and a suite of tools aimed at supporting real-time and historical querying and analytics over very large, dynamic, heterogeneous, and noisy graphs. Some of the key components of our overall system include:
  • Declarative graph transformation framework (GDM 2011, SIGMOD Demo 2013): We are developing a Datalog-based framework for declaratively specifying iterative complex graph transformation tasks. Examples of such tasks include a variety of graph cleaning tasks like entity resolution, link prediction, and collective classification.
  • Temporal and Historical Analytics (ICDE 2013, SIGMOD Demo 2013, EDBT 2016): We are developing tools for managing and querying very large historical network traces. We are also building a toolkit that supports temporal analytics over such traces.
  • Real-time query processing (SIGMOD 2012, SIGMOD 2014, DEBS 2016): We are working on developing techniques for efficiently supporting stream reasoning and querying tasks that can handle highly dynamic information networks. We are also designing a Datalog-based query language for specifying various types of continuous queries and analytics (e.g., anomaly detection, trend identification, etc.) over such dynamic networks.
  • Distributed programming framework (NDA@SIGMOD 2016, VLDB Journal 2016): We are developing a distributed programming framework, called NScale, to support a large variety of analytics on very large graphs in a distributed fashion. Our framework generalizes previous distributed programming frameworks like Pregel and Giraph.

Project Participants

Publications (Chronologically Ordered)

  • GraphGen: Adaptive Graph Extraction and Analytics over Relational Databases;
    Konstantinos Xirogiannopoulos, Virinchi Srinivas, and Amol Deshpande;
    GRADES SIGMOD Workshop 2017. [pdf]
  • Extracting and Analyzing Hidden Graphs from Relational Databases;
    Konstantinos Xirogiannopoulos, and Amol Deshpande;
    SIGMOD 2017. [abstract]
  • Parallel SPARQL Query Optimization;
    Buwen Wu, Yongluan Zhou, Hai Jin and Amol Deshpande;
    ICDE 2017.
  • Continuous Detection of Activity-based Subgraph Patterns on Dynamic Graphs;
    Jayanta Mondal and Amol Deshpande;
    DEBS 2016. [abstract]
  • NScaleSpark: Subgraph-centric Graph Analytics on Apache Spark;
    Abdul Quamar and Amol Deshpande;
    NDA 2016 (International Workshop on Network Data Analytics, co-located with SIGMOD 2016). [abstract]
  • Storing and Analyzing Historical Graph Data at Scale;
    Udayan Khurana, and Amol Deshpande;
    EDBT 2016 (also CoRR Technical Report arXiv:1509.08960). [abstract]
  • NScale: Neighborhood-centric Large-Scale Graph Analytics in the Cloud;
    Abdul Quamar, Amol Deshpande, Jimmy Lin;
    To appear in VLDB Journal (also CoRR Technical Report arXiv:1405.1499). [pdf] [abstract]
  • GraphGen: Exploring Interesting Graphs in Relational Data (Demonstration Proposal);
    Konstantinos Xirogiannopoulos, Udayan Khurana, and Amol Deshpande;
    VLDB 2015. [abstract]
  • VERTEXICA: Your Relational Friend for Graph Analytics!;
    Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, Mike Stonebraker;
    VLDB Demo 2014. [pdf] [abstract]
  • NScale: Neighborhood-centric Analytics on Large Graphs;
    Abdul Quamar, Amol Deshpande, Jimmy Lin;
    VLDB Demo 2014. [pdf] [abstract]
  • EAGr: Supporting Continuous Ego-centric Aggregate Queries over Large Dynamic Graphs;
    Jayanta Mondal and Amol Deshpande;
    SIGMOD 2014 (also CoRR Technical Report arXiv:1404.6570). [pdf] [abstract]
  • Subgraph Pattern Matching over Uncertain Graphs with Identity Linkage Uncertainty;
    Walaa Eldin Moustafa, Angelika Kimmig, Amol Deshpande, Lise Getoor;
    ICDE 2014 (also CoRR Technical Report arXiv:1305.7006). [pdf] [abstract]
  • Stream Querying and Reasoning on Social Data;
    Jayanta Mondal and Amol Deshpande;
    Chapter in Encyclopedia of Social Network Analysis and Mining, Springer, 2014. [pdf] [abstract]
  • GrDB: A System for Declarative and Interactive Analysis of Noisy Information Networks;
    Walaa Eldin Moustafa, Hui Miao, Amol Deshpande, Lise Getoor;
    SIGMOD Demo 2013. [abstract]
  • HiNGE: Enabling Temporal Network Analytics at Scale;
    Udayan Khurana, and Amol Deshpande;
    SIGMOD Demo 2013. [abstract]
  • Efficient Snapshot Retrieval over Historical Graph Data;
    Udayan Khurana, and Amol Deshpande;
    ICDE 2013 (also CoRR Technical Report arXiv:1207.5777). [pdf] [abstract]
  • Managing Large Dynamic Graphs Efficiently;
    Jayanta Mondal, Amol Deshpande;
    SIGMOD 2012. [pdf] [abstract]
  • Ego-centric Graph Pattern Census;
    Walaa Eldin Moustafa, Amol Deshpande, Lise Getoor;
    ICDE 2012. [abstract]
  • Declarative Analysis of Noisy Information Networks;
    Walaa Eldin Moustafa, Galileo Namata, Amol Deshpande, Lise Getoor;
    ICDE Workshop on Graph Data Management: Techniques and Applications (GDM 2011). [abstract]

Acknowledgments

This material is based upon work supported in part by the National Science Foundation under Grants 0916736, and 1319432. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.