shopify analytics tool

Publications( DBLP, Google Scholar )

  • Answering Complex Queries on Large Dynamic Graphs
    Jayanta Mondal, Amol Deshpande;
    Under Submission

  • A Coarse-grained Partitioning Advisor for OLTP Workloads
    Jayanta Mondal, Sudipto Das;
    Under preparation

  • GSTREAM: Towards a Unified Streaming Graph Data Management System
    Jayanta Mondal, Amol Deshpande;
    Under Preparation

  • The Dropper Effect: Insights into Malware Distribution with Downloader Graph Analytics
    BumJun Kwon, Jayanta Mondal, Jiyong Jang, Leyla Bilge, Tudor Dumitras;
    To appear in ACM SIGSAC (CCS), 2015 [pdf][abstract][blog]

    Malware remains an important security threat, as miscreants continue to deliver a variety of malicious programs to hosts around the world. At the heart of all the malware delivery techniques are executable files (known as downloader trojans or droppers) that download other malware. Because the act of downloading software components from the Internet is not inherently malicious, benign and malicious downloaders are difficult to distinguish based only on their content and behavior. In this paper, we introduce the downloader-graph abstraction, which captures the download activity on end hosts, and we explore the growth patterns of benign and malicious graphs. Downloader graphs have the potential of exposing large parts of the malware download activity, which may otherwise remain undetected. By combining telemetry from anti-virus and intrusion-prevention systems, we reconstruct and analyze 19 million graphs from 5 million real hosts. We identify several strong indicators of malicious activity, such as the growth rate, the diameter and the Internet access patterns of downloader graphs. Building on these insights, we implement and evaluate a machine learning system for malware detection. Our system achieves a 96.0% true-positive rate, with a 1.0% false-positive rate, and is able to detect malware an average of 9.24 days earlier than leading anti-virus products. We also perform an external validation of our system by examining some of the unlabeled samples predicted to be malicious and we find that 41.41% are blocked by anti-virus products.

  • EAGr: Supporting Continuous Ego-centric Aggregate Queries over Large Dynamic Graphs;
    Jayanta Mondal, Amol Deshpande;
    Appeared in SIGMOD 2014 [pdf][abstract][arXiv]

    In this paper, we present EAGr, a system for supporting large numbers of continuous neighborhood-based ("ego-centric") aggregate queries over large, highly dynamic, rapidly evolving graphs. Examples of such queries include computation of personalized, tailored trends in social networks, anomaly or event detection in communication or financial transaction networks, local search and alerts in spatio-temporal networks, to name a few. Key challenges in supporting such continuous queries include very high update rates typically seen in these situations, large numbers of queries that need to be executed simultaneously, stringent low latency requirements. In this paper, we propose a flexible, general, extensible in-memory framework for executing different types of ego-centric aggregate queries over large dynamic graphs with low latencies. Our framework is built around the notion of an aggregation overlay graph, a pre-compiled data structure that encodes the computations to be performed when an update or a query is received. The overlay graph enables sharing of partial aggregates across different ego centric queries (corresponding to different nodes in the graph), also allows partial pre-computation of the aggregates to minimize the query latencies. We present several highly scalable techniques for constructing an overlay graph given an aggregation function, also design incremental algorithms for handling changes to the structure of the underlying graph itself, that may result in significant changes to the neighborhoods on which queries are posed. We also present an optimal, polynomial-time algorithm for making the pre-computation decisions given an overlay graph. Although our approach is naturally parallelizable, we focus on a single-machine deployment in this paper and show that our techniques can easily handle graphs of size up to 320 million nodes and edges, achieve update and query throughputs of over 500,000/s using a single, powerful machine.

  • Stream Querying and Reasoning on Social Data;
    Jayanta Mondal, Amol Deshpande;
    Appeared in Encyclopedia of Social Network Analysis and Mining, Springer, 2014 [abstract]

  • Managing Large Dynamic Graphs Efficiently;
    Jayanta Mondal, Amol Deshpande;
    Appeared in SIGMOD 2012. [pdf] [abstract]