I developed three visualization systems that exploit knowledge of how users visually explore datasets to implement context-aware database optimizations. ScalaR dynamically adjusts the size of query results output from a DBMS, based on the available screen space to render the results, avoiding rendering issues such as over- plotting. ForeCache uses predictive prefetching techniques to support large-scale 2D data browsing. Sculpin combines predictive pre-fetching, incremental pre-computation, and visualization-aware caching techniques to support interactive visualization of queries executed on large, multidimensional array data. We collaborated with scientists to design and evaluate these systems when exploring satellite sensor data from the NASA MODIS.
"How common is interactive visualization on the web?" "What is the most popular visualization design?" "How prevalent are pie charts really?" These questions intimate the role of interactive visualization in the real (online) world. In this project, we present our approach (and findings) to answering these questions. First, we introduce Beagle, which mines the web for SVG-based visualizations and automatically classifies them by type (i.e., bar, pie, etc.). With Beagle, we extract over 41,000 visualizations across five different tools and repositories, and classify them with 85% accuracy, across 24 visualization types. Given this visualization collection, we study usage across tools. We find that most visualizations fall under four types: bar charts, line charts, scatter charts, and geographic maps. Though controversial, pie charts are relatively rare for the visualization tools that were studied. Our findings also suggest that the total visualization types supported by a given tool could factor into its ease of use. However this effect appears to be mitigated by providing a variety of diverse expert visualization examples to users. By using a scalable and automated data collection process, Beagle can support a variety of data-driven visualization design techniques, where a large input corpus is used to train machine learning models and extract design heuristics for automated visualization of new and unfamiliar datasets.
As real-time monitoring and analysis become increasingly important, researchers and developers turn to data stream management systems (DSMS's) for fast, efficient ways to pose temporal queries over their datasets. However, these systems are inherently complex, and even database experts find it difficult to understand the behavior of DSMS queries. To help analysts better understand these temporal queries, we developed StreamTrace, an interactive visualization tool that breaks down how a temporal query processes a given dataset, step-by-step. The design of StreamTrace is based on input from expert DSMS users; we evaluated the system with a lab study of programmers who were new to streaming queries. Results from the study demonstrate that StreamTrace can help users to verify that queries behave as expected and to isolate the regions of a query that may be causing unexpected results. This project was completed during my internship at Microsoft Research in 2014, and was presented at the CHI 2016 conference.
ForeCache was connected to BigDAWG to support interactive exploration of medical patient data from the MIMIC II dataset, and was presented as part of the BigDAWG demo at the VLDB 2015 conference. BigDAWG is a federated database supporting multiple data models to power a variety of analysis and exploration use cases.