PhD Proposal: Linked Visual Data Summaries Framework for Rapid, Comprehensive, and Scalable Data Exploration

Talk
Mehmet Adil Yalcin
Time: 
04.29.2015 14:00 to 15:30
Location: 

AVW 4172

Data exploration can be modeled as a search problem in data for insights. Interactive visualization tools enable, and shape, the navigation in this search space with visual interfaces for technical or non-technical audiences. Tools offering low-level visual languages aim to support creative expressions with rich and flexible visual configurations.
However, every option for data transformation, visualization and interaction expands the search space within the tool. With more options, expressing goals becomes more demanding (the gulf of execution). Given a wide variety of possible configurations, many would miss visual clarity by inappropriate mappings (the gulf of evaluation), and visual scalability by clutter (the volume and variety of data). Maintaining consistency also becomes more challenging in large design spaces. Exploratory paths can be suggested, yet suggestions may not match our conceptual models, and iterating on the navigation would still depend on the underlying, and potentially complex, search space. Alternatively, we can design higher-level tools that structure our navigation for efficient insight seeking.
I propose a novel framework to enable rapid, comprehensible and scalable data exploration: linked visual data summaries. To build a well-structured, tightly controlled exploratory search space, summaries are used as the unit of data exploration. A summary is based on an existing or calculated data attribute, with types such as categorical (single and multi valued), numeric and time data. To create a scalable and comprehensible design that supports the overview-to-detail seeking, the attribute values are aggregated and visualized without overlaps in summaries. Browsers merge multiple data summaries and a record list. Interaction is tightly linked across all aggregates, offering a consistent and visual query interface. Mouse-over selection of an aggregate fluidly previews its characteristics, enabling rapid discoveries. To enrich exploration, the framework supports one-to-many comparison of aggregate selections, and analysis of relative distributions within aggregations.
The validation of the framework will include multiple means. In the preliminary work, I implemented the framework as a new data exploration tool called keshif, as well as a simple API to build data browsers. I conducted use studies throughout the design, including case studies, expert reviews, usability studies, and action logging. I imported over a hundred datasets from different domains to study the capabilities and generalizability of the framework. In the planned work, I will model and analyze insights given multivariate datasets as a qualitative and quantitative measure for the open-ended exploratory search process. User studies will be performed to understand how rapid and how deep data enthusiasts, not necessarily with visualization and design skills, can explore new data sources for insights in a limited time. Measuring insights will provide a higher-level alternative evaluation method (within and across tools) compared to limited exploratory analysis through pre-defined tasks and domain-specific case studies. To enable this evaluation, the planned work includes the design and development of visual interfaces to modify and create data summaries and browsers. Optionally, new techniques to manage (capture, organize, share, tell) insights within the framework can be developed based on the insight model.
Examining Committee:
Committee Chair: - Dr. Benjamin B. Bederson
Dept's Representative - Dr. Amitabh Varshney
Committee Member(s): - Dr. Niklas Elmquist
- Dr. Catherine Plaisant