Research

Data Science Collaboration

  • In Progress
  • 2020
  • Deepthi Raghunandan
  • Aayushi Roy

In this paper, we seek to understand how the leading data science tool, the computational notebook, meets real-world collaboration demands.

Interview study

Computational Notebooks

Collaboration

Code Code Evolution- Understanding Changes in Data Science Notebooks Over Time

Code Code Evolution- Understanding Changes in Data Science Notebooks Over Time

  • In Progress
  • 2019
  • Deepthi Raghunandan
  • Aayushi Roy
  • Shenzhi Shi
  • Niklas Elmqvist
  • Leilani Battle

Sensemaking is the iterative process of identifying, extracting, and explaining insights from data. Each iteration is referred to as the ``sensemaking loop’ by Pirolli and Card. Although recent work observes snapshots of the sensemaking loop within computational notebooks, none measure shifts in sensemaking behaviors over time. This gap limits our ability to understand the full scope of the sensemaking process and thus our ability to design tools to fully support sensemaking. We contribute the first quantitative method to characterize how sensemaking evolves within data science computational notebooks. To this end, we conducted a quantitative study of 60,000 Jupyter notebooks mined from GitHub. First, we identify data science-focused notebooks that have undergone significant iterations. Second, we present regression models that automatically characterize sensemaking activity within individual notebooks by assigning them a score representing their position within the sensemaking loop. Finally, we use our regression models to calculate and analyze shifts in notebook scores across GitHub versions. Our results show that notebook authors participate in a diverse range of sensemaking tasks over time, where some but not all of these behaviors align with the findings of prior work. Finally, we propose design recommendations for extending notebook environments to support the sensemaking behaviors we observed.

Mixed methods analysis

Computational Notebooks

GitHub

Diagnosing NASA Earth Science Models (ESM) using a live visualization environment—A novel approach

  • 2021
  • 2021
  • Deepthi Raghunandan
  • Carlos Cruz
  • Vanessa Valenti
  • Jules Kouatchou
  • K. Emma Knowland
  • Megan Rose Damon
  • Craig Pelissier

Software solutions that manage voluminous earth science model data need to handle the different needs of earth scientists as model output varies on temporal and spatial scales. Scientists often leverage human effort to manage changes in model data. Though reliable, these methods can require significant effort and are rarely scalable. To alleviate this, we developed a scalable visualization pipeline that automatically manages data streams for the NASA Goddard Earth Observing System (GEOS) Chemistry Climate Model (CCM). These visualizations are important for the exploration and validation of GEOSCCM data products. Visualizations allow scientists to quickly observe changes in the GEOSCCM simulations and validate against observations or previous benchmarks. However, GEOSCCM data are challenging to visualize due to the varying time scales, the use of different gridding schemes, the need for high resolution visualizations, and the large size of the datasets. We developed a custom web-based live environment to address these challenges. These tools, which are being hosted at the NASA Center for Climate Simulation (NCCS) computing facilities, allow scientists to engage with the latest model results automatically, through a dashboard, and without any external user intervention. Previous tools required manual and offline preparation of the model diagnostic visualizations and additional work to communicate the results. Our ‘live environment’ approach greatly reduces person-hours and empowers the scientists to quickly and efficiently diagnose model runs. This diagnostic dashboard makes visualizations easy to share and provides a level of interactivity which enables the comparison of visualizations from different model executions and times. Additionally, scientists can customize visualizations by specifying configurations. We will discuss our pipeline’s architectural design, the tools/libraries we leveraged, and lessons learned. We will also demonstrate the multiple ways in which we have provided access to the pipeline, and present visualization results generated for scientists working with NASA GEOSCCM.

Preventing Deforestation- Modeling and Prediction of Vulnerabilities in Forest Conservation

  • Finished
  • 2021
  • 2021
  • Deepthi Raghunandan
  • Saotarashmi Bandyopadhyay
  • Dhruva Sahrawat
  • John Dickerson

We predict attacks on tree cover, a green security asset, in sub-national regions of Indonesia using a boosted Decision Tree Classifier, the BoostIT algorithm. Our models are based on athorough literature survey which found that deforestation occurs in hotspots, and proximity to other anthropomorphic activity is the strongest predictor of deforestation in other subnational regions. Coarse-grained prediction of targets vulnerable to attacks is a significant challenge in Green Security Games for strategizing by defenders. We find that a boosted Decision Tree Classifier takes minimal resources to build, is accurate in its predictions, and is scalable for the sake of expanding on the assumptions made regarding the drivers of deforestation. We show that such an algorithm can empower communities to manage forest resources effectively.

Deforestation

AI

Lodestar

  • Finished
  • 2019
  • 2021
  • Deepthi Raghunandan
  • Zhe Cui
  • Kartik Krishnan
  • Segen Tirfe
  • Shenzhi Shi
  • Tejaswi Darshan Shrestha
  • Leilani Battle
  • Niklas Elmqvist

Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose Lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6,000 Jupyter notebooks. We evaluate Lodestar in a formative study guiding improvements to the tool and a summative study investigating them. We also showcase the system utility in three examples. Our results suggest that users find Lodestar useful for rapidly creating data science workflows.

Data Science

Machine Learning

Computational Notebooks

Data Visualization

Latent Variable Models for Understanding Transactional Behavior

  • Inactivate
  • Deepthi Raghunandan
  • Furong Huang

Developing an unsupervised learning algorithm to characterize personal behavior based on only financial transactions

Location Determination on Mobile Applications

  • Inactivate
  • Deepthi Raghunandan
  • Ashok Agrawala

Engineering real-time system with Android and JEE to construct/convey location using Wi-Fi information

Posts and Presentations

A Bifocal View

  • Deepthi Raghunandan
  • Ameya Patil

Recently, the Association of Computing Machinery (ACM) decided to make its historical SIGCHI video archive available online. The CHI conference, organized by SIGCHI, has been the main stage for breakthrough ideas and technology in the field of Human-Computer Interaction (HCI) and other fields such as virtual reality or visualization for decades. This archive includes videos of the CHI conference from 1983 to 2002 including demonstrations and talks that were recorded on magnetic tapes and were played via a VHS player. To bring these hidden treasures to light, Dr. Catherine Plaisant, of the University of Maryland’s iSchool, and Ms. Nat DeMenthon, a PhD student at the University of Maryland, took up the monumental task of digitizing the archive, painstakingly reviewing decades of conference proceedings, and obtaining copyright permissions from the authors. This effort is a part of the Historical CHI Video Project and the videos will be made available on the ACM Digital Library with the aim of spreading awareness of the work of HCI pioneers and to inspire future research.

Lodestar at Visual Data Science Workshop at KDD and InfoVis Conferences

  • Deepthi Raghunandan

Coding Projects

Westminster Website

Developed a simple static website for a loan management company.