PhD Proposal: Beyond Descriptive xAI: Cross-Domain Methods for Converting Interpretability into Model Improvement

Talk

Sai Yerramreddy

Time:

06.02.2026 14:00 to 15:30

Location:

University of Maryland Institute for Health Computing - Meeting Room 4 https://umd.zoom.us/j/3473881918

URL:

https://talks.cs.umd.edu/talks/4631

Machine learning models are capable of delivering high accuracy, but they often do so while obscuring the evidence behind their decisions. Practitioners, increasingly need more than accuracy alone: they need to understand what drives a prediction, whether the model will generalize, and how to correct problematic behavior. Explainable AI (xAI) seeks to provide this transparency, but explanations on their own rarely fix underlying flaws, leaving a gap between knowing why a model behaves a certain way and making it behave better. This thesis attempts to close that gap by treating explanations not as descriptive artifacts but as diagnostics and actionable training signals to forge more robust, faithful models. The approach is investigated across three domains: software static analysis, scientific imaging, and computational drug discovery.In the first part of this thesis, we study machine-learning-based triage of static analysis reports: the warnings produced by automated code-scanning tools, the majority of which are false positives that developers must manually filter. A large-scale empirical comparison across multiple analyzers and languages reveals two complementary patterns: classifiers with comparable overall accuracy disagree substantially on \emph{which} specific warnings they label true or false, and simple data-preparation choices can swing accuracy by as much as sixteen points. Both findings point to the same gap: aggregate accuracy tells us little about what a classifier is actually attending to. These findings raise a critical question: are these filters relying on the right evidence, or merely latching onto superficial code patterns? To answer this, we curate the first line-level, relevance-annotated dataset for static analysis and demonstrate that even high-accuracy filters frequently attend to irrelevant code. By adapting two relevance-guided training strategies: prediction consistency under masking and explanation-guided learning, we improve classification accuracy while also increasing the top-1 relevant-line hit rate from 6--35\% to 12--63\%.In the second part, we extend our work to scientific imaging, where post-hoc xAI often falters: networks fixate on dataset-source artifacts, saliency maps contradict domain experts, and attributions fail to capture the domain's native vocabulary. We argue for a representational rather than algorithmic solution. By converting scientific images into tabular features with direct physical meaning, inherently interpretable models like Explainable Boosting Machines (EBMs) become readily deployable, yielding explanations immediately legible to experts. Benchmarking eight model families on two scientific image datasets (dark solitons in Bose--Einstein condensates and quantum-dot triangle plots) and four external validation sets, we uncover a stable faithfulness--robustness trade-off governed by architecture. EBMs uniquely occupy the Pareto frontier, matching the classification accuracy of the best black-box ensembles. We further extend this benchmark to frontier large language models, showing that classification quality is bounded by the strength of the LLM's domain priors.Finally, in the last part, we propose to apply this framework to computational drug discovery, where shortcut learning during active learning can derail virtual screening campaigns. We will develop a neural-screening pipeline that pairs molecular docking with graph neural network surrogates, employing counterfactual trust checks and xAI-derived diagnostics to detect and correct reliance on shortcut learning. Together, these studies translate interpretability insights into measurable gains in robustness, stability, and practical utility. The methods contributed here: relevance-guided training, interpretable representational shifts, and xAI-driven trust diagnostics, generalize naturally to other high-stakes domains where accountability matters as much as accuracy. The unifying claim is that when explanations are treated as inputs to training rather than as post-hoc decoration, they cease to function solely as audit artifacts and instead become useful engineering tools.