Panning for gold: Interpretable and error-controlled hypothesis generation from biomedical data

Talk
Yang Lu
Cheriton School of Computer Science, University of Waterloo
Talk Series: 
Time: 
02.20.2024 11:00 to 12:00

Rapid developments in high-throughput sequencing have enabled biologists to collect large volumes of multi-omics data with unprecedented resolution. However, interpretation of such an increasing amount of heterogeneous biological data becomes highly nontrivial. In my talk, I will present a data-driven research paradigm to discover testable hypotheses directly from biological data in an interpretable and error-controlled fashion. In particular, the talk will mainly focus on three recent works that span the critical components to biomedical research: data collection, hypothesis generation, and hypothesis prioritization: (1) An interpretation method that generates testable biological hypotheses from deep learning models. Specifically, I developed an uncertainty-aware method to identify from single-cell RNA-seq data a combinatorial gene set signature to characterize the single-cell type. This method pioneers efforts to streamline existing single-cell analysis pipelines through a unified framework for easy interpretation. (2) A statistical method that subjects the hypotheses generated from deep learning models to error control, without relying on p-values. This method demonstrated to the community for the first time that the interpretation of deep learning models could achieve confidence guarantees. (3) A critical reevaluation of problematic statistical estimation of the Basic Alignment Search Tool (BLAST), a cornerstone tool used in daily biomedical analysis over the past 30 years.We have introduced an alternative method to address this issue, ensuring that it does not yield inflated estimates of significance. Our study has the potential to influence and reshape numerous conclusions drawn by researchers.