Analytic pipelines are impacted by and create errors in biological databases

Talk
Mihai Pop
Time: 
12.05.2025 11:00 to 12:00

Biological databases are rapidly growing in size, making it impossible for scientists to verify the data and correct errors. The impact of database errors on the conclusions of analytic workflows that rely on these databases is not currently well understood. Given the increase reliance in both biomedical research and clinical practice on computational analytics, it is important to develop a better understanding of how data and software interact. I will describe new results from my lab that demonstrate that some classifiers can be influenced by even small errors in the data, and that using computationally-inferred labels in databases can skew the classification output. These results underscore the need for deeper research into the interaction between software and data in biomedical applications.