Analyzing Programs in the Era of Software 2.0

Xin Zhang
Talk Series: 
02.27.2020 11:00 to 12:00

IRB 4105

With the software industry experiencing a major shift to machine learning, the programming systems community is facing both opportunities and challenges. On one hand, advances in machine learning provide new toolkits to build better programming systems to ensure software quality. On the other hand, as machine learning programs are increasingly being used in critical applications, it is now paramount to ensure their quality as well. In this talk, I will describe a set of new analysis techniques that address these opportunities and challenges.First, I will talk about a data-driven framework for improving program analyses. It enables both online and offline learning by incorporating probabilities in the representation, which is conventionally only logical. While the logical part still encodes the expert knowledge from the analysis designer and ensures correctness, the probabilistic part now offers new abilities to handle uncertainties. Our approach reduces the number of false positives by 70% for foundational program analyses like datarace detection and pointer analysis. In addition, our inference engine can solve problems containing up to 10^30 clauses from various domains including program analysis, statistical AI, and Big Data analytics. While existing program analyses work well with conventional programs, they cannot be applied to analyzing novel properties that arise in machine learning. To address this challenge, we have developed program analyses for emerging properties such as interpretability and fairness. Our interpretability analysis is the first that uses corrections as actionable feedback to judgments made by a neural network. And our fairness analysis can scale to models that are more than five orders of magnitude larger than the largest previously-verified model. To enable building machine learning programs that satisfy these properties by construction, we have also developed a probabilistic programming language that supports distributional inference and causal inference.