PhD Proposal: Understanding and Enriching the Algorithmic Reasoning Abilities in Neural Networks

Jingling Li
01.21.2022 13:00 to 15:00

IRB 5165

Learning to reason is an essential step to achieving general intelligence. As the reasoning process can be described as a step-by-step algorithmic procedure, understanding and enriching the algorithmic reasoning capabilities has drawn increasing attention in deep learning communities.To bridge algorithms and neural networks, we propose a framework, algorithmic alignment, to characterize how well a neural network's computation structure aligns with the algorithmic structure of the relevant reasoning process. Based on this framework, we studied three important questions across generalization, extrapolation, and robustness.First, why and when does a network structure generalize better than others, despite having equal expressive power? In studying this, we unify seemingly different reasoning tasks, such as intuitive physics, visual question answering, and shortest paths, via the lens of a powerful algorithmic paradigm, dynamic programming (DP). We then show that Graph Neural Network (GNNs)---a popular reasoning model---algorithmically align well with DP, which explains their empirical success on many reasoning tasks. At the same time, using this framework, we also suggest the limitations of GNNs and demonstrate how we can use algorithmic alignment to design a structured network given a particular task.Second, what does a neural network learn outside the support of the training distribution? To answer this, we identify conditions under which MLPs and GNNs extrapolate well. We first quantify the observation that ReLU MLPs quickly converge to linear functions along any direction from the origin, implying that ReLU MLPs do not extrapolate most nonlinear functions. Yet, we prove that MLPs can successfully extrapolate a linear target function given sufficient training support. These results suggest a hypothesis on GNNs for which we provide theoretical and empirical evidence: the success of GNNs in extrapolating algorithmic tasks to new data (e.g., larger graphs or edge weights) relies on encoding task-specific non-linearities in the architecture or features.Third, how does a neural network’s architecture impact its robustness to noisy labels? To figure this out, we use the proposed framework connecting a network's robustness to its architectural alignments with the target/noise functions. We hypothesize that a network is more robust to noisy labels if its architecture is more aligned with the target function than the noise, and we provide both theoretical and empirical evidence across various neural network architectures and different domains.The above works suggest a few interesting directions for our ongoing and future projects, including attempts to re-use the learned heuristics to help generalization in reinforcement learning and to study how to embed task-specific inductive bias in representation learning.Examining Committee:

Chair:Department Representative:Members:

Dr. John Dickerson Dr. Aravind SrinivasanDr. Tom GoldsteinDr. Petar Velickovic