Why do neural networks learn?
Neural networks used in practice have millions of parameters and yet they generalize well even when they are trained on small datasets. While there exists networks with zero training error and a large test error, the optimization algorithms used in practice magically find the networks that generalizes well to test data. How can we characterize such networks? What are the properties of networks that generalize well? How do these properties ensure generalization?
In this talk, we will develop techniques to understand generalization in neural networks. Towards the end, I will show how this understanding can help us design architectures and optimization algorithms with better generalization performance.