PhD Proposal: PhD Preliminary: Understanding and Enhancing Machine Learning Models with Theoretical Foundations

Talk
Zhengmian Hu
Time: 
02.02.2024 10:00 to 12:00
Location: 

IRB IRB-5105

I will present a study on neural network kernel distributions, for understanding their performance and scalability. Specifically, I investigate the distributions of Conjugate Kernel (CK) and Neural Tangent Kernel (NTK) for ReLU networks that are subjected to random initialization. Through rigorous analysis, I have derived precise distributions of the diagonal elements of these kernels. For a feedforward network, these values converge in law to a log-normal distribution when the network depth d and width n simultaneously tend to infinity and the variance of log diagonal elements is proportional to d/n. For the residual network, in the limit that number of branches m increases to infinity and the width n remains fixed, the diagonal elements of Conjugate Kernel converge in law to a log-normal distribution where the variance of log value is proportional to 1/n, and the diagonal elements of NTK converge in law to a log-normal distributed variable times the conjugate kernel of one feedforward network. These novel theoretical findings suggest that residual networks can remain trainable even in the limit of infinite branching and constant network width. The numerical experiments are conducted and all results validate the soundness of our theoretical analysis.