Local Signal Adaptivity: Feature learning in Neural networks beyond kernels
Neural networks have been shown to significantly outperform kernel methods (including neural tangent kernels) in problems such as image classification. Most theoretical explanations of this performance gap focus on learning a complex or stylized hypothesis class. In this talk, I will demonstrate a related but simple hypothesis class inspired from natural images which explains this performance gap based on finding a sparse signal in the presence of noise. Specifically, we show that, for a simple data distribution with sparse signal amidst high-variance noise, a convolutional neural network trained using stochastic gradient descent learns to threshold out the noise and find the signal. On the other hand, the corresponding neural tangent kernel, with a fixed set of predetermined features, is unable to adapt to the signal in this manner. This is joint work with Stefani Karp, Ezra Winston and Yuanzhi Li.