Minimum weighted norm interpolation and Fourier scattering
Many modern data science methods contain more parameters than number of samples. Over-parameterization enables an algorithm to directly fit training data, and challenges conventional wisdom which instead advocates for under-parameterization. In the first part of this talk, we consider a concrete setting where $n$ training points are interpolated by a function that belongs to a $N$ dimensional Hilbert space, chosen as a minimizer of a weighted norm. We relate this method to kernel interpolation and show that both the interpolating functions and their generalization errors converge as $N$ tends to infinity. This rigorously establishes the inductive bias of weighted norm interpolation and shows that this method can succeed even for extreme over-parameterization. In the second part of this talk, we discuss how weighted norm minimization relates to scattering transforms, which are examples of highly over-parameterized methods that also enjoy good generalization on a variety of datasets.