On Gradient-Based Optimization: Accelerated, Stochastic and Nonconvex
Optimization methods play a key enabling role in statistical inference, both frequentist and Bayesian. Moreover, as statistics begins to more fully embrace computation, what is often meant by "computation" is in fact "optimization". I will discuss some recent progress in high-dimensional, large-scale optimization, where new theory and algorithms have provided non-asymptotic rates, sharp dimension dependence, elegant ties to geometry and practical relevance. In particular, I discuss several recent results: (1) a new framework for understanding Nesterov acceleration, obtained by taking a continuous-time, Lagrangian/Hamiltonian/symplectic perspective, (2) a discussion of how to escape saddle points efficiently in nonconvex optimization, and (3) the acceleration of Langevin diffusion.