Gradient Estimation with Stochastic Softmax Tricks
Gradient estimation is an important problem in modern machine learning frameworks that rely heavily on gradient-based optimization. For gradient estimation in the presence of discrete random variables, the Gumbel-based relaxed gradient estimators are easy to implement and low variance, but the goal of scaling them comprehensively to large combinatorial distributions is still outstanding. Working within the perturbation model framework, we introduce stochastic softmax tricks, which generalize the Gumbel-Softmax trick to combinatorial spaces. Our framework is a unified perspective on existing relaxed estimators for perturbation models, and it contains many novel relaxations. We design structured relaxations for subset selection, spanning trees, arborescences, and others. We consider an application to helping make machine learning models more explainable.
Bio: Chris Maddison is an Assistant Professor in the Department of Computer Science and the Department of Statistical Sciences at the University of Toronto. He is a CIFAR AI Chair at the Vector Institute, a research scientist at DeepMind, a member of the ELLIS Society, and a Faculty Affiliate of the Schwartz Reisman Institute for Technology and Society. Maddison works on the methodology of statistical machine learning, with an emphasis on methods that work at scale in deep learning applications. His research interests lie in the study of Bayesian inference, optimization, discrete search. Previously, he was a member at the Institute for Advanced Study in Princeton, NJ from 2019–2020, and he completed his DPhil at the University of Oxford. Maddison was an Open Philanthropy AI Fellow during his graduate studies. He received a NeurIPS Best Paper Award in 2014. He was a founding member of the AlphaGo project, which received the IJCAI Marvin Minsky Medal for Outstanding Achievements in AI in 2018.