Scalable semiparametric inference for the means of heavy-tailed distributions
Heavy tailed distributions present a tough setting for inference. They are also common in industrial applications, particularly with Internet transaction datasets, and machine learners often analyze such data without considering the biases and risks associated with the misuse of standard tools. This paper outlines a procedure for inference about the mean of a (possibly conditional) heavy tailed distribution that combines nonparametric analysis for the bulk of the support with Bayesian parametric modeling -- motivated from extreme value theory -- for the heavy tail. The procedure is fast and massively scalable. The resulting point estimators attain lowest-possible error rates and, unique among alternatives, we are able to provide accurate uncertainty quantification for these estimators. The work should find application in settings wherever correct inference is important and reward tails are heavy; we illustrate the framework in causal inference for A/B experiments involving data from hundreds of millions of users of eBay.com.