Using Subset Log-Likelihoods to Trim Outliers in Gaussian Mixture Models
Speaker:
Paul McNicholas, McMaster University
Date and Time:
Tuesday, November 12, 2019 - 4:00pm to 4:30pm
Location:
Fields Institute, Stewart Library
Abstract:
Mixtures of Gaussian distributions are a popular choice in model-based clustering. Outliers can affect parameters estimation and, as such, must be accounted for. Algorithms such as TCLUST discern the most likely outliers, but only when the proportion of outlying points is known a priori. It is shown that, for a finite Gaussian mixture model, the log-likelihoods of the subset models are beta-distributed. An algorithm, called OCLUST, is then proposed that predicts the proportion of outliers by measuring the adherence of a set of subset log-likelihoods to a beta reference distribution. The OCLUST approach is compared to popular alternatives using simulated and real data.