Comparing EM to a greedy search algorithm to optimize ICL for mixture models
The integrated complete-data likelihood (ICL) is a popular criterion in model-based clustering for choosing the number of clusters of a finite mixture model. Typically, the ICL is computed using a BIC-like approximation, which depends on maximum likelihood estimates that are found using the expectation-maximisation (EM) algorithm. An alternative method for clustering with the ICL calculates the exact ICL in closed form within a Bayesian framework. A greedy search (GS) algorithm is then used to allocate observations to clusters in order to maximise the ICL directly and hence obtain an optimal clustering solution. This approach can be used to simultaneously searche the model space while clustering the data. To better understand the properties of the GS method, we conducted an extensive simulation study comparing its performance to the standard EM approach, in terms of number of clusters selected, cluster accuracy, and computational cost. The performance of the methods on real data is also discussed.