Variable Selection for Model-Based Clustering of Functional Data
In studying the health effects of radiation, identifying subpopulations with densely sampled functional data are important for detecting late effects of radiation treatment. However, extraneous variables can mask the true group structure. Using a variable selection technique is particularly important when a large number of variables must be considered. Little work on variable selection methods for model-based clustering has been applied to functional data. We propose a greedy search algorithm to integrate variable selection into the clustering procedure, as in Raftery and Dean (2006), but adapted for use with functional data. At each step in our method, two models are compared using the Akaike information criterion (AIC) corrected for small samples. One difficulty in implementing this approach is the lack of software available for constructing multivariate fully functional linear models of functional data represented by splines. We avoid this obstacle by creating a full model using a series of univariate partial functional regression linear models. Our new method successfully identifies the most important variables for clustering in a simulation study. We then apply the method to a study examining the respiratory functions of irradiated and non-irradiated mice.
By: Kyra Singh, Tanzy Love*, Eric Hernandy, Jacob Finkelstein, and Jacqueline Williams.