Variable clustering with G-models: CORD vs PECOK

Speaker:

Flori Bunea, Cornell University

Date and Time:

Thursday, July 7, 2016 - 11:30am to 12:30pm

Location:

Fields Institute, Room 230

Abstract:

The problem of variable clustering is that of grouping similar components of a $p$ -dimensional vector $X=(X_{1},\ldots,X_{p})$ , and estimating these groups from $n$ independent copies of $X$ . When cluster similarity is defined via $G$ -latent models, in which groups of $X$ -variables have a common latent generator, and groups are relative to a partition $G$ of the index set $\{1, \ldots, p\}$ , the most natural clustering strategy is $K$ -means. We explain why this strategy cannot lead to perfect cluster recovery and offer a correction, based on semi-definite programing, that can be viewed as a penalized convex relaxation of $K$ -means (PECOK). We introduce a cluster separation measure tailored to $G$ -latent models, and derive its minimax lower bound for perfect cluster recovery. The clusters estimated by PECOK are shown to recover $G$ at a near minimax optimal cluster separation rate, a result that holds true even if $K$ , the number of clusters, is estimated adaptively from the data. We contrast this with cluster estimation via CORD, a method tailored to the more general class of $G$ -block covariance models. We also compare PECOK with appropriate corrections of spectral clustering-type procedures, and show that the former outperforms the latter for perfect cluster recovery of minimally separated clusters.

Fields Live Video

The Fields Institute for
Research in Mathematical Sciences

Variable clustering with G-models: CORD vs PECOK

Scheduled as part of

People and Contacts

Calendar and Events