Online Program Home
  My Program

Abstract Details

Activity Number: 314 - Recent Advances in High-Dimensional Inferences
Type: Invited
Date/Time: Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
Sponsor: IMS
Abstract #325050
Title: Overlapping clustering with LOVE
Author(s): Florentina Bunea*
Companies: Cornell
Keywords: Clustering ; Overlap ; Latent model ; Identifiability ; Support recovery ; High dimension

The area of overlapping variable clustering, with statistical guarantees, is largely unexplored. We propose a novel Latent model-based OVErlapping clustering method (LOVE) to recover overlapping sub-groups of a p dimensional vector X from a sample of size n on X, with p allowed to be larger than n. In our model-based formulation, a cluster is given by variables associated with the same latent factor. Clusters are anchored by a few components of X that are respectively associated with only one latent factor, while the majority of the X-components may have multi-factor association. We prove that, under minimal conditions, these clusters are identifiable, up to label switching. LOVE estimates first the set of variables that anchor each cluster, and also estimates the number of clusters. In a second step, clusters are populated with variables with multiple associations. Under minimal signal strength conditions, LOVE recovers the population level overlapping clusters consistently. The practical relevance of LOVE is illustrated through the analysis of a RNA-seq data set, devoted to determining the functional annotation of genes with unknown function.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association