Online Program Home
My Program

Abstract Details

Activity Number: 131
Type: Contributed
Date/Time: Monday, August 1, 2016 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #320082
Title: A Statistical Algorithm for Phantom Clustering Using PPCA
Author(s): Wei Q. Deng* and Radu V. Craiu
Companies: University of Toronto and University of Toronto
Keywords: Probabilistic Principal Component Analysis ; Penalized likelihood ; Cluster dimension ; Shrinkage ; Factor analysis

In this paper we extend the idea of spectral clustering using probabilistic PCA (PPCA) to cluster panel data. The key challenge is to determine the true number of clusters. A number of solutions assume a factor analysis model when both the observation and a factor matrix are observed and the loading matrix (W) estimated. However, when only the observations are available, a latent "phantom" random vector could be used to account for the clustering structure. Within the wide ranging assumption of small number of clusters relative to sample size, a penalized form of the PPCA is implemented to directly maximize the number of clusters (p). We show theoretically that the penalized MLE p0 is consistent for reasonable choices of the penalty parameter. This approach resembles the shrinkage estimation since the last N-p0 singular values of the estimated W are shrunk to zero. We demonstrate with data from Google Domestic Trend searches that search terms that are assigned to the same cluster are conceptually consistent, and a a visual inspection of the raw data overlaid confirms the shared similarity of trends over time.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association