Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 347 - Recent Advances in Clustering and Mixture Models Analysis
Type: Topic-Contributed
Date/Time: Thursday, August 12, 2021 : 10:00 AM to 11:50 AM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #317070
Title: Sparse Topic Modeling: Computational Efficiency and Near-Optimal Algorithms
Author(s): Ruijia Wu* and Linjun Zhang and Tony Cai
Companies: Department of Statistics, University of Pennsylvania and Rutgers University and University of Pennsylvania
Keywords: Topic modeling; Matrix factorization; High-dimensional statistics; Estimation
Abstract:

Sparse topic modeling under the probabilistic latent semantic indexing (pLSI) model is studied. Novel and computationally fast algorithms for estimation of both the word-topic matrix and the topic-document matrix are proposed and their theoretical properties are investigated. Our algorithm of word-topic matrix first finds anchor words and then solves for the matrix. We also treat the recovery of the topic-document matrix as a multinomial regression problem with non-negativity and column sum constraints. Both minimax upper and lower bounds are established and the results show that the proposed algorithms are rate optimal, up to a logarithmic factor. The simulation results show that the proposed algorithms perform well numerically and are more accurate in a range of simulation settings comparing to the existing literature.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program