Online Program Home
My Program

Abstract Details

Activity Number: 387 - Foundations of Data Science
Type: Invited
Date/Time: Tuesday, July 31, 2018 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract #326762 Presentation
Title: A Fast Algorithm with Minimax Optimal Guarantees for Topic Models with an Unknown Number of Topics
Author(s): Florentina Bunea*
Companies: Cornell Univeristy
Keywords: latent models ; topic models ; minimax bounds ; fast algorithms ; high dimensions

We study parameter estimation from data that consists of n independent observations from p-dimensional multinomial distributions with respective parameters N_i and P_i, for each observation i, under the popular topic model assumption. This model links all cell probabilities, collected in a p x n matrix P, via the assumption that P can be factorized as the product of two non-negative matrices A, of dimensions p x K, and W, of dimensions K x n. All three matrices are unknown, have non-negative entries that can be interpreted as conditional probabilities, and sum up to 1 column-wise. The factors are unique, up to column permutations, under slight variations of commonly used separability assumptions. In this work, K, the number of topics, is unknown, in contrast with all previous studies that assume it to be known a priori. We derive finite sample minimax lower bounds for the estimation of A, and develop a new and fast minimax adaptive algorithm. Our finite sample analysis is valid for any n, N_i, p and K. In particular, we allow the number of possible topics, K, to increase with n, a situation not handled well theoretically or practically in previous work.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program