Online Program Home
  My Program

Abstract Details

Activity Number: 24 - Statistical Computing and Graphics Student Awards
Type: Topic Contributed
Date/Time: Sunday, July 30, 2017 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #323288
Title: Scalable Bayesian Learning for Sparse Logistic Models
Author(s): Xichen Huang* and Feng Liang
Companies: and University of Illinois at Urbana Champaign
Keywords: online algorithm ; logistic regression ; Bayesian variable selection ; variational approximation ; spike-and-slab prior

Logistic regression is a widely used model for classification. Lots of recent classification problems involve large-scale data sets which has huge amount of predictors, which poses a big challenge for estimation and computation. Therefore, we follow the Bayesian approach to model the sparsity of logistic regression. Our model can return the posterior inclusion probabilities of the variables and the posterior probabilities of any submodels. To scale up for large data sets, we proposed a variational Bayesian algorithm. However, when the data sets are too large to be loaded into the computer memory, this VB algorithm is impractical, as it requires a full pass through the data in every iteration. Hence, we propose an online VB algorithm which uses the idea of Follow-the-Regularized-Leader algorithm. Numerical results show that: (1) The batch VB algorithm can be as accurate as LASSO in prediction, but with a more sparse model; (2) The prediction accuracy of the online VB algorithm is comparable to the batch VB and LASSO; and (3) The online algorithm has nearly the same log-loss as the FTRL-Proximal algorithm (McMahan et al., 2013).

Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

Copyright © American Statistical Association