Abstract:
|
Logistic regression is a widely used model for classification. Lots of recent classification problems involve large-scale data sets which has huge amount of predictors, which poses a big challenge for estimation and computation. Therefore, we follow the Bayesian approach to model the sparsity of logistic regression. Our model can return the posterior inclusion probabilities of the variables and the posterior probabilities of any submodels. To scale up for large data sets, we proposed a variational Bayesian algorithm. However, when the data sets are too large to be loaded into the computer memory, this VB algorithm is impractical, as it requires a full pass through the data in every iteration. Hence, we propose an online VB algorithm which uses the idea of Follow-the-Regularized-Leader algorithm. Numerical results show that: (1) The batch VB algorithm can be as accurate as LASSO in prediction, but with a more sparse model; (2) The prediction accuracy of the online VB algorithm is comparable to the batch VB and LASSO; and (3) The online algorithm has nearly the same log-loss as the FTRL-Proximal algorithm (McMahan et al., 2013).
|