Online Program Home
My Program

Abstract Details

Activity Number: 513 - Topics in Monte Carlo Simulation
Type: Contributed
Date/Time: Wednesday, July 31, 2019 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Computing
Abstract #304615 Presentation
Title: Efficient Sampling for Imbalanced Large Categorical Data Using Piece-Wise Deterministic Markov Chain Monte Carlo
Author(s): Deborshee Sen* and Matthias Sachs and David Dunson and Jianfeng Lu
Companies: Duke University and Duke University and Duke University and Duke University
Keywords: Control variates; Imbalanced data; Logistic regression; Scalable inference; Subsampling; Zig-zag sampler

High-dimensional data are routinely collected in many application areas. In this article, we are particularly interested in classification models where the predictors are imbalanced. This creates well-known difficulties in estimation. To tackle this, Bayesian approaches with appropriate priors are often used, with Markov chain Monte Carlo (MCMC) algorithms used for posterior computation. However, current MCMC algorithms can be inefficient as the size of the dataset increases due to worsening time per step. One promising strategy is to use a gradient-based sampler while relying on data subsamples to reduce the computational complexity per step. However, usual subsampling strategies break down when applied to imbalanced data. Instead, we propose to generalize recent piece-wise deterministic MCMC algorithms to include stratified and importance-weighted subsampling, and also propose a new subsampling algorithm based on sorting data points. These approaches maintain the correct stationary distribution with arbitrarily small subsamples and substantially outperforms current competitors. We provide theoretical support and illustrate gains in multiple simulated and real data applications.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program