Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 75 - Invited EPoster Session II
Type: Invited
Date/Time: Sunday, August 7, 2022 : 9:35 PM to 10:30 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323249
Title: Balanced Subsampling for Big Data with Categorical Predictors
Author(s): Lin Wang*
Companies: George Washington University
Keywords: Big data; Data reduction; random sampling; Robust prediction; Experimental design; causal effects
Abstract:

The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. I will introduce a balanced subsampling approach for big data with categorical predictors. The merits of the proposed approach are two-fold: (i) it is easy to implement and fast; (ii) the selected subsample allows robust effect estimation and prediction. Theoretical results and extensive numerical results show that the proposed approach is superior to simple random subsampling. The advantages of the balanced subsampling approach are also illustrated through the analysis of real-life examples.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program