Activity Number:
|
75
- Invited EPoster Session II
|
Type:
|
Invited
|
Date/Time:
|
Sunday, August 7, 2022 : 9:35 PM to 10:30 PM
|
Sponsor:
|
Section on Statistical Learning and Data Science
|
Abstract #323249
|
|
Title:
|
Balanced Subsampling for Big Data with Categorical Predictors
|
Author(s):
|
Lin Wang*
|
Companies:
|
George Washington University
|
Keywords:
|
Big data;
Data reduction;
random sampling;
Robust prediction;
Experimental design;
causal effects
|
Abstract:
|
The dramatic growth of big datasets presents a new challenge to data storage and analysis. Data reduction, or subsampling, that extracts useful information from datasets is a crucial step in big data analysis. I will introduce a balanced subsampling approach for big data with categorical predictors. The merits of the proposed approach are two-fold: (i) it is easy to implement and fast; (ii) the selected subsample allows robust effect estimation and prediction. Theoretical results and extensive numerical results show that the proposed approach is superior to simple random subsampling. The advantages of the balanced subsampling approach are also illustrated through the analysis of real-life examples.
|
Authors who are presenting talks have a * after their name.