Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 15 - Subsampling: Basic Tool That Facilitates the Identification of Statistical Relationships in Big Data
Type: Topic Contributed
Date/Time: Sunday, August 7, 2022 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #322735
Title: Subdata Selection Methods
Author(s): John Stufken*
Companies: UNC Greensboro
Keywords: Subsampling; Information; Leverage; Optimal Design; Random Forests; Big Data
Abstract:

Whether due to the size of datasets or computational challenges, there is a vast amount of literature on using only some of the data (subdata) for estimation or prediction. This raises the question how subdata should be selected from the entire dataset (full data). One possibility is to select the subdata completely at random from the full data, but this is typically not the best method. The literature contains various suggestions for better alternatives. After introducing some of these alternatives, we will discuss extensions, weaknesses, and new directions.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program