Online Program Home
My Program

Abstract Details

Activity Number: 549 - Optimal Designs for Modeling Asymmetries in Big Data
Type: Invited
Date/Time: Wednesday, July 31, 2019 : 2:00 PM to 3:50 PM
Sponsor: WNAR
Abstract #300402 Presentation
Title: Subdata Selection Methods
Author(s): John Stufken*
Companies: Arizona State University
Keywords: big data; subdata; sampling; deterministic; inference; prediction

The size of big data can cause challenges for even the simplest explorations of the data. Such challenges can, for example, be related to storage of the data or to computations of even the simplest statistics. One method to deal with the challenges is based on selecting a much smaller subdata set from the original full data set. The exploration or analysis would proceed with the subdata. The subdata can be selected by a sampling strategy or by a deterministic method based on a specified criterion. Whatever method of subdata selection is used, it is important that it is computationally feasible and efficient. It is also important that inferences or predictions based on the subdata set are comparable to those that would have been obtained by using the full data set. Ideally, this is true with as few assumptions as possible about the full data. After a brief discussion of different subdata selection methods, we will focus on their strengths and, especially, their weaknesses.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program