Name: 2019 Joint Statistical Meetings
Start: 2019-07-27T07:00:00+00:00
End: 2019-08-01
Location: Colorado Convention Center

Abstract Details

Activity Number:	600 - Less Can Be More: Smart Sampling in Data and Engineering Sciences
Type:	Topic Contributed
Date/Time:	Thursday, August 1, 2019 : 8:30 AM to 10:20 AM
Sponsor:	Section on Physical and Engineering Sciences
Abstract #304905	Presentation
Title:	Choosing the Best Partition for the Output from a Large-Scale Simulation
Author(s):	Emily Casleton* and Chelsea Challacombe and Jonathan Woodring
Companies:	Los Alamos National Laboratory and University of California-San Diego and Los Alamos National Laboratory
Keywords:
Abstract:	Advances in computing (processors) have outpaced advances in data storage and network bandwidth. Computational scientists are able to perform large-scale, high-resolution simulations in both space and time, but yet, cannot examine all of the generated data at once and interactive visualization and queries are prohibitively slow. Data partitioning, or sub-selection, becomes necessary to reduce the data size. Our task is to partition the data so that every element (point, cell, row, etc.) of the raw data belongs to one and only one partition. We then store summary information about each partition, rather than the raw data itself, reducing its size: such as a representative value plus error or a distribution, and most importantly, preserving the interesting data characteristics. When creating the partitions, there are many decisions that must be made. We present a metric for evaluating data partitioning quality. It is inspired by model comparison techniques and was created to balance the tradeoffs between raw data reproducibility, accuracy, and storage costs. We explore and evaluate the metric’s performance on partitioning data from real world, large scale simulations.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program

JSM 2019 Online Program

Abstract Details

American Statistical Association