Online Program Home
My Program

Abstract Details

Activity Number: 395 - Statistical Models for High-Dimensional Computer Output
Type: Topic Contributed
Date/Time: Tuesday, July 31, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics and the Environment
Abstract #330887 Presentation
Title: Compressing Scientific Data: Reducing Storage While Preserving Information
Author(s): Dorit Hammerling* and Joseph Guinness and Allison Baker
Companies: National Center for Atmospheric Research and NC State University and National Center for Atmospheric Research
Keywords: Compression; Climate modeling; Machine learning; Prediction; Quality Metrics

Due to rapid technological progress our ability to generate increasingly larger data sets from high resolution numerical models is outpacing our ability to store, manage and effectively access these vast volumes of data. Similar statements can be made with regard to observational data captured by a variety of advanced instruments. One potential solution to this Big Data dilemma is the use of compression. Lossless compression offers perfect reconstruction, but provides only limited compaction when confronted with floating point data. Lossy compression, however, is able to achieve substantial reduction, but by its very definition is unable to exactly reproduce original values. We will present results from our efforts to develop statistical metrics to assess when compression starts to affect scientific conclusions drawn from climate model data. We will further discuss ways to predict the optimal compression algorithm and level from features of the data to be compressed.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program