Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 185 - Addressing Important Questions in Climate Science Using Advanced Statistical and Machine-Learning Approaches
Type: Topic Contributed
Date/Time: Monday, August 8, 2022 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics and the Environment
Abstract #323035
Title: Prediction of Optimal Compression Settings for Spatiotemporal Climate Data Sets: Benchmarking Statistical and Machine Learning Techniques
Author(s): Dorit Hammerling* and Alexander Pinard and Allison Baker
Companies: Colorado School of Mines and Colorado School of Mines and National Center for Atmospheric Research
Keywords: Data compression; Data Quality
Abstract:

Climate models are vital tools to inform scientists and society about future climate. However, simulations using these models produce such vast quantities of data that storage becomes a significant burden; a problem that is only expected to grow with slower improvements in data storage than computing infrastructure. As a result, trade-offs must be made between simulation length, resolution, ensemble size, and number of climate variables to track, limiting the amount of information obtained from a model run. Lossy compression is a viable approach to data reduction that trades perfect reconstruction of the data for a greatly reduced file size compared to traditional lossless compression. When using lossy compression, care must be taken to ensure the data is similar enough that any scientific conclusions drawn from the data are unaffected. This requires careful selection of both compression algorithm and compression settings, which may vary between variables and time slices. To that end, we introduce metrics to gauge data quality, as well as statistical and machine learning techniques to predict the optimal compression algorithm and settings for specific variables and time slices.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program