Abstract:
|
Climate models are vital tools to inform scientists and society about future climate. However, simulations using these models produce such vast quantities of data that storage becomes a significant burden; a problem that is only expected to grow with slower improvements in data storage than computing infrastructure. As a result, trade-offs must be made between simulation length, resolution, ensemble size, and number of climate variables to track, limiting the amount of information obtained from a model run. Lossy compression is a viable approach to data reduction that trades perfect reconstruction of the data for a greatly reduced file size compared to traditional lossless compression. When using lossy compression, care must be taken to ensure the data is similar enough that any scientific conclusions drawn from the data are unaffected. This requires careful selection of both compression algorithm and compression settings, which may vary between variables and time slices. To that end, we introduce metrics to gauge data quality, as well as statistical and machine learning techniques to predict the optimal compression algorithm and settings for specific variables and time slices.
|