Abstract:
|
Due to rapid technological progress our ability to generate increasingly larger data sets from high resolution numerical models is outpacing our ability to store, manage and effectively access these vast volumes of data. Similar statements can be made with regard to observational data captured by a variety of advanced instruments. One potential solution to this Big Data dilemma is the use of compression. Lossless compression offers perfect reconstruction, but provides only limited compaction when confronted with floating point data. Lossy compression, however, is able to achieve substantial reduction, but by its very definition is unable to exactly reproduce original values. We will present results from our efforts to develop statistical metrics to assess when compression starts to affect scientific conclusions drawn from climate model data. We will further discuss ways to predict the optimal compression algorithm and level from features of the data to be compressed.
|