![IconGems-Print](images/IconGems-Print.png)
157 – Compressing Climate Model Data: Lowering Storage Burden While Preserving Information
Error Distributions of Lossy Floating-point Compressors
Peter Lindstrom
Lawrence Livermore National Laboratory
With ever increasing volumes of data being generated in scientific simulations, experiments, and observations, storage and bandwidth concerns are mounting. As a means of data reduction, lossless data compression is largely ineffective when applied to such floating-point data, and consequently much recent work has focused on lossy compression methods that only approximately reconstruct the data by allowing for small errors. When such approximated data sets are used in data analysis, it is important to understand how errors due to compression are distributed and how they propagate to impact the accuracy of the analysis. In this paper we perform an empirical study of the statistical distributions of compressioninduced errors in scientific data for a number of state-of-the-art data compressors. We find that compression schemes based on scalar quantization tend to give uniformly distributed errors that are weakly data-dependent, and that transform- and decomposition-based methods tend to give Laplace or normally distributed errors. With the exception of the FPZIP compressor, we find the errors to be unbiased with zero mean. We further analyze the error distribution of the ZFP compressor and show using the central limit theorem that it tends to a normal distribution. We conclude with an examination of correlation, both between the function being compressed and its errors and within the error signal itself. Our results suggest that transform-based compression methods more reliably reduce autocorrelation, especially at high compression ratios.