Abstract:
|
In this paper we perform an empirical study of the statistical distributions of compression-induced errors in scientific data for a number of state-of-the-art data compressors. We find that compression schemes based on scalar quantization tend to give uniformly distributed errors that are weakly data-dependent, and that transform- and decomposition-based methods tend to give Laplace or normally distributed errors. With the exception of the fpzip compressor, we find the errors to be unbiased with zero mean. We further analyze the error distribution of the zfp compressor and show using the central limit theorem that it tends to a normal distribution. We conclude with an examination of correlation, both between the function being compressed and its errors and within the error signal itself. Our results suggest that transform-based compression methods more reliably reduce autocorrelation, especially at high compression ratios.
|