JSM 2017 Online Program

Activity Number:	157 - Compressing Climate Model Data: Lowering Storage Burden While Preserving Information
Type:	Topic Contributed
Date/Time:	Monday, July 31, 2017 : 10:30 AM to 12:20 PM
Sponsor:	Section on Statistics and the Environment
Abstract #324295	View Presentation
Title:	Error Distributions of Lossy Floating-Point Compressors
Author(s):	Peter Lindstrom *
Companies:	Lawrence Livermore National Laboratory
Keywords:	Lossy compression ; Floating-point data ; Approximation ; Error distribution ; Correlation
Abstract:	In this paper we perform an empirical study of the statistical distributions of compression-induced errors in scientific data for a number of state-of-the-art data compressors. We find that compression schemes based on scalar quantization tend to give uniformly distributed errors that are weakly data-dependent, and that transform- and decomposition-based methods tend to give Laplace or normally distributed errors. With the exception of the fpzip compressor, we find the errors to be unbiased with zero mean. We further analyze the error distribution of the zfp compressor and show using the central limit theorem that it tends to a normal distribution. We conclude with an examination of correlation, both between the function being compressed and its errors and within the error signal itself. Our results suggest that transform-based compression methods more reliably reduce autocorrelation, especially at high compression ratios.

Authors who are presenting talks have a * after their name.