Online Program

Return to main conference page
Friday, May 31
Computational Statistics
Statistical Methods for Analyzing Large Scale or Massive Data
Fri, May 31, 1:30 PM - 3:05 PM
Grand Ballroom K
 

Goodness-of-Fit Tests for Large Data Sets (305170)

*Taras Lazariv, TU Dresden 
Christoph Lehmann, TU Dresden 

Keywords: statistical inference, Big Data, Kolmogorov-Smirnov test, exploratory statistics, outlier detection

This contribution will provide some insights to methodological and technical issues referring inferential methods in the Big Data area in order to bring together Big Data and inferential statistics, as it comes along with its difficulties. An approach that allows testing goodness-of-fit without model assumptions and relying on the empirical distribution is presented. The method can utilize information from large datasets, and it is based on a clear theoretical background. We concentrate on the widely-used Kolmogorov-Smirnov test that is applied for testing goodness-of-fit in statistics. Our approach can be parallelized easily, which makes it applicable to distributed datasets particularly on a computing cluster.