Keywords: statistical inference, Big Data, Kolmogorov-Smirnov test, exploratory statistics, outlier detection
This contribution will provide some insights to methodological and technical issues referring inferential methods in the Big Data area in order to bring together Big Data and inferential statistics, as it comes along with its difficulties. An approach that allows testing goodness-of-fit without model assumptions and relying on the empirical distribution is presented. The method can utilize information from large datasets, and it is based on a clear theoretical background. We concentrate on the widely-used Kolmogorov-Smirnov test that is applied for testing goodness-of-fit in statistics. Our approach can be parallelized easily, which makes it applicable to distributed datasets particularly on a computing cluster.