Abstract:

The 1dimensional two sample Kolmogorov Smirnov (KS) test uses a statistic based upon the maximum distance between empirical estimates of the two Cumulative Distribution Functions (CDFs). In 1983, Peacock extended the 1dimensional KS test into two dimensions by using not only the nobserved pairs of observations, from which to build the 2dimensional CDFs, but all paired combinations of observed values, for a total of n^2 values from which to estimate the maximum distance in CDFs. This method is computationally expensive for large sample sizes. Fasano and Franceschini (1987) proposed only looking at the nobserved paired observations, claiming that one could minimize computational cost while still maintaining fidelity of the KS test. This work 1) examines differences in the estimated maximum KS distance between the methods suggested by Peacock and that by Fasano and Franceschini by building the distribution of standardized KS distances for each method using sample sizes ranging from 2 to 1000, and 2) addresses the tradeoff between computational cost and test accuracy with respect to estimating both the KS distance and the power/type I error of the KS hypothesis test.
