Abstract:
|
Recent replication crisis has led to a number of ad hoc suggestions to decrease the chance of making false positive findings. Among them, it has been suggested to use 0.5% significance level instead of conventional 5% significance level. However, it is unclear whether or not commonly used two-sample tests are robust and/or powerful at such an extreme quantile. Therefore, the robustness and power curve behaviors of independent two-sample tests are investigated for metric and ordinal data at nominal significance levels of 0.5% and 5%. Through an extensive simulation study, it is found that the permutation versions of the Welch t-test and the Brunner-Munzel test are particularly robust and powerful while the commonly used two-sample tests that utilize t-distribution tend to be either liberal or conservative.
|