Abstract:
|
Scatterplots are the most common way for data analysts to visually detect relationships between measured variables. At the same time, and despite controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Classification accuracy in some scenarios improved on repeat attempts of the survey. Our results suggest that data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We propose that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.