Abstract:
|
Standard multivariate techniques are zero-breakdown: the introduction of a single corrupted data point can spoil the estimates. Estimator consistency results are typically fragile in this sense. Outliers can be difficult to detect in multivariate data, especially with large data sets. We present two new methods for multivariate outlier identification and for robust estimation of multivariate location and dispersion. We investigate the performance of our new methods via simulation and application to real-world data. The evidence indicates that our methods perform at a par with, or better than, two of the currently best available methods, and that they work well on benchmark data sets. We re-examine prominent economic studies and show that key results can be sensitive to a small percentage of atypical cases in the data. This is valuable information; if atypical data drive a result, it should prompt an investigation of those atypical data, call into question the generality of the result, and potentially lead to richer theory. As the current tools are both feasible and accurate, we suggest that any empirical researcher should check to see if his or her results are robust.
|