Abstract:
|
For dimension reduction procedures such as Sliced Inverse Regression (SIR) a worst possible case of data contamination can be defined as producing an estimated subspace that is maximally distant from the true dimension reduction subspace. That is, the estimated subspace is as orthogonal as possible to the true subspace. To formalize the concept of maximal distances between subspaces, we introduce a metric on subspaces. By metricizing distances between dimension reduction subspaces, worst case results for data contamination can be formulated to define a finite sample breakdown point as a measure of global robustness. We present the finite sample breakdown point for SIR and illustrate that the result depends intricately on a combination of factors, such as the dimension of the regressor space, the dimension of the true e.d.r. subspace, and whether the dimension is known or requires estimation. This study is further complicated by the issue that the most disruptive directions of contamination change between cases when the regressor covariance structure is known or unknown. Our theoretical findings are illustrated through simulation.
|