Abstract:
|
In multivariate statistics, numerous parametric families of random vectors, for example so-called location-scale families, can be uniquely described by their mean vector and covariance matrix. This idealized assumption can though very often be violated in practice. The data can come from some other distribution, which could be, for example, multi-modal or not radially symmetric, and thus significantly differ from the original model assumption. Robust estimators of the mean vector and covariance matrix such as MVE or MCD are characterized by their property of being "stable" under certain distribution perturbations. We present a novel data-driven robust estimator based on non-parametric multivariate density estimation and modal association clustering techniques. To overcome possible oversmoothing effects, a reweighted version of Rousseeuw's minimum covariance determinant method is described and applied to select the bandwidth matrix.
|