Abstract:

Fitting a model to a collection of observations is one of the quintessential questions in statistics. The standard assumption is that the data was generated by a model of a given type (e.g., a mixture model). This assumption is at best only approximately valid, as real datasets are typically exposed to some source of contamination. Hence, any estimator designed for a particular model must also be robust in the presence of corrupted data. This is the prototypical goal in robust statistics, a field that took shape in the 1960s with the pioneering works of Tukey and Huber. Until recently, even for the basic problem of robustly estimating the mean of a highdimensional dataset, all known robust estimators were hard to compute. Moreover, the quality of the common heuristics degrades badly as the dimension increases.
In this talk, we will survey the recent progress in algorithmic highdimensional robust statistics. We will describe the first computationally efficient algorithms for robust mean and covariance estimation and the main insights behind them. We will also present practical applications of these estimators to exploratory data analysis and adversarial machine learning.
