A common assumption in the training of machine learning systems is that the data has few or no outliers, or that the distribution of the data does not have very long tails; which is increasingly indefensible. The key question then is how to perform estimation that is robust to departure from these assumptions. This question has been of classical interest, and loosely, there seemed to be a computation-robustness tradeoff, practical estimators did not have strong robustness guarantees, while estimators with strong robustness guarantees were computationally impractical.
In our work, we provide a new class of computationally-efficient class of estimators for risk minimization that are provably robust to a variety of robustness settings, such as arbitrary oblivious contamination, and heavy-tailed data, among others. Our workhorse is a novel robust variant of gradient descent, and we provide conditions under which our gradient descent variant provides accurate and robust estimators in any general convex risk minimization problem. These results provide some of the first computationally tractable and provably robust estimators for general statistical models.