Machine learning is at the forefront of many recent advances in science and technology, enabled in part by the sophisticated models and algorithms that have been recently introduced. However, as a consequence of this complexity, machine learning essentially acts as a black-box as far as users are concerned, making it incredibly difficult to understand, predict, or "trust" their behavior.
In this talk, I will describe our research on approaches that explain the predictions of any classifier in an interpretable and faithful manner. In particular, these methods identify the relationship between the components of the input instance and the classifier's prediction. I will cover various ways in which we summarize this relationship: as linear weights, as precise rules, and as counter-examples, and present experiments to contrast them and evaluate their utility in understanding, and debugging, black-box machine learning algorithms.