Online Program

Return to main conference page

All Times ET

Thursday, June 3
Data Visualization
Visual Analytics
Thu, Jun 3, 1:10 PM - 2:45 PM
TBD
 

Validating Visual Inference Methods by Use of Deep Learning (309708)

Claus Thorn Ekstrøm, Biostatistics, University of Copenhagen 
*Anne Helby Helby Petersen, University of Copenhagen 

Keywords: visual inference, neural network, model diagnostics, deep learning, validation

When does inspecting a certain graphical plot allow for an investigator to reach the right statistical conclusion? Visual inference is commonly used for various tasks in statistics - including model diagnostics and exploratory data analysis - and though attractive due to its intuitive nature, the lack of available methods for validating plots is a major drawback.

We propose a new validation method for visual inference. Our method trains deep neural networks to distinguish between plots simulated under two different data generating mechanisms (null or alternative). We report the classification accuracy as a technical validation score (TVS). The TVS measures the information content in the plots, and TVS values can be used to compare different plots or different data generating mechanisms, thereby providing a meaningful scale that new visual inference procedures can be validated against. We consider this as a necessary criterion for validity: If the neural network cannot be trained to classify the plots well, we claim that it is unlikely that humans will be able to do so. On the other hand, if it is possible for the neural network to classify the plots sufficiently well, then the plots do hold the information necessary for the task at hand and hence it may also be possible for humans to learn to classify them correctly.

We apply the method to three popular diagnostic plots for linear regression, namely the scatter plot, the quantile-quantile plot and the residual plot. We consider various types and degrees of misspecification, as well as different within-plot sample sizes. Our method produces TVSs that increase with increasing sample size and decrease with increasing difficulty, and hence the TVS is a meaningful measure of validity. Our application also provides an example of how the validation method can be calibrated by applying it to well-known use cases, and we therefore believe that it will be easy to implement in new applications.