Assessment of precision in companion diagnostics: past, present, and future methods
Stephanie Sanchez, Roche Tissue Diagnostics  *Crystal Schemp, Roche Tissue Diagnostics 

Keywords: tissue diagnostics, imprecision, variance estimation, gini index, average agreement, optimal prediction error

Both accuracy and precision of diagnostics used in precision medicine is essential to delivering the right drug to the right patient. As such, characterization of the variance sources associated with the diagnostic assessment is critical. Variance estimation methods are well described for continuously measured signals. Yet in tissue diagnostics, measurement limitations of diagnostic device output often preclude using a continuous scale. Ultimately, it is the precision of the final binary diagnostic assessment that must be demonstrated.

No consensus exists on the best method for reporting precision of the binary diagnostic assessment in a way that strictly measures imprecision while remaining intuitive and accessible to the end user. Guidance available to industry suggests several methods to assess precision of a single variance source, such as average agreement, kappa, or the Fisher’s test for independence; but these methods must be extended to assess the reproducibility of the diagnostic assessment across multiple sources of variability such as laboratory, operator, and instrument. Although generalized linear mixed models can be used to address variance components, the results are often not easily interpreted. Thus, extensions to the previous methods have been proposed to address variance estimation across multiple levels and in multi-factor settings.

A review of published Summary of Safety and Effectiveness Data supporting IVD pre-market approvals showed that a number of reproducibility claims were based on inter-laboratory reproducibility, as assessed by averaging multiple pairwise comparisons between site, day, and observer. The Gini index has also been used as a variance measure to report on total imprecision, a method which could be extended by calculating the proportional reduction in variance to assess the individual variance components. As the methods being adapted to this setting continue to evolve, we continue to see changes in the way precision is reported. Most recently, these have included measures of error, as calculated by comparison to the Optimal Prediction (i.e. most frequently occurring diagnostic assessment). Here we discuss by way of simulation studies the performance of average agreement rates, the Gini index, Optimal Prediction Error and an adjustment to Optimal Prediction Error for assessing the imprecision of the binary diagnostic assessment.