In this talk, we present our latest findings to deliver new statistical approaches to identify various types of interpretable feature representations that are prognostically informative in classifying complex diseases. Identifying key features and their regulatory relationships which underlie biological processes is the fundamental objective of much biological research; this includes the study of human disease, with direct and important implications in the development of target therapeutics.
We present new ways to visualise valuable information from the thousands of resamples in modern selection methods that use repeated subsampling to identify what features predict best disease progression. We show that using subtractive lack-of-fit measures scales up well to large dimensional situations, making aspects of exhaustive procedures available without its computational cost. We also show that the subtractive lack-of-fit measures provide measures of feature importance, which when bootstrapped can utilise pairwise feature comparisons to both improve the initial ranking as well as distinguishing highly significant features which continuously rank higher than a subset of other features.
|