Exploratory subgroup analysis: Subgroup identification approaches in clinical trials
View Presentation View Presentation
Alex Dmitrienko, Quintiles  *Ilya Lipkovich, Quintiles 

Keywords: subgroup analysis, machine learning, exploratory analysis, biomarker identification

Vast literature has been generated in medical and statistical journals over the last 15 years concerning subgroup analysis methodology and the assessment of validity/credibility of subgroup analysis methods for clinical trial data. We see a shift from presenting checklists of “good practices” for subgroup analysis to developing more aggressive subgroup/biomarker identification strategies under the umbrella of “individualized medicine/tailored therapeutics”. This presentation will provide a structured overview of several recent subgroup identification methods that originated in data mining and machine learning fields. We discuss some challenges of applying data mining methodology to subgroup identification in the context of clinical data and present a recursive partitioning procedure for subgroup identification SIDEScreen (Lipkovich and Dmitrienko, 2014) which is an extension of the SIDES procedure (Lipkovich et al, 2011). SIDEScreen is an ensemble method based on recursive partitioning. It is different from many other applications of data mining/machine learning to subgroup analysis in that it allows for explicit control of the overall Type I error rates. We will discuss its key elements: (1) generation of multiple promising subgroups based on different splitting criteria, (2) choice of optimal values of complexity parameters via cross-validation, (3) evaluation of variable importance and using variable importance indices for pre-screening covariates, and (4) addressing Type I error rate inflation using a resampling-based method. SIDEScreen procedure will be illustrated using clinical trial example.