Keywords: Precision medicine, Safety monitoring, Predictive modeling, Feature selection, Multiple testing
Recent technological advances have made it feasible to routinely measure large number of biomarkers in clinical trials. Motivated by several real-life examples in Oncology, we consider the problem of developing a monitoring algorithm from longitudinally collected biomarkers that can be used to monitor patients' short-term risk of developing an adverse event on an ongoing basis. However, a number of statistical challenges are brought about by the complex and high-dimensional longitudinal data structure. First, typically in clinical trials, sample collection tends to occur more frequently (e.g., biweekly) in the beginning of the study and become quite sparse over time (e.g., quarterly). As such, the risk of event is expected to vary as a function of when the last biomarker measurement was taken. Second, the data are high-dimensional with small n, and will require feature selection along with the provision of multiplicity corrected p-values. While extensive literature exists on joint modeling of longitudinal biomarkers with time-to-event outcomes, that type of modeling does not provide a straightforward framework for devising a monitoring algorithm. In this proposal, we treat this problem as a direct classification problem that involves modeling the risk of a patient developing an adverse event in a certain period of time after biomarker sample collection. This direct classification approach presents additional challenges as the binary outcomes data are highly imbalanced with potential correlations between outcome measurements from the same patients. We present a novel analytical framework to address these issues that leverages decision trees based GUIDE algorithm for feature extraction and Duchon logistic splines for building a prediction algorithm with the selected features. We will also discuss aspects of multiplicity correction in feature selection and assessment of model performance with this kind of outcomes data.