Industry Demands and Statistical Perspective for Drug Biomarker Development
View Presentation View Presentation
Jie Cheng, GSK  *Kwan Lee, GSK 

Keywords: biomarker discovery, genomic data analysis

We will discuss three important statistical questions that are routinely being asked in industrial biomarker discovery: i.) how do we select features for predictive modeling in clinical settings? ii.) how do we estimate model performance? iii.) how do we select more features for further analysis? To answer the first question, we will exam the commonly made normality assumption about the genomic data and then briefly present our approach based on grid search using a pair of statistics, which often results in models that consist of a small number of features with large fold changes. For the second question, we would like to promote the usage of robust cross validation techniques including nested cross validation. To answer the third question, we will propose a procedure that iteratively identify and remove important features from subsequent runs until no good feature can be found.