Abstract:
|
In the era of precision medicine, building reliable, interpretable and accurate predictive models, based on the vast amount of patients’ clinical, demographic information as well as biomarkers, has been key for disease prevention, early diagnosis as well as targeted therapy. When predictors outnumber the sample size, regularization methods or variable screening approaches have been widely used for data dimension reduction. However, these methods either generate biased estimates and incur computational challenges or do not directly generate a predictive model. We propose an automatic data-driven sequential estimation method, an extension of forward regression in ultrahigh dimensional settings. We show that our method possesses model selection consistency and produces unbiased estimates of regression parameters, which is useful to accurately gauge the effect size of the chosen predictors.
|