Advances in next generation sequencing (NGS) technologies provide exciting opportunities to study disease etiology and pathology. One of the challenges with development of NGS based signatures is to determine appropriate sample sizes.
To accommodate the large scale of biomarkers, penalized regression, such as Lasso, is often used to identify biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an L0-penalty on the regression coefficients, but is known as NP-hard. We proposed an efficient augmented and penalized minimization L0 (APM-L0) to solve L0-penalty variable selection problem.
The sample size calculation plays an important role in the study planning but is not well studied, especially for biomarker signature discovery problems using regularization methods. The objective is to use simulation studies to quantify the impact of sample size on the performance of various penalized methods measured by prediction error and selection performance. Our proposed variable selection method can be applied to clinical studies for disease diagnostic and enrichment design.