Abstract:
|
Identifications of disease signature or protein biomarkers have been crucial for medical diagnosis and prognosis, and drug target selection in complex diseases, such as cancer. Statistical models with single feature selection encompass the multi-testing burden with low power if with limited sample size. High correlations among the markers, along with small to moderate effects often lead to unstable selections, and cause reproducibility issues. Machine learning with ensemble feature selections (EFSs) has the advantage to alleviate and compensate those drawbacks. Mass spectrometry (MS) based proteomic technologies have enabled global expression profiling at the protein level to examine the linkages between protein, cancer subtypes and treatment heterogeneity. In this work we conducted and compared various EFS methods in machine learning models such as random forests, support vector machine, and neural network for predicting both binary and multiple class outcomes using MS proteomic ovarian cancer data. Despite the different prediction accuracies from various machine-learning models, EFSs identify the consistent and reproducible sets of proteins biomarkers linked to the outcomes.
|