Abstract:
|
Recent technological advances have made it possible to collect a variety of genomic data on the same cancer patients. Therefore, it would be of great interest to integrate multiple genomic data types for predicting clinical outcomes such as survival times. We propose a variable selection method called Integrative Boosting (I-Boost), which makes use of all available clinical and genomic data, but treats each data type separately, such that small but predictive data types would not be dominated by the larger ones. Simulation studies and applications to The Cancer Genome Atlas reveal that I-Boost performs substantially better than existing variable selection methods in prediction accuracy. Using I-Boost, we show that combining genomic variables with clinical variables provides more accurate prediction of survival time than the use of either data type alone. In addition, the currently available gene modules provide better or similar prediction accuracy when compared to the totality of individual gene expression data. Finally, clinical and gene expression data have higher prognostic values than other types higher prognostic values than other types of genomic data.
|