Abstract:
|
Random survival forests are popular statistical models in biomedical studies, especially for cancer studies with high-dimensional genetic information. With the abundance of cancer genetics and genomics data, new studies can borrow information from existing ones. For this purpose, we propose penalized random survival forests that utilize information from existing data to improve the model fitting. The penalization is achieved by constructing a new type of splitting rule that shrinks the marginal scores of a potential split based on provided information. This new split has the potential to improve the convergence rate of random forest models. We experimented with two types of shrinkage methods by utilizing two types of existing information: the marginal p-value or summary statistics which are often released from existing studies, or the variable importance measure calculated from the existing data if the complete data are available. We perform extensive simulation studies to demonstrate the superior performance over existing single data set approaches, and also apply our method to genetic data for analyzing skin cancer.
|