Abstract:
|
We propose a new constrained partial maximum likelihood estimator for dimension reduction in integrative (e.g., pan-cancer) survival analysis with high-dimensional covariates. We assume that in each population from which the data are collected, the baseline hazard function is distinct. However, we also assume that survival is explained by a low-dimensional linear combination of predictors, some of which are shared across multiple populations. We estimate these linear combinations using an algorithmic approach, based on “distance-to-set” penalties, which imposes both low-rankness and sparsity. Asymptotic results provide insight regarding the performance of our estimator. Numerical experiments suggest that our method can outperform related competitors under various data generating models. We use our method to perform a pan-cancer survival analysis relating proteomic profiles to survival across 18 distinct cancer types. Our approach identifies six linear combinations of 20 proteins whose expression explain survival across all cancer types: our findings largely agree with the existing literature. We findings are further validated on multiple external datasets.
|