Keywords: Bootstrap, Fine-Gray model, Large-scale data, Variable selection
Advancements in areas such as medical informatics tools and high-throughput biological experimentation are making large-scale biomedical data routinely accessible to researchers. Competing risks data is typical in biomedical studies where individuals are prone to more than one cause (type of event) which can preclude the others from happening. The Fine-Gray model is a popular approach to model competing risks data and is currently implemented in a number of statistical software packages. Current estimation procedures are not computationally scalable for large-scale data. We develop a novel technique to estimate the parameters of the Fine-Gray model by exploiting the cumulative structure of the risk set for each subject. A two-way linear scan approach allows us to perform parameter estimation in linear time, considerably reducing the runtime for optimization. Extensive numerical studies compare the speed and scalability of our implementation to currently available methods for unpenalized and penalized Fine-Gray regression. We note over 100-300 fold change in runtime for moderately sized data.