Abstract:
|
Divide and conquer (D&C) is a smart and efficient approach to analyzing and making inference for big data. Its principle is to divide a big dataset into K subsets, then process each sub-dataset separately and combine these individual solutions to form a final solution to the original full data. In this paper, we propose a D&C approach for analysis using Cox proportional hazards model. Specifically, we consider to randomly divide the data into K subsets and propose a weighted method to combine the K partial maximum likelihood estimators (PMLE), each from an individual sub-dataset. Under some mild conditions, we show that the proposed final estimator is asymptotically equivalent to the PMLE from the full data as if it is analyzed all at once. We next extend our approach to the variable selection problem and propose an estimator that combines the K maximum penalized partial likelihood estimators, each obtained from an individual sub-dataset. Statistical properties of the resultant estimators are developed. Performance of the proposed methods, including savings in computation time, is investigated using simulation studies. A data example is provided to illustrate the proposed methods.
|