Abstract:
|
As one of the most popular machine learning tools, tree-based method has been adapted for survival analysis and estimate the conditional survival functions nonparametrically. However, not many statistical results are available. We will first investigate the method from the aspect of splitting rules, where the log-rank test statistics are calculated and compared to find the best splitting variable. We demonstrate that this approach is affected by the censoring distributions, which may lead to inconsistency of the method. Based on this observation, we develop an adaptive concentration bound in the sense that for each terminal node, the estimation centers around the true within node average of the underlying survival model, which could be affected by the censoring distribution. As a result, we show that consistency can be achieved in high dimensional settings when the splitting rule is modified to satisfy certain restrictions.
|