Abstract:
|
The most popular approach for analyzing survival data is the Cox regression model. However, its proportionality assumption is not always fulfilled. An alternative is the use of random forests for survival outcomes. The standard split criterion is the log rank test statistic, which favors splitting variables with many possible split points. In this presentation, we introduce maximally selected rank statistics for split point selection. To avoid split point selection bias, the method minimizes p-values for association between split points and survival time. We describe several p-value approximations and the implementation of the proposed random forests approach. A simulation study demonstrates that unbiased split point selection is possible. However, there is a trade-off between unbiased split point selection and runtime. Benchmark studies of prediction performance on simulated and real datasets on breast cancer show that the new method performs equally well or even better than random survival forests, conditional inference forests and the Cox model. In a runtime comparison the method proves to be computationally faster than random survival forests and conditional inference forests.
|