Conference Program

Return to main conference page

All Times ET

Wednesday, June 8
Machine Learning
Computational Statistics
Practice and Applications
Modeling + Non-Parametric Methods, Part 2
Wed, Jun 8, 2:45 PM - 3:40 PM
Allegheny I
 

Oblique and Non-Linear Survival Trees Based on Dipolar Splitting Criteria (310251)

*Drew Lazar, Ball State University  

Keywords: survival analysis, survival trees, survival forests, splitting criteria, machine learning

Survival analysis is the study of time-to-event data. It has wide application in epidemiology, engineering and finance among many others. Censoring, which often exists in survival data sets, presents challenges to statistical analysis and inference. Semi-parametric and parametric models have been developed to accommodate such survival data. More recently, machine learning approaches, such as support vector machines, neural networks and survival forests, have been successfully developed to model survival data.

Ensemble methods such as survival forests depend on splitting data at nodes in underlying decision trees. Various splitting criteria have been proposed and implemented using within- or between-node homogeneity. Criteria in the former category includes using log-likelihood statistics based on parametric assumptions and criteria in the latter criteria often depend on the log-rank statistic.

We improve and clarify existing algorithms which rely on non-parametric dipolar splits by hyperplanes for maximizing between-node homogeneity. We demonstrate improved prediction of survival experience and more parsimonious survival trees using simulated and real data sets. We extend these methods to non-linear surfaces while avoiding overfitting as we reduce tree sizes without decreasing predictive power on test data.