Abstract:
|
Random Forests (RF) are fast, flexible and represent a robust approach to mining high-dimensional continouous and discrete survival data. They perform well even in big data problems. The tree-building process of random forests implicitly allows for interaction between features and high correlation between features. Approaches are available to measuring variable importance, which is the basis for feature selection. Although RF perform well in many applications, their theoretical properties have been understood only recently. After a non-theoretical introduction into RF, we summarize the theoretical findings. We survey different versions of RF, including random survival forests (RSF) and conditional inference forests (CIF). Split criteria are summarized and their end-cut preference is discussed. Implementations of RF for survival data are compared with respect to options and runtime. We provide a brief overview of different areas of application of RF with survival data and present real data gene expression and genetic studies.
|