Abstract:
|
Random forests is a popular nonparametric tree ensemble well known for highly accurate prediction. But another important feature is that it provides a fully nonparametric measure of variable importance (VIMP) for ranking variables. However, inference for VIMP is difficult due to its highly complex nature. Therefore, we describe a subsampling approach that can be used to estimate the variance of VIMP and to construct confidence intervals. The method is applicable to a wide variety of problems, including regression, classification, and survival, and is found to be highly effective, even surpassing bootstrapping, and most importantly it is computationally fast and attractive for big data settings.
|