Abstract:
|
Random forest is a popular method for developing prediction models. The first step in developing prediction models often involves reducing the number of variables to be included. Dozens of random forest variable selection methods exist; however, there is a paucity of literature to guide users as to which method may be preferable for different types of datasets. Using several hundred datasets freely available online, we evaluate the prediction error rates, number of variables selected, and computation times. This presentation will discuss preferable random forest variable selection methods, tailored for different types of innovative real-world applications.
|