Abstract:
|
Tree-based methods and their ensemble extensions remain a popular tool in the statistical machine learning domain. In addition to their demonstrated robust predictive accuracy, a variety of ad hoc tools are available to assist in understanding the model fit and underlying processes. In recent years, a flurry of theoretical developments investigating the consistency and asymptotic distributions of predictions from such methods has helped to pull these tools further within the domain of statistics. We will highlight a number of these developments and discuss how those results pave the way for more traditional statistical analyses to be performed within these normally black-box procedures. We focus on particular on generating confidence intervals for predictions, the development of formal hypothesis tests for variable importance, efficient variable screening procedures, as well as a recent proposal based on classical permutation tests that allows such procedures to scale to high-dimensional settings and to be performed simultaneously on large test sets. Simulation results and demonstrations on real data will be provided.
|