Abstract:
|
In the context of big and often high-dimensional data, valid procedures for assessing variable importance and identifying accurate model representations are essential tools, especially in the presence of substantial instability. Instead of seeking to find only a single set of covariates that form the empirically optimal model, we propose an automated procedure for identifying an entire collection of stable and predictively similar models. Within each iterate of the selection method, we develop a procedure to identify covariates that are predictively similar with regard to a chosen loss function, thereby providing multiple options as to which covariate should be added to the final model. By construction, our procedure acts a wrapper method that can be applied to any statistical or machine learning technique. Furthermore, we provide a natural and intuitive graphical display of these model paths that makes apparent potential underlying relationships between covariates as well as the relative importance of the covariates selected.
|