Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 201 - Big Data and Statistical Learning
Type: Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Computing
Abstract #313243
Title: Improved Two-Stage Model-Averaging for High-Dimensional Linear Regression
Author(s): Juming Pan*
Companies: Rowan University
Keywords: model averaging; model selection; High-dimensional regression models; random forest; jackknife
Abstract:

This talk is in response to the paper by Ando and Li (2014) who developed a two-stage model-averaging procedure (MCV) for high-dimensional regression. The most notable features of MCV lies in the usage of marginal correlation to group regressors for model construction and the relaxation of the conventional constraint of total weights summing to 1. We have several concerns with MCV. First, poor candidate models could be constructed by marginal correlation as high degree of correlation does not necessarily imply high linear relationship; second, relaxing the total model weight constraint may not always lower the prediction error; third, the same generated data were applied in the simulation study for both model-averaging and measuring forecasting performance, independent testing dataset should be collected for evaluation. In this paper, we consider LASSO, Ridge Regression, and Random Forest for partitioning regressors, and select AIC, Jackknife, Mallows Cp for optimizing model weights. We compare these approaches with MCV on simulated datasets and a real data application. The results demonstrate that prediction accuracy can be further improved for high-dimensional model-averaging.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program