Activity Number:
|
97
- New Methods for Structured Variable Selection
|
Type:
|
Topic Contributed
|
Date/Time:
|
Monday, August 8, 2022 : 8:30 AM to 10:20 AM
|
Sponsor:
|
SSC (Statistical Society of Canada)
|
Abstract #322447
|
|
Title:
|
Variable Selection in High-Dimensional Linear Regression Accounting for Heterogeneity in Covariate Effects Across Multiple Data Sources
|
Author(s):
|
Tingting Yu*
|
Companies:
|
Harvard Pilgrim Health Care Institute and Harvard Medical School
|
Keywords:
|
Data heterogeneity;
Variable selection;
Coefficient clustering;
K-means;
ADMM
|
Abstract:
|
When analyzing data combined from multiple sources, the heterogeneity across different sources must be accounted for. We consider high-dimensional linear regression models for integrative data analysis with heterogeneity across units modeled as unit-specific covariate effects. A fully heterogeneous model that assumes distinct covariate effects for each source can be over-parameterized and may impair statistical power when the sample size is small and the number of predictors or units is large. Therefore, identifying sub-homogeneity among heterogeneous covariate effects is necessary to build a more parsimonious model. We propose a new adaptive clustering penalty (ACP) method to simultaneously select variables and cluster unit-specific covariate effects with sub-homogeneity. We show that the estimator based on the ACP method enjoys a strong oracle property under certain regularity conditions, and develop an efficient alternating direction method of multipliers (ADMM) algorithm for parameter estimation. We conduct simulation studies to compare the performance of the proposed method to existing methods and apply the method to real datasets.
|
Authors who are presenting talks have a * after their name.