Activity Number:
|
166
- Understanding Mixtures in Environmental Epidemiology
|
Type:
|
Topic Contributed
|
Date/Time:
|
Tuesday, August 4, 2020 : 10:00 AM to 11:50 AM
|
Sponsor:
|
Section on Statistics in Epidemiology
|
Abstract #312215
|
|
Title:
|
Principal Component Pursuit for Pattern Identification in Environmental Health
|
Author(s):
|
Elizabeth A Gibson* and Jingkai Yan and Robert Colgan and John Wright and Jeff Goldsmith and Marianthi-Anna A Kioumourtzoglou
|
Companies:
|
Columbia University, Dept of Environmental Health Sciences and Columbia University, Data Science Institute and Columbia University, Data Science Institute and Columbia University, Data Science Institute and Columbia University Mailman School of Public Health and Columbia University, Dept of Environmental Health Sciences
|
Keywords:
|
Environmental Health;
Pattern Identification;
Dimensionality Reduction;
Epidemiology;
Environmental Mixtures
|
Abstract:
|
Environmental health (EH) researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures, to inform policy. Here we adapt Principal Component Pursuit (PCP)—a robust and well-established technique for dimensionality reduction and pattern recognition—to EH data. PCP is a convex program with efficient numerical tools and theoretical guarantees decomposing the design matrix into a low-rank matrix (to identify consistent patterns of exposure across pollutants and reduce dimensionality) and a sparse matrix (to identify unique exposure events). We extend PCP to accommodate non-negative data and measurements below the analytic limit of detection (LOD). We introduce a non-negativity constraint on the low rank solution matrix and allow for a more stringent penalty on observations < LOD. We ran simulations with increasing proportions < LOD to evaluate the performance of PCP-LOD compared with original PCP and PCA. PCP-LOD outperforms both in terms of relative error of identified components in the low rank solution matrix. We apply PCP-LOD to a nationally-representative dataset to identify sources of exposure to persistent organic pollutants.
|
Authors who are presenting talks have a * after their name.