Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 166 - Understanding Mixtures in Environmental Epidemiology
Type: Topic Contributed
Date/Time: Tuesday, August 4, 2020 : 10:00 AM to 11:50 AM
Sponsor: Section on Statistics in Epidemiology
Abstract #312215
Title: Principal Component Pursuit for Pattern Identification in Environmental Health
Author(s): Elizabeth A Gibson* and Jingkai Yan and Robert Colgan and John Wright and Jeff Goldsmith and Marianthi-Anna A Kioumourtzoglou
Companies: Columbia University, Dept of Environmental Health Sciences and Columbia University, Data Science Institute and Columbia University, Data Science Institute and Columbia University, Data Science Institute and Columbia University Mailman School of Public Health and Columbia University, Dept of Environmental Health Sciences
Keywords: Environmental Health; Pattern Identification; Dimensionality Reduction; Epidemiology; Environmental Mixtures

Environmental health (EH) researchers often aim to identify sources or behaviors that give rise to potentially harmful environmental exposures, to inform policy. Here we adapt Principal Component Pursuit (PCP)—a robust and well-established technique for dimensionality reduction and pattern recognition—to EH data. PCP is a convex program with efficient numerical tools and theoretical guarantees decomposing the design matrix into a low-rank matrix (to identify consistent patterns of exposure across pollutants and reduce dimensionality) and a sparse matrix (to identify unique exposure events). We extend PCP to accommodate non-negative data and measurements below the analytic limit of detection (LOD). We introduce a non-negativity constraint on the low rank solution matrix and allow for a more stringent penalty on observations < LOD. We ran simulations with increasing proportions < LOD to evaluate the performance of PCP-LOD compared with original PCP and PCA. PCP-LOD outperforms both in terms of relative error of identified components in the low rank solution matrix. We apply PCP-LOD to a nationally-representative dataset to identify sources of exposure to persistent organic pollutants.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program