Online Program

Return to main conference page
Thursday, October 19
Thu, Oct 19, 2:45 PM - 3:50 PM
Aventine Ballroom E
Speed Session 1

Probabilistic Predictive Principal Component Analysis for Spatially Misaligned and High-Dimensional Air Pollution Data (303779)

Adam A Szpiro, University of Washington 
*Phuong T Vu, University of Washington 

Keywords: air pollution, dimension reduction, principal component analysis, missing data, latent variable model, spatial misalignment, universal kriging

Environmental studies often focus on the health impacts of long-term air pollution exposure on human subjects. Pollutant concentrations are measured at regulatory monitoring locations, which are usually located at different locations than the study subjects. This spatial misalignment motivates a two-stage modeling approach with an exposure model and a health regression model. In addition, air pollution is often a mixture of many components with different health implications. Conventional approaches incorporate techniques such as principal component analysis (PCA) to obtain a lower-dimensional representation of the data. Recently developed predictive PCA modifies the optimization criterion to improve the exposure model. However, these approaches require complete data. Real-world data tend to have complex missing patterns, including some pollutants that are measured at relatively few locations and some locations with many missing measures. We propose a probabilistic version that allows for flexible imputation to utilize all available monitoring data. We demonstrate the performance of probabilistic predictive PCA with simulations and analysis of multivariate air pollution data.