Online Program

Friday, February 19
CS07 Emerging Challenges and Methods for Large Databases Fri, Feb 19, 11:00 AM - 12:30 PM
Diamond I&II

High-Dimensional Linear Model Stability and Robustness in Large Database Applications (303072)

*Michael B Brimacombe, KUMC 

Keywords: linear models, high dimensional data analysis, mis-specification, robustness

The development of large databases as a statistical research setting is an area of great research interest. Linear models in the setting of large genomic databases and related high dimensional analysis require careful application when the number of variables (p) exceeds the number of subjects (n). The resulting predictive models, even with sparsity restrictions imposed, are often non-robust. These issues are examined here using detailed numerical studies. In particular nonlinearity, model mis-specification and issues related to Simpson’s paradox affect the stability of related statistical methods and search algorithms.