Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number:
Register
101 - Section on Statistical Learning and Data Science P.M. Roundtable Discussion (Added Fee)
Type: Roundtables
Date/Time: Monday, August 9, 2021 : 12:00 PM to 1:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #317612
Title: Statistical Learning on Large Population Studies: The Curse of Large P, Large N
Author(s): Prateek Sasan*
Companies: The Ohio State University
Keywords: Statistical Learing; Larger than RAM datasets; high dimensional statistics; dimensionality reduction; population studies
Abstract:

Large population studies are now increasingly common in the field of genetics, neuroscience, economics etc. These datasets possess a large number of samples and features, their size can often exceed the available RAM and additionaly, they have their own subject specific complexities e.g., data type and missing data. Consequently, the softwares available to run the statistical learning procedure of our choice might not be able to handle these complex challenges. For instance, the R package "glmnet" does not support long vectors, and R packages like "biglasso" and "bigstatsr" which are designed for large datasets have limited functionality e.g., "biglasso" does not support multivariate lasso regression. Alternatively, we could reduce the sample and the feature size but need to be careful as over reduction can lead to a loss of information. At this roundtable, we plan to discuss the major methodological and computational challenges as well as the key lessons learnt while analyzing such datasets.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program