Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 51 - Recent Developments in Modeling High-Dimensional and Complex Data
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: SSC (Statistical Society of Canada)
Abstract #312953
Title: Analyzing high-dimensional gene expression data by using the regularization techniques
Author(s): Kumer Das* and Aiden Kenny and Danielle Solomon
Companies: University of Louisiana At Lafayette and Franklin and Marshall College and St. John's University
Keywords: Sparse Regression; Regularization; Logistic regression; LASSO; Principal Components; Clustering

The idea of using data to train models that are both accurate and interpretable has been around for decades. One desires to build such an effective model based on the predictors. However, in the age of big data, it is becoming increasingly common that a data set is high-dimensional, meaning the number of predictors vastly exceeds the number of observations. In this setting, many long standing statistical modeling techniques, such as linear and logistic regression, no longer suffice. Regularization is a popular technique that imposes a penalty on the original model; in some cases the models are sparse, meaning they are very interpretable. In this study, we investigate the potential effectiveness of using clustering algorithms to generate a grouping structures for high-dimensional data sets. Using various regularization techniques, we seek to determine if the generated groups are truly relevant to the response and if the accuracy and interpretability of the models can be improved. We support the clustered group structure theory using two real-world data sets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program