![IconGems-Print](images/IconGems-Print.png)
51 – Recent Developments in Modeling High-Dimensional and Complex Data
Analyzing High-Dimensional Gene Expression Data by Using the Regularization Technique
Kumer Das
University of Louisiana at Lafayette
Gene expression data can be difficult to analyze due to its high-dimensional nature. Regularization techniques are useful in reducing the amount of predictors and highlighting the significant genes, in this case genes that may indicate the presence of cancer. The goal of this study is to see if grouping the genes before applying the regularization techniques is beneficial in reducing the prediction error of classification. We investigate the potential effectiveness of using clustering algorithms to generate a grouping structure for high-dimensional data sets. Using various regularization techniques, we seek to determine if the generated groups are truly relevant to the response and if the accuracy and interpretability of the models can be improved. We apply the clustered group structure to two real-world data sets.