Abstract:
|
Gene expression analysis has been of major interest to biostatisticians for many decades. Such studies are necessary for the understanding of disease risk assessment and prediction, so that medical professionals and scientists alike may learn how to better create treatment plans to lessen symptoms and perhaps even find cures. In this study, we will investigate various gene expression analyses and machine learning techniques for disease class prediction, as well as assess predictive validity of these models and uncover differentially expressed (DE) genes for their relevant datasets. Multiple gene expression datasets will be used to test model accuracy and will be obtained using the Affymetrix U133A platform. Our models to be addressed are: (1) simple random forest modeling, (2) Gene eXpression Network Analysis (GXNA), (3) RF++, (4) LASSO regression, and (5) Bayesian Neural Networks. Significant Analysis of Microarrays (SAM) is used to identify potential disease biomarkers, as well a Principal Component Analysis to determine any significant clusters before applying clustering techniques. Our ultimate goal is to find co-expressed genes and identify the effect of clustering analysis.
|