Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 81 - Contributed Poster Presentations: Section on Statistics in Epidemiology
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #313526
Title: Significant Gene Array Analysis and Cluster-Based Random Forest Modeling for Breast Cancer Relapse Prediction
Author(s): Myrine Barreiro-Areval*
Companies: University of Texas Rio Grande Valley
Keywords: Random Forest Modeling; gene clusters; disease prediction
Abstract:

Random forest modeling is a popular decision tree-based ensemble machine learning tool that has recently gained traction in computational biology due to its ability to analyze DNA microarrays which are significant for its capability of handling thousands of gene expression data simultaneously. The primary purpose of this experimental data is for usage in addressing interests in research such as identifying the co-regulated and functionally related groups of genes from an organisms’ genomic data. This study will explore the possibility of identification of differentially expressed genes that may serve as potential biomarkers for breast cancer using Significant Analysis of Microarrays and random forest modeling with the Gene eXpression Network Analysis Tool by forming a set of functionally correlated clusters of genes. The gene expression data was taken from Gene Expression Omnibus database GSE2034/GSE2990. Model RF1 of single genes yielded a 0.33 Out-Of-Bag (OOB) error rate on a collection of 2071 filtered genes. Model RF2 with clusters yielded a 0.35 OOB error rate on < 1% of genes held in the original RF1. Gene EEF1A2 was found to be the most significant up-regulated gene.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program