JSM 2017 Online Program

Activity Number:	531 - SPEED: Statistics in Epidemiology and Genomics and Genetics
Type:	Contributed
Date/Time:	Wednesday, August 2, 2017 : 11:35 AM to 12:20 PM
Sponsor:	Section on Statistics in Genomics and Genetics
Abstract #325180
Title:	Exploring High-Dimensional Feature Selection Using Reproducibility Methodology
Author(s):	Frank Shen*
Companies:	Penn State University
Keywords:	Irreproducible Discovery Rate ; Random Forest ; SVM ; high dimensional
Abstract:	High-throughput biological data has become an useful tool for understanding intricate biological systems in the past few decades. But, the resulting data has extremely high dimensionality, making it difficult to detect true associations amidst random noise. Several data mining tools, such as SVM and Random Forest, have sprung up to handle such analyses. These data mining tools are primarily focused upon prediction, but they are inconsistent when used for variable selection. Irreproducible Discovery Rate (IDR) has been proposed as method to better identify important variables in high dimensional biological data. We explore its use on large, sparse, high-dimensional datasets to increase the accuracy and consistency of variable importance measures used in data mining.

Authors who are presenting talks have a * after their name.