JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 295
Type: Contributed
Date/Time: Tuesday, July 31, 2012 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #305758
Title: Extensions to Random Forests for High-Dimensional Genetic Data
Author(s): Stacey J Winham*+ and Robert R Freimuth and Joanna M Biernacka
Companies: Mayo Clinic and Mayo Clinic and Mayo Clinic
Address: Department of Health Sciences Research, Rochester, MN, 55905, United States
Keywords: Random Forests ; High-dimensional data ; data-mining ; gene-gene interactions ; statistical genetics ; genome wide association

Identifying variants associated with complex disease in high-dimensional data is a challenging problem, and complicated etiologies such as gene-gene interactions are often ignored. The data-mining method Random Forests (RF) can accommodate high-dimensional data while also allowing for potentially complex genetic models. RF variable importance measures rank SNPs and can be used as a filter that considers interactions, although the impact of dimensionality has not been explored. As dimension increases, we compare the performance of RF variable importance measures to univariate logistic regression to detect interactions. We observe similar detection probabilities for true interacting SNPs for both methods, indicating that in high-dimensional data RF is capturing marginal effects rather than complex models. To improve performance in high-dimensional data, we propose an extension called Weighted Random Forests (wRF), which incorporates tree-level weights to emphasize more accurate trees in calculations of prediction accuracy and variable importance. We demonstrate that in certain situations wRF can improve prediction accuracy and variable importance rankings of RF.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program

2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.