The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Online Program Home
Abstract Details
Activity Number:
|
295
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, July 31, 2012 : 8:30 AM to 10:20 AM
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
Abstract - #305758 |
Title:
|
Extensions to Random Forests for High-Dimensional Genetic Data
|
Author(s):
|
Stacey J Winham*+ and Robert R Freimuth and Joanna M Biernacka
|
Companies:
|
Mayo Clinic and Mayo Clinic and Mayo Clinic
|
Address:
|
Department of Health Sciences Research, Rochester, MN, 55905, United States
|
Keywords:
|
Random Forests ;
High-dimensional data ;
data-mining ;
gene-gene interactions ;
statistical genetics ;
genome wide association
|
Abstract:
|
Identifying variants associated with complex disease in high-dimensional data is a challenging problem, and complicated etiologies such as gene-gene interactions are often ignored. The data-mining method Random Forests (RF) can accommodate high-dimensional data while also allowing for potentially complex genetic models. RF variable importance measures rank SNPs and can be used as a filter that considers interactions, although the impact of dimensionality has not been explored. As dimension increases, we compare the performance of RF variable importance measures to univariate logistic regression to detect interactions. We observe similar detection probabilities for true interacting SNPs for both methods, indicating that in high-dimensional data RF is capturing marginal effects rather than complex models. To improve performance in high-dimensional data, we propose an extension called Weighted Random Forests (wRF), which incorporates tree-level weights to emphasize more accurate trees in calculations of prediction accuracy and variable importance. We demonstrate that in certain situations wRF can improve prediction accuracy and variable importance rankings of RF.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2012 program
|
2012 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.