JSM 2012 Home

JSM 2012 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Online Program Home

Abstract Details

Activity Number: 580
Type: Contributed
Date/Time: Wednesday, August 1, 2012 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistics in Epidemiology
Abstract - #304590
Title: Statistical Strategies for Developing Classification Algorithms with Application to Insulin Sensitivity Status
Author(s): Wenting Xie*+ and Charmaine S Tam and Bin Li and Eric Ravussin and William D. Johnson
Companies: and Pennington Biomedical Research Center and Louisiana State University and Pennington Biomedical Research Center and Pennington Biomedical Research Center
Address: 4707 Tigerland Ave., Baton Rouge, LA, 70820, United States
Keywords: Boosted Regression Tree ; RandomForest ; Tree based methods ; Logistical regression ; Insulin Sensitivity Status ; metabolic markers
Abstract:

Insulin resistance is a strong precursor to the development of the metabolic syndrome and type 2 diabetes. The hyperinsulinemic-euglycemic clamp, the gold standard for assessing insulin resistance in humans, is labor-intensive and expensive and thus examining surrogate markers for insulin resistance is necessary. In this paper, we incorporated the newer statistical algorithms to boost accuracy of insulin prediction. Data including subject characteristics (age, ethnicity, sex), body composition (BMI) and blood biochemistry (glucose, insulin) were obtained from 270 individuals participating in research studies at the Pennington Biomedical Research Center in Louisiana between 2001 and 2011. Using these data, we applied and compared four statistical methods to predict insulin resistance including classical logistic regression, and the newer methods of single classification tree, boosted regression tree (BRT) and random forest (RF) as well as a novel approach of combining logistic regression and featured selection from BRT or RF. Random forest (AUC=0.858) and boosted regression tree (AUC=0.845) gave the best prediction performance for predicting insulin resistance. This was followed by logistic regression method combined with feature selection technique from BRT or RF (AUC=0.763) and finally single classification tree (AUC=0.741). However, when using variables without a large portion of missing values we found that logistic regression (AUC=0.84) gave the best prediction performance. The result shows that boosted regression tree and random forest approaches may provide better algorithms where missing data may be an issue. We also found an appropriate combination of traditional logistic regression and variable selection from BRT or RF may improve model performance. Logistic regression is still appropriate when missing data may not be a factor. In conclusion, we have illustrated the exploration of different statistical models when determining prediction performance in biomedical studies.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2012 program




2012 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.