JSM 2009 Online Program

Activity Number:	362
Type:	Topic Contributed
Date/Time:	Tuesday, August 4, 2009 : 2:00 PM to 3:50 PM
Sponsor:	Section on Statistics in Epidemiology
Abstract - #305315
Title:	Classifier Variability: Accounting for Training and Testing
Author(s):	Weijie Chen*+ and Brandon D. Gallas
Companies:	FDA and FDA
Address:	10903 New Hampshire Ave, CDHR, Silver Spring, MD, 20993,
Keywords:	classification ; training variability ; U-statistics
Abstract:	This paper concerns the statistical evaluation of classifiers that are commonly used to combine multiple biomarkers to predict the presence or absence of certain disease. A classifier is typically trained with a finite data set and tested on an independent finite data set. We consider two sources of variability of the estimated performance of such a classifier: the finite size of the training set and the finite size of the testing set. We mimic multiple training sets by bootstrapping the training data set. We investigated U-statistics variance estimators for (a)the estimated conditional AUC that is conditional on a training data set; (b) the estimated mean AUC where the mean is over multiple training sets. Our U-statistics variance estimators are unique minimum variance unbiased estimators. We demonstrate our methodology with simulated data sets as well as real-world genomic data sets.


This is the preliminary program for the 2009 Joint Statistical Meetings in Washington, DC.
The views expressed here are those of the individual authors and not necessarily those of the ASA or its board, officers, or staff. Back to main JSM 2009 Program page