|
Activity Number:
|
362
|
|
Type:
|
Topic Contributed
|
|
Date/Time:
|
Tuesday, August 4, 2009 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
Section on Statistics in Epidemiology
|
| Abstract - #305315 |
|
Title:
|
Classifier Variability: Accounting for Training and Testing
|
|
Author(s):
|
Weijie Chen*+ and Brandon D. Gallas
|
|
Companies:
|
FDA and FDA
|
|
Address:
|
10903 New Hampshire Ave, CDHR, Silver Spring, MD, 20993,
|
|
Keywords:
|
classification ; training variability ; U-statistics
|
|
Abstract:
|
This paper concerns the statistical evaluation of classifiers that are commonly used to combine multiple biomarkers to predict the presence or absence of certain disease. A classifier is typically trained with a finite data set and tested on an independent finite data set. We consider two sources of variability of the estimated performance of such a classifier: the finite size of the training set and the finite size of the testing set. We mimic multiple training sets by bootstrapping the training data set. We investigated U-statistics variance estimators for (a)the estimated conditional AUC that is conditional on a training data set; (b) the estimated mean AUC where the mean is over multiple training sets. Our U-statistics variance estimators are unique minimum variance unbiased estimators. We demonstrate our methodology with simulated data sets as well as real-world genomic data sets.
|