JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 256
Type: Contributed
Date/Time: Monday, August 1, 2011 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #302411
Title: Data Preprocessing and Variable Selection in the Study of Proteomic Mass Spectrometry
Author(s): Chamont Wang*+ and Charlene Wang and Michele Meisner
Companies: The College of New Jersey and HealthFirst Inc. and The College of New Jersey
Address: , , ,
Keywords: Variable Selection ; Data Preprocessing ; False Discover Rate ; Decision Tree ; Stochastic Gradient ; Regression
Abstract:

This study investigates a set of proteomic data, collected from the records of 216 individuals: 121 of those with cancer and 95 healthy volunteers. For each individual, there are 368,749 pieces of the spectra in the raw data. In our investigation, we use a technique of Dynamic Binning to merge adjacent spectra by assigning similar compounds to the same spectrum without minimizing peak resolution. The process reduced the raw data from 1.16 Gb to 9.3Mb in 5,155 bins. Our study compares the effect of this technique with other types of binning.

Within each bin, one can take mean, max, SD, moving average and other types of statistics for predictive modeling. This study compares the efficiencies of these statistics in the prediction of cancer patients. Furthermore, the study investigates the effects of Variable Selection via False Discover Rate as discussed in Efron (2010, 2008) and Benjamini and Hockberg (1995). In addition, we used various techniques from Dudoit, Shaffer, and Boldrick (2003). We compare these results with the variables selected by Decision Tree, Stochastic Gradient, Regression, and Partial Least Squares.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.