Online Program Home
  My Program

Abstract Details

Activity Number: 294 - High-Dimensional Regression
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 8:30 AM to 10:20 AM
Sponsor: Biometrics Section
Abstract #324528
Title: An Application of High-Dimensional Multiclass Classification Methods to Listeria Monocytogenes Whole Genome Multilocus Sequence Typing Data
Author(s): Sunkyung Kim* and Gordana Derado and Anna J Blackstock and Conrad Amanda and Heather Carleton
Companies: Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention and Centers for Disease Control and Prevention
Keywords: Multiclass classification ; Whole Genome Sequence ; High dimensional ; Group lasso
Abstract:

Whole genome sequence (WGS) data enhance investigation of foodborne outbreaks caused by Listeria monocytogenes. Since September 2013, epidemiologic data collected by the Listeria Initiative at CDC have been combined with WGS for use in Listeria outbreak investigations. By combining WGS data with food exposure data, clinical isolates from foodborne outbreaks can be linked to likely food sources. In this study, we apply state-of-the-art multiclass classification algorithms to Listeria whole genome multi-locus sequence typing data to predict the source of human illness. Although methodological development for feature selection in high dimensional data has been established in the literature, its application to a multiclass classification problem with categorical features is lacking. We compare performance of multinomial logistic regression using lasso, group lasso, and elastic net penalties using prediction error rates, which range from 40% to 60% and vary based on the method used and parameters chosen. We conclude with a discussion of our results, a remark on computational challenges, and a plan for future research to incorporate the available patient-level epidemiologic covariates.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association