Activity Number:
|
376
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 2, 2016 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistics in Genomics and Genetics
|
Abstract #320739
|
|
Title:
|
Multicategory Classification Using High-Dimensional Predictors with Applications to Studying Effects of Rice Genome
|
Author(s):
|
Arkaprava Roy* and Subhashis Ghoshal
|
Companies:
|
North Carolina State University and North Carolina State University
|
Keywords:
|
High Dimensional ;
Genomics ;
Logistic Regression ;
Lasso
|
Abstract:
|
We develop a multicategory classification technique particularly useful when one type of predictors are present in an unbalanced way and one set of predictors tend to mask the effects of the other type of predictors. Our method is motivated by a problem of classifying rice type in one of the five groups based on its genome data and effects of exogeneous variables. The gene expressions are high dimensional and hence variable selection is needed, but one type of variable should not overwhelm the other type in selection. Experience shows that the effects of gene expressions tend to be shadowed by the macro variables which dominate in a sparse classification procedure. We address the issue by explaining macrovariables by their respective sparse regression residuals and then consider all variables together in a variable-selection-cum-classification procedure using a high dimensional penalized logistic regression framework. We proceed by selecting one variable at a time in a forward selection framework with an objective function that includes also a penalty term. The proposed approach is shown to select very sparse models without losing predictive power.
|
Authors who are presenting talks have a * after their name.