|
Activity Number:
|
418
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Wednesday, August 9, 2006 : 10:30 AM to 12:20 PM
|
|
Sponsor:
|
Section on Statistical Computing
|
| Abstract - #306089 |
|
Title:
|
A Systematic Benchmark of Dimension Reduction in Remote Homology Detection with Support Vector Machines
|
|
Author(s):
|
Melissa M. Matzke*+ and Bobbie-Jo Webb-Robertson and Christopher S. Oehmen and Jorge F. Reyes Spindola
|
|
Companies:
|
Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory
|
|
Address:
|
MS K1-90, Richland, WA, 99352,
|
|
Keywords:
|
bioinformatics ; support vector machine (SVM) ; multivariate analysis ; dimensionality reduction ; homology
|
|
Abstract:
|
Biopolymer sequence comparison to identify evolutionarily related proteins is one of the most common and data intensive computing tasks in bioinformatics. One of the most accurate approaches implements support vector machines (SVMs) to classify proteins into families via vectorization of the protein by sequence similarity scores obtained from the Bayesian Algorithm for Local Sequence Alignment (BALSA). However, one primary computational issue with SVMs is the size of the variable set. In this study, the performance of the SVM built with the complete BALSA score set is assessed against a reduced dimensionality. Principal components analysis, sequential projection pursuit, independent component analysis and kernel principal components analysis are used for dimension reduction. The area under the ROC curve is used to compare model performance.
|