Abstract:
|
Linear classifiers, such as the linear discriminant analysis (LDA) classifier and versions of it, are popular tools for classification purposes. However, their use is limited to applications where there is one data type available. Nowadays, multiple data types (including genomics, metabolomics) are being measured on the same subject with each data type measuring different sets of characteristics but collectively helping to explain underlying complex relationships. When the goal is to simultaneously use these data types for classification purposes, the standard LDA and its current versions suffer. We present SIDA (Sparse Integrative Discriminant Analysis), a novel linear discriminant analysis method for simultaneous integrative analysis of multiple data types and classification. The method we propose solves an optimization problem that considers the overall association between different data types, and maximum separation of classes within each data type in choosing discriminant vectors that optimally separate subjects into different classes. Simulation studies and real-data example are used to demonstrate the effectiveness and efficiency of the proposed approach.
|