Abstract:
|
Data generated for biomarker studies of a disease often include many large and diverse data sets. Classification models coupled with feature selection methods are often used to identify potential biomarker candidates. With several data sets available, the most straightforward approach to leverage all data is to use a single classification model across all data sets combined together. However, given the disparate nature of the different data types, it is unlikely for one model to be appropriate across all available data. We present a novel method for the integration and feature selection of multiple disparate data sets (e.g. dietary information, metabolomics, demographics, etc.) for biomarker discovery studies. We present an application of the method on a large cohort study of Type 1 diabetes demonstrating how integrated models could provide novel clues concerning the pathways leading to the disease.
|