Abstract:
|
The rapid accumulation of high-throughput genomic data offers an unprecedented opportunity to study human diseases. We formulate the question of disease diagnosis as a hierarchical multi-label classification (HMC) problem, and have developed a systematic framework, including a two-stage Bayesian learning approach, to associate one or more diseases organized in a hierarchical taxonomy with a queried expression profile. We further studied methods for making "optimal" decisions in HMC given classifiers of individual classes. In particular, we introduce a new procedure, based on transforming the individual classifier scores into local precision rates or local false discovery rates, to make class assignments along either a tree- or DAG-structured hierarchy. This method will lead to an optimal hit curve under some reasonable conditions.
|