Abstract:
|
We examine a few Bayesian-biclustering models for analyzing biological datasets and demonstrate their uses in analyzing biological sequences and gene expressiond data. These models can be viewed as special Bayesian clustering models with geature selections. For example, in order to predict co-functioning of genes in biologically relevant activities, we develop algoirthm CLIC to partition the given gene get into disjoint co-expression modules (CEMs), simultaneously assigning posterior probability of selection to each dataset (in this example, each dataset is a "feature"). We can then further expand each discovered CEM by scanning the whole reference genome for candidate genes that were not in the input gene set but co-expressed with the genes in this CEM. CLIC is capable of integrating over thousands of gene expression datasets to achieve much higher coexpression prediction accuracy compared to traditional co-expression methods. Application of CLIC to ~1000 annotated human pathways and ~6000 poorly characterized human genes reveals new components of some well-studied pathways and provides strong functional predictions for some poorly characterized genes.
|