Abstract:
|
Gene expression profiles have been widely utilized to define subgroups of diseased patients, which often lead to novel biological insights on diseases. Longitudinal gene expression profiles collected from patients over time might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel subgroup identification method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects model framework to model the trajectory of genes over time and the Dirichlet process prior distribution is assumed for the random effects to induce clustering. Also, factor analysis model is used for the regression coefficients to account for the correlations among genes and alleviate the curse of dimensionality. Through extensive simulation studies and real data analysis, we show that BClustLonG has improved performance over an empirical method.
|