Abstract:
|
Omics data including DNA sequence, RNA, and protein data rises in many biological systems. Quantitative and qualitative modeling of these systems needs deep understanding of system functioning. In modern metagenomics, there are many potentially interacting objects (species, cell populations, molecules, etc.) of interest, which may need to be modeled via dynamics systems and lead to an explosion in the number of parameters. We propose an innovative model-based clustering technique to infer from time series data. The model is assessed to be feasible via a simulation study. Furthermore, many variables (features) within samples are measured in typical datasets, for example, levels of thousands of mRNA or proteins in many samples are collected, and such high dimensionality make visualization and interpretation of samples difficult and limit exploration of data. We develop a new clustering approach which is equivalent to finding dimensionality of data via principal component analysis, removing outliers and implementing clustering to group the principal component loadings on projected space. This approach is performing better than the other popular methods via Monte Carlo protocol.
|