Abstract:
|
Compositional data consist of nonnegative components that represent the proportions of parts of some entirety. Large-scale compositional data arise in many fields such as chemistry, ecology, and geology. Examples include species of human microbiome and elemental compositions of a chemical mixture or soil. Applying standard dimensionality reduction techniques such as the principal component analysis to compositional data is challenging due to several reasons: the unit-sum constraint, negative correlations among the components, difficulties in specifying distributions, and potential zero observations. We propose an iterative method to identify a low dimensional latent space for compositional data. In particular, we take a quasi-likelihood approach to latent space modeling based on the generalized linear model framework to deal with the aforementioned challenges. We assess the performance of the proposed method using synthetic data as well as real-world data against the existing log-ratio based method in terms of the ability of identifying the latent space, recovering underlying parameters, and dealing with different types of zero observations.
|