Abstract:
|
Compositional data structures naturally arise across various research areas. The accurate estimation of the latent covariance matrix is a key task in compositional data analysis. The real-world compositional datasets are often littered with the complications such as compositional structure, high-dimensionality, heavy tails, and possible outliers. To address these challenges, we propose a new robust estimation procedure for the latent shape matrix of high-dimensional heavy-tailed compositional data, which is a scalar multiple of the latent covariance matrix when it exists. The proposed method allows for a broad class of elliptical distributions to model the latent log-basis variables and introduces a positive-definite robust estimation of the large latent shape matrix based on the celebrated Tyler's M-estimator (Tyler 1987) and Huber's M-estimator (Huber 1964). We prove the theoretical guarantees for the proposed method under the high-dimensional setting, including the selection consistency, sign consistency, convergence rate, and expected risk bound. We demonstrate the performance of our method through simulation studies and a real application to microbial inter-taxa analysis.
|