Abstract:
|
We propose a novel method for distance metric learning for multi-class compositional data. This problem setup is motivated by common learning tasks on data sets arising in microbial ecology, such as relative abundances generated from 16S sequencing experiments. Our approach can specifically handle data that contain a large number of zero measurements (zero inflation), a common property for data acquired from targeted, high throughput sequencing. In previous work, Generalized Aitchison Embeddings were proposed as an extension of John Aitchison's log-ratio based framework to map image histograms from the Simplex to a suitable Euclidean space. We propose a novel algorithm to learn a Mahalanobis metric from microbial compositions given some meta-data (i.e. sample class), posed as a non-convex optimization problem.
|