Abstract:
|
In some applications, data may be measured from different instruments or conditions, splitting the whole data into multiple domains or "views". For example, single-cell RNA-sequencing (scRNA-seq) measurements may be obtained from the same cellular system under different treatments, resulting in batch effects that affect downstream analysis. In many cases, these different domains are difficult to integrate. Previous works have tackled the purely unsupervised and semi-supervised cases. The former assumes there is no known shared information between domains, whereas the latter assumes that one-to-one correspondences between some data points are known. In this work, we consider a scenario between these two: we assume that class labels are known in each domain without one-to-one data point correspondence. We present a manifold alignment approach using a neural network that exploits label information and the data geometry to accurately integrate different domains. We show that our method successfully integrates these multiple views in both real and simulated data, as well as how it can be used to understand inter-domain relationships and improve downstream analysis tasks.
|