Abstract:
|
Wasserstein distance may be used to evaluate model fit when likelihood based methods fail. Brenier showed that the choice of squared L2-distance as the cost function in the Wasserstein distance integral guarantees that the transport map associated with the Wasserstein distance is the gradient of a convex function with a specific form. We explicitly identify the relationship between Wasserstein distance and the optimum transport map, and formulate an algorithm for identifying such a convex function whose corresponding transport map takes sets of equal probability measure to each empirical data point. In contrast with previous methods for computation of Wasserstein distance, this formulation enables identification of the optimal transport map without direct computation of the transport cost. Subsequent evaluation of the transport cost using only optimal transport maps enables model selection. We evaluate our optimization algorithm, elucidate several methods for model selection, and evaluate model fit in two-stage Hurdle models for single-cell gene expression data by analyzing uniformity of homogenized P-values.
|