Project A1: Wasserstein Metrics in Statistics: Inference


The Wasserstein and related optimal transport (OT) distances have played a prominent role in physics, probability theory, analysis and related areas for more than two centuries. More recently, they have been recognized as a powerful tool in statistics at its interface with computational science and various applications. They excel where conventional methods fail, in particular OT distances adapt to the underlying geometric structure of the ground space. However, e.g. due to the lack of appropriate distributional results, rigorous statistical inference with Wasserstein distances was until recently mainly limited to measures on the real line.

Hence, this project aims to facilitate reliable statistical inference for broader classes of measures by providing distributional limits, deviation and concentration results and resampling schemes to imitate them. In the first period of this project we have derived limit laws and distributional bounds for the empirical optimal transport (EOT) distance for specific parametric classes such as the Gaussian family and for finite and countable spaces. While for the latter case the limiting random variable is given in full generality as the dual optimization problem over a Gaussian process this can be expressed as an explicit sum for spaces with a spanning tree structure, which then can be computed/simulated efficiently. Based on this, we provided risk bounds which allow to balance statistical accuracy and computational efficiency when subsampling the EOT and thus lead to EOT based inference tools for manifold types of data.

Current work on the regularized OT plan offers a route to obtain limit distributions and deviation bounds for the unregularized OT plan, not only for the OT distance. We aim to use this for refined OT based statistical inference. To this end we envision to develop resampling schemes to imitate these laws and provide explicit error expressions to fine tune these schemes. In particular, we will tailor these to specific geometries of the ground space, such as spheres and extend our methodology to dependent data. Based on this we will continue to explore real world applications of our methodology, for example in three dimensional image alignment problems.

We collaborate with A2 and A5 on the development and implementation of algorithms for computing OT plans and barycenters.
Further connections, e.g. via geometric constraints and empirical processes exist to projects B2 and B5.

Methods: Wasserstein distances, optimal transport, limit theorems, tree approximation, bootstrap, sensitivity analysis
Applications: statistical tests and confidence bands, randomized computation, image analysis, biochemical network analysis