An augmented Lagrangian approach to Wasserstein gradient flows and applications

Taking advantage of the Benamou-Brenier dynamic formulation of optimal transport, we propose a convex formulation for each step of the JKO scheme for Wasserstein gradient ﬂows which can be attacked by an augmented Lagrangian method which we call the ALG2-JKO scheme. We test the algorithm in particular on the porous medium equation. We also consider a semi implicit variant which enables us to treat nonlocal interactions as well as systems of interacting species. Regarding systems, we can also use the ALG2-JKO scheme for the simulation of crowd motion models with several species.


Introduction
It is well-known since the seminal work of Jordan Kinderlehrer and Otto [25] that the Fokker-Planck equation where the initial condition ρ 0 is a probability density may be viewed as the Wasserstein gradient flow of the (relative) entropy functional More generally, given an internal energy E, a potential V and an interaction potential W , evolution equations of the form ∂ t ρ = div(ρ∇(E (ρ) + V + W ρ)), ρ| t=0 = ρ 0 (1.3) is the Wasserstein gradient flow of the energy For instance, if E(ρ) = 1 m−1 ρ m and V = W = 0 one in particular recovers the porous medium equation ∂ t ρ = ∆ρ m , see the seminal work of Otto [34]. Convolution terms ∇W ρ in (1.3) arise naturally in aggregation equations [16] and models of granular media [18], [19].
The celebrated Jordan-Kinderlehrer-Otto (henceforth JKO) scheme consists, given a time-step τ > 0 in constructing inductively, starting from ρ 0 a sequence of probability measures ρ k by the implicit Euler scheme: ρ k+1 ∈ argmin ρ∈P 2 1 2τ W 2 2 (ρ, ρ k ) + E(ρ) (1.4) where P 2 denotes the set of probability measures on R d having finite second moments and W 2 2 is the squared 2-Wasserstein distance defined for every (ρ, ν) ∈ P 2 × P 2 by W 2 (ρ, ν) := inf γ∈Π(ρ,ν) R d ×R d |x − y| 2 dγ(x, y) 1 2 where Π(µ, ν) is the set of transport plans between ρ and ν i.e. the set of Borel probability measures on R d × R d having µ and ν as marginals. When E = S V is given by (1.2), Jordan, Kinderlehrer and Otto [25] proved that one recovers the solution of the Fokker-Planck equation (1.1) by letting τ tend to 0 in the JKO scheme. Similar convergence results hold for the more general equation (1.3) under suitable assumptions on E, V and W . The theory of Wasserstein gradient flows is by now well-developed and it is detailed in the textbooks of Ambrosio, Gigli and Savaré [1], Villani [38], [39] and Santambrogio [37]. We remark that the JKO scheme (1.4) is constructive and it is very natural and tempting to try to apply it for numerical purposes. The positivity, mass conservation and energy dissipation are inbuilt in the JKO scheme and non trivial to preserve with non-linear finite-difference or finite volume schemes (see [14] and references therein). Also some JKO gradient flows, like congested crowd motions [31], cannot be formulated as nonlinear PDEs and the JKO semi-discretisation is the only numerical option.
A serious difficulty with this approach is in the Wasserstein term which involves solving a costly optimal transport problem at each step. In dimension one, this is not really an issue since the optimal transport is essentially a rearrangement problem, and in fact, this 1-D numerical approach was proposed in the early work of Kinderlehrer and Walkington [26] and was used repeatedly. See in particular the recent work of Osberger and Matthes [33] for application to fourth-order evolution PDEs of thin films type. In higher dimensions, the optimisation problem in (1.4) is much more complicated because the optimal transport is given by Brenier's map, the gradient of a convex potential which solves some Monge-Ampère equation.
At least three categories of approaches have been followed to solve (1.4) numerically.
A first "Lagrangian" strategy based on Brenier's Theorem is to formulate the problem in terms of the transport map or its potential instead of the density ρ to avoid dealing with the positivity and mass constraints. It also allows a more consistent discretisation of the mass when the density concentrates or dilates. This is done for instance in Carrillo and Moll [17] who proposed a Lagrangian scheme, based on a gradient flow for evolving diffeomorphisms (not necessarily the optimal transport maps) related to a system of evolution equations which is very nonlinear since it involves cofactors. Osberger and Matthes [30] introduce a Galerkin discretisation of the potential. As illustrated in [24], where another Lagrangian method is introduced, a difficulty with the Lagrangian approach is the construction of a discrete density to be used in the internal energy. A semi-discrete solution to this problem has been proposed in [3] based on optimal maps, a discretisation of the Monge-Ampère operator and techniques of computational geometry. This method is provably convergent and enables one to use of a Newton method. Note that using monotone finite difference Monge-Ampère solvers as introduced in [10], [9] could be an option but it does not seem to have been tried.
A second strategy, which is Eulerian, is to use the Monge-Kantorovich linear relaxation of the Wasserstein distance in (1.4). The size of the discretisation is very limited by the linear programming approach. However, Peyré [36] recently generalised entropic regularisation techniques that are computationally efficient in optimal transport [8] to treat JKO gradient flows.
In the present paper we investigate a third approach, also Eulerian, based on replacing the Wasserstein distance with the Benamou-Brenier formulation [5]. This idea has already been used in [11], [12], [2], [4] either for JKO steps or in optimisation problems where the Wasserstein distance intervenes. Our original contribution, initiated in [7] on a general class of Optimal Transport problems and variational Mean Field Games [29], is to extend this convex reformulation and its augmented Lagrangian numerical resolution based on the algorithm ALG2 of Glowinski and Fortin [22] to solve a succession of problems of the form (1.4), we will call this method ALG2-JKO. We also show that the method can be adapted to treat systems (which are not necessarily gradient flows) using the relaxation introduced in [20] by Di Francesco and Fagioli and extended to the diffusive case in [15].
The Benamou-Brenier formulation induces an extra time dimension in each of the JKO steps. The resulting extra cost because of the discretisation of the inner (Benamou-Brenier) time dimension is usually considered a draw back. In the ALG2-JKO scheme however, since the successive JKO density time snapshots are close only a very few inner timesteps are needed in practice. The ALG2 augmented Lagrangian method is very robust, can deal with non-smooth energies but remains a proximal splitting first order method and converges slowly [35].
The paper is organized as follows. In section 2, we describe the ALG2-JKO scheme. In section 3, we illustrate the algorithm on two examples: the porous medium equation and a model of crowd motion with diffusion introduced by Santambrogio and Mészáros in [32]. Section 4 addresses the case of interaction terms, as in aggregation or granular media equations by a semi-implicit variant of the ALG2-JKO scheme. Finally, section 5 extends the method to several systems: interacting species (which are not gradient flows) and diffusive models of crowd motionsà la Santambrogio and Mészáros with several populations.

The ALG2-JKO scheme
Let us consider one step of the JKO scheme (1.4) in the case where the energy E is of the form with E a convex internal energy (typical cases being the entropy or a convex power), which corresponds to the time discretization of the PDE: Our goal is to rewrite (1.4) as a tractable convex problem. To do so, we use the Benamou-Brenier dynamic formula [5] to rewrite W 2 2 as 2) which is a convex variational program (it is implicit that the energy above is set to +∞ whenever µ becomes negative or when µ = 0 and m = 0 so that momentum m can be written as m = µv that is m vanishes where µ does and then |m| 2 /µ = µ|v| 2 is the kinetic energy).
Thanks to (2.2) one can rewrite one step of the JKO scheme (1.4) as the convex minimization: subject to the constraints that µ ≥ 0, m = 0 when µ = 0 and the linear constraint One then recovers ρ k+1 = µ 1 (and actually even an interpolation (µ t ) t∈[0,1] between ρ k and ρ k+1 ). Of course we can consider variants, for instance the periodic (in space) case or the case of a smooth bounded domain Ω of R d . In the latter case, we have to supplement the PDE (2.1) with the Neumann boundary condition: subject to the constraints that µ ≥ 0, m = 0 when µ = 0 and the linear constraint

Augmented Lagrangian formulation
Convex time-dependent problems like (2.6) subject to a divergence constraint (2.7) appear in various contexts, they are actually particular cases of deterministic Mean-Field Games (a class of games with a continuum of players introduced by Lions and Lasry [27], [28]). Such problems can be solved by Augmented Lagrangian methods, see in particular [7] for applications to Mean-Field Games, Papadakis, Peyré and Oudet [35] for connections with proximal schemes and Buttazzo, Jimenez and Oudet [13] for applications to congested transport. We now recall the principle of the Augmented Lagrangian approach and explain how to use it in the JKO framework. As was observed by Benamou and Brenier [5] the convex lsc 1-homogeneous function defined for (µ, m) ∈ R × R d by: is the support function of the convex set Our convex problem (2.6)-(2.7) then is dual to: where E * is the Legendre tranform of E (extended by +∞ on (−∞, 0]): where, slightly abusing notations, we have set We then rewrite the dual as where χ K denotes the indicator function in the sense that (φ, σ) satisfies the optimality conditions of (2.10) and (2.6)-(2.7) respectively if and only if is a saddle-point of L. Now for r > 0, we consider the augmented Lagrangian function and recall (see for instance [22], [23]) that being a saddle-point of L is equivalent to being a saddle-point of L r . The augmented Lagrangian algorithm ALG2 consists, starting from (φ 0 , q 0 , σ 0 ) to generate inductively a sequence (φ n , q n , σ n ) as follows: • Step 1: minimization with respect to φ: Step 2: minimization with respect to q: 14) • Step 3: update the multiplier by the gradient ascent formula The convergence of ALG2 to a saddle-point is well documented see in particular the general results of Bertsekas and Eckstein [21] in finite dimensions. We therefore have to understand that in the problems above, we have already projected the potentials in (2.10) on a finite-dimensional space of finite-elements and therefore deal with a finite-dimensional problem for which existence of a saddle-point is rather standard and convergence follows from [21].

Details for the three steps
Step 1 corresponds to a linear elliptic problem in t and x, −r∆ t,x φ n+1 = div t,x ((µ n , m n ) − r(a n , b n )), in (0, 1) × Ω, (2.16) together with the boundary conditions and (or periodic boundary conditions if Ω is replaced by the flat torus).
Step 2 splits into two convex pointwise (i.e. for every t and x) minimization subproblems, the first one (minimization with respect to (a, b)) is a projection problem onto the parabola K: where the projection P K onto K is explicit (see [5] or [35]): The second subproblem gives the update for c which is obtained by solving so that (2.22) can be rewritten as Thanks to the well-known (and actually easy to check) Moreau's identity we see that it is not necessary to compute E * to solve (2.22) if the computation of prox E turns out to be easier.
Step 3 is an explicit update which may be detailed as Note that only the minimization with respect to c (2.22) in step 2 depends on the form of the energy E, we shall give details for this step in each application given in the sequel.
In the Finite Element discretisation of this algorithm, we use P 2 FE (in time and space) for φ and P 1 FE for σ so that in (2.20), in fact, one has to understand Dφ n+1 as its projection onto P 1 FE. It was implemented in FreeFeem++ 1 . In practice, a discretization with 32 × 32 triangles in space and 4 inner timesteps needs a few hundreds iterations of ALG2 for each JKO time step. This is a few minutes on a standard laptop. Larger discretizations can be done using FreeFem mpi version which uses for instance MUMPS parallel linear solver 2 .

Applications
We now present two appplications of the ALG2-JKO scheme: the first one deals with the porous medium equation and the second one with a diffusive model of crowd motion recently introduced in [32].

Application to the porous medium equation
The porous medium equation In this case and then (2.22) consists in a pointwise minimization problem: given The case of a linear diffusion (Fokker-Planck) corresponds to an entropic internal energy E(ρ) = ρ log(ρ), in which case by similar computations, one finds c = c(x) by solving Figure 1 shows the evolution of the density for m = 3 and V = |x| 2 2 . As expected, we converge towards the stationnary Barenblatt profile Discretization is 16 × 16 in space. We use 8 internal time steps in each JKO step and make sure the ALG2 iterations converge "reasonably" (a 10 −6 tolerance is prescribed in the optimality system). There are 200 JKO steps (timestep = 0.01).
The plot in figure 2 gives quantitative information on the convergence towards the stationnary profile, decrease of energy and conservation of mass.

Application to crowd motions
In [32], Santambrogio and Mészáros considered a model of congested crowd motion with diffusion which leads to m−1 , reached up to discretization error again (in log scale).The last curve shows the decrease of the difference between the density energy (potential + entropy) E(ρ) and Barenblatt energy E(BB) in log scale. As expected the decrease behaves like −2 t down to numerical discretization error.
with no flux boundary condition. In the nondiffusive case (no Laplacian in the left hand side), this model is due to Maury, Roudneff-Chupin and Santambrogio [31] who made it clear that it has a gradient flow structure. The diffusive case (3.5), of course also has a gradient flow structure for the following energy A direct computation gives ). The first row represents the evolution under the constraint ρ 1 and the second the evolution under the constraint ρ 2.
In figure 3, we represent the evolution of one species, the potential has three minima (hot spots where the crowd wants to go) but with two different density constraints. When the density threshold is higher (second row ρ 2) then, at the end, the density is more concentrated around the three minima of the potential. 13

Semi-implicit variant and nonlocal interactions
If we consider a nonlocal interaction term Ω×Ω W (x, y)ρ(x)ρ(y)dxdy (with W symmetric and smooth) in the general form of the equation (1.3), the final term E becomes nonconvex. Therefore, in order to be able to use our Augmented Lagrangian strategy, we have to modify the JKO scheme in a semiimplicit way by replacing the nonconvex bilinear term 1 2 Ω×Ω W (x, y)ρ(x)ρ(y)dxdy by the linear one Ω W (x, y)ρ(x)ρ k (y)dxdy. This leads to the semi-implicit scheme (4.2) We refer to [15] for the convergence of this scheme to the solution of equation (1.3). For each time step, we then have to solve exactly the same type of problems as in section 3.1 except that the potential has to be updated at each step.

Systems
Our aim now is to show that the ALG2-JKO scheme can also be used for systems. We first show how the semi-implicit ALG2-JKO scheme can easily be extended to systems which are coupled through interaction drift terms (such systems are not gradient flows in general). We then address the generalization of the Mészáros and Santambrogio model to the case of several populations, in this case, the coupling is through a total density constraint (and then a common pressure field) and the corresponding system actually has a gradient flow structure.

Interacting species
Let us take two species for the sake of simplicity and consider the evolution of the densities of these two species, coupled only through interaction terms: )) (5.1) with energies E 1 and E 2 corresponding to independent linear or non linear diffusion terms and the coupling drift terms given (as in Di Francesco and Fagioli [20]) by convolutions with smooth kernels and individual potentials V 1 and V 2 : The nondiffusive case where E 1 = E 2 = 0 was studied by Di Francesco and Fagioli [20]. Note that (5.1) is not a gradient flow except in the particular case where interactions are symmetric i.e. W 12 = W 21 . The semi-implicit JKO scheme first proposed by Di Francesco and Fagioli in the nondiffusive case consists in defining inductively ρ k 1 and ρ k 2 by ρ k+1 The convergence of this scheme is established in Di Francesco and Fagioli [20] in the nondiffusive case and by Carlier and Laborde [15] in the diffusive case. Clearly, each independent subproblem (5.2) and (5.3) can be solved by the ALG2-JKO scheme described in section 2 and the extension to more than two species is direct as shown in the simulations below.
In figure 5, we see the evolution of two species which solve (5.1) with and W 12 (x) = − |x| 2 2 . In other words, the first species is attracted by the second one but the second species is repelled by the first one. Since we have attractive self-interactions, the two species do not spread too much.  This scheme can treat systems with more than two species. In figure 6, we represent evolution of three species which run after each other. More precisely, the interaction potentials we use are of the form where ρ 4 := ρ 1 and ρ 0 := ρ 3 .  Figure 6: Evolution of three species running after each other with linear diffusion. Top row: display of ρ 1 + ρ 2 + ρ 3 . Bottom row: display of ρ 1 .

Crowd motion with diffusion and several species
A natural variant of the model of [32], consists in considering two (or more) populations, each of whom having its own potential but coupled through the constraint that the total density cannot exceed 1 and then subject to a common pressure field. Note that a variant, without diffusion, with fixed initial and final densities and total density equal to 1 was treated in [6]. For a linear diffusion (corresponding to a Brownian noise on each species), the two-species crowd dynamics is expressed by the PDEs Which is the gradient flow (for the product Wasserstein distance) of the energy For a more general energy of the form the gradient flow of E is the following system with a coupling in the diffusion: The JKO scheme for this energy then reads which, in the particular case of the linear diffusion crowd motion problem with two species, takes the form Arguing as in section 2, setting φ = (φ 1 , φ 2 ), (Dφ 1 , Dφ 2 ) := (∂ t φ 1 , ∇φ 1 , ∂ t φ 2 , ∇φ 2 ), q = (q 1 , q 2 ) = (a 1 , b 1 , c 1 , a 2 , b 2 , c 2 ), σ = (σ 1 , σ 2 ) = ((µ 1 , m 1 , µ 1 ), (µ 2 , m 2 , µ 2 )) and defining the convex set K by (2.8), one can rewrite the discretization of (5.5) as a saddle-point problem for the augmented Lagrangian One can then again use ALG2 for this Lagrangian, note that since the coupling between the two species only appears in term τ E * c 1 τ , c 2 τ , the only significant difference is in the proximal problem in the variables c = (c 1 , c 2 ) of the second step of the algorithm. Taking r = 1 for simplicity and using remark 2.1, we see that the two-populations analogue of (2.22) simply takes the form In the two populations crowd motion model with linear diffusion, setting whose solution is again quasi explicit. More precisely where ψ τ (α) is the root in (0, 1) of otherwise. This proximal computation therefore only involves scalar monotone equations and is therefore not more complicated than what we saw in the case of a single equation. In figure 7, we see two populations which cross each other. When they start to cross each other at time t = 0.05, we remark that the density of ρ 1 and ρ 2 decrease and the sum is saturated.  Top row: display of ρ 1 + ρ 2 . Bottom row: display of ρ 1 .
In figure 8, we add a obstacle in the middle. This can be done using a potential with very high value in this area.