CROWD MOTION AND EVOLUTION PDES UNDER DENSITY CONSTRAINTS

. This is a survey about the theory of density-constrained evolutions in the Wasserstein space developed by B. Maury, the author, and their collaborators as a model for crowd motion. Connections with microscopic models and other PDEs are presented, as well as several time-discretization schemes based on variational techniques, together with the main theorems guaranteeing their convergence as a tool to prove existence results. Then, a section is devoted to the uniqueness question, and a last one to diﬀerent numerical methods inspired by optimal transport.


Introduction
Modeling the behavior of human crowds when individuals are an obstacle for each other is a natural issue in applied mathematics, connected in a broad sense to many subdisciplines including, for instance, game theory (when individuals are considered to be rational agents) and fluid mechanics (when they are described as the particles of a fluid). Applications of such a modeling challenge can be crucial in particular in emergency scenarios, such as the evacuation of an area in case of danger, or in potentially critical situations, whenever huge crowds gather together (as in big sport, political, or religious meetings).
Many models have been studied, each with the goal of handling a particular sub-brick of the whole modeling construction. We cite for instance the studies by Helbing [26] or Hughes [27] which are now widely cited.
The present paper is concerned with a sort of meta-model introduced by B. Maury and his collaborators approximately ten years ago, first in a microscopic setting and then in a macroscopic framework in collaboration with the author. Calling it meta-model instead of model means that it is not really concerned with the behavioral choices of the individuals but only with a rigorous mathematical handling of the contacts between them. The motion of the crowd is described through a spontaneous velocity field u (which can be determined in exogenous or endogenous ways, and can take into account exterior obstacles, individual preferences, strategical choices. . . ), which stands for the velocity each agent would follow if the others were not present, and is then transformed into the actual velocity field v which drives the evolution; the difference between v and u is due to the contacts between individuals.
The paper will be essentially devoted to the macroscopic setting, which has been developed as a continuous counterpart of the microscopic one via tools coming from optimal transport. The work leading to the first paper on the subject, [34], started more than ten years ago, and many developments have occurred since then.
After a brief introduction we will see how the problem can be formulated into a class of evolution PDEs under density constraints, and discuss its connections with other evolution equations. The questions to be studied are the classical ones in PDEs: existence, approximation, uniqueness and numerical simulations, and the present paper is a survey on these issues. Existence (together with approximation) and uniqueness are treated in two different sections, which explains why the convergence results of Section 2 are stated "up to subsequences", with remarks explaining how the convergence can be obtained on the full sequence using the results of Section 3.
The idea to write such a survey comes from the need to clarify and summarize the evolution of the theory since [34]. For instance, we now understand better uniqueness issues, thanks in particular to [22], but we are also able to formulate clear conjectures on this matter. In what concerns existence and approximation, [34] was only concerned with the gradient-flow case; this is an important setting but not the only reasonable one: we will see how the gradient structure simplifies the setting but the PDE can also be studied without it. Looking back at [34], the reader will find how much effort had been put in dealing with the mass which exits the domain (in the evacuation setting: the mass going out of the door), which did not appear any more in the subsequent literature. This was essentially due to a certain fear of dealing with non-convex domains (a remainder of the geodesic convexity condition usually required for gradient flows in [4]), which suggested not to extend the domain beyond the door, and use instead subtle mathematical tricks to handle possible concentrations of the mass on part of the boundary. One of the goals of this survey is also to provide a better reference for the readers, so that they do not have to refer to [34] and get confused with the exit problem, and to optimize the assumptions in the main theorems. Finally, numerical methods have evolved since [34] (for a review of computational optimal transport, see [18]) and we will present both the stochastic projection idea used there and the methods which are the state-of-the-art now for Wasserstein gradient flows in the optimal transport community (see also [48]).
Of course, the material to present is a lot, and this survey is not so synthetic. The main goal is to put some order into the increasing literature on these equations, and the typical readers of this survey are persons who already have met part of this literature but look for a more structured reference. The applications to crowd motion modeling are the starting point and the motivation but, due to the meta-model nature of the subject, most of the paper will be devoted to the mathematical analysis of the abstract evolution PDEs which are motivated by these models.

Micro and macro models for crowd motions with constraints
In this section, starting from the microscopic setting, we will describe the models presented in [34][35][36][37] to deal with crowd evolution involving constraints due to contacts between the individuals.
We consider a population of particles in a domain Ω ⊂ R d , and suppose that each particle, representing an individual, has its own spontaneous velocity u (which could a priori depend on time, position, on the particle label itself, on the interaction with the others. . . this depends on the exact model one wants to consider), which represents the velocity it would follow if alone. Yet, particles are described as rigid disks that cannot overlap, and the presence of the others has an influence on the feasible velocities. The actual velocity cannot always be u, in particular when u tends to concentrate (which is the case every time many individuals in a large area try to exit through a small opening, for instance). We will call v the actual velocity of each particle, and the main assumption of the model is the following: v will be the projection of u onto the set of feasible velocities, which depends on the configuration of the particles. More precisely, v = P adm(q) [u], where q is the particle configuration, adm(q) is the set of velocities that do not induce (for an infinitesimal time) overlapping starting from the configuration q, and P adm(q) is the projection (in the high-dimensional Euclidean space R dN , N being the number of particles) onto this set.
For simplicity, we will suppose that every particle is a disk with a same radius R and denote by q i the position of the center of the i-thparticle. In this case we define the admissible set K of configurations through The set of admissible velocities is easily seen to be the following convex cone: The evolution equation which models the motion of q will then be the ODE (1.1) (with q(0) given). Equation (1.1), not easy from a mathematical point of view, was studied by Maury and Venel in [36,37] in the framework of the so-called Moreau Sweeping process. This process consists in the differential inclusion x (t) ∈ −N C(t) (x(t)), for a given moving convex domain C(t) and represents the motion of a particle forced to stay inside C(t) and pushed by its boundary (see, for instance, [43]); variants with exterior forcing terms, which make the process interesting even when C does not depend on t, can also be studied, as well as variants where C(t) is not convex. In our case, (1.1) can be written as the differential inclusion where N K is the normal cone to the set K defined via It is also possible to discretize in time this equation via the so-called catching-up algorithm as follows (for a small time step τ > 0 which will tend to 0), where P K is the projection onto the set K (note the difference between the projection in the space of velocities onto adm(q) and the projection in the configuration space onto K). The same algorithm is also called prediction-correction algorithm or, in a less specific way, splitting algorithm. It is important here to notice that K, even if not convex in Ω N , is as at least prox-regular (the projection on K is well defined on a neighborhood of K), which makes the above differential inclusion and the above discrete scheme well-defined. All in all, we exactly have a perturbed sweeping process with exterior forcing u. The mathematical and numerical analysis (with numerical simulations with impressively many particles) has been done in [36,37].
We are now interested in the simplest continuous counterpart of this microscopic model. We do not claim at all that this is any kind of homogenized limit of the microscopic case, but we only propose it as an easy re-formulation in a density setting. In this case • the particles population will be described by a probability density ∈ P(Ω), • the constraint becomes a maximal density constraint ≤ 1 (we define the set K = { ∈ P(Ω) : ≤ 1}, which means that measures in K are absolutely continuous and their density is smaller than 1), • the set of admissible velocities will be described by the sign of the divergence on the saturated region { = 1}: adm( ) = v : Ω → R d : ∇ · v ≥ 0 on { = 1} (in a sense to be made precise, because we are considering divergences of non-smooth vector fields), • the projection P in the space of velocities will be computed either in L 2 (L d ) (L d denotes the Lebesgue measure in R d ) or in L 2 ( ), but this turns out to be the same, since the only relevant zone is { = 1}, • the motion of the crowd will be described by the continuity equation with no-flux boubdary conditions on ∂Ω. The main difficulties in studying Equation (1.2) come from the lack of regularity: the vector field v = P adm( t) [u t ] is neither smooth itself (since it is obtained as an L 2 projection, and may only be expected to be L 2 a priori), nor it depends continuously on (it is very sensitive to small changes in the values of : passing from a density 1 to a density 1 − ε completely modifies the saturated zone, the admissible set of velocities, and the projection onto it).
Before introducing an approximation scheme for this equation, we want to make precise the definitions above, in particular in what concerns the divergence. Indeed, it is more convenient to describe adm( ) by duality: In this way we characterize v = P adm( ) (u) through where press( ) is the space of functions p used as test functions in the dual definition of adm( ). They play the role of pressures affecting the movement. The two cones ∇press( ) (the set of gradients of elements of press( )) and adm( ) are in duality for the L 2 scalar product (i.e. one is defined as the set of vectors with a negative scalar product with all the elements of the other). This allows for the orthogonal decomposition u t = v t + ∇p t , and gives an alternative expression of Equation (1.2), i.e.
Note (details are in [21], for instance), that the orthogonality condition´(u t − ∇p t ) · ∇p t = 0 has disappeared, since it is guaranteed, for a.e. t, by the other conditions. Finally, the term ∇ · ( t ∇p t )) can be replaced by ∆p t since anyway the pressure p t and its gradient are non-zero only on the saturated region { t = 1}.
It is interesting to see that the above equation strongly recalls the density dynamics formulation of the Hele-Shaw flow. In order to present it in few words, consider an equation of the form where the term G possibly stands for reaction terms, and p and are bound by the condition p t ∈ H( t ), where H is a monotone graph. When H is given by H(s) = s m we have a porous-medium equation (a form of non-linear diffusion, see [49]), and when H = ∂I [0,1] (by I A we denote the indicator function in the convex analysis setting : I A = 0 on A, I A = +∞ on A c ; thus, we have H(s) = [0, +∞[ for s = 1, H(s) = {0} for 0 < s < 1), the pressure p ≥ 0 is arbitrary, but satisfies p(1 − ) = 0, exactly as in (1.3). We refer, for instance, to [11,16] and to the more recent [45] for these equations.
In particular, when G = G , withG ≥ 0, and 0 = 1 Ω0 is a patch (i.e. no intermediate density between 0 and 1 is allowed), the evolution is of the form t = 1 Ωt with Ω t evolving with normal velocity This free-boundary formulation of the Hele-Shaw flow, usually studied in terms of viscosity solutions, could be more familiar for some readers (see, for instance, [30,31]) Our equation (1.3) has the same form, but we replace reaction terms with an advection term. This kind of terms happens to be much less studied in the Hele-Shaw literature and lets different difficulties appear, in particular when studying the uniqueness of the solution (see Section 3). Moreover, it seems that the approach with optimal transport (which will be the key to provide the existence of a solution in the next section) has not yet been exploited in the Hele-Shaw community.

Existence and approximation: the role of optimal transport
We will present in this section an iterative scheme, together with some variants, which provide time-discrete approximations of the solution of (1.3).
First, we fix a time step τ > 0: we look for a sequence ( τ n ) n where τ n will represent the measure t at time t = nτ . To pass from τ n to τ n+1 , we define a splitting scheme in the space of measures Here # denotes the push-forward (or image) measure, and the important point is to choose the metric in the projection P K . The vector field u nτ is the evaluation of the time-dependent vector field u t at time t = nτ .
The key point is using the W 2 projection (where W 2 is the quadratic Wasserstein distance, induced by optimal transport, see below), instead of L 2 or other Hilbertian projections. Indeed, only the distance W 2 corresponds to the L 2 projection of velocity fields and of (Lagrangian) positions.

Optimal transport and Wasserstein distances
For the whole theory of optimal transport we refer to [50] or [47]. Here we only give the most basic ingredients that we need, with no proof. If two probabilities µ, ν ∈ P(Ω) are given on a compact domain, the Monge problem, [42], for the quadratic cost reads The formulation given by Kantorovich, [29], is a relaxation of the above optimization problem inf ˆ1 2 |x − y| 2 dγ : γ ∈ P(Ω × Ω), (π x ) # γ = µ, (π y ) # γ = ν , where π x and π y are the two projections on Ω × Ω onto its factors (hence, the admissible measures γ, called transport plans, are those with prescribed marginals µ and ν). Since this second problem is convex, it is possible to find a dual problem (this is the main contribution by Kantorovich), which reads sup ˆϕ dµ +ˆψ dν : It is possible to prove that both the primal and the dual Kantorovich problems admit solutions and that, whenever µ L d , then the optimal γ in the Kantorovich problem is of the form γ = (id, T ) # µ (it is concentrated on the graph of a map T : Ω → Ω), and this map T , which is optimal for the Monge problem, is related to the optimal pair (ϕ, ψ) in the dual problem via T (x) = x − ∇ϕ(x). The optimal ϕ is called Kantorovich potential, is Lipschitz continuous (which guarantees the differentiability a.e.), and is such that u(x) = |x| 2 /2 − ϕ(x) is convex. In particular the optimal transport map is of the form T = ∇u, i.e. it is the gradient of a convex function. This is a well-known result by Brenier [12,13].
Moreover, one can use the values of the above problems (which all coincide, at least when µ L d ) in order to define a distance on probability measures. More precisely, we define Hence, the values in (2.2),(2.3),(2.4), which depend on µ and ν, are equal to 1 2 W 2 2 (µ, ν). Then, it is possible to prove that W 2 , called Wasserstein distance, is a distance on P(Ω) which metrizes the weak-* convergence of probabilities (on compact domains; see Chapter 5 in [47]), and use this distance to define projections.
A final important property that we need about Wasserstein distances and Kantorovich potentials is the following (see Chapter 7 in [47]): if ν is fixed and one defines ε := (1 − ε) + ε˜ , then we have d dε where ϕ is the Kantorovich potential in the transport from to ν, provided it is unique up to additive constants. This means that ϕ is the first variation of the half squared Wasserstein distance.

Wasserstein projections
The projection problem, which is a crucial tool in all the analysis of this paper, reads as follows: fix a measure ν ∈ P(Ω) and solve The right-hand side of the above inequality shows that this is a convex problem, and it is possible to characterize its solution via its optimality condition. As usual, we can say that, if¯ is a solution of the above projection problem, then¯ must also minimize, under the same constraints, the linearization (around =¯ ) of the functional → 1 2 W 2 2 ( , ν). Hence, it is possible to prove (see [34]) that there exists a functionφ, which is a Kantorovich potential in the transport from¯ to ν, such that¯ also solves min ˆφ d : ≤ 1 .
The solutions of the above problem are easy to characterize: there should exist a constant such that In particular, we can define p := ( − ϕ) + and observe that we have p ∈ press(¯ ) and, passing to gradients, we have¯ − a.e.
It is useful to understand why this projection allows, when applied to the setting of the splitting scheme described so far, to recover the desired PDE, i.e. why we chose the distance W 2 in the projection step.
Indeed, in this case T transports τ n+1 to˜ τ n+1 . We first note that we have This suggest to scale the pressure (we call it now τ p) and get the situation in Figure 1. Formally, we have and get a solution of the PDE.
Before going on, we want to summarize some properties of the projection operator P K . Its properties are essential for proving convergence. To be more general, we will describe the properties of the operator P K(m) , defined via P K(m) [ν] := argmin{W 2 ( , ν), ≤ m}, for general measures m. Here is what we know about P K(m) : is strictly convex as soon as ν L d (see Chapter 7 in [47]). This provides uniqueness of the projection, and hence continuity of the projection operator (for arbitrary m, if ν is absolutely continuous).
• Uniqueness actually holds for every ν, in the case m L d (see [20]) as a consequence of the fact that the projection must be of the form = m1 A + ν1 A c .
• For m = 1, when Ω is convex, the geodesic convexity of { : ≤ 1} (w.r.t. Wasserstein geodesics, see Section 3) also gives uniqueness, and Hölder continuity w.r.t. W 2 (see [34], but in particular Proposition 2.3.4 in [46]); on the other hand, we do not know whether the projection is Lipschitz continuous for W 2 (see below). • The projection preserves ordering (in order to give a meaning to this statement, one needs to extend P K(m) to measures with arbitrary finite mass, and not restrict to probabilities) and decreases the L 1 distance between densities (these facts are folklore among specialists, but a precise written reference is unfortunately missing; they are true for every m L d ). • Estimates (order 0): for m = 1 and f convex, →´f ( (x))dx decreases under projection (see [41]).
• Estimates (order 1): the BV norm decreases under projection in the case m = 1 (see [20]; a corresponding estimate, involving the BV norm of m also exists for non-constant f , but reads: ||P K(m) [ν]|| BV ≤ ||ν|| BV + 2||m|| BV , and the constant 2 is sharp). On the other hand, W 1,p norms do not decrease under projection, and it can happen that / ∈ W 1,p while ν ∈ W 1,p (as in the example illustrated in Figure 2, where the projection = P K [ν] has jumps which were absent in ν).
Open question 1. Is it true that the projection P K(m) is Lipschitz continuous for the W 2 distance? is it 1-Lipschitz, at least for m = 1 or for particular assumptions on m?

Convergence of the scheme and diffusive variants
We summarize here some convergence statements for the splitting scheme. In all these statements, what we do is the following: we take an initial measure 0 ∈ P(Ω) and a time horizon T < +∞; we iterate the splitting scheme defining the two sequences˜ τ n and τ n as soon as nτ ≤ T ; then, we consider the piecewise constant interpolation τ t = τ n+1 for t ∈]nτ, (n + 1)τ ]. The goal is to prove that up to subsequences, as τ → 0, the curves t → τ t uniformly converge on [0, T ] (as curves valued in the Wasserstein space with distance W 2 ) to a curve t → t which solves the desired equation. The proofs proposed in [21,34,41,47] actually use several interpolation sequences, but this is just a technical tool and we will not enter here into the details of the proof. Also, the convergence as τ → 0 is essentially a weak convergence (since we mentioned convergence for W 2 ) and this requires a special attention for the convergence of non-linear terms, involving in particular the pressure. The non-linear term ∇ · ( ∇p) in the equation has been transformed, as we said in Section 1, into an easier term ∆p, but the non-linearity is preserved in the constraint p(1 − ) = 0. In order to handle it, the main tool exploits H 1 bounds in x on the pressure and a variant of the so-called Aubin-Lions lemma (see [5]), which allows to combine the bounds in time and space. For the precise proofs in the framework of density-constrained equations, we refer to Step 3 of Section 3.2 in [34], Lemma 3.5 in [41], and Lemma 4.6 in [21]. A similar argument is also performed in Lemma 1 in [14].
We first start from the statement related to the scheme we described so far.
Theorem 2.1. Given 0 ∈ K and a time-dependent velocity field, define the splitting scheme according to (2.1). Suppose that u is continuous in (t, x), and C 1 in x, uniformly in t. Then, the scheme converges up to a subsequence to a solution of (1.3).
We do not discuss here the need to pass to subsequences: of course, we have full convergence in case of uniqueness of the solution of (1.3). The results of Section 3 can indeed be applied to this case, since our assumption on u implies Lipschitz continuity in x, and hence u satisfies the monotonicity assumption of Theorem 3.1.
Note that this statement is given in [46] (Theorem 2.3.6), with two extra assumptions. First, Ω was required to be convex, which is not really necessary (it was used to guarantee the uniqueness of the projection, but this can also be obtained in other ways, as we saw above). Then, there was the assumption that (id + τ u)(Ω) ⊂ Ω for small τ (which amounts to a condition on u on ∂Ω). This was necessary so that the first step in the scheme does not go out of Ω, but is not crucial neither. We can define the projection step also for measures not supported on Ω, but on Ω ⊃ Ω, and project on the set of those which belong to K and are concentrated on Ω. Also note that, in this case, close to the regions where u points outwards there will be instantaneous creation of a saturated region { = 1}.
We now observe that taking˜ τ n+1 = (id + τ u nτ ) # τ n is just a possible choice. When u is not regular enough or depends on there are better options, that we will describe in a more general framework. We will consider a possible diffusion in the equation, by replacing the first equation in (1.3) with where σ ≥ 0 is a volatility. Again, we consider a no-flux boundary condition on ∂Ω, which reads in this case ( t u t − ∇p t − σ∇ t ) · n = 0. We define now a new splitting scheme by first taking the solution of the Fokker-Planck equation on the interval (nτ, (n + 1)τ ): An example of projection and define˜ τ n+1 = (n+1)τ (we first follow the Fokker-Planck equation ignoring the density constraint for a duration τ , and we project the final measure we obtained). Note that in this case, due to definition of the first step via the solution of the PDE and the effect of the diffusion, the behavior of u on ∂Ω is not relevant.
In [41] a detailed proof of convergence is presented in the case σ > 0, thus obtaining: Theorem 2.2. Given 0 ∈ K and a time-dependent vector field u ∈ L ∞ , suppose σ > 0 and define the splitting scheme as above. Then, the scheme converges up to a subsequence to a solution of (2.5).
Again, the convergence can be obtained for the full sequence (with no need to pass to subsequences), thanks to the uniqueness result that we will present in Theorem 3.2.
It is interesting to see that this scheme converges under much weaker assumptions on u than what required in Theorem 2.1. This is partially due to the effect of the diffusion, and partially to better properties of the scheme. It has not been proven rigorously, but we expect the same scheme to converge even with σ = 0 provided u satisfies the assumptions for the equation (2.6) to be well-posed (in the sense of admitting existence and stability properties). It should at least work whenever u satisfies the assumptions of the DiPerna-Lions (or Ambrosio) theory (i.e. u ∈ W 1,1 or u ∈ BV , and some bounds on ∇ · u, see [3,23]), but we will not develop here the details.

The gradient flow case
An interesting framework, which is actually the original one studied in [34], occurs when u has a suitable gradient structure. In this case, the scheme can be made easier and the results require weaker assumptions on u. In particular, it is possible to do the two steps of the splitting algorithm at once, thanks to the theory of gradient flows.
First, let us recall in few words the general theory about gradient flows (the interested reader can look at [4] or [48]). Consider the simple ODE.
x (t) = −∇F (x(t)) (which follows the steepest descent lines of a function F : R n → R). We can discretize in time such an equation by solving The optimal x τ n+1 satisfies which corresponds to the implicit Euler scheme for x = −∇F (x), the solution being found as a limit τ → 0. The important point (see [2,19]) is that this formulation may easily be adapted to a general metric space (X, d), by replacing the Euclidean distance |x−x τ n | with the distance d in X (as it was introduced in [2,19], under the name minimizing movements). In particular, we can look at what happens whenever we take (X, d) = (P(Ω), W 2 ).
Let F be a functional over (P(Ω), W 2 ), and let us follow the so-called JKO scheme (see [28]): As in the Euclidean case above, in order to find the equation solved by the limit curve we need to write down the optimality conditions at each time step of the discrete scheme. Let us define, for given F , the first variation of F as the function δF/δ such that, setting again ε := (1 − ε) + ε˜ , we have (see Chapter 7 in [47] for precise definitions). Then, using also the considerations about the first variation of the Wasserstein distance, we can say that the optimality conditions of the above minimization problem read This implies that, if we define a vector field v via v( Here, the vector field v represents a discrete velocity (in the sense of a ratio displacement / time step), and we can guess that, at the limit τ → 0, the limit curve will solve a continuity equation of the form ∂ t + ∇ · ( v) = 0, with v given as in (2.7), i.e.
An example is given by F ( ) =´f ( (x))dx (for f convex and superlinear: a functional which is set to +∞ when is not absolutely continuous). Then By the way, this functional is the limit as m → ∞ of´( 1 m−1 (x) m + V (x) (x))dx, which shows that (1.3) can be seen as the limit of the porous medium equation when the exponent tends to ∞ (as in [45] and in [1], where explicit rates of convergence are also presented). For the diffusive variant, just add σ´ (x) log (x)dx.
The result proven in [34] can be summarized as follows: Theorem 2.3. Given 0 ∈ K and u = −∇V , define ( τ n ) n according to the JKO scheme with the choice of F in (2.8). Suppose that V ∈ W 1,1 (Ω). Then, the scheme converges up to a subsequence to a solution of (1.3).
Note that the original statement in [34] required many useless assumptions, in particular convexity of the domain Ω and λ-convexity of V . Both these assumptions were necessary to guarantee λ-convexity (in the geodesic sense) of the functional. The point where λ-convexity of V is really important is the proof of uniqueness (see Theorem 3.1), which allows to obtain convergence with no need to pass to subsequences; without this assumption, we do not know if we have uniqueness of the solution of (1.3), even if we suppose a gradient structure for u, and we do not know if different subsequences of the same JKO scheme could select different solutions.
The assumption V ∈ W 1,1 (Ω) is needed in order to handle the terms ∇V in the equation (we have ∈ L ∞ and ∇V ∈ L 1 ), and this is all we need, with the same proof stategy as in [34]. We also insist on the fact that, thanks to the gradient structure of u, the regularity we require on u itself is very weak, since it is only supposed to be L 1 . Finally, also note that it would not be difficult to handle the case of a time-dependent gradient vector field u t = −∇V t , provided the dependence in t is nice (t → V t needs to be Lipschitz for the L 1 norm). Also, we want to stress that the gradient flow case is indeed a very natural one for modeling purposes. Indeed, the first case which has been considered was exactly the following: u = −∇d(·, Γ), where Γ ⊂ ∂Ω is a subset of the boundary standing for the exit. On the other hand, the λ-convexity assumption on V needed to be removed if one wanted to consider V defined as the geodesic distance to the exit in a non-convex domain Ω (in case of obstacles inside the evacuation area).
We finish this gradient flow part with a challenging open question about the multi-population case (which is slightly out of the object of this survey).
in the non-trivial case ∇V 1 = ∇V 2 ? More generally, prove existence for gradient flows involving energies of the form´f ( 1 + 2 ) and not individual congestion terms´f ( i ), which are difficult to analyze because of the lack of strong compactness on each density i ; see for instance [32] for some partial result in dimension 1.

Moving domains
We finish this section by briefly addressing a variant, presented in [21] and inspired by the Moreau sweeping process (see [43]). We will consider the sweeping of a probability measure t in a moving convex set Ω(t), with the constraint t ≤ 1. Considering a large domain Ω which contains all the Ω(t), the set of probability measures we are interested in is now (2.9) By the word "sweeping", in the spirit of Moreau, we mean that the particles of t try to stay at rest, and only move in order to satisfy the constraint. These particles are either pushed by the boundary of the moving domain Ω(t) (as it happens in the standard sweeping process) or by the other particles (in particular, those which are interposed between them and the boundary). Given a density ∈ P(Ω(t)) such that ≤ 1, we can heuristically describe the set of "admissible" velocities adm( , t), by requiring ∇ · v ≥ 0 on the set { = 1} (in order to preserve the density constraint) and taking care of the fact that, on the boundary, the inward normal velocity of the particles must be at least that of the boundary: this sums up as v · n ≤ v ∂Ω · n where v ∂Ω is the boundary velocity (we will denote V = v ∂Ω · n). As usual, this will be expressed by duality. We consider the following formal computation, for p ∈ press( ): This leads us to the following definition: where the dependence of adm( , t) in t includes both the domain Ω(t) and its normal velocity V (which is a function defined on ∂Ω(t)). Now, the evolution equation we want to solve is a continuity equation where the velocity has to be admissible, and to have minimal L 2 norm (it is the projection of the 0 velocity field) (2.10) Using the dual characterization of the cone adm( t , t), it is possible to translate this system into (2.11) Note that the very last condition, spt( t ) ⊂ Ω(t), includes the fact that no mass can exit the boundary ∂Ω(t), which, considering that this boundary is itself moving, means −∇p t · n ≤ −V on ∂Ω(t).
The discrete scheme used in [21] to find a solution of (2.10) is inspired from [43]: given τ n ∈ P(Ω ), define The convergence result is not surprising: Theorem 2.4. Given 0 ∈ K (0) define ( τ n ) n according to the above projection scheme. Suppose that all the sets Ω(t) are convex, that inf t∈[0,T ] |Ω(t)| > 1 (where |Ω| denotes the volume of a domain Ω ⊂ R d ), and that t → Ω(t) is Lipschitz continuous for the Hausdorff distance. Then, the scheme converges up to a subsequence to a solution of (2.11).
This theorem is proved in [21], where uniqueness of the solution of (2.11) is also proven (but, for simplicity, the uniqueness statement for moving domains is not included in the next section). This implies in particular that the convergence holds with no need to pass to subsequences.
If the fact that the domain is moving is finally not a source of problems for proving existence and uniqueness, it makes things much more difficult in what concerns some estimates. For instance, we have the following open question.
Open question 3. Can we prove that the solution t of (2.11) is bounded in BV (R d ) if 0 ∈ BV (R d ), under possible assumption on Ω(t)? Note that even in the case where Ω(t) is just obtained by translation of a smooth convex domain Ω(0), both t → || t || BV (Ω(t)) and t → || t || BV (R d ) are not, in general, decreasing.

Few words about uniqueness
An easy computation in the Euclidean gradient flow x = −∇F (x) shows that uniqueness of the solution is guaranteed as soon as F is semi-convex, i.e. if for some λ ∈ R the function F (x) − λ|x| 2 /2 is convex (which is the same, for smooth functions, of D 2 F ≥ λI. . . we say in this case that F is λ-convex). Indeed, take x 1 (t) and A very similar result is available in the general theory of gradient flows in metric spaces, [4]. Without entering into details, what is needed is geodesical semi-convexity. In the case of the metric space W 2 , this means that for some λ ∈ R the function s → g(s) := F ( s ) satisfies g ≥ λW 2 2 ( 0 , 1 ) whenever s is a constant-speed geodesic connecting 0 to 1 . This condition is quite easy to check, since in W 2 we know the geodesic curves (see, for instance, chapter 5 in [47]): the geodesic connecting µ to ν has the form s = (id−s∇ϕ) # µ = ((1−s)id+sT ) # µ) where T is the optimal transport map from µ to ν (as soon as µ L d , which guarantees the existence of such a map T ; this is not restrictive in our setting since we apply it to measures which are absolutely continuous with bounded densities).
Also, for many functionals F it is well-known whether they are or not geodesically (semi)-convex (see, for instance, Chapter 7 in [47]). The functional →´V d is λ-geodesically convex if and only if the function V itself is λ-convex; for the functional →´f ( (x))dx the situation is trickier, and the sharp conditions have been found by McCann (see [38]). Such a functional is geodesically convex (i.e., for λ = 0) if and only if f is such that s → s d f (s −d ) is convex and decreasing, which is the case in many examples, including f (s) = s m for m > 1 and f (s) = s log s. In particular, this means that the set of densities with´ m ≤ C is geodesically convex, and, for m → ∞, the same is true for the set K of densities bounded above by 1.
The general technique of [4] allows to obtain the estimate whenever the curves i t are two gradient flows of F , and F is λ-geodesically convex. Using the fact that the constraint ∈ K is geodesically convex, this applies to the case u = −∇V , provided D 2 V ≥ λI.
In order to consider more general velocity fields u (i.e., of non-gradient type), we can try to directly differentiate the Wasserstein distance between two solutions of (1.3). This requires to use the formula for differentiating the W 2 distance between two curves (see Chapter 5 in [47]).
Take two solutions ( i t , p i t ) and let ϕ and ψ be the Kantorovich potentials between 1 t and 2 t , with T (x) = x − ∇ϕ(x). We can compute d dt Yet the technique to prove uniqueness in the DiPerna-Lions theory is not at all based on W 2 estimates. It recalls much more an estimate for´| 1 t − 2 t |, i.e. an L 1 -contraction result. The question then becomes whether it is possible to combine these two techniques, or directly prove that we have an L 1 contraction.
Note that this is what is usually done for the Hele-Shaw flow, where uniqueness comes from estimates on the adjoint equation (by the way, the technique based on the differentiation of the W 2 distance seems to be essentially unknown in the Hele-Shaw community, and uniqueness results in the presence of non-trivial drifts are not standard).
However, uniqueness for (1.3) with non-smooth vector fields u is so far only a conjecture, and the only result in this direction is the one obtained in the second part of [22], about the diffusive case (σ > 0 in (2.5)): Theorem 3.2. Given u ∈ L ∞ and σ > 0, consider ( 1 , p 1 ) and ( 2 , p 2 ) two solutions of (2.5). Then we have In particular, this gives || 1 t − 2 t || L 1 ≤ || 1 0 − 2 0 || L 1 , and uniqueness of the solution for fixed initial datum. Open question 4. Can we prove uniqueness in the case σ = 0 (i.e. for the equation (1.3)) under possible assumptions on u, weaker than those of Theorem 3.1? In particular, do we have uniqueness for u t ∈ BV ? Given two different solutions of (1.3) 1 t and 2 t , with initial data 1 0 and 2 0 , can we prove the L 1 contraction estimate Note that the discrete steps in the splitting scheme are indeed an L 1 contraction, but, to pass this to the limit on arbitrary solutions, one already needs uniqueness.

Numerical methods
The approximation results of Section 2 naturally suggest that they could be used to develop numerical methods to approximate the solutions of the corresponding equations. This requires numerical solvers for one of the two (strictly related) variational problems min F ( ) + 1 2 W 2 2 ( , ν) : ∈ P(Ω) (4.1) Of course, Problem (4.2) is a particular (but singular) case of the first one (choosing F = I K , this notation standing for the indicator function in convex analysis, which vanishes on K and takes value +∞ outside K) and corresponds to the projection step, while Problem (4.1) appears in the JKO scheme in the gradient case. Note that we have Using suitable smooth functions f : R + → [0, +∞] which approximate I [0,1] it is also possible to approximate the second problem via standard (non-singular) issues of the first one. We present in this section first some numerical methods which have been recently proposed to handle gradient flows in W 2 via their variational JKO scheme, and then a numerical method which is ad-hoc for Problem (4.2).

Numerical methods for the JKO scheme
We will present three methods. One, essentially taken from [9], is based on the Benamou-Brenier formula first introduced in [6] as a numerical tool for optimal transport (also see [7]). This method is well-suited for the case where the energy F ( ) is a convex function of and can be easily dualized (i.e. F * is explicit and easy to handle). The second method (essentially developed by M. Cuturi and collaborators, see [8,17,18]) comes from a clever approximation of the linear programming formulation of optimal transport in the discrete case (while the Benamou-Brenier formula best fits the continuous setting), and is easy to perform whenever F is separable (i.e. of the form´f (x, (x))dx). The third method (essentially adapted from the techniques used by Q. Mérigot for semidiscrete optimal transport see [33,39]) translates the problem into an optimization problem in the class of convex functions; it is well suited for the case where F is geodesically convex, which means that´f ( (x)) dx is only admissible if f satisfies McCann's condition (see Section 3).
Luckily, the three above methods are all able to handle the case F ( ) =´f ( (x)dx +´V (x) (x)dx, with possible assumptions on f and V , but these assumptions are satisfied for f = I [0,1] and for many of its approximations. They can be used for Fokker-Planck and porous medium equations, or to approximate the gradient-flow version of Equation (1.3) (or to perform a projection step in the non-gradient case). Augmented Lagrangian methods. Let us recall the basis of the Benamou-Brenier method. This method is based on the equality Using E = v as a variable instead of v, we can see (and this was the main idea in [6] to produce a numerical method) that this problem is convex, since we havê The continuity equation constraint can also be written as a sup penalization, by adding a sup which is 0 if the constraint is satisfied, +∞ if not. We then express everything in the space-time formalism, writing ∇ t,x φ for (∂ t φ, ∇φ), m for ( , E) and A for (a, b). We also set G(φ) :=´φ 1 dν −´φ 0 dµ, thus getting where the scalar product is here the L 2 scalar product, but becomes a standard Euclidean scalar product as soon as one discretizes (in time-space).
The problem can now be seen as the search for a saddle-point of the Lagrangian We look for a pair (m, (A, φ)) (A and φ play together the role of the second variable) where m minimizes for fixed (A, φ) and (A, φ) maximizes for fixed m. The idea of the Augmented Lagrangian method (see [25]) is that saddle points are the same if we subtract in the min/max expression the quantity r 2 ||A − ∇ t,x φ|| 2 , which vanishes, together with its first-order expansion, at optimal solutions. We then obtain a new saddle point problem (again, the last term is an L 2 norm in time and space), which is then solved by iteratively repeating three steps: for fixed A and m, finding the optimal φ (which amounts to minimizing a quadratic functional in calculus of variations, i.e. solving a Poisson equation in space-time with prescribed Neumann boundary conditions, coming from the term G); then for fixed φ and m find the optimal A (which amounts to a pointwise minimization problem, in this case a projection on the convex set K 2 ); finally update m by going in the direction of the gradient descent, i.e. replacing m with m − r(A − ∇ t,x φ) (it is convenient to choose the parameter of the gradient descent to be equal to that of the Augmented Lagrangian). This is what is done in the case where the initial and final measures are fixed. In a JKO scheme, at every step, one is fixed (say, µ), but the other is not, and a penalization on the final 1 is added, of the form τ F ( 1 ). Inspired from the considerations above, the saddle point below allows to treat the problem The role of the variable λ is to be dual to 1 , which allows to express f ( 1 ) as sup λ 1 λ − f * (λ). To find a solution to this saddle-point problem, an iterative procedure is also used, as above.
For the applications to gradient flows, a small time-step τ > 0 has to be fixed, and this scheme has to be done for each n, using µ = τ n and setting τ n+1 equal to the optimizer 1 and the functions f and V must include the scale factor τ . The time-space [0, 1] × Ω has to be discretized but the evolution in time is infinitesimal (due to the small time scale τ ), which allows to discretize the fictitious time-interval [0, 1] using less than 10 time steps for each n and still getting satisfactory results.
Finally, in order to apply to density-constrained evolutions, one has to take f = I [0,1] , or approximate it (via f (s) = s m /(m − 1) for m → ∞, for example). The interested reader can consult [9] for details, examples and simulations. Entropic regularization. Another, very recent approach to optimal transport (see [8,17]), easy to explain and efficient, is based on its linear programming formulation in the discrete case (when the measures are finitely atomic, or have been approximated by atomic measures). As before, we first present the procedure in the case of two fixed measures. We suppose µ = i a i δ xi and ν = j b j δ yj , and we set c ij := c(x i , y j ), where c is the transport cost we use (in the applications to W 2 gradient flows, c(x, y) = |x − y| 2 ). We consider the following approximation of Kantorovich's linear programming problem: fix ε > 0 and look at min where h(s) = s log(s) − s, so that h (s) = log(s). It is clear that, for ε → 0, the above minimization converges to the standard transport problem for the cost c. The objective function in (4.3) can be re-written using the notation η ij = e −cij /ε , so that This last quantity is the so-called Kullback-Leibler divergence (a relative entropy) of γ relative to η. An interesting feature of this minimization problem is that it consists in computing the projection, for the Kullback-Leibler divergence, of the matrix η (a point in a high-dimensional Euclidean space) onto the intersection of two linear subspaces (and the positivity constraint on γ is now included in the definition of the entropy function h). Alternate projections can be proven to converge, as it happens for "standard" (i.e. Euclidean) projections.
Yet, we will not develop here this approach (which is described, for instance, in Section 6.4.1 of [47]), since it cannot be adapted to the case where one of the marginals is not fixed, but penalized. Hence, for applications to the JKO scheme or for projection problem, we will rather consider the minimization problem (4.4) This problem, which reduces to the one with fixed marginals when f j = I {bj } and g i = I {ai} , admits a dual one, which can obtained by exchanging inf and sup in which provides the optimal choice γ ij = η ij e −(uj +vi)/ε ; after dividing by ε, we obtain the dual problem This convex optimization problem can be solved by alternate optimization, i.e. setting According to the functions f j and g i , this problem can be solved explicitely or not, but in any case it allows for parallelization and the solution is always of the form γ ij = η ij p i q j . In the case where g i = I {ai} and f j = I [0,1] , we get , which allows to solve the projection problem (see [18]). Optimization among convex functions and computational geometry. It is clear that the optimization problem min 1 2 W 2 2 ( , µ) + F ( ) can be formulated in terms of transport maps and, taking advantage of Brenier's theorem, we can recast it as min u convex : ∇u∈Ω Hence, we are facing a calculus of variations problem in the class of convex functions. These are non-trivial problems, as the literature (see [15,40]) has shown, even in the case of functionals only involving u and ∇u.
Here instead, if we suppose that F is of the form F ( ) :=´f ( (x))dx +´V d , the first part of this expression involves the Monge-Ampère operator det(D 2 u), which appears in the computation of the density of the image measure (∇u) # µ.
We will roughly present here the method proposed in [10]. The main idea is to suppose that µ is a discrete measure of atomic type, i.e. of the form j a j δ xj . We look for a function u defined on its support S := {x j } j . We define at each point x ∈ S the subdifferential ∂u(x) := {p ∈ R d : u(x) + p · (y − x) ≤ u(y) for all y ∈ S}.
We say that u is convex if ∂u(x) = ∅ for all x ∈ S.
There are two difficulties in understanding the minimization problem (4.5) when µ is discrete: first, in order to define (∇u) # µ we need to define a gradient ∇u(x) at each point of x ∈ S, and if we manage to do it, (∇u) # µ will be an atomic measure; second, one part of our functional, namely´f ( (x))dx, is never finite on atomic measures. Yet, we need to define the minimizer as an atomic measure, since we need to iterate the variational scheme, and use the minimizer as a starting point for the next JKO step.
First, let us define how to handle the part of the functional which requires absolutely continuous measures. We need to define a reasonable notion of := (∇u) # µ when µ is atomic, in such a way that we can compute functionals of the form´f ( (x))dx. To do this, we take all the mass a j contained in the point x j ∈ S and we spread it uniformly on the whole subdifferential ∂u(x j ). This means that we define a measure that we will denote by := (∂u) # µ, given by where A j := ∂u(x j ) ∩ Ω (the intersection with Ω is done in order to take care of the constraint ∇u ∈ Ω). Computing´f ( (x))dx hence givesˆf It is possible to prove, thanks to the Brunn-Minkowski inequality, that the above expression is convex in u as soon as f satisfies McCann's condition (see Section 3).
In order to associate with every u a discrete version of (∇u) # µ, we need to introduce an extra variable, due to the fact that the gradient is not uniquely defined but we only have subgradients. The choice which is described in [10] is the following: define the optimization problem over the set of pairs (u, P ) : S → R × R d where P (x) ∈ ∂u(x) for every x ∈ S. Use u to define (∂u) # µ (the "diffuse representative" of (∇u) # µ), and P # µ as the "atomic representative" of (∇u) # µ. The functional we minimize will then bẽ Given τ n = µ, one finds the optimal (u, P ) and then sets τ n+1 := P # µ. The above minimization problem can be seen to be convex in (u, P ) whenever x → |x| 2 /2 + V (x) is convex and f satisfies McCann's condition; the constraints P (x j ) ∈ A j := ∂u(x j ) ∩ Ω is convex in (u, P ) if Ω is convex. Note that incorporating the scale factor τ in the functional F means that convexity in P is guaranteed as soon as V is semi-convex and τ is small (the quadratic term coming will always overwhelm possible concavity of the V term). The delicate point is how to compute the subdifferentials ∂u(x j ), and optimize them (i.e. compute derivatives of the relevant quantity w.r.t. u).
This is now possible, and in a very efficient way, thanks to tools from computational geometry. Indeed, in this context, subdifferentials are exactly a particular case of what are called Laguerre cells. Laguerre cells, similar to Voronoi cells, are very useful in optimal transport: given a set of values ψ j , they are for the cells This means that we look at points which are closer to x j than to the other points x j , up to a correction given by the values ψ j . It is not difficult to see that also in this case cells are convex polyhedra. Also, it can be easily seen that the Laguerre cells corresponding to ψ j := u(x j ) − 1 2 |x j | 2 are nothing but the subdifferentials of u (possibly intersected with Ω).
Handling Laguerre cells from the computer point of view has for long been difficult, but it is now state-of-theart in computational geometry, and it is possible to compute very easily their volumes, as well as the derivatives of their volumes (which depend on the measures of each face) w.r.t. the values ψ j . The reader can have a look at [33,39] but also to Section 6.4.2 in [47].
We conlude the description of this method by noting that, compared to the previous ones, this is essentially a Lagrangian approach to the JKO scheme, as we follow the motion of each particle, while in the other two we looked at the evolution of the density on a fixed grid, or at the evolution of the masses at fixed atom points.

A stochastic method for the projection on the density constraint
We move now to a different numerical method, the first used for the continuous models of crowd motion under density constraints which are at the core of this survey. This method is only concerned with the projection problem (4.2), which means that, even in the gradient flow case, it has to be preceded by a transport step, where˜ τ n+1 is computed via more standard tools (finite volumes,. . . ). This step has to be followed by a less standard one, i.e. the projection onto K.
The scheme proposed in [34,35] to approximate the projection onto K with respect to the Wasserstein distance is based on the following -extremely heuristic -considerations. Consider a density ν / ∈ K such that the violation of the constraint ν ≤ 1 occurs on a domain ω ⊂ Ω, but is small. This means that there exists a non-zero density g ≥ 0 supported on ω with ν = 1 ω + εg on ω. The projection of ν onto K will necessarily be identically 1 in ω. Let us denote by εv the displacement field which corresponds to the optimal transport between ν and P K [ν]. For ε small, the displacement v will be small far from ω. From (id + εv) # (ν) = P K [ν], looking at what happens on ω and using that the projection is identically 1 in ω, one has 1 + εg det(I + εDv) = 1, so that, at the first order in ε, ∇ · εv = εg. As id + εv is an optimal map, v is a gradient : v = −∇p, where p approximately verifies the Poisson problem −∆p = g with Dirichlet boundary conditions on ∂ω. Since we are considering a violation of the density constraint on a set ω and supposing that, after the projection, the density will be saturated on ω itself, we are only interested in finding the mass that will exit ω when the displacement is given by εv. At a first order approximation, we only need to estimate the flux of density through the boundary ∂ω, i.e. v · n = −∂p/∂n. Now consider the stochastic interpretation of the Poisson equation −∆p = g (see, for instance, classical texts such as [24]). Suppose that g is a probability density on ω, and consider a random variable X ∈ ω distributed according to g. Then, consider a Brownian motion stemming from X, i.e. t → X + B t , where B is a Brownian motion and X and B are independent. Set T := inf{t : X + B t / ∈ ω} and Y = X + B T . This is a random variable standing for the first exit point from ω of X + B t , and takes values in ∂ω. It is known that Y follows the probability law which has a density −∂p/∂n with respect to the H d−1 -dimensional measure on ∂ω.
The idea is therefore to redistribute the exceeding mass of the saturated zone in the following way (see figure  4.2): discretize the domain on a regular grid and, for each saturated cell, a random walk is started, transporting the exceding mass ( − 1) + . When this random walk encounters a non-saturated cell, it gets rid of as much mass as it can, and continues as long as the transported mass is not fully distributed. When all the saturated cells have been treated, the obtained density n+1 is admissible. As it should represent a good approximation of what obtained after stopping X + B t when meeting the non-saturated zone (up to the fact that we use a random wall on a discrete grid instead of a Brownian motion), the saturated zone should have moved with a normal speed close to −∂p/∂n, which was the desired motion.
It is important to insist that no convergence result has been proven on this stochastic algorithm, but the results obtained in [35] are convincing. Also, the same method has been applied in [21] to the case of moving domains. This is not a variational algorithm, but its pertinence is justified by the heuristics that we have Figure 3. A random walk realization of the projection onto K described so far: for small violations of the constraint ≤ 1 the projection (a transport optimization problem) can be well approximated by the linearization of the Monge-Ampère equation (this is also what we see in the expression of the equation (1.3), and can also be interpreted in terms of the approximation of the W 2 distance via the H −1 distance, see Section 5.5.2 in [47]); then, the stochastic interpretation of such an equation provides an efficient way to mimick the motion of the mass in the projection.
Open question 5. Prove the a.s. convergence of the above fully discrete scheme when both the time step τ and the space grid discretization tend to 0, and/or find suitable static and/or discrete convergence formulations.