Numerical Probabilistic Approach to MFG

This project investigates numerical methods for solving fully coupled forward-backward stochastic differential equations (FBSDEs) of McKean-Vlasov type. Having numerical solvers for such mean field FBSDEs is of interest because of the potential application of these equations to optimization problems over a large population, say for instance mean field games (MFG) and optimal mean field control problems. Theory for this kind of problems has met with great success since the early works on mean field games by Lasry and Lions, see \cite{Lasry_Lions}, and by Huang, Caines, and Malham\'{e}, see \cite{Huang}. Generally speaking, the purpose is to understand the continuum limit of optimizers or of equilibria (say in Nash sense) as the number of underlying players tends to infinity. When approached from the probabilistic viewpoint, solutions to these control problems (or games) can be described by coupled mean field FBSDEs, meaning that the coefficients depend upon the own marginal laws of the solution. In this note, we detail two methods for solving such FBSDEs which we implement and apply to five benchmark problems. The first method uses a tree structure to represent the pathwise laws of the solution, whereas the second method uses a grid discretization to represent the time marginal laws of the solutions. Both are based on a Picard scheme; importantly, we combine each of them with a generic continuation method that permits to extend the time horizon (or equivalently the coupling strength between the two equations) for which the Picard iteration converges.


Introduction
In this project, we examine numerical methods for solving forward backward stochastic differential equations (FBSDEs) of McKean-Vlasov type.We are particularly interested in equations of this type as they can be used to represent, from the probabilistic viewpoint, solutions to mean field games or, more generally, to mean field stochastic control problems.
Mean field games were developed independently and at about the same time by Lasry and Lions [27], and Huang, Caines, and Malhamé [24].The goal of this theory is to understand the limit as N → ∞ of the Nash equilibria for an N player stochastic dynamic game with mean field interaction, meaning that players interact with one another through their collective state.Equivalently, mean field games must be regarded as the continuum limit of games with a large number of symmetric players, each of them having a small effect on the dynamics of the whole group.The applications of mean field games are numerous, and spread across many disciplines, including social science (congestion [25][19] [3], cyber attacks [12]), biology (flocking [28] [29]), and economics (systemic risk [11], production of exhaustible resources [23] [13]), just to name a few.As explained in [8,9], solutions to mean field games can be characterized through a coupled system of two forward and backward stochastic differential equations of mean field type, like those we address in this note.The forward equation provides the dynamics of one typical player in the population at equilibrium.Generally speaking, the backward equation accounts for the optimality condition in the definition of an equilibrium and the McKean-Vlasov structure is precisely here to stress the fact that the player in hand is representative of the others.As exemplified in [10], other forms of equilibria can be addressed by means of this kind of equations.This includes optimal mean field control problems, which can be regarded as the continuum limit of a control problem involving a large number of symmetric players obeying a central planner.Below, we mostly focus on examples arising in the theory of mean field games.
Whilst deterministic numerical methods, based upon finite differences or variational approaches, are also conceivable for handling mean field games, see [1,2,4] and [5,26,22], we here focus on the approach based on FBSDEs.In this regard, we implement (and compare) two different algorithms.The first algorithm, which is based on the paper of Chassagneux, Crisan, and Delarue [14], relies on a tree structure to represent the pathwise law of the solution.The second algorithm, and main contribution of this report, takes the algorithm presented in the paper of Delarue and Menozzi [18] for solving FBSDEs and extends it to the mean field framework in hand.In this algorithm, a grid structure is used to represent the marginal laws of the solution.The serious issue that we are facing in this note is that both methods are based upon a Picard scheme, the first method involving a global Picard scheme upon the whole process and the second one involving a Picard scheme on the sole marginal laws of the process.It is indeed a well-known fact that, because of the strong coupling between the forward and backward equations, Picard schemes for FBSDEs may just converge in small time, even in the classical case without mean field interaction.For sure, this limitation should persist in the mean field setting for the global Picard method; as exemplified below, it turns out that it persists as well when the Picard scheme is applied to the marginal laws.One of our main contribution in this report is to apply the time continuation approach presented in [14] to the grid algorithm and to compare the results with the tree algorithm for which the time continuation approach was originally designed in [14].In brief, the time continuation permits to extend, by a continuation argument, the time interval on which the Picard scheme converges.We illustrate both algorithms on a handful of example problems.
Section 2 provides a review of Nash equilibria in N player stochastic differential games, and their continuum mean field game counterparts.We review two probabilistic approaches to formulate the solutions of mean field games and provide the general FBSDE system which we would like to solve.In Section 3, we describe the algorithms that we implement in the report.Some benchmark examples and the corresponding numerical results are presented in Section 4. We conclude in Section 5.

Overview of Mean Field Games and FBSDEs
The purpose of this section is to introduce the theoretical material that is needed for our numerical analysis.The objective is purely pedagogical and the text does not contain any new result.

N Player Stochastic Differential Games
We start with the description of the prototype of a finite player game in the theory of mean field games.We consider N ∈ Z + players indexed by i ∈ {1, . . ., N }.The dynamic game occurs over a fixed time horizon [0, T ] for some T > 0. We have N independent m-dimensional Brownian motions (W i t ) 0≤t≤T which are supported by a filtered probability space (Ω, F, F = (F t ) 0≤t≤T , P).Each player chooses its control α i = (α i t ) 0≤t≤T from the set A defined as the set of square integrable F adapted processes with values in a given set A (typically A is a closed convex subset of a Euclidean space).Each player i has a state X i which evolves according to the stochastic differential equation: μt , α t )dt + σ i (t, X i t , μt , α t )dW i t , where μt denotes the empirical distribution of the players' states: μt = 1 N N j=0 δ X j t ∈ P 2 (R d ).Here P 2 (R d ) is the space of probability measures with a finite second moment, which we equip with the 2-Wasserstein distance, denoted by W 2 .For µ, ν ∈ P 2 (R d ), we call Γ(µ, ν) the set of all the joint laws with marginals µ and ν.Then, the 2-Wasserstein distance is defined by: .
The drift and volatility functions, b i and σ i , respectively, are deterministic functions Most of the time, they are assumed to be bounded in time and to be Lipschitz continuous with respect to all the arguments, the Lipschitz property in the measure argument being understood with respect to W 2 .This ensures that, for a given Given a tuple of controls α = (α 1 , . . .α N ), we associate with player i a cost objective which we take to be of the form: Thus each player considers a deterministic running cost Of course, each of them wishes to minimize its own cost by tuning its own control in the most relevant way.Note that we only allow the interaction of the players through their empirical measure, as this will be needed in our formulation of the continuum limit.Still, extensions exist, in which players also interact through the controls, see Subsection 2.2.3.
The players are in a Nash equilibrium if each player is no better off for switching their strategy when they consider the other players' strategies to be fixed.More precisely, the set of strategies α is a Nash equilibrium if

Mean Field Games
For games where N is large, the problem quickly becomes of an intractable complexity.Thus we turn to the continuum limit by considering the limit as N tends to infinity.In order for this limit to make sense, we require the players to be symmetric.Precisely, we require b = b i , σ = σ i , f = f i , and g = g i ∀i ∈ {1, . . ., N }.As the number of players increase, the impact of each player on the empirical distribution decreases, and we expect to have a propagation of chaos such that the players become asymptotically independent of each other.This is the rationale for passing to the limit: Asymptotically, the influence of one player on the group should be null and the statistical structure of the whole should be pretty simple.
We wish to formulate the analogue of a Nash equilibrium when there is a continuum of players.To this end, we consider the states and actions of the other players to be fixed, and consider the best response for a representative player (as we expect equilibria to inherit the symmetric structure of the game).Thus, the first step is to solve an optimization problem.The next step is to find a fixed point, providing an analogue of a Nash equilibrium for the mean field game.
We again have a filtered probability space (Ω, F, F = (F t ) 0≤t≤T , P) where the filtration supports an m-dimensional Brownian motion W = (W t ) 0≤t≤T and an initial condition ξ ∈ L 2 (Ω, F 0 , P; R d ).
The strategy for solving the asymptotic game is the following: 1.For a fixed deterministic flow of probability measures µ = (µ t ) 0≤t≤T ∈ C([0, T ], P 2 (R d )), solve the standard stochastic control problem: subject to 2. Find a fixed point, µ, such that L(X α t ) = µ t for all 0 ≤ t ≤ T .
This strategy can be tackled from either the PDE viewpoint (leading to a coupled Hamilton-Jacobi-Bellman and Kolmogorov/Fokker-Plank equations, known as the MFG system in the literature) [27] [24] or the probabilistic viewpoint [8] [9], which is the focus of this project.Within the probabilistic viewpoint, there are two approaches, both of which are formulated with FBSDEs.See Chapters 3 and 4 of the manuscript of Carmona and Delarue [8] for reference on the two probabilistic approaches.
For simplicity, from now on we assume m, the dimension of the Brownian motion matches d, the dimension of the state variable.We also assume the diffusion coefficient, σ, is a constant matrix σ ∈ R d×d .For both approaches, we will utilize the Hamiltonian deriving from the aforementioned stochastic control problem (1).In fact, since we assume that the drift is uncontrolled, we can just write the reduced Hamiltonian: and an adjoint variable y ∈ R d .Then, a key object in order to formulate the solution to ( 1) is (whenever it exists): α(t, x, µ, y) = arg inf α∈A H(t, x, µ, α, y).
We will provide below explicit examples for α(t, x, µ, y).We now give a brief introduction to the two probabilistic approaches.

Weak approach
In the first probabilistic approach, which we will refer to as the weak approach, the optimization problem is solved using a backward SDE for the probabilistic representation of the value function.For a fixed flow of measures µ = (µ t ) 0≤t≤T , let u : [0, T ] × R → R denote the value function: The strategy is to evaluate the value function along the solution of the state process (X t ) 0≤t≤T , namely we let Y t = u(t, X t ).The weak formulation of the stochastic control problem underpinning u says that, under suitable assumptions that are exemplified below, the pair (X t , Y t ) 0≤t≤T has to solve the following FBSDE: where we assume σ to be invertible.For instance, we take the following set of assumptions from Chapter 3 of the manuscript by Carmona and Delarue [8]: We may assume that the set A for the values of the controls is a bounded closed convex subset of a Euclidean space, the deterministic functions b, f , and g are defined on [0 , respectively, and there exists a constant C 0 > 1 such that: • The functions b, f , σ and g are bounded by C 0 .
• There exists a function x, µ, y) is the unique minimizer of the Hamiltonian H(t, x, µ, y, α).
Under this set of assumptions, it is shown in Chapter 3 of the manuscript by Carmona and Delarue [8] that a flow of measures µ = (µ t ) 0≤t≤T is a mean field game equilibrium if and only if µ t = L(X t ), ∀t ∈ [0, T ], where (X, Y, Z) is a solution of the weak approach FBSDE system in Equation ( 2), in which case (2) becomes where we use the generic notation L(•) for the law of a random variable.This approach is developed further in the papers and manuscript of Carmona and Delarue [8][9].

Pontryagin approach
The second probabilistic approach, which we will refer to as the Pontryagin approach, is based on the Pontryagin stochastic maximum principle.In this formulation, the optimization problem is solved using a backward SDE for the probabilistic representation of the spatial derivative of the value function u.Formally, the strategy is thus to evaluate the process (X t ) 0≤t≤T along ∇ x u.Hence we let Y t = ∇ x u(t, X t ), which makes sense when ∇ x u is well-defined.In fact the Pontryagin system may be formulated without any further reference to the regularity of u, the Pontryagin formulation having the following general form: where we assume b, f and g to be differentiable with respect to x.We may use the following set of assumptions taken from Chapter 3 of the manuscript by Carmona and Delarue [8] to guarantee that the Pontryagin system is both a necessary and a sufficient condition of optimality: We assume the coefficients b, f , and g are defined on [0 , respectively.We also assume that they satisfy: • The drift b is an affine function of (x, α) of the form: R d×d and R d×d valued, respectively, and are measurable and bounded on bounded subsets of their respective domains.
• There exist two constants C 1 > 0 and C 2 ≥ 1 such that the function ), the Lipschitz constant in x and α being bounded by C 2 (so that it is uniform in t and µ).Moreover, it satisfies the following strong form of convexity: The notation ∂ (x,α) f stands for the gradient in the joint variables (x, α).Finally, f , ∂ x f and Under these assumptions, it is shown in Chapter 3 of the manuscript by Carmona and Delarue [8] that a flow of measures µ = (µ t ) 0≤t≤T is a mean field game equilibrium if and only if µ t = L(X t ), ∀t ∈ [0, T ], where (X, Y, Z) is a solution of the Pontryagin approach FBSDE system in Equation ( 4), in which case (4) becomes This approach is also developed further in the papers and manuscript of Carmona and Delarue [8] [9].

Mean Field Games of Control
In many applications, individuals may interact through their controls, instead of their states.One example is an application of trade crowding which was tackled with a mean field game approach in the paper of Cardaliaguet and Lehalle [7].There is also a model for price impact in the book of Carmona and Delarue which we take as one of our example problems in Section 4.4.2.Mean field games where players interact through the law of their controls is sometimes referred to as extended mean field games [20], see also Chapter 4 in [8].We are interested in testing our numerical methods on a certain class of mean field games of control: those in which the interaction is through the marginal distributions, L(X t ) and L(α t ).To design our algorithms to handle such a general framework of mean field interaction, we study numerical methods for solving a general FBSDE system which includes the two approaches detailed above, as well as this class of mean field games of control.

General System
We can address both probabilistic formulations for mean field games and a class of mean field games of control simultaneously by considering the following general FBSDE system.We now take the dimension of the state space to be d = 1 since, for simplicity, our algorithms will be just applied to this case.Let [X] = L(X) denote the law of a process X.With an abuse of notation, let [X, Y, Z] := (L(X), L(Y ), L(Z)) denote the laws of the individual processes (and not their joint law).The general system is the following: The assumption that the coefficients depend, at most, upon the marginal laws L(X t ), L(Y t ) and L(Z t ) of the triple (X t , Y t , Z t ) (and not upon the full joint law) is tailor made to the applications we have mind: As explained in the previous paragraph, we want to handle games in which players interact with one another through the law of the control.In order to guess the impact this may have on the FBSDE representation, it is worth recalling that, in the problem (1), the optimal control may be represented in a quite generic way through the function α with the adjoint variable y therein taken as the gradient of the value function.Under the weak formulation approach, this turns the FBSDE system (3) into an FBSDE system depending on the marginal laws of Z, as Z t is known to have the representation Z t = ∇ x u(t, X t )σ.Under the Pontryagin approach, it turns the FBSDE system (5) into an FBSDE system depending upon the marginal laws of Y .Hence our choice to handle systems of the form (6). Still the reader should observe that, in order to handle the more general case when the players interact through the joint law of the state and of the control, it is necessary to address systems of the same type as ( 6) but with the convention that [X t , Y t , Z t ] is understood as the joint law of the triple (X t , Y t , Z t ); as we just mentioned, we do not address this level of generality in the report.
In ( 6), the diffusion process X is coupled to the diffusion process (Y, Z) through the functions B and F , representing the drift of the forward process and the driver of the backward process, respectively.The functions B and F are assumed to be Lipschitz in each of their arguments on ([0, T ], R 3 , (P 2 (R)) 3 ) and G Lipschitz on (R, P 2 (R)), namely, for (x, y, z, x ′ , y ′ , z) ∈ R 6 and (µ, ν, λ, µ ′ , ν ′ , λ ′ ) ∈ (P 2 (R)) 6 , we have: The goal of this project is to study numerical methods for solving this general FBSDE system.At some point in the report, we will relax the Lipschitz condition of F in the variables z and λ and address an FBSDE with a quadratic driver F , see our examples in Section 4.

Algorithms
We implement two algorithms for numerically solving the FBSDE system in Equation ( 6).In the first algorithm, we represent paths of the stochastic processes (X, Y, Z) using a tree structure, where branches of the tree represent quantization of the Brownian motion.In the second algorithm, we no longer represent the paths of the process, but the marginal laws of the process.In this case, the law of the process is discretized on a fixed temporal and spatial grid.
In both cases, the representation serves as a basis for a Picard scheme for approaching the solution.For the first algorithm, the Picard scheme is implemented in the form of a global Picard scheme on the whole process; for the second one, iterations are just performed on the marginal laws of the process.Although Picard's method sounds very natural, this strategy suffers, whatever the algorithm, from a serious drawback as Picard iterations for forward backward systems are only expected to converge in small time.Basically, the value of T for which the algorithm will converge depends on the coupling strength of the system; we make this fact clear for the global method by showing how T depends on the Lipschitz coefficients K B , K F and K G .In any case, bifurcations can be observed when T is increased, or equivalently, when the coupling strength between the two equations is increased.For our convenience (since it can be costly to increase T ), we will fix T and explore the convergence of the algorithms as we vary the coupling strength.This is one first step in our report: Compare how the two algorithms suffer from the coupling strength between the forward and backward equations.
The second key feature of our report is to use, for both algorithms, a continuation in time, which allows us to extend the value of the coupling parameter for which the algorithms converge.
In the following sections, we detail our Picard approaches for the two algorithms and the continuation in time method.

Global Picard Iteration on a Small Time Interval
The main difficulty in numerically solving the FBSDE system is the fact that, not only the forward component X = (X t ) 0≤t≤T and backward component (Y, Z) = (Y t , Z t ) 0≤t≤T are coupled, but also they run in opposite directions.Thus, neither equation can be solved independently of the other, seemingly requiring us to manage both time directions simultaneously.Several strategies are conceivable to sort out this issue.A first one is to make use of the decoupling field of the system in order to work with one time direction instead of two (roughly speaking, the decoupling field is the value function u in the weak approach and its derivative ∇ x u in the Pontryagin one).We will investigate this method for our second algorithm; its implementation is indeed pretty subtle in the mean field setting and it leads to the aforementioned Picard method on the marginal laws.For our first algorithm, however, we limit ourselves to a brute force approach.To decouple the equations, we propose a global Picard iteration scheme, whose definition is as follows.For the initial and terminal data of the problem (ξ and G), we want to define a mapping Φ ξ,G that will take the j − 1 Picard iterate and produce the j Picard iterate: We define the decoupled Picard scheme Φ ξ,G as the following: 1. First, solve for X j which gives us [X j ].
2. Next, solve for Y j and Z j which gives us [Y j ] and [Z j ].
After initializing (X 0 , Y 0 , Z 0 , [X 0 , Y 0 , Z 0 ]), we can define a sequence by: and thus, (X, Y, Z) is a solution to the original FBSDE system in Equation ( 6).Picard schemes such as this one are only guaranteed to converge for a small time horizon, T , depending on the Lipschitz coefficients of the functions B, F and G (and in fact not only the convergence of the Picard sequence but also the solvability of the equation may fail on an arbitrary time interval).We illustrate this idea through the following example (the reader may find the general case in [17]): Let the driver function F = 0 and define the common Lipschitz coefficient K = max(K B , K G ).The FBSDE system becomes: We write the equation above in the integral form and take the expectation conditional to the filtration F t in the backward equation: Instead of one single X, let us now consider two (initial) processes denoted by X and X.After one Picard iteration, we arrive at X′ and X′ .The Picard iteration is given by: and similarly for X′ .From the forward component, we have the following estimate for the difference between X′ and X′ : Then we write for the backward component: In the second to last inequality, we used Doob's martingale inequality for the martingale term Combining the inequalities above for both forward and backward components, we obtain the following estimate: And then, Finally, when 16T 2 K 4 < 1, i.e.T ≤ 4/K 2 , the mapping Φ ξ,G realizes a contraction on the forward component and the Picard iteration defined above will converge to the fixed point, providing a solution to the original FBSDE.Thus, we have illustrated that Picard schemes are only expected to converge in small time or for small coupling (i.e.smaller Lipschitz coefficients).
Keeping in mind that we eventually want to describe numerical schemes, we define a solver picard, that will implement a finite number, J p ∈ Z + , of Picard iterations.Thus, we define: 1. Initialize X t = ξ, Y t = 0, and

Continuation in Time of the Global Method for Arbitrary Interval/Coupling
Of course, we would like the mapping Φ ξ,G to realize a contraction to make sure that the Picard iteration converges.As we just explained, this is the case when the forward and backward processes have a small enough coupling strength or for small time horizon T .But in practice, this is not always the case: It may happen that the FBSDE system is uniquely solvable, but that we are unable to prove that the Picard sequence converges.In order to overcome this issue, we follow the approach introduced in Chassagneux, Crisan, Delarue [14].Basically, the point is to divide the time interval into smaller intervals, called levels, and to apply a Picard solver recursively between the levels.
To define the levels, we fix a time mesh: We would like to use the picard solver introduced in the previous section to apply the Picard iteration on a given level.Notice that for an arbitrary level [T k , T k+1 ], we do not know the initial condition for X T k or the terminal condition for Y T k+1 .Thus, our current approximation of these values will be inputs to the picard solver in place of ξ and G.We would also like to modify the solver picard to also take the current estimate of (X, [X]) so that we don't have to start from scratch every time we wish to use picard.Thus, we define for level k: 1. Initialize Y t = 0, and where X = (X t ) T k ≤t≤T k+1 , and similarly for Y and Z and their laws.Note that Φ X T k ,Y T k+1 is the same as Φ ξ,G defined earlier except the time horizon is [T k , T k+1 ] instead of [0, T ] and the initial and terminal conditions are given by X T k and Y T k+1 instead of ξ and G, respectively.In particular, pay attention that we no longer consider the terminal condition in the form of a mapping (like G) but in the form a random variable (like Y T k+1 ).Implicitly, this requires to store, from one step to another, the full random variable Y T k+1 ; hence our choice below to use a tree.Now that we have a solver picard which will implement the Picard iteration for a given level, next, we want to define a global solver to apply a continuation in time.The global solver, called solver, is recursively defined as follows for some J s ∈ Z + .For a given level k, define: As before, it is important to notice that the entry X T k in solver is a random variable; [X T k ] is its law.Having the two in our notations is a bit redundant, but we feel it is more transparent for the reader.The break condition of the recursion is given by the terminal condition: The goal of the continuation in time is for a Picard iteration scheme to converge even for large coupling parameters or large time horizon.We will see in Section 4 that the continuation in time successfully achieves this goal for our benchmark examples.We refer to [14] for its theoretical analysis.
Thus far, we have described our Picard approach and a continuation in time method.Now we need to provide a scheme for discretizing the Picard iteration mapping Φ ξ,G .In this report, we implement a tree algorithm; we also give a variant of it, in the form of a grid algorithm.For both algorithms, we consider the uniform time mesh with time step h = T /N t > 0 with N ℓ < N t ∈ Z + and t i = ih, i = 0, ..., N t .For convenience, we will assume the coarse time mesh used to define the levels, {0 = T 0 < T 1 , . . ., T k , . . .T N ℓ −1 < T N ℓ = T }, is a subset of the fine time mesh {0 = t 0 < t 1 , . . ., t N t−1 < t Nt = T }.

Tree Algorithm for the Global Method
The first implementation of the Picard iteration, Φ ξ,G , is the tree algorithm presented in Chassagneux, Crisan, Delarue [14].We now provide a brief presentation of this algorithm.

Time Discretization
Our first step in developing a discretization for the Picard iteration Φ ξ,G is to discretize the problem in the time domain.We use the decoupled scheme derived in Chassagneux, Crisan, Delarue [14], and repeated below for convenience.As noted above, we consider the uniform time mesh with time step h = T /N t > 0 with N ℓ < N t ∈ Z + and t i = ih, i = 0, ..., N t .
Step 1) in defining Φ ξ,G requires solving the forward equation for X j .We use the classical Euler scheme: Note that when we calculate X j t i+1 , the value of X j t i and its law is known and could be substituted for X j−1 t i and its law in the drift function, B.
Step 2) in defining Φ ξ,G requires solving the backward equation for Y j and Z j .We derive the discrete-time scheme to approximate the backward component, see e.g.[6].The Y j component at time t i in the backward scheme is obtained by taking the expectation conditional to F t i , denoted as E t i , of the backward equation between t i and t i+1 .Let ∆W i = W t i+1 − W t i denote the forward Brownian increment between t i and t i+1 .The driver function F is approximated by its value at time t i and we use the fact that F at t i and Z t i are F t i -measurable, and As for the Z j component, we multiply the approximation of Y j t i by the Brownian increment ∆W i , taking the conditional expectation E t i , and using E((∆W i ) 2 ) = h.By noticing that our scheme will never make use of Z j T , we can simply set the terminal condition for the Z j component to 0.
Putting this together, the time-discretized decoupled forward-backward scheme for Picard iteration of our general FBSDE system (6) is the following: Other variants of this forward-backward scheme are possible.For example, in the forward scheme we could change j to j − 1 and t i to t i+1 on the right hand side.It is easy to observe that the forward-backward system is decoupled because of lagged Picard indices j − 1 and j.Thus, given ( , and similarly for Y and Z, we can solve the backward scheme autonomously and then the forward scheme to obtain (X j , Y j , Z j , [X j , Y j , Z j ]).We have not fully provided a discrete scheme for Φ ξ,G yet, however, because for a given t i , we have not discretized (X t i , Y t i , Z t i ).This is the goal of the next section.

Spatial Discretization via Tree Structure
The forward-backward decoupled scheme (7) above looks quite simple and explicit.However, it still presents some difficulties for the numerical computation.Firstly, it is difficult to compute the conditional expectation in the backward scheme.Secondly, it is non trivial to compute the law of X t i+1 forward in time.Even if there was no drift, the computation would involve the convolution of the law of X t i and a Gaussian law of the Brownian increment.Ultimately, we will need a spatial dicretization.
The approach of this algorithm is to approximate the Brownian increments using a simple binomial approximation: ∆W i = ± √ h with probability 1/2.This gives rise to a binomial tree for the forward scheme.Each node on the tree at depth i represents a value of X t i , and has two children nodes representing the two possible values of X t i+1 (the "up ↑" and the "down ↓" value), conditioned on the value of X t i .The two values are computed as follows: Suppose that we use M points x 1 , ..., x M for the approximation of the law ξ of the forward process at the initial time, i.e. [X 0 ] = ξ ≈ M k=1 p 0 k δ x k (•).Then we have M parallel binomial trees at each Picard iteration.For Picard iterate j and time t i , the number of nodes at depth i is M × 2 i with values of (X j t i , Y j t i , Z j t i ) saved on the nodes of the tree at depth i.The marginal law of each process at time t i can be determined by looking at all the values on the nodes at depth i.The backward scheme can be easily computed on the binomial tree.At the last time step, T = t Nt , we have ) for each of the M × 2 Nt nodes.The conditional expectation in the backward scheme at t i is simply the average of the "up" and "down" branches at t i+1 .
To initialize the j = 0 Picard iterate as in the definition of the solver picard, we want to set X t i = ξ, ∀i ∈ {0, . . ., N t }.This amounts to taking each initial value x k and initializing its entire tree to this value, meaning that X 0 t i = x k for all nodes at depth i and for all i ∈ {1, . . ., N t } of the k-th binomial tree.We then begin the Picard iteration by applying the mapping Φ ξ,G as detailed above.Using the binomial tree and approximation of Brownian increments, the forward-backward decoupled scheme becomes fully implementable.

Picard Iteration on the Marginal Laws: a Grid Algorithm
The complexity of the tree algorithm is exponential with respect to the number of time steps N t , since the size of the binomial tree is of order 2 Nt .The exponential complexity becomes problematic and makes continuation in time much slower when we deal with large time horizons.In order to reduce the size of the tree, a natural idea is to make some "recombination" of the binomial tree.But since the drift function B depends on the value of the process, the two branches "up-down" and "down-up" from the same node at time t i will not coincide at time t i+2 , in general.Instead of recombination, we may fix a spatial grid of a controllable size that the binomial tree can be projected onto, in order to avoid exponential complexity.This will be the first ingredient of our grid algorithm, which is the main novelty in our report.
Inspired by the paper of Delarue and Menozzi [18], where the authors used a spatial grid for the approximation of FBSDEs without mean-field interaction, we make an intensive use of the notion of decoupling field, which is the second key ingredient of our strategy.Indeed, using the representation result in Proposition 2.2 in [16], we know that there exist deterministic feedback functions (u, v) : [0, T ] × R × P 2 (R) → R with u a solution to a nonlinear PDE (on the space of measures) such that for a solution (X, Y, Z) to the general FBSDE system in Equation ( 6): Generally speaking, u is called the decoupling field of (6).Here comes the main observation: The time marginals of the solution to the general FBSDE system in Equation ( 6), ([X t , Y t , Z t ]) 0≤t≤T , can be equivalently characterized by (µ t , u(t, •), v(t, •)) 0≤t≤T where [X t ] = µ t .As in [18], our strategy is to thus approximate (u, v) instead of (X, Y, Z), but this is not so easy as u and v are defined on a space of infinite dimension (because the mean field component lives in P(R)).In order to overcome this difficulty, we propose to freeze the mean field argument in (u, v).This permits to regard u and v as functions of a finite-dimensional variable and thus to approximate both along the underlying spatial grid.Once an approximation of u and v has been computed for the given proxy of the marginal laws, we can update the value of this proxy by using a Picard method.Hence the name of this subsection.So, as opposed to the tree algorithm, we will no longer keep track of the pathwise laws of the processes.Instead, we will only compute the marginal laws at each time step of the time mesh by means of a Picard iteration.

Picard Iteration without Grid
We first give the inputs and outputs of our new Picard mapping without any grid approximation: Denoting below ϕ♯ν as the push-forward measure of the measure ν by the function ϕ, ) is defined by: 1. Solve the following SDE for X j : ) by solving: 3. Return ((µ j t , u j (t, •), v j (t, •)) 0≤t≤T ).
Given Ψ [ξ],G , we can go through the construction of the global Picard method and define similar routines picard and solver for our new Picard approach by replacing formally Φ ξ,G by Ψ [ξ],G .The various inputs and outputs in the new routines should be clear.The next step to make the whole algorithm entirely tractable is to show how to compute µ j t , u j , and v j explicitly in the mapping Ψ [ξ],G .Similar to Φ ξ,G , this mapping will be defined explicitly for a discretized scheme on a temporal and spatial grid in the next two sections.As in the tree algorithm, we consider the uniform time mesh with time step h = T /N t > 0 with N ℓ < N t ∈ Z + and t i = ih, i = 0, ..., N t .

Grid Approximation of Forward Component and its Law
We begin by fixing a spatial discretization grid.This grid could in principle be defined differently for each time step t i , but for simplicity, we consider a homogeneous grid Γ fixed for all time steps t i , i ∈ {0, ..., N t } with constant spatial step size ∆x.Let Π be the projection function on the grid Precisely, Π is given by The initial law ξ of the forward process is approximated as ξ ≈ µ 0 (•) on the grid Γ with N x points.Recall that in the tree algorithm, we cannot choose M , the number of points for the approximation of the initial law, to be too large as we will need M parallel binomial trees.Because the tree algorithm has exponential complexity, we are able to choose N x to be much larger than M , and in turn, the approximation of the initial law is more accurate in the grid algorithm than in the tree algorithm.We can initialize the Picard iterate (µ 0 i , u 0 i , v 0 i ) 0≤i≤Nt similar as before by letting µ 0 i = µ 0 and (u 0 i , v 0 i ) = (0, 0), for all i ≤ N t .We follow the definition of Step 1) for Ψ [ξ],G , but we use the Euler scheme for the forward process between t i and t i+1 : Suppose the j Picard iterate at time t i is given by µ j i with µ j 0 = µ 0 : The law of X j t i in the Euler scheme of the forward process is µ j i .Then we would like to define µ j i+1 as the law of X j t i+1 in the Euler scheme above, but the quantity X j t i+1 may not belong to the grid.
Instead of using formula ( 9), the natural idea is then to replace it by its projection on the grid Γ: The law of X j t i+1 is the convolution of µ j i with transition density q j (t i , t i+1 ; x k , x n ) with x k , x n two points of the grid Γ and k, n ∈ {1, ..., N x }.The transition probability is equal to the conditional probability that the process X j starting at time t i from point x k arrives at time t i+1 at the point x n , i.e.
The discretized law µ j i+1 is then written as: It is worth noticing that if we did not take the projection when computing X j t i+1 in the scheme (10), then its law would be given by convolution of the law µ j i with the Gaussian transition density qj (t i , t i+1 ; x k , y) associated to the Euler scheme.The transition densities q j and qj have the following relation: qj (t i , t i+1 ; x k , y)dy.
In fact, for a more tractable implementation of the forward scheme, we use the binomial approximation for the Brownian increments (8) introduced in the previous section.Note that quantization with more points can be easily applied.In this binomial case, with (↑)/(↓) representing the "up" and "down" branches, respectively, the transition probabilities on the grid can be easily computed: Then we can write µ j i+1 by computing the probabilities: At Picard iteration j ≥ 1, the forward scheme finally gives the flow of measures (µ j i ) Nt i=0 at discrete time steps (t i ) Nt i=0 of the discretized law defined on the grid Γ.Thus, we have described an implementation of Step 1) in the definition of the Picard mapping Ψ [ξ],G .In the next section, we detail the implementation of the backward components in Step 2).

Grid for the approximation of the backward component
Given the current Picard iterates (µ j i ,

we would like to detail
Step 2) in the definition of Ψ [ξ],G .Since we have a discrete spatial grid Γ, we wish to compute the values of u j i (x) and v j i (x) for x ∈ Γ.By replacing Y j t i and Z j t i with their respective feedback functions in the backward component in Equation ( 7), for x ∈ Γ we have the following backward scheme starting, for i ≤ N t − 1, with terminal condition at time T = t Nt , (u j Nt , v j Nt ) = (G, 0): The variable X j t i+1 with law [X j t i+1 ] = µ j i+1 is given by the forward scheme presented in the previous section with starting point X j t i = x.Notice that by construction of the forward scheme, X j t i+1 ∈ Γ and supp(µ j i+1 ) = Γ so u j i+1 (•) has been calculated and saved at time t i+1 .We have a more explicit scheme in the case of binomial approximation of the Brownian increments, which is used for the numerical results for our examples, with X j t i+1 (↑) and X j t i+1 (↓) as defined in the forward scheme, always conditional to X j t i = x and given µ j i at time t i : We have now described a scheme for Step 2) in the definition of Ψ [ξ],G .Putting this together with the previous section, we have described a fully implementable scheme for Ψ [ξ],G .Note that we can use this Picard iteration mapping to define the analogue of the solvers picard and solver.Importantly, we can also apply the continuation in time method to the grid algorithm as well.In the next section, we apply these two methods, the tree and grid algorithms, to five example problems.

Examples
We have collected five example problems to test the algorithms presented in Section 3.

Linear Example
The first example is a linear model which comes directly from Chassagneux, Crisan, Delarue [14], in which they implemented the tree algorithm.The system of interest is the following: For this problem, the solution is known explicitly: x 0 e aT 1 + ρ a (e aT − 1) For the numerical results, we let ρ = 0.1, a = 0.25, σ = 1, T = 1, and x 0 = 2.We vary h, the time step size, and for the grid algorithm, we use ∆x = h 2 .Figure 1 shows the log error between the numerical and true solution values of Y 0 as a function of log number of time steps for the tree and grid algorithm with one level (i.e.without using continuation in time).The left figure (tree algorithm) repeats the results in [14] and as expected, it decreases linearly.On the other hand, we also observe a negative trend in the rate of convergence for the grid algorithm.Thus, both algorithms appear to converge to the true solution.

Trigonometric Drivers Example
The second example also comes directly from Chassagneux, Crisan, Delarue [14].The system of interest is the following: For the numerical results, we let σ = 1, T = 1, x 0 = 0, h = 1/6 for the tree algorithm and h = 1/12 and ∆x = h 2 for the grid algorithm.For this problem, we observe a bifurcation when using the tree algorithm as we increase the coupling parameter, ρ. Figure 2 shows the values of Y 0 from the last 5 Picard iterations.Starting at about ρ = 3.5, the tree algorithm without continuation in time bifurcates.If the continuation in time method is used for the tree algorithm with two levels, there is no bifurcation for the range of values of ρ showed in the plot.Note that the results from the tree algorithm repeat those in [14].The grid algorithm performs quite well in the sense that even with only one level, the algorithm converges for all of the values of ρ in the plot.In particular, it avoids the exponential growth of the data structure characterizing the tree algorithm.Note that even though both the tree method with two levels and the grid method with one level converge, they produce different values for Y 0 for larger values of ρ.We believe this is because the tree algorithm is less accurate since the time step is larger.For values of ρ for which the tree algorithm bifurcates, we were interested in the effect of changing σ on the convergence.Figure 3 shows the last 5 values of Y 0 from the Picard iteration as σ varies for ρ = 5. Surprisingly, the value of σ plays no role in the bifurcation for this value of ρ.To see if σ would play a role when ρ is closer to the bifurcation point, we have analogous plots when ρ = 3.5 and ρ = 4 (see Figure 4).In these plots, however, it is clear that σ does affect the bifurcation.Understanding the role of σ on the bifurcation is an open question.It is interesting to note that even though the tree algorithm does not bifurcate for ρ = 3.5 when σ = 1, we still observe a bifurcation when we fix ρ = 3.5 and vary σ.This suggests that we check if the grid algorithm bifurcates as we vary σ for a fixed value of ρ where there is no bifurcation when σ = 1, such as ρ = 5. Figure 5 shows that the grid algorithm does not bifurcate as we change σ for fixed ρ = 5.

Mixed Model
The third example also comes directly from Chassagneux, Crisan, Delarue [14].The system of interest is the following: For the numerical results, we let σ = 1, T = 1, x 0 = 2, h = 1/6 for the tree algorithm and h = 1/12 and ∆x = h 2 for the grid algorithm.We also observe a bifurcation for this problem as we increase the coupling parameter, ρ. Figure 6 shows the values of Y 0 from the last 5 Picard iterations.Starting at about ρ = 1.5, the tree algorithm without continuation in time bifurcates.If the continuation in time method is used for the tree algorithm with two levels, the bifurcation point is pushed back to about ρ = 3, and pushed back further when using three levels to about ρ = 5.Note that these results for the tree algorithm repeat those in [14].The grid algorithm performs quite well again in the sense that it converges for all of the values of ρ shown, even when using only one level.Further, its major attractivness is the lower complexity compared to the tree algorithm.As in the trigonometric drivers example, we can also investigate the effect of changing σ for a value of ρ where the tree algorithm without continuation in time bifurcates.For ρ = 2, Figure 7 shows that the tree algorithm converges for large enough values of σ.Since we have observed that the tree algorithm converges for small value of ρ and large values of σ, this suggests trying a continuation in ρ and/or σ, instead of the continuation in time.Instead of implementing a full continuation method, we used a simpler method of incrementing the parameter of interest.
The incrementation in ρ is performed by starting the algorithm with a small value of ρ and letting it converge.Then ρ is increased by some fixed ∆ρ, and the algorithm is initialized with the solution from the previous value of ρ. Figure 8 shows the results from the incrementation in ρ.The incrementation in ρ only increases the bifurcation point by a small amount.
The incrementation in σ is similar, except for each value of ρ, we start the algorithm with a sufficiently large value of σ such that it will converge.Then σ is decreased by a fixed ∆σ. Figure 9 shows the results from the incrementation in σ.The incrementation in σ also only increases the bifurcation point by a small amount.If we take the previous example but change the drift and driver functions to also be in terms of the mean of the processes, then we have the following system, which we will refer to as the mixed model of means: For the last example, we noticed that increasing σ allows the tree algorithm to converge.We are thus interested to see if replacing the dynamics with the mean of the process will remove the effect of changing σ on the convergence.We use the same values of the parameters as before.Figure 10 shows the bifurcation with one level of the tree algorithm (i.e.without using continuation in time).The effect of changing σ is shown in Figure 11.Our prediction is confirmed: σ no longer affects the convergence when the dynamics are replaced with the mean of the process.

Examples: Linear Quadratic Mean Field Games
The last two examples belong to the family of linear quadratic (LQ) games.In these models, the dynamics of the state are linear in the sense that the drift is defined by an affine function: Furthermore, the running and the terminal cost are quadratic in the state and control variables.For the sake of simplicity, we choose not to include the cross terms, so that we define f and g as where: Using the weak formulation, the FBSDE system of interest is the following: If we use the Pontryagin formulation, the FBSDE system becomes: The numerical results are presented for ρ = 1, σ = 1, T = 1, x 0 = 0, h = 1/20 for the tree algorithm and h = 1/130 and ∆x = h 2 for the grid algorithm.Figure 12(a) shows the results for the grid algorithm for both the weak and Pontryagin approaches.The plot shows the weights of the distribution L(X T ).The results are similar between both approaches and coincide with the true solution.
We can look at the convergence rate by calculating the 2-Wasserstein distance between the numerical results and the true solution as we change the number of time steps.Since our state space is in one dimension, we can calculate the Wasserstein distance explicitly using the representation provided by Prokhorov [30]: , where F µ (x) = µ([0, x]), denotes the cumulative distribution function.Figure 12(b) presents the convergence rate of the grid algorithm in terms of the 2-Wasserstein distance calculated between the true solution and numerical results with respect to the number of time steps.As expected, the 2-Wasserstein distance decreases towards 0 as we increase the number of time steps, for both the Pontryagin and weak approaches.
The results for the tree algorithm are shown in Figure 13 for the weak and Pontryagin approaches.As with the grid algorithm, the weak and Pontryagin solutions are similar to each other and coincide with the true solution.

Trader Problem
The last example shows an application of mean field games to finance, such as the trader congestion model in the paper of Cardialaguet and Lehalle, [7].We focus on the Price Impact Model presented in the book of Carmona and Delarue in [8].The interest for this kind of model where α t corresponds to the trading rate.The price of the asset (S t ) 0≤t≤T is influenced by the trading strategies of all the traders trough the law of the controls (θ t = L(α t )) 0≤t≤T as follows: where γ and σ 0 are constants and the Brownian motion W 0 is independent from W .The amount of cash held by the trader at time t is denoted by the process (K t ) 0≤t≤T .The dynamic of K is modeled by where the function α → c α (α) is a non-negative convex function satisfying c α (0) = 0, representing the cost for trading at rate α.The wealth V t of the trader at time t is defined as the sum of the cash held by the trader and the value of the inventory with respect to the price S t : Applying the self-financing condition of Black-Scholes' theory, the changes over time of the wealth V are given by the equation: We assume that the trader is subject to a running liquidation constraint modeled by a function c X of the shares they hold, and to a terminal liquidation constraint at maturity T represented by a scalar function g.Thus, the cost function is defined by: Applying Equation (12), it follows that where the running cost is defined by We assume that the functions c X and g are quadratic and that the function c α is strongly convex in the sense that its second derivative is bounded away from 0. Such a particular case is known as the Almgren-Chriss linear price impact model.Thus, the control is chosen to minimize: T , over α ∈ A. To summarize, the running cost consists of three components.The first term represents the cost for trading at rate α.The second term takes into consideration the running liquidation constraint in order to penalize unwanted inventories.The third term defines the actual price impact.Finally, the terminal cost represents the terminal liquidation constraint.As for the flocking example, this model falls in the class of linear quadratic games.Assume that the initial condition is given by a constant, X 0 = x 0 .The solution is Gaussian with mean and variance defined as: Using the weak approach yields the following FBSDEs system: Alternatively, the FBSDE system obtained via the Pontryagin approach is: The numerical results focus on the effect of the continuation method for the grid algorithm.In contrast with the previous examples, we show that the grid algorithm is also affected by bifurcation.Figure 14 shows the last five Picard iterations of Y 0 for the Pontryagin approach when the number of levels ranges from 1 to 3. Fixing the parameters x 0 = 1, σ = 0.7, 1/c α = 1.5, c g = 0.3, γ = 2, T = 1, h = 1/12, ∆x = h 2 and increasing the coupling parameter, c X , we observe that the bifurcation effect can be corrected by increasing the number of levels.In fact, Figure 14 shows that the true value of Y 0 matches the value computed numerically when using three levels.
Figure 15(b) presents the convergence rate in terms of the 2-Wasserstein distance calculated between the true solution and numerical results with respect to the number of time steps.We again make use of the explicit representation of the Wasserstein distance [30].The numerical solution is obtained using the grid algorithm with parameters x 0 = 1, σ = 0.7, 1/c α = 0.3, c g = 0.3, γ = 2, c X = 2, T = 1, and ∆x = h 2 .As expected, the 2-Wasserstein distance decreases towards 0 as we increase the number of time steps.The last plot, Figure 16, shows the error from the true solution of the control at time 0, α 0 , as we increase the number of time steps.This value is given by α 0 = −Y 0 /c X for the Pontryagin approach and α 0 = −Z 0 /(c X σ) for the weak approach.The true value is given by α 0 = −η 0 x 0 /c X .As for the 2-Wasserstein distance, the error in the control decreases towards 0.

Conclusion
In conclusion, we have provided two algorithms for numerically solving FBSDEs of McKean-Vlasov type, which can be used to formulate the solutions to mean field game problems.The first algorithm is based on a pathwise tree structure.The second algorithm is based on a marginal grid structure.We have also proposed various refinements to the algorithms, including a continuation in time, and incrementation of a coupling parameter or the diffusion coefficient.The different numerical methods were illustrated on five benchmark examples.
The tree algorithm's main advantage is that we do not need to project the values of X t onto a discretized spatial grid, which potentially makes the algorithm more accurate.However, a significant disadvantage of the tree algorithm is the exponential growth of the data structure as the number of time steps is increased.This exponential growth is made worse yet if a higher order of quantization were to be used for approximating the Brownian increments.
The grid algorithm's main advantage is it avoids the exponential growth of the data structure.A higher order of quantization may be used without drastically changing the algorithm's complexity.A disadvantage of the grid algorithm is its sensitivity to the spatial step size with respect to the time step size.For the algorithm to be stable, the two step sizes need to be well adjusted to each other.
For both the tree and grid algorithms, we have observed that the continuation in time is able to extend the range of values of the coupling parameters for which the algorithms will converge.The incrementation methods proposed in Subsection 4.3, however, were not very successful at avoiding the bifurcations.
This report has touched on many things that could be explored deeper.First of all, the extension of the grid algorithm in [18] to the mean field setting has not been studied from a theoretical standpoint.It is an open question to determine if this algorithm converges (meaning that the error decreases as the grid size decreases).The effect of the continuation in time or incrementation of the coupling parameter and/or diffusion coefficient has also yet to be studied.The numerical results also raised questions on the influence of the diffusion coefficient in the bifurcations.

Figure 1 :
Figure 1: Linear Example: Convergence of the algorithms with one level as the number of time steps increases.

1 Figure 2 :
Figure 2: Trigonometric Drivers Example: Bifurcations in the values of Y 0 appear as the coupling parameter, ρ, increases.The tree algorithm with one level is shown in blue circles and with two levels is shown in red squares.The grid algorithm with one level is shown in black asterisks.

Figure 6 :
Figure6: Mixed Model: Bifurcations in the values of Y 0 appear as the coupling parameter, ρ, increases.The tree algorithm with one, two, and three levels is shown in blue circles, red squares, and green triangles, respectively.The grid algorithm with one level is shown in black asterisks.

2 Y0Figure 7 :
Figure 7: Mixed Model: Tree algorithm with one level for ρ = 2.The algorithm converges for large enough values of σ.

Figure 8 :
Figure 8: Mixed Model: Tree algorithm with one level.Without continuation or incrementation is shown in black.Incrementation in ρ with ∆ρ = 0.1, 0.01, and 0.001 are shown in blue circles, red squares, and green triangles, respectively.

Figure 9 :
Figure 9: Mixed Model: Tree algorithm with one level.Without continuation or incrementation is shown in black asterisks.Incrementation in σ with ∆σ = 1, 0.5, and 0.1 are shown in blue circles, red squares, and green triangles, respectively.

Figure 10 :Figure 11 :
Figure 10: Mixed Model of Means: Bifurcation for the tree algorithm with one level.

Figure 14 :
Figure 14: Trader Problem: Bifurcations in the values of Y 0 depending on the coupling parameter c X for different number of levels in the grid algorithm.One, two, and three levels are shown in blue circles, red squares, and green triangles, respectively.The true value of Y 0 is shown in black asterisks.

Figure 15 :
Figure 15: Trader Problem: (a) Distribution µ T of the players' states at time T for the grid algorithm with one level.Pontryagin is shown in blue circles, weak is shown in red squares, and the true solution is shown in black asterisks.(b) 2-Wasserstein distance between true solution and numerical solution for grid algorithm with one level as we increase the number of time steps, plotted as a log-log plot.Pontryagin approach is shown in blue circles and weak approach is shown in red squares.

Figure 16 :
Figure16: Trader Problem: Error in the control at time 0, α 0 , as we increase the number of time steps, plotted as a log-log plot.