Numerical approximation of general Lipschitz BSDEs with branching processes

We extend the branching process based numerical algorithm of Bouchard et al. [3], that is dedicated to semilinear PDEs (or BSDEs) with Lipschitz nonlinearity, to the case where the nonlinearity involves the gradient of the solution. As in [3], this requires a localization procedure that uses a priori estimates on the true solution, so as to ensure the well-posedness of the involved Picard iteration scheme, and the global convergence of the algorithm. When, the nonlinearity depends on the gradient, the later needs to be controlled as well. This is done by using a face-lifting procedure. Convergence of our algorithm is proved without any limitation on the time horizon. We also provide numerical simulations to illustrate the performance of the algorithm.


Introduction
The aim of this paper is to extend the branching process based numerical algorithm proposed in Bouchard et al. [3] to general BSDEs in form: where W is a standard d-dimensional Brownian motion, f : R d × R × R d → R is the driver function, g : R d → R is the terminal condition, and X is the solution of with constant initial condition X 0 ∈ R d and coefficients (µ, σ) : R d → R d × R d×d , that are assumed to be Lipschitz 1 . From the PDE point of view, this amounts to solving the parabolic equation Tr[σσ D 2 u] + f ·, u, Duσ = 0, u(T, ·) = g.
The main idea of [3] was to approximate the driver function by local polynomials and use a Picard iteration argument so as to reduce the problem to solving BSDE's with (stochastic) global polynomial drivers, see Section 2, to which the branching process based pure forward Monte-Carlo algorithm of [12,13,14] can be applied. See for instance [16,17,18] for the related Feynman-Kac representation of the KPP (Kolmogorov-Petrovskii-Piskunov) equation. This algorithm seems to be very adapted to situations where the original driver can be well approximated by polynomials with rather small coefficients on quite large domains. The reason is that, in such a situation, it is basically a pure forward Monte-Carlo method, see in particular [3, Remark 2.10(ii)], which can be expected to be less costly than the classical schemes, see e.g. [1,4,5,11,21] and the references therein. However, the numerical scheme of [3] only works when the driver function (x, y, z) → f (x, y, z) is independent of z, i.e. the nonlinearity in the above equation does not depend on the gradient of the solution.
Importantly, the algorithm proposed in [3] requires the truncation of the approximation of the Y -component at some given time steps. The reason is that BSDEs with polynomial drivers may only be defined up to an explosion time. This truncation is based on a priori estimates of the true solution. It ensures the well-posedness of the algorithm on an arbitrary time horizon, its stability, and global convergence.
In the case where the driver also depends on the Z component of the BSDE, a similar truncation has to be performed on the gradient itself. It can however not be done by simply projecting Z on a suitable compact set at certain time steps, since Z only maters up to an equivalent class of ([0, T ] × Ω, dt × dP).
Alternatively, we propose to use a face-lift procedure at certain time steps, see (2.10). Again this time steps depend on the explosion times of the corresponding BSDEs with polynomial drivers. Note that a similar face-lift procedure is used in Chassagneux, Elie and Kharroubi [6] 2 in the context of the discrete time approximation of BSDEs with contraint on the Z-component.
We prove the convergence of the scheme. The very good performance of this approach is illustrated in Section 4 by a numerical test case.
Notations: All over this paper, we view elements of R d , d ≥ 1, as column vectors. Transposition is denoted by the superscript . We consider a complete filtered probability space (Ω, F, F = (F t ) t≤T , P) supporting a ddimensional Brownian motion W . We simply write E t [·] for E[·|F t ], t ≤ T . We use the standard notations S 2 (resp. L 2 ) for the class of progressively measurable processes ξ such that ξ 2 ) is finite. The dimension of the process ξ is given by the context. For a map (t, x) → ψ(t, x), we denote by ∂ t ψ is derivative with respect to its first variable and by Dψ and D 2 ψ its Jacobian and Hessian matrix with respect to its second component.

Approximation of BSDE using local polynomial drivers and the Picard iteration
For the rest of this paper, let us consider the (decoupled) forward-backward system (1.1)-(1. 2) in which f and g are both bounded and Lipschitz contin-uous, and σ is non-degenerate such that there is a constant a 0 > 0 satisfying We also assume that µ, σ, Dµ and Dσ are all bounded and continuous. In Remark 2.1. The above assumptions can be relaxed by using standard localization or mollification arguments. For instance, one could simply assume that g has polynomial growth and is locally Lipschitz. In this case, it can be truncated outside a compacted set so as to reduce to the above. Then, standard estimates and stability results for SDEs and BSDEs can be used to estimate the additional error in a very standard way. See e.g. [7].

Local polynomial approximation of the generator
As in [3], our first step is to approximate the driver f by a driver f • that has a local polynomial structure. The difference is that it now depends on both components of the solution of the BSDE. Namely, let , the functions (b q ) 0≤q≤q• and (c j, , ϕ j ) ∈L,1≤j≤j• (with j • ∈ N) are continuous and satisfy for all 1 ≤ j ≤ j • , 0 ≤ q ≤ q • and ∈ L, for some constants C L , L ϕ ≥ 0. For a good choice of the local polynomial f • , we can assume that (x, y, z) →f • x, y, z := f • x, y, z, y, z is globally bounded and Lipschitz. Then, the BSDĒ has a unique solution (Ȳ ,Z) ∈ S 2 × L 2 , and standard estimates imply that (Ȳ ,Z) provides a good approximation of (Y, Z) wheneverf • is a good approximation of f : for some C • > 0 that depends on the global Lipschitz constant off • (but not on the precise expression off • ), see e.g. [7]. One can think at the (c j, ) ∈L as the coefficients of a polynomial approximation of f in terms of (y, (b q (x) z)) q≤q• on a subset A j , the A j 's forming a partition of [−M, M ] 1+d . Then, the ϕ j 's have to be considered as smoothing kernels that allow one to pass in a Lipschitz way from one part of the partition to another one. The choice of the basis functions (b q ) q≤q• as well as (ϕ j ) 1≤j≤j• will obviously depend on the application, but it should in practice typically be constructed such that the sets are large and the intersection between the supports of the ϕ j 's are small. See [3, Remark 2.10(ii)] and below. Finally, since the functionf • is chosen to be globally bounded and Lipschitz, by possibly adjusting the constant M , we can assume without loss of generality that For later use, let us recall thatȲ is related to the unique bounded and continuous viscosity solutionū of Moreover,ū is bounded by M and M -Lipschitz.

Picard iteration with truncation and face-lifting
Our next step is to introduce a Picard iteration scheme to approximate the so-lutionȲ of (2.4) so as to be able to apply the branching process based forward Monte-Carlo approach of [12,13,14] to each iteration: given (Ȳ m−1 ,Z m−1 ), use the representation of the BSDE with driver f • (X, ·, ·,Ȳ m−1 ,Z m−1 ).
is a polynomial, given fixed (x, y , z ), and hence only locally Lipschitz in general. In order to reduce to a Lipschitz driver, we need to truncate the solution at certain time steps, that are smaller than the natural explosion time of the corresponding BSDE with (stochastic) polynomial driver. As in [3], it can be performed by a simple truncation at the level of the first component of the solution. As for the second component, that should be interpreted as a gradient, a simple truncation does not make sense, the gradient needs to be modified by modifying the function itself. Moreover, from the BSDE viewpoint, Z is only defined up to an equivalent class on ([0, T ] × Ω, dt × dP), so that changing its value at a finite number of given times does not change the corresponding BSDE. We instead use a face-lifting procedure, as in [6].
More precisely, let us define the operator R on the space of bounded functions ψ : The operation ψ → sup p∈R d ψ(· + p) − δ(p) maps ψ into the smallest M -Lipschitz function above ψ. This is the so-called face-lifting procedure, which has to be understood as a such that N h := T /h ∈ N, and define Our algorithm consists in using a Picard iteration scheme to solve (2.4), which re-localize the solution at each time step of the grid T by applying operator R. Namely, using the notation X t,x to denote the solution of (1 In above, the existence and uniqueness of the solution (Y x,m , Z x,m ) to (2.9) is ensured by Proposition A.1. The projection operation in (2.10) is consistent with the behavior of the solution of (2.4), recall (2.7), and it is crucial to control the explosion of (Ȳ m ,Z m ) and therefore to ensure both the stability and the convergence of the scheme. This procedure is non-expansive, as explained in the following Remark, and therefore can not alter the convergence of the scheme.
Remark 2.2. Let ψ, ψ be two measurable and bounded maps on R d . Then, for all t ≤ T and all measurable and bounded map ψ.
, recall (2.6) and the definition off • in terms of f • . This means that we do not need to be very precise on the original prior, whenever the sets A j can be chosen to be large.
Moreover, for any constant ρ ∈ (0, 1), there is some constant C ρ > 0 such that . As a consequence, (Ȳ m ,Z m ) is also uniquely defined and satisfies |Ȳ m | ∨ Z m σ(X) −1 ≤ M h• . Using again Proposition A.1, one has the existence of (ū m ,v m ) satisfying the condition in the statement. ii) We next prove the convergence of the sequence (Ȳ m ,Z m ) m≥0 to (Ȳ ,Z). Since {(Ȳ x,m ,Z x,m ), x ∈ R d } is uniformly bounded, the generator f • (x, y, z, y , z ) in (2.9) can be considered to be uniformly Lipschitz in (y, z) and (y , z ). Assume that the corresponding Lipschitz constants are L 1 and L 2 .

Recalling Remark 2.2, this shows that
Assume now that (2.15) holds true for i + 1 ≤ N h and some given C i+1 > 0.
Recall that ρ i ≥ ρ N h . Applying (2.13) with the above choice of λ 1 , λ 2 and β, we obtain which, by (2.14) and the fact that ρ i < 1, induces that Let us further choose λ 2 > 0 such that L 2 /λ 2 ≤ ρ N h , and recall that ρ i ≥ ρ N h . Then, using again (2.13), (2.14), (2.15) applied to i + 1, we obtain, for so that it follows from Remark 2.2 that where this concludes the proof.
Given the above, we construct particles X (k) that have the dynamics (1.2) up to a killing time T k at which they split in ξ k 1 := ξ k,0 + · · · ξ k,q• different (conditionally) independent particles with dynamics (1.2) up to their own killing time. The construction is done as follows. First, we set T (1) := δ 1 , and, given k = (1, k 2 , . . . , k n ) ∈ K n with n ≥ 2, we let T k := δ k + T k− in which k− := (1, k 2 , . . . , k n−1 ) ∈ K n−1 . We can then define the Brownian particles (W (k) ) k∈K by using the following induction: we first set then, given n ≥ 2 and k ∈K n−1 in which we use the notation Now observe that the solution X x of (1.2) on [0, T ] with initial condition X x 0 = x ∈ R d can be identified in law on the canonical space as a process of the form Φ[x](·, W ) in which the deterministic map (x, s, ω) → Φ[x](s, ω) is B(R d ) ⊗ P-measurable, where P is the predictable σ-filed on [0, T ] × Ω. We then define the corresponding particles (X x,(k) ) k∈K by X x,(k) := Φ[x](·, W (k) ). Moreover, we define the d × d-dimensional matrix valued tangent process where I d denotes the d × d-dimensional identity matrix, and σ i denotes the i-th column of matrix σ.
The next proposition shows that (ũ m ,ṽ m ) actually coincides with (ū m ,v m ) in (2.12), a result that follows essentially from [13]. Nevertheless, to be more precise on the square integrability of U m t,x and V m t,x , one will fix a special density function ρ as well as probability weights (p ) ∈L . Recall again that

Numerical example
This section is dedicated to a simple example in dimension one showing the efficiency of the proposed methodology. We use the Method A of [3, Section 3] together with the normalization technique described in [20] and a finite difference scheme to computeṽ m . In particular, this normalization technique allows us to take ρ as an exponential density rather than that in Proposition 3.1. See Remark 3.2.
We consider the SDE with coefficients The maturity is T = 1 and the non linearity in (1.1) is taken as .
It is complemented by the choice of the terminal condition g(x) = 1+cos(x) 2 , so that an analytic solution is available : We use the algorithm to compute an estimation of u(0, ·) on X := [−1., 1.].
To construct our local polynomials approximation off , we use a linear spline interpolation in each direction, obtained by tensorization, and leading to a local approximation on the basis 1, y, z, yz on each mesh of a domain [0, 1] × [−1, 1]. Figure 1 displays the functionf and the error obtained by a discretization of 20 × 20 meshes.

The driverf
Error on the driver due to the linear spline representation. The parameters affecting the convergence of the algorithm are: • The couple of meshes (n y , n z ) used in the spline representation of (4.1), where n y (resp. n z ) is the number of linear spline meshes for the y (resp. z) discretization.
• The number of time steps N h .
• The grid and the interpolation used on X at t 0 = 0 and for all dates t i , i = 1, . . . , N h . Note that the size of the grid has to be adapted to the value of T , because of the diffusive feature of (1.2). All interpolations are achieved with the StOpt library (see [9,10]) using a modified quadratic interpolator as in [19]. In the following, ∆x denotes the mesh of the space discretization.
• The time step is set to 0.002 and we use an Euler scheme to approximate (1.2).
• The accuracy of the estimation of the expectations appearing in our algorithm. We compute the empirical standard deviation θ associated to each Monte Carlo estimation of the expectation in (3.2). We try to fix the numberM of samples such that θ/ M does not exceed a certain level, fixed at 0.000125, at each point of our grid. We cap this number of simulations at 510 5 .
• The intensity, set to 0.4, of the exponential distribution used to define the random variables (δ k ) k∈K .
Finally, we take M = 1 in the definition of R.
On the different figures below, we plot the errors obtained on X for different values of N h , (n y , n z ) and ∆x. We first use 20 time steps and an interpolation step of 0.1 In figure 2, we display the error as a function of the number of spline meshes. We provide two plots: • On the left-hand side, n y varies above 5 and n z varies above 10, • On the right-hand side, we add (n y , n z ) = (5,5). It leads to a maximal error of 0.11, showing that a quite good accuracy in the spline representation in z is necessary.  In Figure 4, we finally let the number of time steps N h vary. Once again we give two plots: • one with N h above or equal to 20, • one with small values of N h .
The results clearly show that the algorithm produces bad results when N h is too small: the time steps are too large for the branching method. In this case, it exhibits a large variance. When N h is too large, then interpolation errors propagate leading also to a deterioration of our estimations. Numerically, it can be checked that the face-lifting procedure is in most of the cases useless when only one Picard iteration is used: • When the variance of the branching scheme is small, the face-lifting and truncation procedure has no effect, • When the variance becomes too large, the face-lifting procedure is regularizing the solution and this permits to reduce the error due to our interpolations.
In figure 5, we provide the estimation with and without face-lifting, obtained with N h = 10, (n y , n z ) = (20, 10) and a space discretization ∆x = 0.1. For N h = 20 the computational time is equal to 320 seconds on a 3 year old laptop.
for t ≤ s ≤ T . Standard estimates lead to for some C µ,σ > 0 that only depends on µ ∞ , σ ∞ , Dµ ∞ , Dσ ∞ , a 0 in (2.1) and T . In particular, it does not depend on X 0 . Up to changing this constant, we may assume that and let M h• ≥ M and h • ∈ (0, T ] be such that and

A.2 Proof of the representation formula
We adapt the proof of [13, Theorem 3.12] to our context. We proceed by induction. In all this section, we fix and assume that the result of Proposition 3.1 holds up to rank m − 1 ≥ 0 on [0, T ] (with the convention U 0 · = y, V 0 · := Dy), and up to rank m on [t i+1 , T ]. In particular, we assume thatũ m (t i+1 , ·) =ū m (t i+1 , ·). We fix = 4 and define in which ∇X t,x is the tangent process of X t,x with initial condition I d at t.
We then set SinceF is non-increasing andũ m (t i+1 , ·) =ū m (t i+1 , ·) is bounded by M and M -Lipschitz, direct computations imply that Proof. Letb ∈ R d be a fixed vector. Set Q t,x := ∇X t,xb . Then, it follows from direct computations that Further, remember that each b q is assumed to be bounded by 1, so that b q σ −1 2 is uniformly bounded byλ (σσ ) −1 . Then, direct computations lead to