NUMERICAL METHODS FOR STOCHASTIC DIFFERENTIAL EQUATIONS: TWO EXAMPLES

. The goal of this paper is to present a series of recent contributions arising in numerical probability. First we present a contribution to a recently introduced problem: stochastic diﬀerential equations with constraints in law, investigated through various theoretical and numerical viewpoints. Such a problem may appear as an extension of the famous Skorokhod problem. Then a generic method to approximate in a weak way the invariant distribution of an ergodic Feller process by a Langevin Monte Carlo simulation. It is an extension of a method originally developed for diﬀusions and based on the weighted empirical measure of an Euler scheme with decreasing step. Finally, we mention without details a recent development of a multilevel Langevin Monte Carlo simulation method for this type of problem.


Introduction
Monte Carlo simulation-based probabilistic numerical methods, which were initially developed after the World War II to solve neutron simulation problems, received renewed interest in the 1990s with the emergence of tradable derivative products in finance (option pricing and hedging). This is largely due to the conceptual simplicity of the most popular of these methods: the Monte Carlo simulation method and the direct access to more and more powerful computing means via the development of microcomputers (PCs). Gradually, the analysis of these simulation-based numerical methods -not only the Monte Carlo method but also Metropolis-Hasting algorithm, simulated annealing, particle methods, etc -became a more autonomous field under the name of Numerical Probability.
In this paper, we would like to present three recent contributions on the topic or in close connection to it. In Section 2, P.-É. Chaudru de Raynal, introduces a new problem in the field of diffusion processes by considering a constraint in law on the process which can be seen in some sense as an extension of the Skorokhod problem. After theoretical analysis numerical method of simulation are proposed and analyzed. This contribution is based on a work with P. Briand, A. Guillin and C. Labart [2]. In Section 3, C. Rey and G. Pagès propose a simulation method to compute the invariant distribution of a (non-simulatable) Feller continuous Markov process when the increments of this process can be approximated by simulatable Markov kernels. This work appears as an extension of a series of papers devoted to the approximation of ergodic diffusion processes by an Euler scheme with decreasing steps ( [10,11,13,16,19]). The proposed algorithm based on the simulation of only one path of the approximating process appears for this reason as a Langevin Monte Carlo simulation method (with a possibl non-constant diffusion parmater). An extended version of the presented results, including detailed proofs, is available in [18]. In Section 4, is briefly presented an application of the multilevel (see [8]) paradigm (in its weighted version, see [14]), still for the recursive approximation of the invariant distribution in a Brownian diffusion framework. This is a joint work by G. Pagès and F. Panloup [17].
2. Stochastic differential equation with constraint in law: well-posedness, mean field limit and numerical scheme We shortly describe stochastic differential equations constrained in law: existence and uniqueness result as well as propagation of chaos for the corresponding interacting particles system constrained in mean field. Based on these results, we propose a numerical algorithm to approximate the solution of such a system. The present exposition is based on a work done with P. Briand, A. Guillin and C. Labart [2]. Here, we will only sketch the proofs and give the main ideas of how the results are obtained and we refer the reader to the aforementioned paper for details.

The system
Stochastic Differential Equations (SDE in short) constrained in law have been introduced recently in their backward form by Briand, Elie and Hu in [3]. In that framework, the path is reflected in order to constraint its own law to live in a given set. This somehow extends the usual Skorokhod problem for SDE (see e.g. the work of Tanaka [23] or Lions and Sznitman [15]). From a generic point of view, the issue consists in finding, for given T > 0, b, σ, h : R → R, (B s ) s≥0 a Brownian motion defined on some probability space (Ω, F, P), a couple (X, K), K being deterministic, satisfying on [0, T ] where ξ is a square integrable random variable independent of the Brownian motion and such that E[h(ξ)] ≥ 0. Since this kind of dynamic depends on its own law, such a system is called non-linear or of McKean-Vlasov type (see [22]).

Formal derivation of the equation (1) from a financial issue
Let X denote the value of some portfolio of assets. Under the classical Black & Scholes assumptions, the basket of assets included in the portfolio can be modeled as a drifted multidimensional geometric Brownian motion {S t } t≥0 . Hence, denoting by {π t } t≥0 the investment strategy on this basket of assets, the value of the portfolio at time t, denoted by X t , is given by π t · S t , i.e.
when we assume standard self-financing condition of Black & Scholes Theory. Therefore, assuming that π t depends on X t (which means that the strategy depends on the wealth), the dynamic of X t is given (up to a change of variable) as the solution of the following SDE: Suppose now that the owner of this portfolio wants to guarantee that the value of his investment cannot be below a given threshold α < 0. The rule can be applied stricto sensu but it can also be exceeded with a (small) probability β. In that case, the constraint on the portfolio is of the form P( Using the terminology of the measure at risk (see e.g. [1]), this means that the portfolio is constrained to live in acceptable sets of the form A = {X : E [h(X)] ≥ 0} for a given function h : R x → u(x) − p ∈ R, where u is some utility function from R to R and p is a given level of risk: Obviously, in order to satisfy this constraint, the investor must have a way to add some cash in the portfolio through the time investment period. Let us denote by K t the amount of cash that has to be added at time t in the portfolio in order to satisfy (3) (this can also be seen as the "price to pay" to balance the risk associated to X t ). Equation (2) then becomes Finally, we can assume that the investor does not want to add too many "extra cash" in the portfolio in order to optimize his investment, which means that the amount K t should be minimal. This translates, mathematically, into the so-called Skorokhod condition: Hence, putting together all our conditions ( (3) and (5)) on the dynamic (4), we end up with a dynamic of the form of (1) for the portfolio.

2.3.
Heuristic derivation of the reflecting process.
In the framework considered here, the explicit representation of the reflecting process K plays a central role. We hence give the heuristic derivation of such a representation. Let us denote by U the ("non reflected part" of the) dynamic of the portfolio: so that X t = U t + K t and define for all t in [0, T ] the mapping R x → H t (x) = E[h(x + U t )]. Our problem consists in finding, at any time t, the minimal quantity of cash that has to be added in the portfolio X in order to ensure that the portfolio satisfies the risk constraint i.e. we are looking for the solution of the equation i.e. x = (H −1 t (0)) + , provided the mapping H t is invertible for any t. Since the amount of cash added cannot be negative we obtain, setting K t the solution at time t, that:

Existence and uniqueness result
Existence and uniqueness result for (1) relies on usual fixed point argument and so, on Lipschitz property of the coefficients. Therefore, we suppose that the following assumptions hold (HC) The coefficients b and σ are supposed to be Lipschitz continuous (Hh) The mapping R x → h(x) ∈ R satisfies: • h is an increasing function; • there exist two positive constants m and M such that Under these assumptions we can prove the following result: Theorem 2.1. Under (HC) and (Hh) the system (1) has a unique strong solution i.e. there exists a unique pair (X, K) satisfying (1) such that K is a deterministic increasing process.
Sketch of proof. The proof heavily relies on an explicit representation of the process K heuristically derived in the previous paragraph. Let P(R) denotes the space of probability measures on R and set Hence, from the representation (7) of K we get that for any t, where µ s is the marginal at time s of U defined by (6). Then, the bi-Lipschitz property of the function h propagates to H so that the mapping G 0 : P(R) µ → G 0 (µ) ∈ R is a Lipschitz function w.r.t. the Wasserstein-1 distance (recall that for any probability measures ν and ν on R the Wasserstein-1 distance between ν and ν is given by . The result then follows from a usual fixed-point procedure. One can then focus on the property of the Stieljes measure dK when h is a C 2 b (R, R) function. Namely, using Itô's formula it can be shown that the following result holds.
Corollary 2.1. The Stieljes measure dK is absolutely continuous w.r.t. the Lebesgue measure with density This in particular ensures that, in that smooth setting, the SDE constrained in law is a usual McKean-Vlasov SDE with singular law dependence and this emphasizes as well the importance of the non negative assumption on h (or equivalently the bi-Lipschitz property of h).

Constrained in law SDE as mean field limit of interacting particles system
The analogy of this kind of system with the usual McKean-Vlasov SDE suggests to consider the mean field limit of the corresponding particles system. Consider for instance the following particles constrained in mean field: One can also use the analogous representation for the process K N : and show that this system is well posed using fixed point argument as well.
Then, thanks to the explicit representation of K and this representation of K N , we can implement a coupling argument as proposed in [22] and prove that the chaos propagates in (8) (the proof again relies on the Lipschitz properties of G 0 ) so that, denoting for any i in {1, . . . , N } byX i the i.i.d. copies of (1) having the same Brownian motion as the i th particle, we obtain that As suggested above, the proof is an adaptation to our framework of the coupling argument of [22]. Let us only focus on the two rates of convergence obtained. In the proof, the critical quantity which gives the rate is: where W is the Wasserstein-1 distance and whereμ N s is the empirical measure generated by the "non reflected part" of the (X i ) 1≤i≤N . We know from [21] and [7] that the rate of convergence of empirical measure of i.i.d. random variables to their law is of order N −1/2 . Moreover this rate is, in full generality, the optimal one. Especially, in our framework, it is not damaged by the supremum in time. In the second result, we then use the smoothness property of h and Itô's formula to take benefit from the fact that the family of empirical measures (μ N s ) 0≤s≤T is related to i.i.d. diffusion processes. This is how a better estimate is obtained.

On the numerical simulation of constrained in law SDE
As a direct application of that propagation of chaos result, one can design a numerical approximation of (1). The basic idea consists in using a classical Euler scheme for the numerical simulation of the particle system (8).
Let 0 = T 0 < T 1 < · · · < T n = T be a subdivision of [0, T ]. Given this subdivision, we denote by " " the mapping For simplicity, we consider only the case of regular subdivisions: for a given integer n, T k = k T /n, k = 0, . . . , n. We introduce the following discrete version of the particle systemX with the notatioñ Then, our numerical algorithm works as follows. Algorithm [A]: particle approximation of (8) We obtain the following strong error bounds on the resulting approximation process. Theorem 2.3. Suppose that assumptions (Hc) and (Hh) are in force and suppose moreover that there exists p > 4 such that E|ξ| p < ∞, then the algorithm [A] above leads to an approximation of order (1) (T /n) 1/2 + N −1/2 ;

Recursive computation of the invariant distributions of a Feller process
In this paper, we propose a method for the recursive computation of the invariant distribution (denoted ν) of a Feller processes (X t ) t 0 . The starting idea is to consider a non-homogeneous discrete Markov process which can be simulated using a family of transitions kernels (Q γ ) γ>0 and approximating (X t ) t 0 in a sense made precise further on.
As suggested by the pointwise Birkhoff ergodic theorem, we then show that some sequence (ν n ) n∈N * of random empirical measures a.s. weakly converges toward ν under some appropriate mean-reverting and moment assumptions. An abstract framework is developed which, among others, enables to extend this convergence to the L p -Wasserstein distance. For a given f , ν n (f ) can be recursively defined making its computation straightforward.

Convergence to invariant distributions -A general approach
In this section, we build a sequence of empirical measures from an approximation (X γ Γn ) n∈N of a Feller process (X t ) t 0 (which are not specified explicitly), where Γ n = n k=1 γ k is the time grid such that the step sequence (γ n ) n∈N * → n→+∞ 0. We show how it a.s. weakly converges to the set V, of the invariant distributions of (X t ) t 0 . To this end, we will provide as weak as possible mean reverting assumptions on the pseudo-generator of (X γ Γn ) n∈N on the one hand and appropriate rate conditions on the step sequence (γ n ) n∈N * on the other hand.

Construction of the random measures
Let (Ω, G, P) be a probability space. We consider a Feller process (X t ) t 0 (see [6] for details) on (Ω, G, P) taking values in a locally compact and separable metric space E. We denote by (P t ) t 0 the Feller semigroup (see [20]) of this process. We recall that (P t ) t 0 is a family of linear operators from C 0 (E) to itself such that P 0 f = f , P t+s f = P t P s f , t, s 0 (semigroup property) and lim t→0 P t f − f ∞ = 0 (Feller property). Using this semigroup, we can introduce the infinitesimal generator of (X t ) t 0 as a linear operator A defined on a subspace D(A) of C 0 (E), satisfying: For every f ∈ D(A), exists for the . ∞ -norm. The operator A : D(A) → C 0 (E) is thus well defined and D(A) is called the domain of A. From the Echeverria Weiss theorem (see Theorem 9.17 in [5]), the set of invariant distributions for (X t ) t 0 can be characterized in the following way: The starting point of our reasoning is thus to consider an approximation of A. First, we introduce the family of transition kernels (Q γ ) γ>0 from C 0 (E) to itself. Now, let us define the family of linear operators A := ( A γ ) γ>0 from C 0 (E) into itself, as follows The family A is usually called the pseudo-generator of the transition kernels (Q γ ) γ>0 and is an approximation of A as γ tends to zero. From a practical viewpoint, the main interest of our approach is that we can consider that there exists γ > 0 such that for every x ∈ E and every γ ∈ [0, γ], Q γ (x, dy) is simulable at a reasonable computational cost. We use the family (Q γ ) γ>0 , to build (X Γn ) n∈N (this notation replaces (X Its transition probability distributions are given by Q γn (x, dy), n ∈ N * , x ∈ E, i.e. : We can canonically extend (X Γn ) n∈N into a càdlàg process by setting X(t, ω) = X Γ n(t) (ω) with n(t) = inf{n ∈ N, Γ n+1 > t}. Then (X Γn ) n∈N is a simulable (as soon as X 0 is) non-homogeneous Markov chain with transitions ∀m n, P Γm,Γn (x, dy) = Q γm+1 • · · · • Q γn (x, dy), and law L(X Γn |X 0 = x) = P Γn (x, dy) = Q γ1 • · · · • Q γn (x, dy).
We use (X Γn ) n∈N to design a Langevin Monte Carlo algorithm. Notice that this approach is generic since the approximation transition kernels (Q γ ) γ>0 are not explicitly specified and then, it can be used in many different configurations including among others, weak numerical schemes or exact simulation i.e. (X Γn ) n∈N = (X Γn ) n∈N . In particular, using high weak order schemes for (X t ) t 0 may lead to higher rates of convergence for the empirical measures. The approach we use to build the empirical measures is quite general than since we consider some general weights which are not necessarily equal to the time steps. This generalization is motivated by technical reasons related to the analysis of the (weak) convergence rate not investigated in this note (see e.g. [10]). We define this weight sequence. Let η := (η n ) n∈N * be such that ∀n ∈ N * , η n 0, lim Now we present our algorithm adapted from the one introduced in [10] designed with a Euler scheme with decreasing step (X Γn ) n∈N of a Brownian diffusion process (X t ) t 0 . For x ∈ E, let δ x denote the Dirac mass at point x. For every n ∈ N * , we define the random weighted empirical random measures as follows This paper is dedicated to show that a.s. every weak limiting distribution of (ν η n ) n∈N * belongs to V. In particular when the invariant measure of (X t ) t 0 is unique, i.e. V = {ν}, we show that P − a.s. lim n→+∞ ν η n f = νf , for a generic class of continuous test functions f . The approach we develop can be decomposed into two steps. First, we establish a tightness property to obtain existence of at least one weak limiting distribution for (ν η n ) n∈N * . Then, in a second step, we identify everyone of these limiting distributions with an invariant distributions of the Feller process (X t ) t 0 exploiting the Echeverria Weiss theorem (see [5] Theorem 9.17).

Assumptions on the random measures
In this part, we present the necessary assumptions on the pseudo-generator A = ( A γ ) γ>0 in order to prove the convergence of the empirical measures (ν η n ) n∈N * . Recursive control. In our framework, we introduce a well suited assumption, referred to as the mean reverting recursive control of the pseudo-generator A, that leads to a tightness property on (ν η n ) n∈N * from which follows the existence (in weak sense) of a limiting distribution for (ν η n ) n∈N * . A supplementary interest of our approach is that it is designed to obtain the a.s. convergence of (ν η n (f )) n∈N * for a generic class of continuous test functions f which is larger than C b (E). To do so, we introduce a Lyapunov function V related to (X Γn ) n∈N . Assume that V a Borel function such that We now relate V to (X Γn ) n∈N introducing its mean reversion Lypapunov property. Let ψ, φ : [v * , ∞) → (0, +∞) some Borel functions such that A γ ψ • V exists for every γ ∈ (0, γ]. Let α > 0 and β ∈ R. We assume Lyapunov functions are usually used to show the existence and sometimes the uniqueness of the invariant measure of Feller processes. We refer to the extensive literature on the topic for more details: See for instance [9], [5] or [16]. Notice that the condition RC Q,V (I d , φ, α, β)(i) with φ concave appears in [4] to prove sub-geometrical ergodicity of Markov chains. In [12], a similar hypothesis to RC Q,V (I d , φ, α, β)(i), with φ not necessarily concave, is also used (with A γn replaced by A) to study the convergence of the weighted empirical measures (14) for the Euler scheme of a Brownian diffusion. The function φ controls the mean reverting property. In particular, we call strongly mean reverting property when φ = I d and weakly mean reverting property when lim y→+∞ φ(y)/y = 0, for instance φ(y) = y a , a ∈ (0, 1) for every y ∈ [v * , ∞). The function ψ is closely related to the identification of the set of test functions f for which we have lim n→+∞ ν η n (f ) = ν(f ) a.s., when ν is the unique invariant distribution of the underlying Feller process. To this end, for s 1, which is related to step weight assumption, we introduce the sets of test functions for which we will show the a.s. convergence of the weighted empirical measures (14): withṼ ψ,φ,s : Notice that our approach benefits from providing generic results because we consider general Feller processes and approximations but also because the functions φ and ψ are not specified explicitly.
Remark 3.1. The above assumption is devised to be checked on time discretization schemes of a Feller process that will be used in practice for practical simulation, typically some schemes of Euler type with decreasing steps. However it always corresponds to an "underlying" -and more intuitive -mean-reverting assumption satisfied by the Feller process may appear somewhat hidden by this formulation. In fact, in seminal studies such as in [10] (see also [13], [19] for related works), the results are established (for the Euler scheme of a diffusion process, possible with jumps or locally Lipschitz coefficients) under the natural mean-reverting assumption Its version with the pseudo-infinitesimal genrator A γn corresponds to Assumption RC Q,V (ψ, φ, α, β) (see 16)) when Ψ = I d and φ(y) = |y| a . In fact, under the infinitesimal generator approximation assumption (19) thereafter and suitable hypothesis on the regularity of (X t ) t 0 (related to square integrability of its increments for the empirical measures), (16) follows from (18), where ψ is a polynomial or an exponential function and φ(y) = y a , a ∈ (0, 1]. This connection is detailed in [18]. Infinitesimal generator approximation. This section presents the assumption that enables to characterize the limiting distributions of the a.s. tight sequence (ν η n (dx, ω)) n∈N * . We aim to estimate the distance between V and ν η n (see (14)) for n large enough. We thus introduce an hypothesis concerning the distance between ( A γ ) γ>0 , the pseudo-generator of (Q γ ) γ>0 , and A, the infinitesimal generator of (P t ) t 0 . We assume that there exists D(A) 0 ⊂ D(A) with D(A) 0 dense in C 0 (E) such that: where Λ f : E × R + → R + can be represented in the following way: Let (Ω,G,P) be a probability space. Let g : E → R q + , q ∈ N, be a locally bounded Borel measurable function and letΛ f : (E×R + ×Ω, B(E)⊗B(R + )⊗G) → R q + be a measurable function such that sup i∈{1,...,q}Ẽ [sup x∈E sup γ∈(0,γ]Λf,i (x, γ,ω)] < +∞ and Moreover, we assume that for every i ∈ {1, . . . , q}, sup n∈N * ν η n (g i , ω) < +∞, P(dω) − a.s., and thatΛ f,i satisfies one of the following two properties: There exists a measurable function γ : (Ω,G) → ((0, γ], B((0, γ])) such that: with K E the set of compact subsets of E.
Remark 3.2. Let (F, F, λ) be a measurable space. Using the exact same approach, the results we obtain hold when we replace the probability space (Ω,G,P) by the product measurable space (Ω × F,G ⊗ F,P ⊗ λ) in the above representations of Λ f but we restrict to this case for sake of clarity. This observation can be useful when we study jump process where λ stands for the jump intensity.
This representation assumption benefits from the fact that the transition functions (Q γ (x, dy)) γ∈(0,γ] , x ∈ E, can be represented using distributions of random variables which are involved in the computation of (X Γn ) n∈N * . In particular, this approach is well adapted to numerical schemes associated to a time grid (implemented here with a decreasing timestep) such as Euler, Milstein, Ninomiya-Victoir. . . for stochastic differential equations with a Brownian part or/and a jump component. Growth control and Step Weight assumptions. We conclude with hypothesis concerning the control of the martingale part of one step of our approximation. Let ρ ∈ [1, 2] and let I : R + → R + be an increasing function. For F ⊂ {f, f : (E, B(E)) → (R, B(R))} and g : E → R + a Borel function, we assume that, for every n ∈ N, with C f > 0 a finite constant which may depend on f . We will combine this assumption with the following step weight related ones: Remark 3.3. The reader may notice that GC Q (F, g, ρ, I ) holds as soon as (20) is satisfied with Q γn+1 f (X Γn ), n ∈ N * , replaced by a F X n := σ(X Γ k , k n)-progressively measurable process (X n ) n∈N * since we have for every X n ∈ L 2 (F X n ). We will also use the hypothesis with the convention η 0 /γ 0 = 1. Notice that this last assumption holds as soon as the sequence (η n /γ n ) n∈N * is non-increasing.
At this point we can focus now on the main results concerning this general approach.

Almost sure tightness
From the recursive control assumption, the following Theorem establish the a.s. tightness of the sequence (ν η n ) n∈N * and also provides a uniform control of (ν η n ) n∈N * on a generic class of test functions.

Identification of the limit
In Theorem 3.1, we obtained the tightness of (ν η n ) n∈N * . It remains to show that every limiting point of this sequence is an invariant distribution of the Feller process with infinitesimal generator A. This is the interest of the following Theorem which relies on the infinitesimal generator approximation and the Echeverria Weiss theorem (see [5] Theorem 9.17).
Theorem 3.2. Let ρ ∈ [1,2]. We have the following properties: A. Let D(A) 0 ⊂ D(A), with D(A) 0 dense in C 0 (E). We assume that A γn f exists for every f ∈ D(A) 0 and every n ∈ N * . Also assume that there exists g : E → R + a Borel function and I : R + → R + an increasing function such that GC Q (D(A) 0 , g, ρ, I ) (see (20)) and SW I,γ,η (g, ρ, I ) (see (21)) hold and that It follows that, P − a.s., every weak limiting distribution ν η ∞ of the sequence (ν η n ) n∈N * belongs to V, the set of the invariant distributions of (X t ) t 0 . Finally, if the hypothesis from Theorem 3.1 point B. hold and (X t ) t 0 has a unique invariant distribution, i.e. V = {ν}, then P-a.s. ∀f ∈ CṼ ψ,φ,s (E), lim n→+∞ ν η n (f ) = ν(f ), with CṼ ψ,φ,s (E) defined in (17). In the particular case where the function ψ is polynomial, (26) also reads as the a.s. convergence of the empirical measures for some L p -Wasserstein distances, p > 0. From the liberty granted by the choice of ψ in this abstract framework, where only a recursive control with mean reverting is required, it is also possible to obtain convergence for functions ψ with exponential growth.

Toward a Multilevel Langevin Monte Carlo
Multilevel Monte Carlo simulation methods have been introduced by M. Giles in [8] in 2008 to speed up simulations when only a biased approximations of the random variable of interest can be simulated at a reasonable computational cost. It relies on the combination of a coarse approximation corrected by several levels made of the difference of two refined levels at different scales. Refined levels need few simulated paths to produce good correctors which provide the efficiency of these methods inspired by multi-grid method developed in Numerical Analysis. Under assumptions of weak and strong quadratic convergence rates of the approximators, it is possible to optimize the effort (product of the variance by the complexity) on each level, including the coarse one, to produce an estimator which behaves almost like an unbiased one. Weighted versions were introduced in [14].
In [17], the weighted version of the multilevel paradigm is adapted to Langevin simulation. Langevin for invariant distributions of diffusions originally relies on the simulation of only one path of an Euler scheme with decreasing step (see e.g. [10]). The Langevin version of the multilevel paradigm consists in simulating paths of various lengths on each refined level to control optimally the effort. The new method is applied to various problems like double-well oscillator and sparse regression learning with a huge gain in terms performances, in fact quite comparable to what is observed with regular Monte Carlo simulation.