Bias behaviour and antithetic sampling in mean-field particle approximations of SDEs nonlinear in the sense of McKean

In this paper, we prove that the weak error between a stochastic differential equation with nonlinearity in the sense of McKean given by moments and its approximation by the Euler discretization with time-step h of a system of N interacting particles is O(1/N + h). We provide numerical experiments confirming this behaviour and showing that it extends to more general mean-field interaction and study the efficiency of the antithetic sampling technique on the same examples.


Introduction
According to [16], the strong rate of convergence of particle approximations of McKean-Vlasov Stochastic differential equations with Lipschitz coefficients is O(N −1/2 ) when N denotes the number of particles. This rate is driven by the statistical error and one may wonder whether the bias vanishes quicker. A parallel can be drawn with the time discretization of standard stochastic differential equations where, for Lipschitz coefficients, the strong rate of convergence of the explicit Euler-Maruyama scheme is O( √ h) [12] with h > 0 denoting the time-step. Using the Feynman-Kac partial differential equation associated with the stochastic differential equation, Talay and Tubaro [17] checked that, for smooth coefficients, the weak error behaves in O(h) and can even be expanded in powers of h. In the context of particles interacting through jumps, the O(N −1 ) behaviour of the bias is known. According to [6], for particle approximations of generalized Boltzmann equations, the total variation distance between the law of the path of a particle and the one of the limiting nonlinear Boltzmann process behaves in O(N −1 ). For Feynman-Kac particle models, expansions in powers of 1/N are obtained in [4].
The interest in the bias introduced by particle approximations is motivated by numerical efficiency. Indeed, the numerical experiments performed in Section 3 show a general O(N −1 ) behaviour of the bias, even in models with not so smooth coefficients. Under this behaviour, simulating √ N independent copies of the system with √ N particles leads to the same order of error (bias with the same order as the O(N −1/4 ×N −1/4 ) = O(N −1/2 ) statistical error) as one simulation of the system with N particles (bias smaller than the O(N −1/2 ) statistical error). And the former approach is less expensive than the latter as soon as the computational cost of the particle system grows more than linearly with the number of particles. The behaviour of the bias is also of interest in order to adapt to the number of particles the multilevel Monte Carlo method introduced by Giles [7] in the context of time discretization of standard stochastic differential equations. In [10], Haji-Ali and Tempone combine both discretizations through the Multi-index Monte Carlo method. In this perspective, another interesting question is the possibility to take advantage of the antithetic sampling technique introduced in [9] to reduce the variance (see [3,1] or [8] p57 for the use of this technique in nested Monte Carlo computations). Does the variance of the difference between the empirical mean of the system with 2N particles driven by the i.i.d. couples (X i 0 , W i ) 1≤i≤2N (X i 0 and W i respectively denote the initial condition and the Brownian motion of the i-th particle) and the mean of the empirical means of the two independent systems with N particles respectively driven by (X i 0 , W i ) 1≤i≤N and (X i 0 , W i ) N +1≤i≤2N converge quicker to 0 than O(N −1 )?
In this paper, using the Feynman-Kac partial differential equation associated with the limiting nonlinear process and its moments, we prove the respective O(N −1 ) and O(N −1 + h) behaviour of the bias for systems of diffusive particles interacting through moments and their Euler discretization with time step h. Of course, the computational cost of such systems is linear in the number N of particles and the above numerical motivation is not valid. Nevertheless this result can be seen as a first step before addressing more general interactions which could necessitate more advanced tools like the master equation for mean-field games introduced by Lions in his seminal lectures at Collège de France and studied in [2] from a probabilistic point of view. Theorem 3.2 [13] is proved using the master equation and implies the O(N −1 ) behaviour of the bias for one-dimensional stochastic differential equations with general interaction in the drift coefficient but no interaction in the diffusion coefficient. We provide numerical experiments showing that the O(N −1 ) behaviour holds in more general situations including ones with non smooth coefficients. Last, on the same examples, we check that the antithetic variance does not generally decrease quicker than O(N −1 ).

Estimation of the bias for systems of particles interacting through moments
Let α : R n → R p be Lipchitz continuous and σ = (σ l j ) 1≤j≤n,1≤l≤d : be Lipschitz continuous in their p + n last components uniformly in their first component and such that sup t∈[0,T ] (|σ(t, 0, 0)| + |b(t, 0, 0)|) < ∞. We consider the stochastic differential equation in dimension n nonlinear in the sense of McKean where X 0 is some square integrable R n -valued random variable independent from the d-dimensional Brownian motion (W t ) t≥0 . The drift and diffusion coefficient at time s depend on the law of X s through the moments E [α (X s )].
Since σ may only depend on a subset of coordinates of this expectation on b on another subset, moments in drift and diffusion can differ as well as their respective dimensions. By a solution, we mean a continuous process (X t ) t∈[0,T ] adapted to the filtration generated by the Brownian motion W and X 0 such that sup t∈[0,T ] E[|α(X t )|] < ∞ and the above equation holds with the integrals with respect to dW s and ds making sense. Notice that for any solution, x)) has affine growth uniformly for s ∈ [0, T ]. With the square integrability is Hölder continuous with exponent 1/2 on [0, T ].
The particle approximation of the SDE nonlinear in the sense of McKean (2.1) is given by the system with mean-field interaction copies of (X 0 , W ). By the Lipschitz assumptions, existence and trajectorial uniqueness hold for this N × n dimensional equation. The Yamada-Watanabe theorem ensures weak uniqueness and therefore exchangeability of the particles ((X i,N t ) t∈[0,T ] ) 1≤i≤N . Let us also introduce the Euler discretizations with time-step h > 0 of the SDE (2.1) and the particle system : 3) It is natural and convenient to consider that τ 0 Reasoning like in the laboratory example in [16] or in Theorem 1.3 [11], one easily checks the following result.
If for i ∈ N * , (X i t ) t∈[0,T ] denotes the process obtained by replacement of (X 0 , W ) by (X i 0 , W i ) in (2.1), this implies the following estimation of the bias introduced by the particle discretization: for any function ψ : R n → R Lipschitz with constant L ψ and in particular for each coordinate of α, The first inequality is crude since it prevents cancelations in average and one may wonder whether the bias converges faster to 0 as N → ∞. Under additional regularity, the answer is positive, which is our main result.

Theorem 2.2. Assume that
• σ is Lipschitz continuous in its p + n last components uniformly in its first component and such that • α, ψ are two times continuously differentiable with bounded derivatives up to the order two and Lipschitz continuous second order derivatives, • a = σσ * and b are globally Lipschitz continuous, continuously differentiable with respect to their variables with index in {2, . . . , 1 + p} with derivatives Lipschitz continuous with respect to their p + n last variables uniformly in their first variable, • there existsd ∈ N * ,σ : [0, T ] × R p × R n → R n×d such that for all (t, y, x) ∈ [0, T ] × R p × R n , a(t, y, x) = σσ * (t, y, x) andσ, b are continuous and two times continuously differentiable with respect to their n last variables with bounded derivatives up to the order two and second order derivatives Lipschitz continuous with respect to their n last variables uniformly in their 1 + p first variables, The idea of the proof is, like in [17], to use the Feynman-Kac partial differential equation associated with the nonlinear SDE (2.1) to first check that the estimation holds for the coordinates of α before concluding that it holds for each test function ψ. The following proposition ensures existence and smoothness to this Feynman-Kac PDE which only depends on σ through a = σσ * . In its statement and its proof based on a stochastic flow approach, we take advantage of the flexibility given by the choice of a square root of a which explains the last assumption in Theorem 2.2. Let (W t ) t≥0 be ad-dimensional Brownian motion with coordinates (W l t ) 1≤l≤d and for is once (resp. twice) continuously differentiable with respect to the time (resp. space) variable s (resp. x) and such that u t,ψ together with its spatial derivatives up to the order two are Lipschitz continuous in x uniformly in 0 ≤ s ≤ t ≤ T . Moreover, u t,ψ is a classical solution to the Feynman-Kac PDE Here ∂ j and ∂ jk denote the partial derivative with respect to the j-th coordinate of x and with respect to its j-th and k-th coordinates.
Proof. Even if this result seems to be well-known, we could not find its proof in the literature. Therefore, we are going to give a sketch of its proof. For notational simplicity, we setσ 0 = b andW 0 t = t. Let for j ∈ {1, . . . , n}, e j denote the j-th vector of the canonical basis of R n and for l ∈ {0, . . . ,d}, ∂σ l = (∂ 1+p+kσ l j ) 1≤j,k≤n where ∂ 1+p+k is the partial derivative with respect to the (1 + p + k)-th variable. Let also for l ∈ {0, . . . ,d} and y, z ∈ R n , ∂ 2σl yz ∈ R n be defined by By [15] , Theorem 3.3 p223 and its proof, for r ∈ [s, T ], x →X s,x r is twice continuously differentiable P-almost surely with derivatives satisfying for j, k ∈ {1, . . . , n} Moreover, for any q ∈ [1, ∞), and ε ∈ (0, s]. By the flow property stated in Theorem 3.3 [15], ) . On the other hand, by Taylor expansion, we deduce that By continuity and uniform integrability of , the two first terms in the righthand side converge to 0 as ε → 0. Moreover the expectations in the two last terms are smaller than Cε 3/2 . Taking into account the continuity with respect to s of ∇ We are now ready to prove Theorem 2.2.
Proof of Theorem 2.2. Let t ∈ [0, T ]. Applying Itô's formula and taking into account the Feynman-Kac PDE (2.5), we obtain that Integrability deduced from Propositions 2.1 and 2.3 and the properties of σ ensure that the expectations of the stochastic integrals vanish. Therefore, setting for s and using that u t,ψ (t, .) = ψ(.), we obtain Let us now estimate e ψ jk (s) : Let us respectively denote by e 1,ψ jk (s), e 2,ψ jk (s), e 3,ψ jk (s), e 4,ψ jk (s) andē ψ jk (s) the five terms in the right-hand side. Since, by Proposition 2.3, ∂ jk u t,ψ is bounded by a finite constant M (2) ψ not depending in t and a is Lipschitz Since α is C 2 with bounded derivatives and Lipschitz continuous second order derivatives, for 0 ≤ r ≤ s ≤ T , computing α(X s ) − α(X r ) by Itô's formula, taking expectations and remarking that the expectation of the stochastic integral vanishes, we obtain the existence of a finite constant C not depending on r and s such that ψ not depending on t and a jk is Lipschitz continuous in space with constant L a we have |e 2,ψ jk (s)| ≤ L ψ , taking expectations and remarking that the contribution of the stochastic integral vanishes, we obtain that |e 3,ψ jk (s)| ≤ Ch with C not depending on N, s, t, h.
. Hence there is a finite constant C not depending on N, s, t, h such that Let us now estimate |ē ψ jk (s)|. Denoting by ∇ 2 a jk the partial gradient of a with respect to its variables with indices in {2, . . . , 1 + p}, we havē Let us respectively denote by e 5,ψ jk (s) and e 6,ψ jk (s) the two terms in the right-hand side. By Proposition 2.3, ∂ jk u t,ψ is bounded by M ψ ) not depending on t.
By assumption ∇ 2 a jk is bounded by M (2) a and Lipschitz continuous in its p + n last variables with constant L (2) a . Therefore, The second term in the right-hand side of the inequality is the variance of the empirical mean of i.i.d. random variables. It is therefore equal to 1 The first term is also a variance and we upperbound it by the corresponding second order moment, which is not greater than 1 according to Jensen's inequality for the empirical mean. Therefore, denoting by L α the Lipschitz constant of α, we have centered and by exchangeability of (X i,N,h ) 1≤i≤N , we may replace ∂ jk u t,ψ (s, in the expectation defining e 6,ψ jk (s). With the Cauchy-Schwarz inequality, we deduce that a ) and Lipschitz continuous with constant L (2) ψ (resp. L (2) a ), reasoning like in the derivation of (2.8), we obtain that the second factor in the right-hand side is smaller than (M N with C finite and not depending on N, s, t, ψ. With (2.7), we deduce that Estimating e ψ (s) in a similar way and using (2.6), we deduce the existence of a finite constant C not depending on N and h such that Summing for j ∈ {1, . . . , p} this inequality applied with ψ equal to the j-th coordinate of α, remarking that  The key step is that the decomposition of e 2,ψ jk (s) + e 3,ψ jk (s) avoids to differentiate the solution to the Feynman-Kac partial differential equation more than two times in space and only requires Lipschitz continuity of the second order spatial derivatives.

Numerical Experiments
We conduct two types of numerical tests. First, we estimate the bias using regular Monte-Carlo for examples of one dimensional (n = 1) mean-field SDEs taken from [14] to provide numerical evidence of the O(N −1 ) behaviour of the bias for a fixed value of the time step h. Then we present the antithetic sampling results on these same examples.
The code for running these experiments has a number of iterations as an input parameter. This latter is useful to observe the behaviour of the bias when increasing the number of particles. Therefore, we give an initial number of particles that we multiply by two from an iteration to the other. Except for the polynomial drift and the plane rotator examples where it is respectively equal to 8 and 4, the number of iterations chosen is equal to five and the initial number of particles is twenty so that the final number of particles is 320. The simulation is done with 5.10 6 runs except for the Plane rotator example where we push further the number of Monte Carlo runs up to 4.9 × 10 8 .
We also define the precision as half the width of the 95% confidence interval of the empirical mean i.e. Precision = 1.96 × Variance number of runs where Variance denotes the empirical variance over the runs of the empirical mean over the particles.
In order to measure the relevance of the antithetic sampling technique for variance reduction, we compute the variance of the difference between the empirical mean of the system with 2N particles and the mean of the empirical means of the two independent systems with N particles: Here (Y i,N t ) 1≤i≤N is the system with N particles driven by (X i 0 , W i ) N +1≤i≤2N .

Model
We consider the following generalization of the Ornstein-Uhlenbeck SDE to a linear mean-field SDE: for parameters γ, β, υ ∈ R and initial data x 0 ∈ R. The functions α(x) = x, b(t, y, x) = γx + βy and σ(t, y, x) = υ therefore satisfy the hypotheses of regularity of Theorem 2.2.
The first and second moments of X t are respectively given by E[X t ] = x 0 exp (γ + β)t and E[X 2 t ] = x 2 0 exp 2(γ + β)t + υ 2 2γ exp(2γt) − 1 . The associated particle approximation of the SDE is given by the following system: 1≤i≤N are independent Brownian motions.
Because of the linearity of the drift coefficient, it is possible to obtain closed form expressions for E[X 1,N,h t ] and E[(X 1,N,h t ) 2 ] and deduce the asymptotic behaviour of the biases of the first and second order moments as N → ∞ and h → 0.
Let h = T /K for some K ∈ N * . For k ∈ {0, . . . , K − 1} and i ∈ {1, . . . , N }, we have . . , N }, subtracting the two last equations, we obtain and deduce that We conclude that The bias of the time discretized second order moment is exactly of order 1 in 1 N .

Results
In order to illustrate the behaviour of the first and second order moments, we compute the difference between the closed-form discretized moments and the estimated moments and expect the difference to be null up to the statistical error. The results are shown in Tables 1 and 2 below where we denote by "Difference" that entity. As a test case, we use this model with γ = 1 2 , β = 4 5 , υ 2 = 1 2 and x 0 = 1.
Concerning the first order moment, we proved above that the bias does not depend on the number N of particles, which is what is observe in the first row of the  Table 1: Generalised Ornstein-Uhlenbeck SDE: Comparison of the estimated first moments with the closed-form discretized value 1.34865 as well as the associated precision when increasing the number of particles for a number of 5.10 6 runs, 50 time steps and a time horizon T=1.
As for the second order moment, from the third row we observe that the estimation fits the closed-form discretized value which confirms the behaviour of the bias of order 1 in 1 N .      For the second order moment, the ratio of successive variances is around 4 which means that the variance of the antithetic estimator is roughly proportional to N −2 . The antithetic sampling technique therefore shows an important improvement for this diffusion.

Model
The following SDE refers to a model for coupled oscillators in the presence of noise, also known as the Kuramoto model: where X 0 is distributed according to µ, P µ t denotes the distribution of X t , K > 0 a coupling parameter, k B the Boltzmann constant and T the temperature.
The particle system has the following dynamics: 1≤i≤N are independent Brownian motions.

Results
We use this model with K = 1, k B T = 1 8 and initial distribution µ = N ( π 4 , 3π 4 ) as a test case. The reference value was computed for 2.5 × 10 8 Monte Carlo runs, 1000 particles and the same input parameters as for the general estimation of the bias (time horizon T = 1 and time step h = 50). The value obtained is 0.737576. The results are shown in Table 4  These results are consistent with a bias proportional to N −1 .  The antithetic sampling method is therefore relevant in this example since the ratio of decrease is close to four which means that the variance of the antithetic estimation is roughly proportional to N −2 . This behaviour has also been observed in Section 3.2.2 [10].

Model
Let us consider the following mean-field SDE: for a certain parameter γ ∈ R and initial data x 0 ∈ R. The function σ(t, y, x) = x satisfy the hypotheses of regularity. However, the functions b(t, y, x) = γx + y 1 − xy 2 and α(x) = x x 2 are not Lipschitz.
From the evolution of the Euler discretization X h And we solve this system of inductive equations numerically to obtain reference values of the first and second order moments.
The idea here, once again, is to approximate E[X] and E[X 2 ] by their empirical means in order to define the dynamics of the particle system: 1≤i≤N are independent Brownian motions.

Results
We use this model with γ = 2 and x = 1 as a test case. The reference values obtained for the time discretized first and second moments are respectively 1.3845 and 3.13743 . For this example, we push the iterations further until 8 so that the final number of particles is 2560. We denote by "Ratio of decrease 1" and "Ratio of decrease 2" the respective entities First moment error(N/2) First moment error(N ) and Second moment error(N/2) Second moment error(N ) . The results are shown in Tables 6 and 7   We observe that the ratios of decrease of both the first and the second order moment errors seem to grow as the number of particle increases. It confirms the behaviour of the bias in O( 1 N ) of Theorem 2.2 and even tends towards a behaviour of order 1 in 1 N when the number of particle is large.
In order to measure the relevance of the antithetic sampling, we compute the variance (3.1) for the first and second order moments. We denote by "Ratio of decrease V1" and "Ratio of decrease V2" the respective entities   Table 9: Polynomial Drift SDE: Evolution of the antithetic variance for ψ(x) = x 2 with its associated precision when increasing the number of particles for a number of 5.10 6 runs, 50 time steps and a time horizon T=1. The results exposed in Tables 8 and 9 both show that when increasing the number of particles, the ratios of decrease grow gradually. Therefore, the antithetic sampling method may be relevant when simulating with a large number of particles.

Viscous Burgers equation
Model Let us consider the following mean-field SDE for a parameter υ > 0: dX t =F t (X t )dt + υdW t whereF t (x) = P(X t ≥ x).
Using the Fokker-Planck equation satisfied by the density of X t , we show thatF t (x) is solution to the viscous Burgers equation: We now suppose that the initial condition is X(0) = 0 so thatF 0 (x) = 1 {x≤0} . Then the Cole-Hopf transformation gives:F The function σ(t, y, x) = υ and b(t, y, x) = y are regular enough to satisfy the hypotheses of Theorem 2.2. However, this type of example is not interacting through moments but through a kernel and even a discontinuous one.
We approximateF t (x) by its associated empirical meanF N t (x) := 1 N N j=1 1 {X j,N t ≥x} calculated upon N particles which leads to the following dynamics for the particle system: 1≤i≤N are independent Brownian motions.

Results
We use this model with υ = 1 4 and x 0 = 0 as a test case. We estimate the solutionF 1 ( 1 2 ) of the viscous Burgers equation. Since for t = 1 and x = 1 2 , t − x = x and 2x − t = 0, one hasF 1 ( 1 2 ) = 1 2 . The results are shown in Table  10  We observe that the ratio of decrease of the solution error defined by Solution error(N/2) Solution error(N ) is consistent with an error proportional to N −1 confirming a bias behaviour of order 1 in 1 N .
As for the antithetic sampling, we compute the corresponding variance (  Evolution of the antithetic variance and its associated precision when increasing the number of particles for a number of 5.10 6 runs, 500 time steps and a time horizon T=1. From Table 11, we observe that the ratio of decrease of the variance Variance of solution(N/2) Variance of solution(N ) is around 2.64 which is slightly greater than two. Therefore, there is a slight gain in using the antithetic sampling for this type of diffusion.