New particle representations for ergodic McKean-Vlasov SDEs

The aim of this paper is to introduce several new particle representations for \textit{ergodic} McKean-Vlasov SDEs. We construct new algorithms by leveraging recent progress in weak convergence analysis of interacting particle system. We present detailed analysis of errors and associated costs of various estimators, highlighting key differences between long-time simulations of linear (classical SDEs) versus non-linear (Mckean-Vlasov SDEs) process.


Introduction
Diffusion processes are at the core of many algorithms used in statistics to sample from, typically high-dimensional, distributions. These algorithms are often based on some variant of Langevin stochastic dynamics [LRS10]. Given a probability measure π (possibly known up to a normalising constant), the key idea is to construct a diffusion process which admits π as its invariant measure. Then one can run long-time simulations of that diffusion to obtain samples from π. This ideas has been extensively studied in the context of classical SDEs [Tal90, Tal02, LP + 02, LP + 10, MST10, PP + 12, Dal17, DM + 17].
Recently new promising classes of algorithms based on the theory of gradient flows takes the form of McKean-Vlasov ODEs or SDEs [ŞLMD18,Ber18,Liu17]. To turn them into practical algorithms one needs to approximate them with systems of interacting diffusions also called stochastic interacting particle systems. The key challenge is that, typically, with the increase of the dimension of the problem one needs to consider large number N of particles. Because, for most models, the cost of particle samples growths as N 2 (as each particle interacts with others), the computational cost for simulating the particle systems is prohibitive. Another complication is that when using a single ensemble of particles the statistical error due to the approximation of the measure creates biased dynamics. Put differently bias is a non-linear function of the statistical error. In addition particles are not independent. All of that renders classical variance reduction techniques not directly applicable and consequently simulations of particle systems challenging. The high computational cost is even more pronounced when the aim is to simulate particle systems over a long-time horizon. This should not come a surprise as particle systems give rise to probabilistic numerical methods for highly non-linear PDEs e.g Burgers or Navier-Stokes PDEs.
In this work we leverage recent progress in weak convergence analysis of interacting diffusions [JFC,CCD,Kol10,CD18]. With this new insight we propose several new algorithms and analyse their errors and costs. The emphasis of the work is on algorithmic side and we gloss over some theoretical bounds that will require further research in future. As such we see this work as beacon that helps to identify the most promising research directions in the area of simulations of the ergodic particle systems.
Let (Ω, F, (F t ) t≥0 , P) be a filtered probability space endowed with an R k -valued Wiener process w = (w t ) t≥0 . Let b : R d × P(R d ) → R d and σ : R d × P(R d ) → R d×k . We consider, for t ≥ 0, the McKean-Vlasov SDEs (McKV-SDE) (1.1) where x 0 is distributed according to a given R d -measure µ 0 . The nonlinearity in the McKean-Vlasov SDEs (1.1) appears through the dependency of its coefficients on the law of the process. Existence of the unique solution to (1.1) has been established under various conditions on (b, σ). See [Szn91,Mél96] classical results on that topic that mainly cover the case of finite time interval. For the infinite time horizon we refer to [Ver06,HŠ18].
Furthermore, [BGT08] gives conditions for the existence and uniqueness of the invariant measure π for the equation (1.1). We refer to [EGZ16] for more complete theory. In particular [EGZ16] gives fairly general conditions that guarantee that the convergence to the invariant measure in the L 2 -Wasserstein distance is exponentially fast for some λ > 0, i.e W 2 (µ t , π) ≤ exp (−λt)W 2 (µ 0 , π). (1.2) One can also control the bias of ergodic averages Consider the following system of N particles (x 1,N , x 2,N , ..., x N,N ) defined as The measure valued random variable µ N t is an empirical measure of the system at time t. For the purpose of computer simulations one needs to introduce time discretisation to simulate (1.4). We will do that in the forthcoming section. Under classical Lipschitz continuity conditions the law, seen as element of P([0, T ], R d ), of every fixed subsystem of k particles from (x i,N ) converges, when N tends to infinity, to the law µ ⊗k . This property is called propagation of chaos phenomena. Under strong convexity of the drift, a time-uniform version of propagation of chaos has been established in [BGT08]. In a recent work, [DEGZ18] it has been shown that in general only convexity at infinity is needed.
Let f : R d → R. The objective of this work is to derive, analyse and numerically investigate, several novel particle representations that will allow to approximate: (1.5) To motivate our work, let's temporarily assume that (b, σ) do not depend on measure, i.e we are dealing with a classical SDEs. Then a typical strategy in obtaining an approximation to (1.5) would be to set a finite time t and take N i.i.d. trajectories to compute 1 f (x i s )ds (N = 1 corresponds to ergodic estimator). The error of this estimator can be decomposed as follows The first term in the right hand side is the (weak) error of approximating the invariant measure which decays to zero as exp (−λt) due to (1.2). The second term is CLT type result and can be shown to decay to zero as (N · t) −1/2 , see e.g [KV86,CCG12]). We see that both t and N have the same impact on the variance. In the case of SDEs the cost of simulations is linear in t and N and hence one may be indifferent weather to simulate one long trajectory (ergodic estimator) and many shorter ones (space average). Of course if one uses parallel computer architecture, taking more samples is much more efficient. The situation of McKean-Vlasov SDE (1.1) is dramatically different. The cost of simulating interacting particles (1.4) is N 2 while it still increases linearly with time. As we will show it is possible to construct estimator that has one-order of magnitude lower cost while maintains the same accuracy. Furthermore, we will investigate ensemble version of interacting diffusions where we generate M independent systems of particles with N particles in each system (ensemble). More precisely we define (1.6) where (w i,j , i, j) are independent Brownian motions. That way particles within each ensemble j * driven by (w i,j * ) i,j * are interacting and are not independent. The particle systems j and j ′ , j = j ′ , driven by (w i,j ) i,j and (w i,j ′ ) i,j respectively, are independent. This idea for finite time simulations has been proposed in [HAT16]. Another approach that we investigate are self-interacting diffusion We expect that the law of z t approximates the law of x t for large t due to ergodic property (1.2,1.3). This gives an alternative to the particle system. We will show that the structure of the equation seems to play a crucial role in this set up. The rest of the paper is organised as follows. In Section 2, we recall some classical methods to approximate (1.5) and give error estimations and computational costs of the associated algorithms. In Section 3 we study several variants of algorithms for ergodic interacting particle systems In Section 4, we present the ensemble algorithm with ergodic average particle system. We end this paper with a general conclusion and some perspectives in Section 5.

Algorithms
As in this work we are interested in designing implementable algorithms for (1.1), we need to introduce time discretisation for (1.4). Let us define t n k := k n , k = 0, 1, . . . and κ n (t) = t n k for t ∈ [t n k , t n k+1 ). We introduce (continuous time) Euler approximations for each . (2.1) One may approximate (1.5) by ergodic average estimator with fixed t, For error analysis one needs to choose N (number of particles) and n (number of timesteps) to control the bias of the approximation of (1.1). While (EA) estimator is a reasonable choice for computing approximation of (1.5) for the invariant measures induced by classical SDEs as we already argued, the case of McKean-Vlasov SDEs approximated with a particle system, (EA) estimator does not seem to the best choice. This is because when using particle system (1.4), one computes N particles and therefore calculating ergodic average along one trajectory is not efficient. See sections 3.1, 3.2 for more details. Hence, improvement can be obtained by computing, averaged ergodic average estimator Of course one expects that t ′ in (AEA) to be smaller than t in (EA) for fixed accuracy. Alternative strategy for approximating (1.5) is to resort to the standard Monte Carlo estimator where the average is taken only "over the space". More precisely we compute Monte Carlo average Of course, the above estimator is less efficient than (AEA), as we explain in the coming section. The only reason we study it here is to warn practitioners that if not careful with setting up particle estimators the cost might be huge.

Computational Cost.
By A(η) we denote an algorithm that outputs the approximation for the quantity (1.5), where η denotes the set of all the parameters we need to choose to implement it. To be able to compare the algorithms we need to fix a measure of error. For simplicity, we restore to the mean-square-error: With the measure of error of a given estimator set up, the second equally important quantity is the computation cost of algorithm A, denoted by cost(A). With both quantities in place we can wonder about the optimal choice of parameters achieving a prescribed tolerance. More precisely, for fixed error tolerance ǫ > 0, we need to solve the following optimisation problem:

Assumptions
In this section we list all the assumptions needed for our considerations. The only assumption that has not been yet established in the literature is uniform in time particle error (HW) estimation below. To the best of authors knowledge only finite time weak particle error has been studied. It is clear how to extend the weak convergence to be uniform in time but this would require lengthy introduction of heavy machinery of PDEs on measure spaces. This falls outside this paper. All other assumptions are established in literature under various level of generality and we point out reader to the corresponding papers. We label by x i a McKean-Vlasov SDE driven by ith Brownian motion, that is Assumption 2.1. Convergence rate to ergodic measure: there exists λ > 0 such that As we already mentioned this has been proved in [BGT08] and [EGZ16] under fairly general conditions.

Assumption 2.2. Convergence rate of ergodic average
This is classical CLT result. See [CCG12].
Assumption 2.3. Uniform in time weak convergence of the particle system: for sufficiently smooth f , This type of bound is new in the literature. We refer a reader to [JFC, CCD, Kol10, CD18] for more details.

Assumption 2.4. Uniform in time strong propagation of chaos
See [DEGZ18] for details.
Assumption 2.5. Uniform in time weak discretisation error: for sufficiently smooth f , One can refer to [BGT08], where such results are proved.

Monte Carlo Average
For the Monte Carlo Average estimator we introduce the following notation A MCA (t, n, N ). The aim is to find the optimal allocation of the parameters (t, n, N ) for fixed mean-square-error. We have The four error terms are in order: bias (due to finite time simulation); weak particle approximation error; weak time discretisation error; variance/propagation of chaos. The first three error terms can be estimated directly from the Assumptions in Section 2.2. The last variance error term requires extra comment, as the fact that particles are not i.i.d does not allow to use classical central limit theorem (CLT). Indeed Next, define (ỹ i ) i as the solution of the continuous Euler scheme: It is an easy exercise to show that strong propagation of chaos (HS) can be established on the level of Euler discretisation. This together with Cauchy-Schwarz inequality gives (3.1) This, and the fact that (ỹ i t ) i are i.i.d. allows to conclude that From here and Assumptions in Section 2.2 we have Notice that because of the term 1/ √ N , it is not clear how we can take advantage of the assumption (HW). Fix ǫ > 0 and set mse(A MCA (t, n, N )) ǫ. This leads to the following choice of the parameters t ≈ λ −1 log(ǫ −1 ), N ≈ ǫ −2 , n ≈ ǫ −1 . As the cost of simulating particle system at every step of the Euler scheme is N 2 we have cost(A MCA (t, n, N )) = tnN 2 ≈ log(ǫ −1 )ǫ −5 .
This should be compared to tnN = log(ǫ −1 )ǫ −3 for the simulation of standard SDEs.

Averaged Ergodic Average
As before, we denote by A AEA (t, n, N ) the averaged ergodic average estimator in (AEA). We have To estimate the variance term we note that Hence to estimate the variance we use (HEA) combined with the computation (3.1). Therefore, by the Assumptions in Section 2.2, we have We notice that comparing to the (MCA) case, the last term is multiplied by 1/ √ t. The (asymptotic) cost of the algorithm is the same as before. Again we fix ǫ. The following choice of the parameters ensures that mse(A AEA (t, n, N )) ǫ 2 is t ≈ ǫ −1 , N ≈ ǫ −1 , n ≈ ǫ −1 . The cost consists of two parts: the cost of simulating particle system and the cost of computing averaged ergodic estimator. We have cost(A AEA (t, n, N )) = tnN 2 + tN ≈ ǫ −4 .
Which is an order of magnitude lower than for Monte Carlo average! Notice that similar computation for ergodic average estimator gives mse(A EA (t, n, N )) e −λt + 1 N + 1 n + 1 √ t , leading to the same cost than Monte Carlo average.

ensemble AEA
For the ensemble version of the algorithm we generate M independent systems of particles with N particles in each system. More precisely we define, where (w i,j , i, j) are independent Brownian motions. That way particle within each cloud j * driven by (w i,j * ) i,j * are interacting and are not independent. The particle systems j and j ′ , j = j ′ , driven by (w i,j ) i,j and (w i,j ′ ) i,j respectively, are independent. This idea has been proposed in [HAT16] for the finite time simulations.
We consider ensemble version of AEA.
In fact, all algorithms that we study can have their ensemble versions. By denoting A C-AEA the new method
We notice that comparing to the previous case the last term is multiplied by 1/ √ M . Crucially, the cost of the algorithm growths linearly in M . As the cost growths as N 2 , we are better off taking M ≈ N to balance the error in the last term (instead of taking M = 1 and N 2 ). To make it precise we fix ǫ. The following choice of the parameters ensures that mse(A C-AEA (t, n, N )) ǫ.
The cost of simulating particles and computing the estimator is This is the same as for averaged ergodic estimator. However the above computations do not take under consideration the fact that ensemble algorithms can take full advantage from the parallel computer architecture and therefore will be superior in practice.

Algorithms for Sef-interacting Particle systems
In this section, we present the key ideas improvement in the definition of more efficient algorithm. From the decomposition of the mean square error, we see that different algorithms that we considered only affected the "variance" of the final estimator. Therefore, to improve the efficiency of the algorithm we need to either modified particle system itself or consider different simulation strategies such as Multilevel-Monte Carlo. Here we focus on the former.
Significance of the ergodic theorem is that one can approximate the integral (1.5) by simulating only one path of the process (1.1) rather then the whole particle system. From now on, we will keep the structural assumptions on the coefficients of (1.1), namely One may consider (4.1) We expect that the law of z t approximates the law of x t for large t due to ergodic property (1.2) Processes of the form (4.1) are known in literature as self interacting diffusions. We refer to [KK12] where the convergence to the invariant measure has been established. Notice that there is no need for the particle system any more as one could simply simulate one path of the process to calculate ergodic integral (1.5).
However, motivated by computations in the previous section where mixed ergodic/Monte Carlo average we introduce the corresponding mean self-integrated SDE and its independent copies (z i ) driven by Brownian motion (w i ). Note that "one-particle" approximation of (4.5) is precisely a self-interacting diffusion.
To gain better insight, into the idea of using self-interacting diffusions to approximate McKean-Vlasov SDEs we consider a simple example first.
Example 4.1. Consider a simple scalar McKean-Vlasov SDE x, together with its mean self-integrated version y, and its self-interaction motion SDE z, We stress out that dissipativity comes from the part of the drift that do not depend on measure. We assume α > β. To estimate the convergence rate to the invariant measure we analyse the evolution of the difference of two solutions to the above x SDE initiated at L 2 random variables ξ 1 and ξ 2 . With we have Due to properties of W 2 distance and the fact that the above calculation does not depend on a particular choice of random variables ξ 1 and ξ 2 , we have W 2 2 (L (x ξ 1 t ), L (x ξ 2 t )) e −2(α−β)t W 2 2 (L (ξ 1 ), L (ξ 2 )) . Furthermore, in this simple example we can take advantage from the explicit solutions to calculate Hence if log t < (α − β)t then we have that Let us consider now process z. By integration by part we have that Then we observe that if E[z 0 ] > 0, then E[z t ] stays non-negative and do not cross 0. We observe also In particular Taking limit s → t, we obtain after integration that (4.4) The following computation on z is a tentative to evidence the rate's gain in the convergence rate to equilibrium. To this aim, we analyse the evolution of the difference of two solutions z initiated at L 2 random variables ξ 1 and ξ 2 .
Repeating the previous computation for E[ξ 1 − ξ 2 ] ≥ 0, we also obtain , and we use this weak estimation to derive the L 2 -norm bound. We have first that (e (α+β)s − e 2βs )ds .
As α > β, it is not difficult to check that for t big enough, And we obtain the contraction inequality, which means that after a time t ≥ t 0 , the convergence in L 2 is exponentially fast with a rate α ∧ (α + log(t) − β)/2, that accelerates and becomes with time better than the rate for process x (in α − β).
We have also to show that x and z have the same equilibrium measure. A sufficient condition is the L 2 -convergence for z toward x in time. To not spend to much time on this example, we make the following assumption from the previous Wasserstein contraction for z and from what we obtain for x, that We consider now The convergence is then ensured. Hence, mean self-interacting version z of x is converging to x in L 2 , which means that z convergence to the equilibrium measure π and hence can used as an alternative model for sampling.

Self-averaged ergodic averaged algorithm
By considering error decomposition studies in the previous section, we see that the key ingredient that we ought to understand is the following rate of convergence Key ideas: here we want to take part of a special structure, i.e such that potential V is say convex and W has a small Lipschitz constant we should obtain exponential "forgetting property", as we observed in Example 4.1. For small time this will be bad approximation and because we average the error the bad approximation from the initial time will prevail. Further, we consider particle system of the form for which the error to be investigate is to have in place of (HW). According to our computation on example 4.1, we impose the following assumption The following bound gives a leading error term. We chose λ as a exponent for simplicity as it particular value does not affect asymptotic cost/error analysis. (4.6) Let's introduce the corresponding estimator, an ensemble self-integrated version of AEA.
In what follows, we test (HEW) in the cost analysis. Notice that we cannot test it directly, as we need to do Monte-Carlo approximation for the expectation. We first observe that by exchangeability of the law of the particle systems, Next we consider M independent ensembles to approximate expectations that is where the first and last error are standard MC estimates while the middle one is given by (HEW).

Cost analysis
We analyse the cost of the self-averaged ergodic averaged estimator The mean-square error decomposition reads Reasoning as before, with (HW) replaced by (HEW), we have mse(A ES-AEA (t, n, N t )) e −λt + e −λt (N t ) −1 + 1/n + 1/ tN t .
Note that because of the variance (last term) there is no benefit of exponential decay in t assumed in HEW. Again we fix ǫ. The following choice of the parameters ensures that mse(A ES-AEA (t, n, N t )) ǫ: Notice that the choice implies that tN t to be "constant". Hence we chose N t = N t −1 . Which in the case of t ≈ λ −1 log(ǫ −1 ) implies that N ≈ ǫ −2 / log(ǫ −1 ). Now we study the computational cost of simulating self-interacting diffusions (for the nonlinear interacting kernel). Note that, due nonlinear interactions at every step of the Euler scheme, we have N t particles and each particle interacts with itself from all the past times-steps. Recall also that we take n time steps in each unit time interval so that overall number of steps on the interval [0, t] is tn. With that in mind and the fact that we take N t = N t −1 we have cost(A ES-AEA (t, n, N )) = N 2 1/n 1 + N 2 2/n 2 + N 2 3/n 3 + . . . + N 2 tn/n tn = (nN ) 2 (1 + 1/2 + 1/3 + . . . + 1/tn) = (nN ) 2 tn k=1 1/k ≈ (nN ) 2 log(1 + tn).
Let us consider ensemble implementation of the above algorithm. Reasoning as before, by the Assumption HEW, we have mse(A ES-AEA (t, n, N t )) e −λt + e −λt (N t ) −1 + 1/n + 1/ tN t M .
Hence one more time we achieved order better computational cost in comparison to naive estimator. Note that the presented analysis implies that ensemble of M independent self-interacting diffusions yields best result.

Conclusion and perspectives
We presented a number of different algorithms for the approximation of the invariant measure of the McKean-Vlasov SDE. We have achieved one order better computational costs comparing to naive particle based estimator. On algorithmic side possible extensions are that may consider fixed length window for self-interacting diffusion and it also possible to study Multilevel Monte Carlo strategies in this setup. Overall we anticipate that it will be possible to bring the cost of simulating particle system to the same level as for standard independent copies of SDEs.