A sparse grid approach to balance sheet risk measurement

In this work, we present a numerical method based on a sparse grid approximation to compute the loss distribution of the balance sheet of a financial or an insurance company. We first describe, in a stylised way, the assets and liabilities dynamics that are used for the numerical estimation of the balance sheet distribution. For the pricing and hedging model, we chose a classical Black&Scholes model with a stochastic interest rate following a Hull&White model. The risk management model describing the evolution of the parameters of the pricing and hedging model is a Gaussian model. The new numerical method is compared with the traditional nested simulation approach. We review the convergence of both methods to estimate the risk indicators under consideration. Finally, we provide numerical results showing that the sparse grid approach is extremely competitive for models with moderate dimension.


Introduction
The goal of this paper is to present a robust and efficient method to numerically assess risks on the balance sheet distribution of, say, an insurance company, at a given horizon. In practice, it is chosen to be one year, consistently with the Solvency 2 regulation, the prudential framework for assessing the required solvency capital for an European insurance company.
On a filtered probability space (Ω, A, P, (F t ) t≥0 ), the balance sheet of the company is a random process summarised, at any time t ≥ 0, by the value of the assets of the company (A t ) t≥0 and the value of the liabilities (L t ) t≥0 . The quantity of interest is the Profit and Loss (PnL in the sequel) associated to the balance sheet, which is given by By convention, and adopting the point of view of risk management, we measure the loss as a positive quantity.
On the Liability side, the insurance company has sold a structured financial product which depends on the evolution of a one-dimensional stock price (S t ) and the risk-free interest rate (r t ). Several insurance products could be of this type, in particular Unit-Linked (with or without financial guarantees) and Variable Annuity contracts. For those contracts, client's money is invested in equity and bond markets while the insurance company might also provide with financial guarantees similar to long-term put options. The long maturity of those contracts requires the introduction of a model for interest rate as they are very sensitive to Interest Rate curve movements. The value L 1 is just the price of this product taking into account the value of some risk factors X 1 (stock price, interest rate curve etc.) at time t = 1 used to calibrate the pricing model.
On the Asset side, the insurance company manages some assets to hedge the risk associated to the product sale. The pricing actually includes a margin which is secured through hedging. The hedging assets are the stock and swaps of several maturities, in practice mostly concentrated on the long term. In practice, bond futures are also included sometimes. The hedging portfolio is typically rebalanced on a weekly basis and the hedging quantities are determined by a financial model, taken to be the same as the liability pricing model, whose inputs are the risk factors X t at the time t when the hedge is computed.
We describe precisely in Section 2, the pricing and hedging model, the dynamics of the risk factor X and the value of the asset and liability side of the balance sheet. Let us stress that the risk factor model is given under the so-called real-world probability measure P, which might be objectively calibrated using time series of financial markets or represents the management view. This real-world model may be -and most of the time is-completely different from the pricing and hedging model which might be simplified for runtime/trackability purposes, prudent (pricing and hedging include a margin) or being constrained by regulation.
Our goal is then to compute various risk indicator for the loss distribution of the balance sheet at one year namely the distribution of P 1 under the real-world probability measure P, that we denote hereafter η.
Precisely, we measure the risk associated to η using a (law invariant) risk measure defined over the class of square integrable measures : P 2 (R) → R. First, we consider for the so called Value-at-Risk (V @R), which is defined by the left-side quantile: We will also work with the class of spectral risk measures: a spectral risk measure is defined as where h is a non-decreasing probability density on [0, 1]. In the numerics, we will focus on the Average Valueat-Risk (AV @R) which is given by and is a special case of a spectral risk measure. For a law invariant risk measure , we denote by its "lift" on L 2 (Ω, A, P; R) =: L 2 , namely [X] = ([X]) for any X ∈ L 2 , where [X] denotes the law of X. The lift h from a spectral risk measure h satisfies the following properties: (1) Monotonicity: (2) Cash invariance: h [X + c] = h [X] + c for X ∈ L 2 and c ∈ R; (3) Positive homogeneity: [tX] = t [X], t ≥ 0 and X ∈ L 2 . (4) Convexity: , whenever 0 ≤ t ≤ 1, for X, Y ∈ L 2 ; Let us stress the fact that V @R only satisfies 1-3. We refer to [7] and the references therein for more insights on risk measures and spectral risk measures.
In our setting, the loss distribution η of the balance sheet PnL is obtained through the following expression: where denotes the push-forward operator, p 1 : R θ → R is the function describing the PnL in terms of the risk factors, and ν stands for the distribution of the risk factors X . In practice the estimation of (η) requires to sample from η. In turn, this demands for a sample of the model parameter distribution ν and for a numerical approximation of p 1 . In this note, we compare two main approaches to form the sample of η given one of ν.
The first one is known as the nested simulation approach: It is a two-step method. First, a set of "outer simulation", describing the random values of the risk factors, is drawn. Then, for each value of the risk factors, a sample of "inner simulation" is drawn to compute the various hedge and prices. In this approach, all computations are realised "online". The main advantage of this approach is its simplicity to implement in practice, described in the first paragraph of Subsection 3.1. However, it is well known that this approach is quite greedy, even if optimised as in [6]. We also want to stress the fact that when computing the η-sample, no information about p 1 is stored for future work: for example if ν is modified, due to time or a model change, a full recalculation would be required.
The other approach we chose to adopt and would like to promote is a grid approach where the approximation of p 1 is made "offline", by a Monte Carlo approach, and then stored. The numerical computation is then done through a (multi-linear) interpolation on a grid. The main drawback of this approach is that the size of the grid, in high dimension, can become untractable, especially if one uses regular grid. To partially circumconvent this difficulty, we introduce a sparse grid [2] which reduces drastically the number of point to be used (equivalently, values to be stored) with only small reduction of the accuracy of the method.
We prove that, for a spectral risk measure, the two approaches give an estimation of (η) which converges to the true value, see Theorem 3.8.
Furthermore, we show in the numerical Section 4 that using the grid approach together with a sparse grid of low level allows to get a good approximation of the loss distributions η, and of some related risk measures, while reducing drastically the computational time and allowing to keep information about the balance sheet function p 1 . Last, this permits to numerically quantify uncertainty. Indeed, since the computations on the grid are stored, the computation of the distribution of the PnL under other distributions for the parameters is almost instantaneous and can be compared with the results obtained with the initial one. An application to uncertainty estimation is given in the last numerical application.
The rest of the paper is organised as follows. In the Section 2, we first describe the mathematical models that are used to describe the evolution of the prices under the risk-neutral measure Q. We then describe precisely how A and L are specified. In Section 3, we describe the two numerical methods used to compute L t and A t at any given time t ≥ 0. In particular, we show how to efficiently compute, at time t, the quantities to hold in the hedging portfolio, which are expressed in term of the derivatives of the claim's price. We also explain how to compute the price of the product and of the assets used to construct the hedging porfolio, leading to the computation of L t and A t . We show how to obtain an approximation of the distribution of P 1 under the physical measure P, and we prove an upper bound for the mean square error of the overall procedure. Finally, in Section 4, we present our numerical results, comparing the two methods.

Financial Model
In this section, we give the precise specification of the asset and liability sides of the balance sheet. We also present the risk-neutral model and the real-world model that are used.

Description of the sold product
Let us assume that a company sells a contingent claim at time t = 0 which is a (discretely) path-dependent option with a payoff function G paid at the maturity T > 0, depending upon the evolution of a one-dimensional risky asset's price S. We focus here on: A put lookback option, that is a discretely path-dependent option whose strike at maturity T is given by the maximum of the asset's price S over the times t ∈ {τ 0 = 0, τ 1 , · · · , τ κ = T } where κ ≥ 1: Remark 2.1. The proxy provided above is close to financial guarantees offered in Variable Annuity contracts. Those contracts are structured insurance products composed of a fund investment on top of which both insurance and financial protection are added. In our case, the contract is a Guaranteed Minimum Accumulation Benefit including a ratchet mechanism. At time t = 0, the customer invests his/her money in the underlying fund and will receive at a given maturity the maximum between the terminal fund value and its terminal benefit base in case she is still alive. The terminal benefit base is equal to the maximum of the underlying fund values observed at each anniversary date of the contract (ratchet mechanism). We do not consider the modeling of death/survival in this proceedings, neither the possibility that client can surrender at any time during the life of the contract.

Market model under the risk-neutral measure
We assume that all pricing and hedging is done with a market risk-neutral measure Q.
The derivative with a payoff function G as above depends upon a one-dimensional stock's price S = (S t ) t∈[0,T ] . We assume here that the dynamics of the asset under Q are of the Black & Scholes type as described in Section 2.2.2 with a stochastic interest rate r = (r t ) t∈[0,T ] which follows a Hull & White model.
As the payoff G is a proxy of Variable Annuity guarantee which is a long term Savings product (in practice maturity ranges from 10 to 30 years depending on product type), the modeling of interest rate is essential as the product and therefore the overall balance sheet of the company is very sensitive to this risk.

The short rate model
Let Θ ∈ R d (d := 3 in the sequel) be a set of parameters representing some market observations. The short rate evolution is governed by the Hull & White dynamics where B is a Q-Brownian motion, a and b are real constants and µ t,Θ : [t, T ] → R is a function. We refer to [1] for a more complete analysis of the Hull & White short rate model. The parameter µ t,Θ is calibrated using the market observations Θ, so that the model reproduces the interest rate curve observed on the market. It is given by We refer to the Appendix for a derivation of (6). As a consequence, the Θ parameter must be chosen in order to represent adequately the forward rate curve observed on the market.
We suppose here that the forward rate curve f Θ (t, ·) is directly observed and is a linear combination of three elementary functions h t,1 , h t,2 , h t,3 from [t, T ] to R, given by where, for u ∈ [0, T ]: where 0 ≤ t 1 < t 2 < t 3 < t 4 ≤ T are four fixed real numbers.  The function h t,1 (resp. h t,2 , h t,3 ) model the short (resp. middle, long) term structure of the interest rates curve.
Remark 2.2. In practice, the parameters a, b appearing in (5) can be calibrated so that the model reproduces the prices, observed on the market, of some contracts such as swaps or swaptions. We could more generally allow the parameters a, b of the Hull & White model to depend upon the market observations Θ. The parameter Θ should live in a higher-dimensional space to take into account the observed swap(tion)s prices. Regular recalibration of parameters is largely performed by practitioners, in particular when they perform dynamic hedging.
There are several reasons explaining the choice of this model. First of all, it is quite simple to calibrate using the data. In fact, the function µ is directly given as a function of the forward rate curve. We should note again that the choice of keeping a, b fixed through time simplifies the calibration. Secondly, we will see later in Proposition 3.2 that this short rate model, associated to the stock model described below, leads to an exact simulation under the risk-neutral measure. Lastly, closed and easily tractable formulas can be obtained for the prices of the zero-coupon bonds and swaps which are the products used to construct the hedging portfolio and then to compute the value of the company's assets A.
These prices are as follows.
(1) The price at time t of a zero-coupon maturing at time u ∈ [t, T ] is given by: and its derivatives with respect to Θ := (θ 1 , θ 2 , θ 3 ) are given by: (2) Let (0, Θ 0 ) be the observation made at time 0. Consider a swap contract issued in s = 0, with maturity M > 0, rate R > 0, with coupons versed at every time i ∈ {1, . . . , M }. Then, the price of this contract at time t is given by: and its derivatives with respect to Θ are given by:

The stock model
Given the observations Θ of the interest rate factors and the risky asset's price x ∈ (0, ∞), the evolution of the price under the neutral-risk measure Q is given by where σ > 0,W is another Q-Brownian motion, whose quadratic covariation with B is given by where ρ ∈ [−1, 1]. Equivalently, S t,x,Θ can be written as: where W is a Q-Brownian motion, independent of B.
Remark 2.4. In practice, the parameter σ appearing in (12) can be calibrated so that the model reproduces the prices of some derivatives over the risky asset. This can be taken into account by increasing the dimension of the space where Θ lives, and by adding this calibration procedure.
Remark 2.5. Naturally, the general sparse grid approach can be applied to different models and functional representations. We made the choice of using a Black & Scholes model for the stock value and a Hull & White model with the given functional representation in terms of h 1 , h 2 , h 3 for the short rate, since they are convenient to obtain explicit pricing and sensitivities formulae as we show in the following.

Modeling the Balance Sheet
The key point for us is to approximate the distribution of the balance sheet of an insurance company at time t = 1 (here a year) given the market observations at t = 0. As mentionned in the introduction, the PnL is a process P which can be decomposed as where L is the value of the liabilities of the company and A is the value of the assets.
We assume that at time t = 0, the balance sheet is clear, meaning that the company has no asset nor liability, that is L 0 = A 0 = 0. We describe precisely in the two subsequent sections how these quantities are defined. Importantly, we denote byX t := (S t ,Θ t ), 0 ≤ t ≤ 1, the stochastic process representing the random evolution of the market parameter under the real-world measure P. Namely,S is the stock price andΘ the interest rate curve parameters as described above in Section 2.2.1. It is important to have in mind that the model chosen for the stock price S (under Q) andS (under P) will be completely different as they do not serve the same purpose (pricing-hedging on one hand, risk management of the Balance Sheet or regulatory assessment of required capital on the other hand).

Liability side
For any market observationX t := (S t ,Θ t ), the value L t =: (t,S t ,Θ t ) of the liabilities has to be computed, especially at time t = 1 in our application. The company's liabilities are reduced to one derivative product sold at t = 0. In our setting, the contingent claim's price is simply given by: where (r t,Θ , S t,x,Θ ) t≤s≤T are the risk neutral dynamics of the short rate and stock price, see Section 2.2.1 and Section 2.2.2, calibrated from the observed market parameter (x, Θ) at time t. We recall that the payoff G depends on S t,x,Θ only through the values S t,x,Θ τ , τ ∈ Γ G := {τ 0 , . . . , τ κ } (κ ≥ 0), see (4).
As explained in more detail below, the computation of P 1 first requires to approximate (t, x, Θ) for (t, x, Θ) on a (possibly stochastic) discrete grid of [0, 1] × (0, ∞) × R 3 . This approximation L at any point (t, x, Θ) ∈ [0, 1] × (0, ∞) × R 3 basically follows from the simulation of the processes r t,Θ and S t,x,Θ under the risk-neutral measure Q and a Monte Carlo procedure. We will see in Section 3 that the simulation can be done in an exact manner in our model. Remark 2.6. A classical approach to compute would be to use a dynamic programming principle.
Step by step, it requires (1) To numerically obtain (1, x, Θ) for all (x, Θ) on the grid with (15), (2) Then to iteratively compute (t k+1 , x, Θ) for all x, Θ on the grid using However, the inner conditional expectation is not of the form (t k+1 , x, Θ) for x > 0 and Θ ∈ R 3 . In fact, the time of the market observation Θ is still t k , while the discount factor goes only from t k+1 to T , in contrast with (15). Therefore, using the dynamic programming equation (16) would require to introduce some additional and artificial parameters, namely the time of calibration and the value of r t k ,Θ at time t k+1 . This would made the overall procedure heavier that is why we compute using simply (15).

Asset side
The company wants to replicate the product with payoff G 1 . The classical theory of mathematical finance ensures that it is equivalent, in theory, to possess a portfolio which negates the variations of the price of the product with respect to the evolution of the underlying parameters. In our context, the insurer wants to be protected against the variations with respect to the stock price S t and the interest rate curve, which is modeled through the parameter Θ. The dynamic hedging portfolio is constructed and rebalanced in discrete time, on the time grid Γ := {t 0 = 0 < t 1 < · · · < t n = 1} (in practice, the porfolio will be rebalanced up to the maturity of the product, but in our setting, we are only interested in the portfolio's value up to t = 1). At each time t ∈ Γ, the insurer computes the derivatives of the price with respect to S t and Θ t , and then buys some financial assets (the stock and swaps) in order to construct a portfolio whose derivatives match those computed. To model this framework, we decompose the hedging portfolio's value A in two parts: The process A ∆ is the value of the portfolio obtained to cancel the variations of the price with respect to S, while A ρ is defined to deal with the variations with respect to Θ.
Remark 2.7. Obviously, since in practice the hedging is done in discrete time and some underlying parameter are not considered, the payoff G is not exactly replicated, nor super-replicated. Therefore the PnL of the company is not null, nor always positive, and the goal of this proceedings is precisely to propose a new numerical method to estimate the distribution of this quantity at time t = 1.
To construct the hedging portfolio, the insurer can buy the underlying stock, together with three swap contracts, defined by some rates R 1 , R 2 , R 3 > 0 and maturity dates T 1 , T 2 , T 3 ∈ [1, T ]. Interest rate hedging is performed so that the portfolio is insensitive to the variations of the main maturities of the interest rate curve. For long-term products, this means building an hedging portfolio containing several different maturities from 1 year to 30 year. Here, only 3 maturities representing short, medium and long-term part of the curve are considered for simplicity. The formula for their price SW t,Θ,Ti,Ri , i = 1, 2, 3 is given in Proposition 2.3 above. We now describe how to compute the quantities of assets and swaps to buy at a time t ∈ Γ, to rebalance the hedging portfolio. Denote by ∆ (resp. ρ i , i = 1, 2, 3) the quantities of stock (resp. swap with rate R i and maturity date T i , i = 1, 2, 3). Then the value of the portfolio of the company is given by: By It's formula, assuming a semimartingale decomposition for the process Θ under P, we get: To cancel the risks induced by the variations of the stock price and the interest rate curve, it is needed that the four first terms in the previous equation cancel.
Those considerations lead to the following construction for the hedging portfolio A = A ∆ + A ρ : ∆-hedging: The value of A ∆ at time 1 is Note that the function ∆ has to be computed, at each time t i , i = 0, . . . , n, and for any market situation (x, Θ) at this time. A method leading to a numerical estimation of ∆ is proposed in Section 3. ρ-hedging: The value of A ρ 1 is At time t ∈ [0, T ], for a market at (x, Θ), the quantities ρ i (t, x, Θ), i = 1, 2, 3 of each swap contract required for the hedging are given by the solution of the linear system One key quantity to compute for us in this setting is thus the vector of sensitivities ( ∂ ∂θi (t, x, Θ)), i = 1, 2, 3.
Remark 2.8. (i) We choose to always use the same swap contracts issued at t = 0 as hedging instruments. We could have decided to enter for free in swaps (at the swap rate) at each rebalancing time. However, this strategy requires to keep the memory of the swap rate in order to compute the swap price at the next rebalancing date.
(ii) The T i , R i should be chosen so that they represent some liquid contracts.

The global PnL function
From the previous two sections, we conclude that the PnL of the balance sheet at time 1 can be expressed as, where (S,Θ) are the market parameters (risk factors) and the PnL function p 1 : R γ → R, with γ = 4 × (n + 1), is given by In the next section, we describe the model we will consider for the market parameter (S,Θ).

Market parameters under the real-world measure
We describe here the model that will be used for the simulation of the market parameters in the numerical part. Let us insist that this real-world measure P might represent the view of the management on the evolution of the market parameter on the period [0, 1]. As already mentioned, it can be completely different from the model used for the risk-neutral pricing.
We assume that we know, or at least are able to estimate, the first two moments of the distribution of (X 1 := log(S 1 ), More precisely, we assume that under P, this random vector has mean and covariance matrix given by To model the random process 1] under P, we assume that its dynamics are given by where W i , i = 0, 1, 2, 3 are independent P-Brownian motions.
Proposition 2.9. There exists at most one set of coefficients b i , c ij , i, j = 0, 1, 2, 3, such that the random vector has mean µ and covariance matrix V .
Proof. We refer to the appendix for a proof, cf. Proposition 5.1.
Remark 2.10. It is well-known that it is difficult to estimate accurately the drift parameter in a Black-Scholes model. This makes our computation subject to model risk. We leave it to a future research work to find a robust way to approximate the law of P 1 under P. Nevertheless, let us point out that the grid approach allows us to compute, with minimal re-computation, risk measures for different approximations of the law of P 1 . This is one of the advantages of this method with respect to using "nested simulations", as illustrated in Section 4.

Numerical methods
In this section, we describe the two numerical methods that we use to compute the risk indicator on the balance sheet PnL. The first one is known as nested simulation approach and is already used in the industry, see the seminal paper [6]. The second one is a sparse grid approach and is designed to be more efficient that the nested simulation approach in the case of moderate dimensions. In the next section, we present numerical simulations that confirms this fact for the model with moderate dimension that we consider here.

Estimating the risk measure
Given a risk measure and the loss distribution η of the balance sheet at one year, we estimate the quantity of interest (η) by simulating a sample of N i.i.d random variables (Ψ j ) 1≤j≤N with law η and then computing simply (η N ) using the formulae (1), (2) and (3) with η N instead of η. Here, η N stands for the empirical measure associated to the Ψ j i.e.
where δ x is the Dirac mass at the point x.
Recall, that in our work, the loss distribution η is obtained through the following expression: η = p 1 ν p 1 being described in (22) and ν stands for the distribution of the market parameters. Namely, ν is the law of the random variableX under the real world probability measure P.
In order to estimate (η) for a chosen risk measure, we need to be able to sample from η which implies two steps in our setting. First, we need to be able to sampleX and then we use an approximation p υ 1 of p 1 , υ ∈ {N , S}, namely: • p N 1 if one chooses the nested simulation approach; • p S 1 if one chooses the sparse grid approach. Eventually, the estimator of (µ) is given by To summarise, the two numerical methods have the following steps. Nested simulation approach.
(2) Inner step: Simulate Ψ j = p N (X j ) using MC simulations. This requires to compute the option prices with Monte Carlo estimates, the interest rate derivative prices, and the various sensitivities of the price, see subsection 3.2.2. All these computations are done using closed-form formulae that are derived below. (3) Estimate the risk measure.
All computations are made "online". Sparse grid approach.
(1) Fix a (sparse) grid V and compute the approximation p S at each required value on the grid by an MC simulation. This involves exactly the same computations as 2. above. (2) Simulate the N model parameter samples (X j ) and evaluate Ψ j = p S (X j ).
Computations at step 1. are done "offline". The next two steps are done "online".
We now describe precisely how to compute p N and p S .

Nested Simulation approach
In the nested simulation approach, once the market parametersX have been simulated, the function p N 1 has itself to be computed. Recalling (22), this requires to evaluate the function N , ∆ N , ρ N (approximations of , ∆, ρ) at the value of the market parameters. This computation is made using again a Monte Carlo approach. In order to compute the risk-neutral expectations in the above formulae, one has to sample the short rate process and compute its integral over [t, T ], and also to simulate the stock price S at least at the times τ ∈ Γ G ∩[t, T ]. Under the model described in section 2.2, the simulation is exact and is described in the two following statements whose proofs are postponed to the appendix.  (5), (12). We first start with an easy and well known observation.
Lemma 3.1. We have the following decomposition for the short rate process: where and (ξ t s ) s∈[t,T ] is the solution of the SDE The following proposition provides a recursive way to produce sample paths of the triplet (ξ t , A t , X t,x,Θ ) on the discrete grid {t} ∪ (Γ G ∩ [t, 1]). We are thus in a position to simulate the evolution of (r t,Θ , S t,x,Θ ) and the discount factor β t,θ T := e − T t r t,Θ s ds under the risk-neutral measure Q.
is a Gaussian vector of dimension 3, with mean vector and covariance matrix respectively given by and

Approximation of the Asset side
The approximation of the asset side is slightly more involveld as it requires the computation of sensitivities with respect to the model parameters: ∂ ∂x and ∂ ∂θi , i = 1, 2, 3, see (19), (20) and (21). We choose to compute the sensitivities using a "weight" approach namely we express them as expectation of the discounted payoff times a well chosen random weight. Note that other techniques are available to compute these sensitivities e.g. Automatic differentiation. In our context, we have compared the two methods and observed that the weight approach is more efficient, see Section 5.4 in the Appendix.
We now describe how to obtain the sensitivities in a convenient form for Monte Carlo simulation. Recall that we consider a liability function of the form: where G depends upon S t,x,Θ through its values on a finite set Γ G = {τ 0 = 0, . . . , τ κ = T } ⊂ [0, T ]. In the following, we assume for simplicity that τ 1 > t. Otherwise, there are deterministic terms that are to be added, but the method remains the same. The Delta: We want to compute: and we have the following result.
Proof. We write the expectation as an integral, remembering that we know the law of the couple (ξ t u , A t u , X t,x,Θ u ) conditionally upon F s : where p Θ (s, a, x, u, ., .) is the density of the couple (A t,a u , X t,x,Θ u ), conditionally upon F s . We have previously seen that it is a Gaussian vector with explicit mean vector and covariance matrix. Thus, using Fubini's theorem, we get, since there is no dependence on x except in the first density: e −a +1 G(e x1 , . . . , e xκ ) ∂p Θ (t, 0, log(x), 1, a 1 , Consequently, the sensibility of the discounted price with respect to the initial stock price is computed only by calculating the derivative of the density with respect to x.
Reinjecting this equality into (43) and rewriting the result in term of expectations, we finally get the result.
The function ∆(t i , ·) is computed using the Monte Carlo estimator given in (40) as in (37) where we simulate in addition the weight H. Sensitivities with respect to the interest rates curve. We consider now the derivatives with respect to the interest rates curve. For i = 1, 2, 3, we want to compute: with where µ s,u and Σ s,u are the mean and the covariance matrix of the Gaussian vector (ξ t u , A t u , X t,x,Θ ) conditionally upon F s .
Proof. Performing a similar analysis as the one above, Here, the computations are more involved since every density depends upon the a i , i = 1, 2, 3, but the idea is the same as before.
The only difference is that, when we use (35)-(36) to differentiate, we see that the short term itself appears in the formulae. This is not a problem as we can rewrite the previous integral as an integral over R 3κ , with the short rate process taken at times τ l , l = 1, . . . , κ, as new variables to integrate against.
This method to calculate derivatives allows us to compute the function (t, x, Θ) and its four derivatives with only one Monte Carlo simulation. Furthermore, given a risk-neutral scenario, each quantity involved in formulae (41)

Sparse Grid Approach
The nested simulation approach requires the approximation of the function and its derivative ∂ ∂x , ∂ ∂θi , i = 1, 2, 3 for each path (S j t ,Θ j t ) t∈Γ of the market parameters. These values are computed on the fly which is quite time consuming.
We suggest here an alternative method which will pre-compute the quantities of interest ( and its derivatives) and store them. The requested value for a given market parameter will then be obtained by an interpolation procedure.
A first simple approach is to consider an equidistant grid of the domain A := d l=1 [m p , M p ] which is a truncation of the support of X (R d , d = 4 in our setting). Then one can use a multi-linear interpolation to reconstruct the function in the whole space. If one sets 2 p points in one dimension, the total number of points will be 2 dp for one function at one given time and overall (n + 1)2 dp to store (here d = 4). This will become rapidly too big, especially if one allows the number of market parameters to grow. This is a typical example of the "curse of dimensionality" encountered in numerical analysis when dealing with problem of high dimension.
Instead of considering a regular grid, we shall rely on the use of sparse grid, which allows to lower the number of points required to store the numerical approximation of the function. We now present rapidly the main concepts linked to sparse grids, see [2] for a comprehensive survey. In our numerical examples, see Section 4, the sparse grid will be computed using the StOpt C++ library [5].
For each multi-index k ≤ l, we define a grid mesh h k = 2 −k and grid pointš Using the hat function, and we can associate to the previous grid a set of nodal basis function: When using a "full" linear interpolation, the function is approximated using the whole set of nodal basis function at the finest level l. Instead, we consider the sparse grid nodal space of order p defined by (l i > 0 and j i is odd) or (l i = 0), for i = 1, . . . , d . (52) For a function ψ : A → R with support in A, we define its associated V κ -interpolator by where the operator θ l,j can be defined recursively in terms of r, the dimension of l, by: ψ(y l,j ); r = 0 θ l−,j− (ψ(·,y r lr,jr ); A−); l r = 0 θ l−,j− (ψ(·,y r lr,jr ); A−) − 1 2 θ l−,j− (ψ(·,y r lr,jr−1 ); A−) − 1 2 θ l−,j− (ψ(·,y r lr,jr+1 ); A−); l r > 0 Let us now introduce the approximation that we use to compute the loss distribution, namely and These functions are built by computing the coefficients appearing in (53), which are then stored in memory. For S say, this amounts to compute the function N on the sparse grid V p and this is done by Monte Carlo simulation as in the previous approach, recall the definition of N in (37).
Complexity. The main limitation of this method is the memory usage and the time to pre-compute the functions. This is proportional to the number of point in the grid. This number can be estimated to be of ), see Proposition 4.1 in [2]. Let us insist on the fact that this is done "offline" compared to the nested simulation approach. For the "online" computations, the main effort is put in the evaluation of the function which is slightly more evolved than a linear interpolation and is of O(κ) where κ is the maximum level that is chosen.

Convergence study
The goal is to obtain a reasonable approximation of the risk associated to the loss distribution of the balance sheet in an efficient way. In this section, we explain why the methods introduced above are indeed good approximations of the risk indicators. We also study theoretically the numerical complexity of both methods in terms of memory and time consumption.

Error analysis
For the risk estimation, we will investigate a root Mean Square Error (rMSE) of the following form The expectation operator E[·] acts under P ⊗N ⊗ Q ⊗M , namely it averages both on the simulation of the market parameters under the real-world measure used for calibration and the risk neutral evolution of the market model under the pricing measure.
The first observation is that under reasonable assumptions on the risk measure used in the risk indicator, the error performed in the numerical simulation can be separated in two main contribution: the error due to the sampling of the loss distribution coming from the sampling of the market parameters and the error made when approximating the different pricing and hedging functions. Lemma 3.6. Assume that has a Monotonic and Cash Invariant lift , then Proof. We denote by X N (ω) the random variable with distribution η N (ω). Note that which leads to By symmetry, we easily get and then the proof is concluded using Minkowski inequality.
The error due to the approximation of the function p 1 is well understood when the function is smooth enough. Note that the asset side of the function is quite involved and we will not attempt to obtain the condition for smoothness of the overall function p 1 . We will now simply review the error done on the liability part (1, ·) assuming that the mapping G is bounded and Even though, this cannot be almost surely true in the model presented above, we will assume that β 1,Θ T is bounded in the discussion below. A more precise analysis should take care of these extreme events arising with small probability. Another possibility would be to force the interest rate to be non-negative, by truncation or by considering a CIR type of model for (5).
Lemma 3.7. Assume that (58) holds true. Recall the definition of N in (37), then Proof. We denote by c the bound on the mapping (x, Θ) → β 1,Θ T G(S 1,x,Θ ) (recall the discussion after equation (58)) and thus the bound on 1 . For the reader's convenience, we introduce and observe that E Σ j M = 0 and recall that the (Σ j M ) are i.i.d. We have, using Hoeffding Inequality, Using the independence property, we obtain Now set ξ := cM log(N ) and compute and concludes the proof.
We conclude this section by giving the overall estimation error induced by the numerical procedure above. We will admit that the upper bound for the error given for (1, ·) in Lemma 3.7 holds true for the PnL function p(1, ·) with a scaling by n coming from the number of rebalancing date.
Theorem 3.8. Assume that h is a spectral risk measure. Then, the following holds, for some α > 0, (1) for the Nested Simulation approach (2) for the Sparse Grid approach with maximum level κ Proof. 1. We first show that Indeed, we have that And we observe that Let us denote by (x, Θ) → φ k (x, Θ) = e − T 1 r 1,Θ,k s ds G(S 1,x,Θ,k ) which is a random function as it depends on the random realisation of the (r, S) process. Under (58), φ k is smooth enough to apply the results in Proposition 4.1 in [2] and we obtain We then observe that The proof of (63) is concluded by combining the above inequality with (64) and Lemma 3.7. 2. We now prove (61). Applying Lemma 3.6, we obtain The second term in the right-hand side of the above inequality is controlled by using Lemma 3.7. We now study the first term in the right-hand side, which is the error introduced by the sampling of the loss distribution. Applying Corollary 11 in [7] to the spectral risk measure, we first get We then use Theorem 1 in [4] to bound the Wasserstein distance, which concludes the proof for this step. 3. To prove (62), we follow similar arguments as in step 2. but using (63) instead of invoking Lemma 3.7.
Remark 3.9. We can compare the bound obtained for the nested simulation with the ones in [6]. Using a different approach, the authors prove a very nice bound on the overall error given by for the V @R (which is not a spectral risk measure) and AV @R. Note that the term 1 M is obtained by cancellation of the first order term through an error expansion. It would be interesting to understand under which assumptions their bound can be retrieved in our setting of general spectral risk measure. This topic is left for further research.
We conclude this Section by a short account on the numerical complexity of the two methods.
The Nested Simulation approach is a pure "online" method which is very simple to implement but has a huge drawback in term of running time. Each time an estimation is requested the numerical complexity is overall of nN M , where recall n is the number of rebalancing date, M the number of sample for the risk neutral simulation and N the number of sample for the real-world simulation. The memory requirements comes only from the estimation of the loss distribution and are of order N .
As already mentioned, the Sparse Grid approach is both an "online" and "offline" method. In terms of memory requirement, it is thus greedier than the nested simulation approach. On top of the memory needed to store the sample distribution (of order N ), memory is also needed to store the sparse grid approximation p S , the requirement are of order ). In term of running time, the gain is important as the complexity of evaluating p S is of O(κ) only, where κ is the maximum level used.

Numerics
In the numerical applications below, we will compare the loss distribution obtained via our two numerical procedures by computing the Wasserstein distance between the two empirical distributions. Since the loss distribution is one-dimensional, we use the following formula [8]: for two probability distribution on R, η andη, In the setting of empirical distributions, the above distance is easily computed. Suppose η = 1 We straightforwardly compute where the subscript (i) refers to the i-th order statistic of the distribution, since x (i) (resp. y (i) ) is simply the i N -th quantile of η (resp.η).
Besides the Wasserstein distance between the two empirical distributions, we will also compare the estimated V@R and AV@R, which are computed in a similar way. Indeed, for α ∈ (0, 1], we have: where iα−1 N < α ≤ iα N , i α ∈ {1, . . . , N }. For a given α ∈ (0, 1], we observe that Using the formulae (67), (68) and (69), we will now present numerical results showing the efficiency and usefulness of the sparse grid approach. We first start with a comparison with the classically used nested simulation approach.

Sparse grid approach versus nested simulations approach
We computed the empirical distribution of the PnL at horizon 1 year using the nested simulations approach, recall Section 3.2, and the sparse grid approach, recall Section 3.3. For both methods, we used a sample of size N = 11000 describing the real-world evolution of S and Θ, recall For the nested simulations approach, using the overall error bound given in [6] we want to approximate the risk-neutral expectations with Monte Carlo simulations with samples of size M √ N , that is M = 100. In practice however, we observe that convergence has not occured yet and we observe non-neglectible changes in the risk measures taken into account, see Table 1. In this table, we compute the Wasserstein distance with respect to the distribution obtained for M = 10000. Operational constraints do not allow us to change the size of the sample used for the Monte Carlo simulations under P, We thus consider a sample of size M = 2000 for the risk-neutral Monte Carlo simulations. Following Proposition 2.9, we calibrated a Gaussian model such that (X 1 , (θ 1 ) 1 , (θ 2 ) 1 , (θ 3 ) 1 ) has mean and covariance matrix given by: In this setting, the nested simulations method was tested with the Put Lookback option described in 2.1, with maturity T = 30 years. Figure 2 shows the outcome PnL's distribution. We next looked at the grid method. Figure 3 shows the outcome PnL's distributions for sparse grids of level 1, 2, 3, which respectively have cardinal 81, 297, 945. For each level, we chose the number of risk-neutral simulations M so that the error induced by the Monte Carlo estimation is small compared with the sparse interpolation error, meaning that we can run the program multiple times without changing significantly the outcome. Furthermore, as in the nested simulations case, we choose M so that increasing M has no effect on the distribution. Empirically, we chose M = 20000 for the sparse grid of level 1, 2 or 3. Figure 4 compares the distribution obtained with nested simulations with the distribution obtained with the sparse grid of level 3. Table 2 shows computational times comparison, and Table 3 shows V@R and AV@R comparison for the empirical distributions obtained in each case.
We observe that the computational time on the Sparse grid of level 3 with M = 20000 is similar to the one for the Nested simulations with only M = 2000. Moreover, a significant gain in time is obtained by the use of the sparse grid of level 2 only, which already gives good results, see Table 2. As already observed in Remark 3.5, this gain in time can be further improved by parallelisation of the computations. In addition, we observe that, once the computations on the grid are done, then the PnL distribution is almost straightforwardly obtained. This is a key feature of the method since the computations on the grid are to be kept. Indeed, if one needs to change the distribution of (S, Θ) under P, say because the view of the risk management on the evolution of the  Figure 2. PnL distribution, nested simulations market parameters has changed, then they can be re-used easily. In the next section, we give an application in this direction.
Level of the sparse grid l = 1 l = 2 l = 3 Nested simulations Computations on the grid 9 min 30 sec 35 min 10 sec 1h 58 min Computation of the PnL distribution 2 sec 4 sec 8 sec 2h 15min Table 2. Computational times

Model risk
As mentioned above, an interesting feature of the sparse grid approach developed in this paper is the ability to change the distribution of the processes X = log(S) and Θ under the real-world probability P. In this Section, we use the model described in Section 2.4 to simulate a first sample. Then, we consider some uncertainty over the estimated moments µ, V of (X 1 , Θ 1 ) used to calibrate the gaussian model: we only assume that the Level of the sparse grid l = 1 l = 2 l = 3   To better understand the risk associated with this uncertainty under P, we simulate two "extreme" new samples of (X, Θ), where every moment taken into account to calibrate the model are multiplied by 0.95 (resp. 1.05), and, thanks to the grid computations done before with the initial model, we are in position to compute almost instantaneously the empirical PnL distributions associated with these two new samples. Table 4 shows the Wasserstein distance between the initial distributions and the two obtained for the shifted parameters, and the V@R and AV@R obtained at different quantile levels. We observe that with these small change the distribution are quite close to each other. The main discrepancy are obtained for the diminished moments.

Model
Initial

Proof of Proposition 2.9
We provide a recursive proof of Proposition 2.9, which allows to compute effectively the coefficients defining the processes.
Suppose more generally that a vector µ ∈ R n and a covariance matrix V ∈ R n×n is given. We look for n processes X i (i = 0, . . . , n) defined by: where W j t (j = 1, . . . , n) are n independant Brownian motions, and b ∈ R n , C = (c ij ) ∈ R n×n . Proposition 5.1. There is at most one (b, C) ∈ R n × R n×n such that: • c ij = 0 whenever i > j, • E X i 1 = µ i (i = 1, . . . , n), • Cov(X i 1 , X j 1 ) = V ij (i, j = 1, . . . , n). Proof. We have E X i 1 = X i 0 + b i , so b i := µ i − X i 0 ensures E X i 1 = µ i for all i. We next determine the matrix C thanks to a recursive algorithm: Ascending step: Let i, l ∈ {1, . . . , n}, and assume c ik , k > l and c lk , k ≥ l are determined. Then we can determine c il .
Indeed, if i > l, we set c il = 0. If i < l, we have: Thus we set: Back step: Let l ∈ {1, . . . , n} and assume c lj is determined, for k > l. Then we can determine c ll . Indeed: Thus we set: 5.3. Proofs of Lemma 3.1 and Proposition 3.2 We prove here the Lemma 3.1 and the Proposition 3.2, which give a recursive procedure to simulate exactly under Q.
and an easy computation using equality (6) shows that: where α t,Θ is the defined by (32). In addition, if ξ t is defined by (33), applying It's formula again gives: which ends the proof as r t,Θ t = α t,Θ t , by (75). We now turn to the proof of Proposition 3.2.
Proof of Proposition 3.2. Let t ≤ s ≤ u ≤ T . It's formula implies that the triplet (ξ t r , A t,s r , X t,x,Θ r ) r∈ [s,u] is the solution of the following linear stochastic differential equation: with the initial conditions ξ s = ξ t s , A s = 0, X s = X t,x,Θ s . This linear equation has a closed form solution, and we find: Conditionaly upon F s , the vector (ξ t u , A t,s u , X t,x,Θ u ) is Gaussian and the expectations and covariations given in the Proposition are easily computed thanks to the above formulae.

Comparison with Automatic Differentiation.
We use the stan math C++ library [3] which allows to easily implement a (Reverse Mode) Automatic Differentiation procedure in order to deduce the derivatives directly from the Monte-Carlo computation of the function L. We compare the results obtained with the weights method developed here with the results obtained by Automatic Differentiation. We also provide a comparison about the computational times.
Precisely, we computed the derivatives of with respect to the 4 variables (x, θ 1 , θ 2 , θ 3 ) at 256 points (x i , θ i 1 , θ i 2 , θ i 3 ) i=1,...,256 . In the case of the automatic differentiation, we only take 1000 risk-neutral simulations to compute , while for the approach involving the computation of weights, we took 10000 simulations to compute and its four derivatives. Table 5 sums up the time taken for the computations. Clearly, the gain in time resulting by using the weights algorithm is really significant. Additionally, Figure 5 shows the accuracy in the computation using the weights derivatives in comparison with the Automatic Differentiation.

Algorithm -Option
Put Lookback Automatic Differentiation 179 sec Weights 97 sec Table 5. Computational times