Monte-Carlo methods for the pricing of American options: a semilinear BSDE point of view

We extend the viscosity solution characterization proved in [5] for call/put American option prices to the case of a general payoff function in a multi-dimensional setting: the price satisfies a semilinear re-action/diffusion type equation. Based on this, we propose two new numerical schemes inspired by the branching processes based algorithm of [8]. Our numerical experiments show that approximating the discontinu-ous driver of the associated reaction/diffusion PDE by local polynomials is not efficient, while a simple randomization procedure provides very good results.


Introduction
An American option is a financial contract which can be exercised by its holder at any time until a given future date, called maturity. When it is exercised, the holder receives a payoff that depends on the value of the underlying assets.
Putting this problem in a mathematical context, let us first consider the case of a single stock (non-dividend paying) market under the famous Black and Scholes setting, [6]. Namely, let (Ω, F, (F t ) t≥0 , P) be a filtered probability space carrying a standard one dimensional Brownian motion W and let us model the stock price process X as under the risk natural probability. Here, x > 0 is the stock price at time t, r > 0 is the risk-free interest rate and σ > 0 is the volatility. Then, the arbitrage free value at time t of an American option maturing at T ≥ t is given by where T [t,T ] is the collection of [t, T ]-valued stopping times, and g is the payoff function, say continuous, see e.g. [7] and the references therein. Typical examples are g(x ) = (x − K) + , for a call option (K − x ) + , for a put option, where K > 0 denotes the strike price. By construction, V (·, X) ≥ g(X), and the option should be exercised only when V (·, X)=g(X). This leads to define the following two regions: • the continuation region: • the stopping (or the exercise) region: These are the basics of the common formulation of the American option price as a free boundary problem, which already appears in McKean [15]: V solves a heat-equation type linear parabolic problem on C and equals g on S, with the constraint of being always greater than g. Another formulation is based on the quasivariational approach of Bensoussan and Lions [3]: the price solves (at least in the viscosity solution sense) the quasi-variational partial differential equation min (rϕ − L BS ϕ, ϕ − g) = 0, on [0, T ) × (0, ∞) ϕ(T, ·) = g, on (0, ∞) in which L BS is the Dynkin operator associated to X: where D and D 2 are the Jacobian and Hessian operators. In this paper, we focus on another formulation that can be found in [5], see also [4] and the references therein. The American option valuation problem can be stated in terms of a semilinear Black and Scholes partial differential equation set on a fixed domain, namely: where q is a nonlinear reaction term defined as in which c is a certain cash flow function, e.g. c = rK for a put option, and H is the Heaviside function.
Note that this semilinear Black and Scholes equation does not make sense if we consider classical solutions because of the discontinuity of y → q(x, y). It has to be considered in the discontinuous viscosity solution sense, see e.g. Crandall, Ishii and Lions [11]. Namely, even if V is continuous, the supersolution property should be stated in terms of the lower-semicontinuous envelope of q, the other way round for the subsolution property. This means in particular that the super-and subsolution properties are not defined with respect to the same operator. Still, thanks to the very specific monotonicity of y → q(x, y), it is proved in [5] that, within the Black and Scholes model, the American option price in the unique solution of (2) in the appropriate sense.
In this work, we first extend the characterization of [5] in terms of (2) to a general payoff function and to a general market model, see Section 2. Then, we suggest two numerical schemes based on this formulation. The general idea consists in (formally) identifying the solution V of (2) to the solution (Y, Z) of the backward stochastic differential equation In the first algorithm, we follow the approach of Bouchard et al. [8] and approximate the nonlinear driver q by local polynomials so as to be able to apply an extended version of the pure forward branching processes based Feynman-Kac representation of the Kolmogorov-Petrovskii-Piskunov equation, see [13,14]. Unfortunately, our numerical experiments show that this algorithm is quite unstable, see Section 3.1.
In the second algorithm, we do not try to approximate q by local polynomials but in place regularize it with a noise by replacing q(X, e r· Y ) by c(X)1 {g(X)+ ≥e r· Y } , in which is an independent random variable. When the variance of vanishes, this provides a converging estimator. For given, the corresponding Y is estimated by using the approach of Bouchard et al. [8] with (random) polynomial (t, x, y, y ) → c(x)1 {g(x)+ ≥e rt y } and particles that can only die (without creating any children). This algorithm turns out to be very precise, see Section 3.2.

Non-linear parabolic equation representation
From now on, we take Ω as the space of R d -valued continuous maps on [0, T ] starting at 0, endowed with the Wiener measure P. We let W denote the canonical process and let (F t ) t≤T be its completed filtration. Given t ∈ [0, T ] and x ∈ (0, ∞) d , we consider a financial market with d stocks whose prices process X t,x evolves according to in which r ∈ R is a constant 1 , the risk free interest rate, and σ : [0, T ] × (0, ∞) d → R d×d is a matrix valuedfunction that is assumed to be continuous and uniformly Lipschitz in its second component. We also assume thatσ We also assume that P is the only (equivalent) probability measure under which e −r(·−t) X t,x is a (local) martingale, for (t, x) ∈ [0, T ] × (0, ∞) d . Then, given a continuous payoff function g : (0, ∞) d → R, with polynomial growth, the price of the American option with payoff g is given by in which T [t,T ] is the collection of [t, T ]-valued stopping times. See [7].
x) is continuous with polynomial growth follows from standard estimates under the above assumptions. In particular, the set The aim of this section is to prove that V is a viscosity solution of the non-linear parabolic equation for a suitable reaction function q on (0, ∞) d × R. In the above, L denotes the Dynkin operator associated to (3): for a smooth function ϕ.
To be more precise, we define the function q by where c is a measurable map satisfying the following Assumption 2.2.
Before providing examples of such a function c, let us make some important observations.
which is typically the case in practice (e.g. because g is non-negative and the probability that then it can not be touched from above by a C 2 function at a point at which it is not C 1 , which implies that one can forget some singularity points in the verification of Assumption 2.2 above. In Section 3, we shall suggest Monte-Carlo based numerical methods for the computation of V . One can then try to minimize the variance of the estimator over the choice of c. However, it seems natural to choose the function c so that g is actually a viscosity solution of In the numerical study of Section 3, this choice coincides with the c with the minimal absolute value, which intuitively should correspond to the one minimizing the variance of the Monte-Carlo estimator. We leave the theoretical study of this variance minimization problem to future researches.

Example 2.4. Let us consider the following examples in whichσ is a constant matrix with
• For d = 1 and a put g : x ∈ (0, ∞) → [K − x] + , the function c is given by the constant rK. This is one of the cases treated in [5]. ' • For d = 1 and a strangle g : the function c can be any continuous function equal to rK 1 on (0, K 1 ) and equal to −rK 2 on (K 2 , ∞), whenever V > 0.
• For d = 2 and a put on arithmetic mean g : x i ] + , we can take c = rK.
• For d = 2 and a put on geometric mean g : Since q is discontinuous, we need to consider (4) in the sense of viscosity solutions for discontinuous operators. More precisely, let q * and q * denote the lower-and upper-semicontinuous envelopes of q. We say that a lowersemicontinuous function v is a viscosity supersolution of (4) if it is a viscosity supersolution of Similarly, we say that a upper-semicontinuous function v is a viscosity subsolution of (4) if it is a viscosity subsolution of We say that a continuous function is a viscosity solution of (4) if it is both a viscosity super-and subsolution Then, we have the following characterization of the American option price, which extends the result of [5] to our context. Recall Remark 2.1 Theorem 2.5. Let c be as in Assumption 2.2. Then, V is a viscosity solution of (4). It has a polynomial growth.
Let us now assume that (t, x) ∈ S := {V = g}. In particular, ϕ(t, x) = V (t, x) = g(x) and therefore q * (x, ϕ(t, x)) = q * (x, V (t, x)) = c(x). Since V ≥ g, (t, x) is also a maximum of g − ϕ and ϕ satisfies This viscosity solution property can be complemented with a comparison principle as in [5]. Combined with Theorem 2.5, it shows that V is the unique viscosity solution of (4) with polynomial growth. Proposition 2.6. Let the conditions of Theorem 2.5 hold. Let v and w be respectively a super-and a subsolution of (4), with polynomial growth.

Monte-Carlo estimation
The solution of (4) is formally related to the solution (Y, Z) ∈ S 2 × L 2 of the backward stochastic differential equation In practice the above BSDE is not well-posed because q is not continuous. However, it can be smoothed out for the purpose of numerical approximations. In the following, we write E s [·] to denote the expectation given F s , s ≤ T .
for s ∈ [t, T ], and set V n (t, x) := e rt Y t,x,n t . Then, (V n ) n≥1 converges pointwise to V as n → ∞.
Moreover, (V n ) n≥1 has (uniformly) polynomial growth, thanks to the uniform polynomial growth assumption on (q n ) n≥1 . See e.g. [16]. By stability and (9), see e.g. [2], it follows that the relaxed limsup V * and liminf V * of (V n ) n≥1 are respectively sub-and super-solutions of (4). By Proposition 2.6, V * ≤ V ≤ V * and therefore equality holds among the three functions.
Therefore, up to a smoothing procedure, we are back to essentially solving a BSDE. In the next two sections, we propose two approaches. The first one consists in smoothing q into a a smooth function q n to which we apply the local polynomial approximation procedure of [8]. This allows us to use a pure forward Monte-Carlo method for the estimation of V n , based on branching processes. In the second approach, we only add an independent noise in the definition of q, which also has the effect of smoothing it out, and then use a very simple version of the algorithm in [8]. As our numerical experiments show, the first approach is quite unstable while the second one is very efficient.

Local polynomial approximation and branching processes
Given Proposition 3.2, it is tempting to estimate the American option price by using the recently developed Monte-Carlo method for BSDEs, see [10] and the references therein. Here, we propose to use the forward approach suggested by [8], which is based on the use of branching processes coupled (in theory) with Picard iterations.
The first step consists in approximating the Heaviside function H : z → 1 {z≥0} by a sequence of Lipschitz functions (H n ) n≥1 and to define q n by q n : (x, y) → c(x)H n (g(x) − y).

See below for examples.
Then, q n is approximated by a map (x, y) →q n (x, y, y) of local polynomial form: where (a j,l , φ j ) l≤l0,j≤j0 is a family of continuous and bounded maps satisfying for all y 1 ,y 2 ∈ R, j ≤ j 0 and l ≤ l 0 , for some constants C l0 , L φ ≥ 0. The elements of (a j,l (x)) l≤l0 should be interpreted as the coefficients of a polynomial approximation of q n on a subset A j , in which (A j ) j≤j• forms a partition of R and the φ j 's as smoothing kernels that allow one to pass in a Lipschitz way from one part of the partition to another one, see [8].
Then, one can consider the sequence of BSDEs withȲ t,x,n,1 given as an initial prior (e.g. e r· g(X t,x )). GivenȲ t,x,n,k ,Ȳ t,x,n,k+1 solves a BSDE with polynomial driver that can be estimated by using branching processes as in the Feynman-Kac representation of the Kolmogorov-Petrovskii-Piskunov equation, see [13,14]. We refer to [8] for more details.
In practice, we use the Method A of [8, Section 3]. We perform a numerical experiment in dimension 1, with a time horizon of one year, and a risk-free interest rate set at 6%. We consider the Black and Scholes model with one single stock whose volatility is 40%. We price a put option which strike is K := 40. At the money, the American option price is around 5.30, while the European option is worth 5.05. In view of Example 2.4, we take c = rK 4 . We first smooth the driver with a centered Gaussian density with variance κ −2 , so as to replace it by 0.5rKe −rt erfc(κ * (y − e −rt g(x))) with κ = 10. See Figure 3.1. Then, we apply a quadratic spline approximation. In actual computation, as it is impossible to apply spline approximation on the whole half real line, we limited the domain of y for the driver function to [0, 40(1 − e −0.06 )]. We partition this bounded domain into 20 intervals with equal-distant points and define a piecewise polynomial on this domain by assigning a quadric polynomial to each intervals. Finally, we match the values and derivatives of our piecewise polynomial at each point of the grid to the original function (except at the right-end where the derivative is assumed to be zero). The truncation of domain will not alter the computational result as our limited domain includes the maximum payoff for the put option. The resulting approximation is indistinguishable from the original function displayed on Figure 3.1. We also partition [0, T ] in 10 periods. As for the grid in the x-component, we use a 25-point uniform space-grid on the interval [e −20 , 80]. We estimate the early exercise value by first using 1.000 Monte-Carlo paths. As can be seen on Figure 3.2, the results are not good and this does not improve much with a higher number of simulations. The algorithm turns out to be quite unstable and not accurate. It remains pretty unstable even for a large number of simulated paths. This is not so surprising. Indeed, as explained in [8], their approach is dedicated to situations where the driver functions is rather smooth, so that the local polynomial's coefficients (a j,l ) j,l are small, and the supports of the φ j 's are large and do not intersect too much. Since we are approximating the Heaviside function, none of these requirements are met.

Driver randomization
In this second approach, we enlarge the state space so as to introduce an independent integrable random variable with density f such that z → (1 + |z|)f (z) is integrable. We assume that the interior of the support of f is of the form (m , M ) with −∞ ≤ m < M ≤ ∞. Then, we define the sequence of random maps for n ≥ 1. If c is non-negative, continuous and has polynomial growth, then the sequence (q n ) n≥1 matches the requirements of Proposition 3.2.
We now let τ be an independent exponentially distributed random variable with density ρ and cumulative distribution 1 −F . Then, Y t,x,n defined as in Proposition 3.2 satisfies .
This can be viewed as a branching based representation in which particles die at an exponential time. When a particle die before T , we give it the (random) markq n (X t,x t+τ , e rτ Y t,x,n t+τ ). In terms of the representation of Section 3.1, this corresponds to j 0 = 1, l 0 = 0, to replacing a 1,0 (x)φ 1 (y ) byq n (x, y ), and to not using a Picard iteration scheme.
On a finite time grid π ⊂ [0, T ] containing {0, T }, it can be approximated by the sequence v π n defined by v π n (T, ·) = g and v π n (t, where φ π s := inf{s ≥ s : s ∈ π} for s ≤ T . Showing that v π n (φ π t , x) converges point-wise to Y t,x,n t as the modulus of π vanishes can be done by working along the lines of [1, Section 4.3] or [12]. In view of Proposition 3.2, v π n converges point-wise to V as |π| → 0 and n → ∞. A similar analysis could be performed when considering a grid in space, which will be necessary in practice.
Then, (11) provides a natural backward algorithm: given a space-time grid Π := (t i , x j ) i,j , (11) can be used to compute v π n (t i , x j ) given the already computed values of v π n at the later times in the grid, by replacing the expectation by a Monte-Carlo counterpart.
Let us now consider a put option pricing problem within the Black-Scholes model as in the previous section. The interest rate is 6%, the volatility is 20% and the strike is 25. The partition π of [0, T ] is uniform with 100 time steps. However, we update v π n only every 10 time steps (and consider that it is constant in time in between). The fine grid π is therefore only used to approximate X t,x τ by X t,x φ π τ accurately. We use a 40-points equidistant space-grid on the interval [5,50]. The random variable /n is exponentially distributed, with mean equal to 10 −100 , while τ has mean 0.6. In Figures 3.3, 3.4 and 3.5, we provide the estimated prices, the estimated early exercise premium as well as the corresponding relative errors. The statistics are based on 50 independent trials. The reference values are computed with an implicit scheme for the associated pde, with regular grids of 500 points in space and 1.000 points in time (we also provide the European option price in the top-left graph, for comparison). The relative errors are capped to 10% or 40% for ease of readability. These graphs show that the numerical method is very efficient. The relative error for a stock price higher that 30/35 are not significant since it corresponds to option prices very close to 0. For 10.000 simulated paths, it takes 12 secondes for one estimation of the whole price curve with a R code running on a Macbook 2014, 2.5 GHz Intel Core i7, with 4 physical cores.
We next consider a strangle with strikes 25 and 27, see Example 2.4. The results obtained with 50.000 sample paths are displayed in Figures 3.6.
Note that we do not use any variance reduction technique in these experiments.