A policy iteration algorithm for nonzero-sum stochastic impulse games

This work presents a novel policy iteration algorithm to tackle nonzero-sum stochastic impulse games arising naturally in many applications. Despite the obvious impact of solving such problems, there are no suitable numerical methods available, to the best of our knowledge. Our method relies on the recently introduced characterization of the value functions and Nash equilibrium via a system of quasi-variational inequalities. While our algorithm is heuristic and we do not provide a convergence analysis, numerical tests show that it performs convincingly in a wide range of situations, including the only analytically solvable example available in the literature at the time of writing.


Introduction
Stochastic impulse games (SIGs) are at the intersection between differential game theory and stochastic impulse control. In the case of one sole player, they reduce to stochastic impulse optimization problems where an agent seeks to control an underlying (or state variable)-otherwise governed by a stochastic differential equation-in order to maximise the expected value of some target functional. One way in which she can influence the underlying is this: whenever it leaves a continuation region (i.e. at her intervention times), she suddenly shifts it back somewhere into the region (by providing an impulse). Together, the continuation region and impulses define what we will call a strategy. Thus, the control-theoretical problem typically boils down to determining an optimal strategy as a function of the current value of the underlying. The purpose of the analysis is to apply that optimal strategy to any specific realization of the underlying. When such a strategy is followed, the expected value of the target functional-the largest possible-is called the value function. This problem is a classical one and it is well understood by now [ØS09]. In particular, a rigorous framework has been put in place-combining Howard's algorithm [Bel70,Bel10,How60,BMZ09] with viscosity-capturing finite difference schemes-which allows for robust numerical approximations [CMS07,CØS02,ØS09].
A natural extension are two-player SIGs. In such games, two players seek to control the underlying with different, and often opposed, aims. Let us give two concrete examples, due to [ABC + 17].
• Two central banks competing to influence the exchange rate between their respective currencies, both seeking to devalue their own one. In this application, the exchange rate is modelled as a stochastic process, and either central bank intervenes when it deems its currency too strong. Each bank's strategy is made up of the ensuing devaluation and the threshold exchange rate triggering it. Both quantities are to be optimally determined in advance, by solving the game.
• A wholesale producer of energy and a big client of hers who resells it. Due to the high consumption of the client, both players are capable of affecting the wholesale price of energy, with naturally opposed targets (a high price versus a low one, respectively). Each player incurs costs when modifying the price while possibly benefiting from state compensation upon adverse price movements led by the opponent. In order to avoid superfluous costs, both players need a strategy, based on the current wholesale price, determining the optimal triggering threshold and the extent of the price shift.
Contrary to one-player SIGs, the value functions in two-player SIGs (one for each player) cannot be naively defined by maximisation. Instead, optimality can naturally be characterized via the notion of Nash equilibrium: intuitively, the pair of value functions displays the best expected outcome in the sense that if a player changes her strategy while her opponent does not, then the former can only be worse off. A pair of strategies at which these values are attained is then called a Nash equilibrium. Because of this, two-player SIGs are distinctly different (and more complex) from one-player SIGs, and the theoretical/numerical frameworks for the latter are not suitable for the former. This is further accentuated in the so-called nonzero-sum SIGs (NZSSIGs), roughly meaning that the losses (resp. gains) undergone by one player may not exactly translate into gains (resp. losses) for the opponent. (For instance, the previously described examples are best modelled under this very flexible framework.) NZSSIGs are in contraposition with the simpler and more particular zero-sum SIGs, in which both value functions add up to zero-effectively reducing the problem to finding one. (This allows for more natural extensions of the theoretical and numerical tools of the one-player case [Cos13,Bas16].) Due to the greater complication involved in the analysis, NZSSIGs are much underdeveloped. Very recently, a breakthrough has been achieved in [ABC + 17], where the value functions and a Nash equilibrium of a very general class of NZSSIGs (assuming they exist and possess some regularity) have been characterized via a system of quasi-variational inequalities (QVIs). Besides the theoretical interest of this link by itself, it opens up the possibility of finding approximations to the value functions by numerically solving that system.
In this paper, we make the first attempt (to the best of our knowledge) at numerically solving NZSSIGs. In a nutshell, the iterative algorithm we put forward treats the NZSSIG at each iteration as a combination of a fixed point problem and a slowly relaxing one-player SIG. This allows us to take advantage of the machinery for the latter. Because, at the moment, we lack a proof of convergence, the proposed algorithm is admittedly heuristic. Instead, we report on a range of numerical experiments, which show convergence of the error with respect to the discretisation. In fact, errors are quite satisfactory and-in the examples we have tackled-the relative errors easily drop well below 0.1%. The algorithm can thus be used to assist further development of the field, as well as to gain insight into applications modelled by NZSSIGs. For even crafted NZSSIGs with exact solution are hard to construct: in fact, the authors of [ABC + 17] also provide (as far as we know) the first example in the literature-thanks to which we were able to validate our algorithm in the first place. (For the sake of completeness, this solution has been included in Section 1.3.) Moreover, this example also evidences that even when an analytical solution is available, its computation in practice may involve parameters which require solving complex nonlinear systems of equations-thus stressing the convenience of numerical approximations.
Let us outline the organization of the rest of the paper. We start in Section 2 by properly setting up NZSSIGs and recalling the main result of [ABC + 17], namely, the Verification Theorem with the corresponding system of QVIs. For the sake of illustration, the concrete application of competition in energy markets is presented according to this framework. In Section 3, we briefly review the state-of-the-art numerics for one-player SIGs, with an emphasis on Howard's algorithm, which will later become a pivotal ingredient of our own numerical method for NZS-SIGs. The latter is motivated, listed, and discussed in Section 4. Section 5 presents numerical evidence supporting it, in the light of which we draw some conclusions in Section 6. 1 The theoretical model 1.1 Two-player nonzero-sum stochastic impulse games In this section we introduce a general class of two-player NZSSIG, to which our numerical algorithm is applicable. For the sake of briefness and pertinence, we will skip some of the most technical matters of the rigorous construction of the model. We refer the interested reader to [ABC + 17] for more details and a more general modelling framework.
Let (Ω, F, (F t ) t≥0 , P) be a filtered probability space under the usual conditions, that supports a standard one-dimensional Brownian motion W = (W t ). For each x ∈ R we consider a process X (the state variable) starting at x, where: is the control of player i, which consists of a sequence of stopping times τ i k (the k-th intervention time of player i) and F τ i k -measurable random variables δ i k (the k-th intervention impulse of player i).
In words, when none of the players intervenes the state variable behaves as an Itô diffusion. The players can at any time decide to shift this process by applying a certain impulse. The specific type of interventions are determined by (threshold-type) strategies, which means that each player acts only when the state variable exits a given region of R. More specifically, the control of player i is defined in terms of a strategy ϕ i := (C i , ξ i ), where C i ⊆ R is an open set (the continuation region) and ξ i : C i c → R is a continuous function. 1 Player i intervenes if X exits C i -i.e., if for some t ≥ 0 and ω ∈ Ω it holds X t (w) / ∈ C i -by applying an impulse ξ i (X t (w)). We decree that the game never stops and if both players want to intervene at the same time, then player 1 has the priority. The latter assumption is a matter of convention and not very restrictive. 2 Henceforth, we write E x to denote the conditional expectation given X 0 − = x, and define the objective function for player i given the strategies (ϕ 1 , ϕ 2 ) and the starting point x of X as with i, j ∈ {1, 2}, j = i; where: (i) ρ i > 0 is the subjective discount rate of player i.
(ii) f i : R → R is a continuous function giving the running payoff of player i.
(iii) φ i : R 2 → R (resp. ψ i : R 2 → R) is a continuous function giving the cost of intervention (resp. gain due to the opponent's intervention) for player i.
In addition, we will only consider pairs of strategies (ϕ 1 , ϕ 2 ) such that the previous expectations are well defined for all x ∈ R, and we refer to these pairs as admissible (see [ABC + 17] for more details). 3 Note for example that, for payoffs with polynomial growth, the 'no intervention strategies' φ 1 = φ 2 = (R, ∅) are always admissible. In this context, we say that the players behave optimally if their strategies form a Nash equilibrium. We recall that (ϕ * 1 , ϕ * 2 ) is a Nash equilibrium if it is an (admissible) pair of strategies such that, for all (ϕ 1 , ϕ 2 ), i.e., if player i changes strategy while player j does not, then on average player i will be worse off, and vice versa. If a Nash equilibrium (ϕ * 1 , ϕ * 2 ) exists, we define the value function for player 1 For any A ⊆ R, A c denotes the complement of A in R.
2 Under the assumptions of the Verification Theorem 1.2.1, the value of the game and the Nash equilibrium found would not change if player 2 had the priority instead. This can be checked by taking a pair (Ṽ1,Ṽ2) as in the theorem and noticing that (Ṽ2,Ṽ1) solves the game in which the structures (costs, gains, rates and payoffs) of the players have been swapped without swapping the priority.
3 In [ABC + 17] admissibility is defined pointwise for greater generality. We refrain from doing this for simplicity.
i when using the strategies (ϕ * 1 , ϕ * 2 ) by Our aim is to compute (V 1 , V 2 ) for some Nash equilibrium, and more importantly, to retrieve from these values the equilibrium itself.

The system of quasi-variational inequalities
In order to establish a system of QVIs for (V 1 , V 2 ) we need to define one last ingredient, known as the intervention operators, which will display the effect of each player's intervention on the value functions. For any two arbitrary functionsṼ 1 ,Ṽ 2 : R → R, i, j ∈ {1, 2}, i = j, and x ∈ R, the loss operator of player i is given by IfṼ i = V i this operator gives the recomputed present value of i due to the cost of her own intervention. If for each x ∈ R there exists a unique δ j (x) = δ j V j (x) that realizes the supremum in (4) , we also define the gain operator of player i as IfṼ i = V i this operator gives the recomputed present value of player i due to her opponent's intervention. Whenever we make use of a gain operator, we are implicitly stating that the above assumptions and notations are in place. We also emphasize that H iṼi (x) depends on the whole functionṼ j through δ j V j and we will write H i (Ṽ j )Ṽ i (x) instead, when we want to make this dependence explicit.
We can now state the Verification Theorem, due to [ABC + 17], that will allow us to tackle the problem of finding (V 1 , V 2 ) and a Nash equilibrium by numerically solving a deterministic system of QVIs. In the next theorem we use the notation A for the infinitesimal generator of X when no interventions take place. That is, A is the operator such that if g ∈ C 2 (S) for some S ⊆ R, then We concur that whenever we apply A to some function g, we are implicitly stating this function g is C 2 at every x at which we compute Ag(x).
has polynomial growth and bounded second derivative on some reduced neighbourhood of , is an admissible pair of strategies. Then is a Nash equilibrium.

Example of application: competition in energy markets
We finish this Section by providing a specific example of application: competition in energy markets [ABC + 17]. Let process X model the forward price of energy, evolving as a Brownian motion when there are no interventions. Consider the following two players: player 1 is an energy producer with unitary production cost s 1 , so that in a simplified model her payoff is X − s 1 . Player 2 runs a large company that buys from player 1 and sells at a unitary price s 2 > s 1 , with payoff s 2 − X. Because of the high consumption of player 2, she can also affect the price of energy in the same way as player 1. Whenever these players intervene they incur a cost (advertising among other factors) which is modelled linearly for tractability-one constant component and another proportional to the change induced in the energy price. At the same time, because of their impact in the economy, the government subsidizes upon adverse movements in the energy price. For each player this represents a gain when the opponent intervenes, which is modelled in the same way as the cost of intervention, but with different parameters. We further assume that both players discount their winnings/losses at the same rate ρ > 0 and they have the same cost/gain parameters. More specifically, The exact solution to this game can be found by application of Theorem 1.2.1. To this purpose, the system of QVIs (7) is heuristically solved, yielding candidates for value functions and a Nash equilibrium. This is done, first, by making some educated guesses regarding the shape of the optimal continuation regions and value functions. Second, by solving the ordinary differential equations (ODEs) in (7) where appropriate. Finally, by imposing the regularity requirements of Theorem 1.2.1 through pasting conditions. Upon verification of the remaining hypotheses, the following turns out to be the solution to the game: where: and ξ ∈ (0, η) is the unique zero of F (y) := 2y − η log η+y η−y + θc.
As it can be readily noticed, the analytical solution involves the computation of several parameters and the resolution of at least one nonlinear equation. As a matter of fact, the number of parameters in this solution was reduced making use of the symmetry in the problem. In a more general two-player NZSSIG under the modelling framework of Section 1.1, if we assume the optimal continuation regions in Theorem 1.2.1 are semi-bounded intervals, then an analytical solution can easily involve eight parameters: two finite limits of both continuation regions, two maximising points of the net value functions (subtracting intervention costs) and four undetermined constants coming from the second order ODEs. Moreover, these parameters are the solution to a nonlinear system of equations that arises from smooth pasting and optimality conditions [ABC + 17, Def. 4.1.]. In the more frequent than not situation in which this system cannot be simplified, computing the analytical solution may become prohibitive. This further motivates the need for a numerical algorithm to solve the system of QVIs (7).
Remark. To the best of our knowledge this is, at the time of writing, the only one example in the literature of an analytically solvable NZSSIG. Henceforth, we will refer to it as the benchmark game.
2 Numerics for one-player SIGs: state of the art Notation. Henceforward, finite grids will be denoted by K and S. For a fixed grid, we will use the same font type when discretising an operator, e.g., M, H and A will be replaced by M, H and A. We shall specify later the way the discretisations have been done in our experiments. We will not change notation for functions over grids as their domain will always be clearly stated. The only exception will be those functions that have been redefined to account for boundary conditions (BCs). Lastly, we recall that for any set A, R A denotes the set of functions from A to R.
Policy iteration or Howard's algorithm for (discrete) variational inequalities (VIs) was originally developed in [Bel70,Bel10,How60]. The method was then extended to QVIs [CMS07, CØS02, ØS09], i.e., variational inequalities in which the obstacle depends on the solution itself. Let K ⊂ R be a finite set (the grid). These problems have the form: where g ∈ R K and L, B y : R K → R K are linear and affine operators resp., for each y ∈ K.
Problem 9 encomprises in particular a discrete localized version of a one-player SIG (i.e., a stochastic impulse control problem) since the system (7) reduces in this case to one single QVI. Indeed, by an appropriate finite difference approximation (more details in Section 3) one can take g = f | K (modified to account for Dirichlet or Neumann BCs), 5 A − ρ 1 Id ≈ L and . By their antisymmetric structure, two-player zero-sum SIGs can also be accommodated with a similar formulation as a max-min double obstacle problem [ABC + 17, page 7] for one single value function V := V 1 = −V 2 , and the policy iteration algorithm can be adapted [BMZ09].
However, this framework is not general enough to tackle the NZSSIGs described in Section 1.1 and the full system of QVIs (7). There are, to the best of our knowledge, no available numerical methods to approach the latter problem. We present now the classical policy iteration algorithm (Algorithm 1) used to solve (9). It will be embedded in the numerical scheme we will put forward to solve the much more general problem (7). While we note that our later use of this algorithm is not fully within the theoretical scope studied in [CMS07]-in particular, we will need to relinquish the affine nature of the operators B y , y ∈ K-the two following main assumptions will be in place nonetheless (#K is the cardinal of set K): (ii) B y is a non-expansive function for · ∞ , for all y ∈ K. That is, We remark that, in particular, Assumption (i) will guarantee that Algorithm 1 is well defined, as the linear operators L k will be non-singular [BMZ09]. Lastly, note that in Algorithm 1, computing the operators L k amounts simply to redefining a matrix row by row, using either the matrix of L or −Id. We have chosen an 'operators-type' notation for this paper, as it will simplify matters in the sequel, when compared with its matrix counterpart.

Proposed algorithm for two-player NZSSIGs
Compared with the single-value-function problems in the previous Section, general two-player NZSSIGs are distinctly more challenging. The main challenges are: • two value functions, governed by a system of QVIs, must be solved for, • the dependence between (V 1 , V 2 ) is highly nonlinear due to the presence of the gain oper- • each gain operator is expansive as a function of (Ṽ 1 ,Ṽ 2 ); and • solutions will typically be less regular. For example, if (V 1 , V 2 ) is the solution to the benchmark game in section 1.3 then V i is singular atx j for i, j ∈ {1, 2}, i = j (i.e., each value function is non differentiable at the border of the opponent's continuation region), in spite of the game having linear payoffs, costs and gains. This is somehow to be expected, as Theorem 1.2.1 contemplates this lack of regularity within the smoothness assumptions (compare to the classical Verification Theorems for one-player problems [ØS09] where greater regularity is assumed).
Algorithm 2 below is, as far as we know, the first ever numerical attempt at two-player NZSSIGs. At present, it is admittedly heuristic and supported only by the numerical evidence reported in Section 4.
The remainder of this Section is organized as follows. We start by explaining the idea and motivating the underlying heuristics. Then, we describe in detail the discretisation of the system of QVIs (7). Finally, we list the new algorithm.
Heuristics. Having discretised the system of QVIs (7), we start from an initial guess (V 0 1 , V 0 2 ) to approximate its solution. We seek an iterative procedure to consecutively compute A natural idea is, first, to partition the grid into the 'approximate continuation region' of player j-{M j V k j − V k j < 0}-and its complement, the 'approximate intervention region'; and then to compute V k+1 i either by calculating a gain as H i (V k j )V k i in the former case, or by solving one QVI with Howard's algorithm (Algorithm 1) in the latter. Note, however, that naively defining {M j V k j − V k j < 0} as the 'approximate continuation region' of player j poses difficulties.
Indeed, the discretisation of the loss operator M j as M j implies that even the true value function V j will generally not verify ) for x in the true intervention region of player j. We therefore need to relax this constraint to account both for numerical error and the discrepancy between the discrete and space-continuous problems. Our experiments have shown that a successive relaxation procedure turns out to be the most effective. Consequently, we define the approximate continuation region of player j as {M j V k j − V k j < −r k } instead, where r k is a small positive number. By letting r k relax to a preset small tolerance ε > 0, the iterative approximations will hopefully converge to the correct discrete solution.
It remains to schedule the relaxation procedure and to define a measure of convergence for the algorithm. Regarding the former, having computed (V k+1 1 , V k+1 2 ), r k is linearly relaxed (line 11). (This relaxation procedure is chosen for simplicity.). Then the largest pointwise residual to the system of QVIs (incurred by either approximate value function) is calculated across the grid (line 12), taking into account the numerical tolerance ε. We denote this residual R k+1 and we consider the algorithm has converged when it drops below a certain tolerance-unlike in Algorithm 1, where the residual is taken as the distance between consecutive approximations. This alternative approach has been chosen for being more informative. It reflects whether a solution to the discrete system of QVIs has been found-as opposed to the algorithm stagnating-on top of giving valuable information at each grid node.
By construction, the junctions between the approximate continuation and intervention regions for each of the players will necessarily take place on top of a finite difference node. This may lead to numerical issues when the exact value functions are non differentiable there (as will often be the case). Namely, the pointwise residual to the QVIs at the junction nodes may not be made arbitrarily small by refining the grid-because the derivatives contained in the residual are not defined at the exact junction, in the first place. In those cases, the algorithm must be stopped at that point, since adding more grid nodes in a naive way cannot lead to any improvement.
As a final remark, we mention that numerical experiments show Algorithm 2 does not enjoy global convergence (i.e. it is not guaranteed to converge from arbitrary initial guesses). Providing a good enough pair (V 0 1 , V 0 2 ) is thus a practical issue; in Section 4, a natural way of constructing educated guesses is explained. discretisation. Let S ⊆ R be a finite set. S is the grid we will use to discretise the system (7). In all of the numerical experiments described in the sequel we have taken S as an equispaced grid of M steps between certain x min < 0 < x max with |x min |, |x max | and M big enough (more about this below).
Let i, j ∈ {1, 2}, i = j,Ṽ 1 ,Ṽ 2 ∈ R S and x ∈ S. We proceed to define the discretised versions of the loss and gain operators, over the grid S, as Next, we choose a finite difference scheme for the ODEs which is consistent, monotone and stable, adding Dirichlet or Neumann-type BCs. In all of our experiments, we have chosen an upwind finite difference scheme, wherẽ were solved for using Neumann conditions oñ respectively. How to get these conditions, and in particular how to choose x min and x max , is a problem-specific question. In some situations, and particularly in the models numerically tested in this paper, one can heuristically assert that at a Nash equilibrium the continuation region of player i should be a semi-interval of the form C i = (x i , x i )-with one end-point finite and the other infinite. Intuitively, one can further guess that there should exist a unique y * i ∈ (x i , x i ) that maximises the net value of player i when she intervenes (see, e.g., [ABC + 17, Sect. 4.2] and [Bas16, Sect. 2.3.1]). These conjectures arise mainly from the observation of the payoff functions f 1 , f 2 which encode the goals of the players and roughly hint at some broad regions where each player would like the state variable to remain at. On R\C i and R\C j one of the two players will intervene, thus giving either

If on the interiors (R\C
exist and do not depend on y * i , y * j resp., differentiating the previous relations yields the Neumann conditions (provided x min , x max are 'extreme' enough). The previous requirements on the derivatives are satisfied, for example, if the cost and gain structures have the form φ i (x, δ) = g i (x) + a i |δ| and ψ i (x, δ) = h i (x) + b i |δ|, for some differentiable functions g i , h i : R → R and a i , b i ∈ R. For the benchmark game the Neumann BCs read V 1 (x min − h) = λ, V 1 (x max ) =λ, V 2 (x min − h) = −λ and V 2 (x max ) = −λ. We will denote by A the discretised version of A and f i the restriction of f i to S, redefined at min S and max S to account for the BCs. For Algorithm 2 we recall that given any real number a, a + denotes its positive part (i.e., a + := max{a, 0}) and for any subset S ⊆ R, 1 S denotes the indicator function of S.

12:
Let R k+1 be the largest pointwise residual to the system of QVIs, i.e.

13:
Let k = k + 1. 14: end while Line 9 of Algorithm 2 deserves some special attention. Although at this step we want to solve a problem restricted to the subgrid K = C k j (for fixed k, i, j), we still need the information in S\C k j in two ways, namely: • To compute the non-local operators M i .
Then in order to apply Algorithm 1 we take for allṼ ∈ R K , x, y ∈ K. We remark once again that the functions B y fail to be affine operators. This, together with some assumptions which are not satisfied, make this application of Algorithm 1 fall outside of the scope of [CMS07] (and even more of [CØS02,ØS09]). However, it is easy to check that the main assumptions, Assumptions 2.1.1, are still verified under our discretisation and we have observed unconditional convergence of this subroutine (to very high orders of precision) in all the experiments we have ran. Lastly, we note that Algorithm 1 requires setting a numerical tolerance which needs not be the same as the one of Algorithm 2. In fact, we have always taken it strictly smaller given the perceived unconditional convergence of the former and its precision. 8

Numerical results
In this section we assess the performance of Algorithm 2. We shall call the problems considered in the experiments simply 'games'. The two value functions V 1 , V 2 are approximated as described in Section 3 on an equispaced grid of M + 1 nodes, and the computational domain is the plotted one. In all the subsequent games, ε = 10 −8 , α = 0.8 and r 0 = 1. The largest pointwise residual (equation (10)) at convergence is denoted by R ∞ .
As mentioned before, the only NZSSIG for which an analytical solution is currently available is the benchmark game in Section 1.3. Therefore, we shall eventually focus on that problem. However, we introduce two other games first (for which we do not have an analytical solution). They illustrate how the initial guess for the benchmark game can be constructed and provide further numerical evidence supporting Algorithm 2.

Parabolic game
This is a version of the benchmark game where the payoff is replaced by a concave parabola with roots r L i and r R i : Let us motivate this game. We seek to use as initial guess the value functions of the 'unilateral games', that is, the control problems in which one of the players never intervenes (her continuation region is fixed and equal to R). Removing the action of one of the players, however, may not always lead to a well-posed problem. Indeed, for payoffs without maximum one could end up for example with 'infinite-valued value functions' or 'infinite-valued optimal impulses'. This is indeed the case for the benchmark game. Thus, in order to skirt that difficulty it is convenient to define variations of the benchmark with payoffs that attain a maximum, like (11). Note that, when well-posed, the unilateral games can be readily solved with Howard's algorithm (Algorithm 1). Figure 1 shows the numerical solution, (V 1 , V 2 ), to a parabolic game (pair of solid curves) along with the initial guess (pair of dashed curves). The latter are, in turn, the value functions of the corresponding unilateral games for players 1 and 2. It is intuitively clear that (V 1 , V 2 ) approximate, over the grid, functions which indeed satisfy the assumptions of the Verification Theorem 1.2.1. The Nash equilibrium exhibited in this Theorem-(ϕ * 1 , ϕ * 2 )-can be retrieved from this graph. Indeed, note that in this case the cost for player i is φ i = −100 and her approximate continuation region is Further, her optimal impulse at a given x is the one that realizes (the discrete analogue of) the supremum in equation (4), which in this case amounts to translating x to the maximising point of V i . Consequently, we get ϕ * 1 = (−∞, 1.068), −1.848 − x and ϕ * 2 = (−3.048, +∞), −0.120 − x .  Table 1 for other values.
As it turns out, this parabolic game also converges from the 'zero guess' (V 0 1 = V 0 2 = 0). On Table 1, the convergence of R ∞ over a wide range of finite difference grids (with 301, 601, . . . , 3001 nodes) is compiled. In all cases, both initial guesses lead to convergence (to within ε) of Algorithm 2. Nonetheless, the algorithm takes in general fewer iterations when starting from the values of the unilateral parabolic games. (We stress that those on Table 1   Iterations to convergence within ε = 10 −8 starting from: zero guess (its. a ) and value functions for one-player games (its. b ). (Same parameters as in Figure 1.) Since the exact solution to the parabolic game is unknown, we cannot say anything about the convergence of Algorithm 2 in continuous-space. On the other hand, we see the system of QVIs is (approximately) enforced to within ε 1 by a pair of numerical functions which-also approximately-comply with the regularity assumptions of Theorem 1.2.1. As such, they are the (approximate) value functions of the parabolic game.
We conclude this example by illustrating the interplay between the value functions which numerically solve the system of QVIs, the Nash equilibrium derived from them, and the evolution of the optimally controlled underlying. Once the optimal strategies (ϕ * 1 , ϕ * 2 ) are available, they can be executed on specific realizations of the game. Sticking to the parameters and numerical solution in Figure 1, Figure 2 depicts one exemplary trajectory of the underlying in the time interval 0 ≤ t ≤ 1000, starting from x = 0 and subjected to the pair of optimal strategies. For numerical purposes, let us definê  The Nash equilibrium itself can be visually explored in the following way. For a given starting point x, we keep the optimal strategy for one of the two players, and slightly alter the strategy of the other one. For concreteness, let us assume that player 1 "moves" (i.e. ϕ 1 = (1 ± 0.25U)ϕ * 1 ) while player 2 does not (ϕ 2 = ϕ * 2 ). (U is the uniform distribution, and "±" means with equal chance.) Then, we proceed to calculate J 1 (x; ϕ 1 , ϕ * 2 ) by Monte Carlo simulation as before. By definition of the Nash equilibrium, J 1 (x; ϕ 1 , ϕ * 2 ) can not be larger than V 1 (x). Within numerical tolerance, this is indeed observed in Figure 4, where the blue empty circles (representing J 1 (x; ϕ 1 , ϕ * 2 )) do not lie over the blue solid curve (which represents V 1 (x)). Full blue circles represent J 2 (x; ϕ 1 , ϕ * 2 ): note that the player who sticks to her optimal strategy may indeed improve over V 2 (x), should her opponent depart from a Nash equilibrium. (When player 2 is the one who changes, the red circles and red curve in Figure 4 apply instead.) Figure 4: Empty blue [red] circles mean that the player 1 [2] has departed from her optimal strategy while her opponent has not. By virtue of Nash equilibrium, she cannot be (within numerical errors) better off than V 1 (x) [V 2 (x)]. Full circles: objective function of the player who does not drop her optimal strategy. Parameters and optimal strategies from Figure 1. (Note that results are subjected to numerical error.) We stress, however, that Monte Carlo simulations such as those cannot prove that a pair of strategies form a Nash equilibrium. (At most, they could disprove it.) The only way-in the current state of the theory-of fully characterizing a Nash equilibrium calls for solving the system of QVIs-which Algorithm 2 has now made possible.

Capped benchmark game
In this game, we replace the running payoffs of the benchmark by a version capped at K > 0: We shall always take K = 5. Once again, the corresponding unilateral games are well-posed and their solutions can be used as initial guess for the capped benchmark game. As in the previous example, the capped game also seems to converge from the zero guess. Some convergence results are compiled in Table 2. Note that convergence falters with M = 600 and M = 3300 (independently of k max ); we will come back to this later.
M R ∞ its.

600
-∞ 900 6.5 × 10 −10 100 1200 8.8 × 10 −9 124 1500 7.4 × 10 −9 78 2700 4.1 × 10 −9 96 3000 6.7 × 10 −9 125 3300 -∞ Table 2: Convergence of R ∞ in the capped benchmark game using the zero guess. The hyphen stands for lack of convergence within ε. Same parameters as in Figure 5. The value functions of the capped benchmark game are a good approximation to those of the benchmark game itself (see Figure 5). This seems to make sense: due to the action of the opponent and for σ K, the discarded portion of the payoff is not very relevant in practice.

Benchmark game with educated initial guess
Finally, we tackle the benchmark game, for which an exact solution is available (see Section 1.3). Contrary to the previous examples, Algorithm 2 does not seem to enjoy unconditional convergence here. In fact, when the zero guess was used, it failed to converge more often than not (not reported). In order to construct an adequate initial guess, we first solve for the value functions of the capped benchmark game. Using them as the initial guess, convergence was achieved in every experiment.  Table 4: Convergence of Algorithm 2 for benchmark game (same parameters as in Figure 6 (right).
On Tables 3 and 4, the convergence of the numerical approximation provided by Algorithm 2 to the true solution is demonstrated. However, R ∞ fails to drop below ε for a fine enough discretisation. This reflects a pattern: R ∞ stagnates as M grows. Upon closer inspection, it turns out that the stagnating largest pointwise residual for each player takes place at the junction between the intervention and continuation regions of the opponent, where the exact value function of the former player has a singularity (a kink ). As discussed in Section 3, Algorithm 2 is always going to place those kinks at finite difference nodes. The overshooting residuals are thus due to the inability of a numerical solution to reproduce a sharp, nonsmooth feature-where the finite difference derivatives grow unbounded as the distance between nodes goes to zero. Figure 7 illustrates this situation. It can be seen that errors continue to be acceptable (far less than 1% relative error in the worst case). We also highlight the fact that the kinks do not seem to bring about oscillations of the numerical solutions around them-the notorious Gibbs' phenomenon.

Conclusions
We have designed and tested a novel policy iteration algorithm-the first one as far as we knowto numerically solve nonzero-sum stochastic impulse games (NZSSIGs). The approach consists in solving a system of quasi-variational inequalities which characterizes the value functions and Nash equilibrium, exploiting a recent theoretical breakthrough in [ABC + 17].
Our algorithm computes iteratively the approximate solution by partitioning, for each of the players, the discretised spatial domain into an approximate continuation region and an approximate intervention region. They are defined through a relaxation parameter that evolves along the iterations. In the continuation region, we solve one quasi-variational inequality by means of (a generalization of) Howard's algorithm, whereas in the complement, a gain is computed. A strategy for producing an educated initial guess to start the iterations-which relies on solving two associated, standard impulse control problems-has been presented along with the new Algorithm as well.
We have not carried out a convergence analysis. Instead, we have gathered plenty of nu-merical evidence showing that the Algorithm can be applied confidently. In particular, we have performed convergence tests on the only available-at the time of writing-example of an analytically solvable NZSSIG. In other test NZSSIGs for which we do not have an analytic solution, we have explored consistency and numerical results on the value functions and Nash equilibrium, also with satisfactory results. Value functions of NZSSIGs can develop sharp kinks at the confluence between the continuation and the intervention regions of the opponent. Capturing such features into numerical approximations is a pervasive challenge of numerical analysis. Numerical tests show that the presence of such kinks may eventually put a cap to the accuracy attained by our algorithm. On the other hand, its stability is not affected by them. Moreover, in every case the largest pointwise error was perfectly acceptable for the purposes of most applications.
In sum, the new Algorithm offers a means of gaining quantitative insight into applications modelled by NZSSIGs. Natural continuations of the present work include the convergence analysis of the Algorithm, and enriching the discretisation method so as to better capture singularities in the solutions.