ON THE NUMERICAL SOLUTION OF A FREE END-TIME HOMICIDAL CHAUFFEUR GAME

A functional formulation of the classical homicidal chauffeur Nash game is presented and a numerical framework for its solution is discussed. This methodology combines a Hamiltonian based scheme with proximal penalty to determine the time horizon where the game takes place with a Lagrangian optimal control approach and relaxation to solve the Nash game at a fixed end-time.


Introduction
The Homicidal Chauffeur (HC) game is a classic problem in the field of differential (dynamical) games that was introduced by Rufus Philip Isaacs in [11] and further elaborated in his seminal book [12]. The statement of the problem is that of a car with a limited radius of turn and constant velocity that pursues a pedestrian, whose velocity is bounded by a given value, that tries to prevent collision. We refer to the car as the pursuer and to the pedestrian as the evader; both are players of the HC game. This is a continuous pursuit-evasion game that can be considered the archetypal of problems of this class and has motivated much research work with early fundamental contribution as in [17]. We refer to [18] for a review of results and a survey of the literature on this topic. We remark that these works focus on a geometrical setting of the HC game and construct solutions based on optimal trajectories and singular lines that disperse, join or refract. In particular, Isaacs investigated the homicidal chauffeur game using a particular method for solving partial differential games based on backward computation of characteristics.
On the other hand, in forthcoming works, different functional settings of pursuit-evasion differential games were considered where the actions of the players is modelled by time-dependent functions, and the purpose of each player and the cost of its action is formulated in terms a cost functional; the player's objective. In this framework, zero-sum versions of the HC problem were proposed in [2,10]; see [3] for a review. Furthermore, also based on the solution concept of Nash equilibrium (NE), nonzero-sum pursuit-evasion games were introduced in [19]; see also [16] for a recent contribution and further references in this field. However, we notice that in many of the latter references the pursuit-evasion game is considered in a fixed end-time horizon and only few works address the case of a free end time; see, e.g., [6]. Furthermore, we remark that, in many research works on the HC game, the focus is on theoretical results as the existence of NE, whereas less effort has been put in the construction of algorithms that accommodate the functional framework for pursuit-evasion games.
It is the purpose of this work to develop a numerical scheme for solving a non-zero sum HC game in a timeoptimal formulation. Our scheme has a bilevel structure where the outer procedure determines the optimal time horizon in a Hamiltonian framework including proximal penalty, and the inner procedure solves a HC game in a fixed time interval by using an optimal control strategy. Then a relaxation step is performed to get a common end-time for the players and a new control couple. At convergence these controls together with the end-time will give the equilibrium solution of the HC game.
In the HC game formulation, the state of our pursuit-evasion system consists of the planar space coordinates of the position of the players, and the controls/strategies are the velocity of the evader and the radial velocity of turn of the pursuer. As illustrated in [18], we discuss this HC system in an inertial reference frame and in a frame moving with the pursuer. Further, by choosing appropriate values for the weights of the cost functionals, we focus on the case where collision may occur and, correspondingly, determine the time for this event.
In the next section, we illustrate the dynamical system modelling the motion of the evader and pursuer and including the control mechanisms (an Appendix is included to facilitate the modelling procedure). Further, we introduce the functional objectives of these two players and define the corresponding NE problem. Also in this section, we draw a connection with optimal control problems and discuss the (partial) characterization of the NE solution in terms of optimality systems.
In Section 2, we present our numerical framework that combines an optimal control strategy and relaxation with a method for determining the time horizon of the game. This latter scheme represents an extension of previous work in [7][8][9]. Section 3 is devoted to the numerical validation of our NE game formulation and of our solution procedure. With our method, we are able to find different solutions of the HC problem starting from different initialisations. On the other hand, small changes of the optimization weights result in similar solutions.
A section of conclusion completes our work.

A homicidal-chauffeur game
We consider a planar system of a pursuer and an evader whose positions, with respect to an inertial Cartesian reference frame, are subject to the following dynamics [18] x p = sin θ,ẋ e = v 1 , with given initial conditions. In this system, (x p , y p ) represents the position of the pursuer (P ) that has a unit-vector velocity while the orientation of this vector with respect to the y-axis (clockwise) is given by θ. On the other hand, the position of the evader (E) is given by (x e , y e ) and its velocity is denoted with (v 1 , v 2 ). In the original work of Isaacs in [12], it appears convenient to reformulate this model in a reference frame moving with the pursuer. In this setting, the origin of the reference system corresponds to the position of P , and the direction of the (new) y-axis coincides with the velocity vector of the pursuer. By coordinate transformation (see the Appendix), we obtain the following dynamical system for the position of the evader in the reference frame of the pursuerẋ where (x 0 , y 0 ) represents the initial position of the evader in the moving reference frame at time t = 0.
Notice that u = u(t) denotes the scalar control (strategy) function of P , and v = v(t) represents the vector control (strategy) function of the evader E. As in (1), these functions are required to lay in given admissible sets: u ∈ U ad , and v ∈ V ad . Specifically, we have U ad = {u ∈ L 2 (0, T ) : |u(t)| ≤ ν u a.e. in (0, T )}, V ad = {v ∈ L 2 (0, T ; R 2 ) : |v s (t)| ≤ ν v a.e. in (0, T ), s ∈ {x, y}}. The numbers ν u and ν v are given control constraints, and T > 0 is the final time that will be determined by our algorithm. Notice that the components of v refer to the new reference frame and are different than (v 1 , v 2 ) in (1) and the value of ν v possibly differs fromν v in (1) because the absolute value is considered, and not the 2-norm.
The HC model (2) can be put in a compact form as follows. Denote with z := (x, y) , z 0 := (x 0 , y 0 ) , B := 0 −1 1 0 and b := (0, −1) (the prime means transpose) then we can write (2) in the following forṁ The objective of player P is to come as close as possible to player E, while minimizing the L 2 cost of its action. The player E wants to prevent its capture, while also minimizing the cost of its action. These objectives are modelled by the following functionals where ν p , ν e > 0, r p , r e represent the relative strength of the interaction with r p , r e ≥ 0 and ω p > 0, ω e ≥ 0. One can prove that, for given u * ∈ U ad and v * ∈ V ad , the control-to-state maps u → z(u, v * ) and v → z(u * , v) are well defined; where z(u, v) is the unique solution to (3), with the given initial condition z(0) = z 0 , and u ∈ U ad , and v ∈ V ad . With these maps, we can introduce the reduced cost functionals Notice that both functionals depend on both strategies, therefore the solution to the differential game is sought as a Nash equilibrium that is defined as follows Definition 1. Let ν u , ν v ∈ (0, ∞] be fixed. The functions u * ∈ U ad and v * ∈ V ad are said to form a Nash equilibrium strategy for the game G = (J e , J p ; ν u , ν v ) for a T = T * if it holds for all u ∈ U ad and v ∈ V ad .
We see that (6) can be written as u * = arg min u J p (u, v * , T * ), and (7) can be written as v * = arg min v J e (u * , v, T * ). This definition allows to interpret our Nash game as two coupled optimal control problems. If (u * , v * , T * ) is the Nash equilibrium, then u * is optimal for player P , in the sense that it solves the following optimal control problem On the other hand, the function v * is optimal for player E, that is, it is a solution of the following optimal control problem min J e (u * , v, z, T * ) Therefore (u * , v * ) solves simultaneously two optimal control problems whose solutions are characterized by the following optimality systeṁ −ṗ e = u B p e , p e (T * ) = r e z(T * ), where P U ad and P V ad represent the L 2 projections onto U ad and V ad , respectively, and ·, · denotes the Euclidean scalar product in R 2 . We refer to the functions p p and p e as the adjoint variables (Lagrange multipliers), and the corresponding differential equations are called adjoint equations.

A Numerical Method to Solve the HC Nash Game
In this section, we illustrate all aspects of our numerical method to compute a NE strategy for the HC game with free end-time. For clarity of presentation, we first discuss some subproblems with fixed end-time.

Fixed End-Time Optimal Control Problems
At the core of our iterative numerical scheme, where a tentative end-time T is available and iterates u k and v k have been computed, we consider the solutions to the two optimal control problems (8) and (9). Specifically, given v k , we aim at computingū by solving the optimality system given bẏ In a similar way, given u k , we can computev by solving the following optimality systeṁ In practice, to computeū andv, we use these optimality systems to determine reduced gradients, ∇ u J p and ∇ v J e , that are required to implement the well-known nonlinear conjugate gradient (NCG) method. We employ the Polak-Ribiére variant of NCG to determineū andv; see [5] for more details.
This approach requires the numerical solution of the HC model and its adjoint. For this purpose, we use the so-called modified Crank-Nicolson (MCN) scheme [4], which requires a time grid that results by subdividing the interval [0, T ] in uniform intervals of size h and N t points, such that t j = (j − 1)h and 0 = t 1 < · · · < t Nt = T . In this setting, the MCN approximation to our HC model is given by where j = 1, . . . , N t − 1, h = T /(N t − 1), and z j , v j , etc., represent the numerical approximation to z(t j ), v(t j ), etc.. The initial point z 1 = z(0) is given. A similar scheme is used to solve the adjoint equations. As in [4] one can prove that this scheme is stable and second-order accurate. For ease of presentation, we summarize the numerical procedure discussed above with the following algorithm.
Algorithm 1 Solution of fixed end-time optimal control problems

Fixed End-Time Nash Games
A classical iterative method for solving NE problems is the relaxation scheme discussed in [13] and implemented in the following algorithm, where τ is a relaxation factor that we specify in our numerical experiments. In general, there is no a priori choice of τ available, and for a discussion on the convergence of Algorithms 2 and 4 some strong convexity criteria are required; see [1,14]. However, in our numerical experiments we always attain convergence of this scheme by a moderate choice of the relaxation factor. Notice that the steps 4 and 5 of Algorithm 2 computeū andv in parallel as illustrated with Algorithm 1.

Free End-Time Optimal Control Problems
The free end-time version of the optimal control problems (8) and (9) are solved through a bilevel approach, which aims at decoupling the determination of the end-time from the solution of the optimal control problems. This allows avoiding time transformation techniques, which usually requires solving a problem very sensitive to the initial guess, see [8].
On the other hand, our iterative solution procedure leads to a two-nested-loops algorithm. The outer loop is devoted to computing the end-time, while the inner loop solves the optimal control problems for a given, tentative time horizon. The key result supporting this approach, in particular how the two levels interact, is drawn in [7, Thm. 10] for linear-quadratic problems and extended in [8] to nonlinear problems.
In order to illustrate our outer loop, we assume that at the k-th iterate the estimates u k , v k and T k are available. Further, let us denote with u[T ] ∈ U ad and v[T ] ∈ V ad the optimal controls associated with final time T as follows Clearly, this construction requires to extend or restrict the functions u k and v k defined on [0, T k ] to the interval [0, T ]. With this preparation, we can consider the following optimization problems, which formally involve only one decision variable: the final time.
For the purpose of defining a robust iteration procedure, a (quadratic) proximal regularization term is introduced, with parameters µ k > 0. Corresponding to (8) and (9), we have the Pontryagin-Hamiltonian (HP) functions given by Then, considering (15) and (16), we have the following augmented HP functions where the term µ k (T − T k ) is associated to the quadratic proximal penalty in (15). In order to implement a gradient-based scheme for determining T , we consider the sensitivity of the cost functional with respect to this variable. This can be evaluated using the HP function of the underlying problem along a fixed-time solution [7, Thm. 10], namely By exploiting the fact that the (non-augmented) system is autonomous and thus the Hamiltonian is constant along a solution, these sensitivities can be evaluated at any t ∈ [0, T ] at the outer loop of our solution procedure. For more details see [8,Algorithm 3]. Summarizing, the method for solving the following free end-time optimal control problem (u, T u ) = arg min Computeū = arg min u J p (u,v, T k )

Free End-Time Nash Games
In this section, we assemble the numerical optimization schemes discussed above to define our algorithm for solving the HC Nash game. We can say that it is a relaxation method with free end-time sub-problems and proximal penalty. A numerical issue that arises when considering the different free end-time optimal control problems for P and E is that the corresponding optimal final times may be different during iterates. However, the relaxation step requires to combine different approximations of the control functions that are given on different intervals. Thus, before the relaxation step is performed, an extrapolation/restriction procedure is applied to define all functions involved on the same time horizon. Our free end-time HC Nash game solver is given by Algorithm 4.
Algorithm 4 Relaxation scheme with free end-time sub-problems and proximal penalty

10:
Set k := k + 1 11: until max(|H e |, |H p |) < In this algorithm, the parameters τ and α are relaxation factors that we specify in our numerical experiments. The main advantage of Algorithm 4 is that we can compute (ū, T u ) and (v, T v ) separately (in parallel) using an efficient optimization scheme. In this respect, Algorithm 4 is of Jacobi, and not of Gauß-Seidel type.
Notice that Algorithm 4 stops when the two Hamilton functions are smaller, in absolute value, than a given tolerance . In fact, since our problem is autonomous, at optimality these Hamiltonians should have zero value.
A different approach, not further investigated in this paper, may consider replacing the free end-time optimal control problems at Steps 4-5 of Algorithm 4 with fixed end-time problems, with final time T k . Successively, lacking of estimates of T u and T v , Step 6 could find aT based onū,v, and T k . This approach would avoid solving free-time optimization problems, which may be a valuable property. On the other hand, this still needs extrapolation or truncation to match the time domains. Also, further decoupling the optimization problem may lead to a slower, less robust convergence. This is a subject for future research.
To conclude this section, we remark that we prefer avoiding time scaling techniques in our game framework and rather use a bilevel approach. However, time scaling provides useful tools for studying free end time optimal control problems. The core idea is to transform the original problem, with time variable t ∈ [0, T ], T > 0, into a fixed-time one by treating the final time T as a parameter (or a constant state). By introducing the mapping s → t(s) := T s, the new time variable is s ∈ [0, 1]. The system dynamics have to be transformed as well, scaled by T . Then, the problem is of finding the optimal control over the scaled time domain [0, 1] and the optimal final time T . Given a final time T > 0, the (original) control u ∈ U ad can be recovered as u(t) = u s (t/T ) for any t ∈ [0, T ]. See [9,15] for more details.

Numerical experiments
In this section, we present results of numerical experiments to validate our HC Nash game formulation and the ability of our numerical framework to solve the resulting problems.
In our numerical experiments, we choose the initial state of the system z 0 = (3, 2) T and initialize T 0 = 10, ν u = 1 and ν v = 0.3. Moreover, let u 00 (t) = 0 and v 00 (t) = (0.1, 0.1) , t ∈ [0, T 0 ]. With these parameters, we solve the HC game for fixed end-time T 0 , in order to get a better initial guess for the controls, namely u 0 and v 0 , in our algorithm for a faster convergence. The extrapolation step is achieved by using the MATLAB function interp1 with the nearest method [20]. Similar results are obtained with the linear method.
Next, assume that player P has an advantage on player E. This situation results from an appropriate choice of the parameters of the game. For example, let ν e = 10 −6 , ν p = 10 −6 , r e = 10 −4 , r p = 1 and ω e = 0, ω p = 10 −6 .
Notice that this choice of parameters aims at having the pursuer catch the evader. Otherwise, the game will stop only by prescribing a maximun number of iterations, at which P and E will have different positions.
To solve this NE problem, we use Algorithm 4 with τ = 0.5, α = 0.5 and = 10 −5 . The regularization parameters are µ k = µ = 10 −10 . In this implementation, the differential HC model and its adjoint are approximated by the MCN scheme with a grid of N t = 500 points. Notice that further experiments have been performed to verify that the results reported in this section are not mesh dependent.
With this setting, we obtain the P and E strategies depicted in Figure 1, and the time when the pursuer catches the evader results to be T * = 8.88. This result is obtained after 15 iterations in Algorithm 4. Starting the algorithm with a different end-time initialization, e.g., T 0 = 9, we obtain T * = 8.81 and the control functions are very similar to those obtained above, hence omitted for brevity.
We would like to comment on the fact that, even using the tolerance = 10 −5 in the stopping condition, we get relatively different values for T * that means the end-time changes much more than the two Hamilton functions. Hence, we do not need a very strict tolerance.
However, it is well-known that Nash games admit many solutions, and these may result by different initializations. In fact, choosing T 0 = 7, we obtain T * = 10.14 and the corresponding strategies and trajectories are shown in Figure 2.  We conclude this section showing that if we change the regularization terms ν e = 10 −6 , ν p = 10 −4 and start with T 0 = 9 ( all the others parameters are as mentioned above), then we obtain (again) the optimal final time T * = 8.79 and similar trajectories, as depicted in Figure 3.
The results of these experiments can be interpreted as follows. At the beginning of the optimization procedure, the evader tries to take the time to the lower bound of the time interval to avoid the chance to be captured. At the same time, the pursuer tries to get enough time to do it. After few iterations it holds the opposite, i.e., the evader aims to get more time to escape, while the pursuer tries to catch it earlier.
Moreover, as explained in [11], if the pursuer turns straight in the direction of the evader, then he can frustrate the pursuer by entering in the circle of maximal curvature. Hence, the pursuer should minimize their distance by moving around the evader until the capture.

Conclusion
In this paper, a numerical framework for solving a pursuit-evader homicidal chauffeur Nash game was presented. The Nash game was formulated in a functional setting involving cost functionals for the two players and the classical homicidal chauffeur differential models. In this model, the strategies of the pursuer and evader are represented by control functions that are subject to constraints. The numerical solution procedure was obtained combining a Hamiltonian based scheme with proximal penalty to determine the time horizon where the game takes place with a Lagrangian optimal control approach and relaxation to solve the Nash game at a fixed end-time.

Acknowledgement
We would like to thank G. Ciaramella, A. Habbal and S. Roy for many insightful discussions and helpful notes on differential games.

Appendix
In this appendix, we derive (2) from (1). For this purpose, we consider the following equations of the pursuer's coordinates in the evader's reference system with an angle of θ measured clockwise from the y-axis.
Hence, equations (2) are obtained by replacing in (20) the values ofẋ i ,ẏ i ,θ given in (1) and by using the following geometric relations x e − x p = y sin θ + x cos θ, y e − y p = y cos θ − x sin θ.