Penalty function method for the minimal time crisis problem

In this note, we propose a new method to approximate the minimal time crisis problem using an auxiliary control and a penalty function, and show its convergence to a solution to the original problem. The interest of this approach is illustrated on numerical examples for which optimal trajectories can leave and enter the crisis set tangentially. Résumé Dans cette note, nous proposons une nouvelle méthode pour approcher le problème du temps de crise en introduisant un contrôle additionnel et une fonction de pénalisation. Nous montrons la convergence des solutions approchées vers une solution du problème original. L’intérêt de cette approche est illustrée sur un exemple numérique pour lequel les trajectoires optimales peuvent entrer ou sortir tangentiellement de l’ensemble de crise. Keywords— Optimal control, Penalty method, Value function, discontinuous integrand


Introduction
The minimal time of crisis problem was introduced in [12] in the context of viability theory (see [1]). It consists in minimizing the time spent by a solution of a controlled dynamics outside a given closed set K (representing typically some constraints). It has been mainly studied in the context of ordinary differential equations (ODEs). Notice however that a similar approach has been proposed for linear parabolic partial differential equations (see [8,9]) in connection with practical applications.
In the context of ODEs, the objective function of the time of crisis can be expressed via the indicator function of the complementary of the set K, which is discontinuous with respect to the state variable, therefore the Pontryagin Maximum Principle (PMP) cannot be directly applied to compute an optimal control. On the other hand, the application of the Hybrid Maximum Principle (HMP) [10] can be used to express necessary optimality conditions (see e.g. [3][4][5][6]13]), but under a certain "transverse crossing condition" of the set K (see also [13]). This condition requires that optimal trajectories enter and leave the set K non tangentially. In this work, our main aim is to present a different approach to the time crisis problem, approximating its solutions. This will allow us to bypass the discontinuity of the indicator function as well as the use of the transverse crossing condition.
Our methodology relies on the introduction of an additional control function and on the definition of an auxiliary optimal control problem with mixed state-control constraint, whose solutions exactly coincide with the ones of the time crisis problem. This new problem is then approximated thanks to a penalty function which is our main purpose in this paper.
The paper is structured as follows. In Section 2, we introduce an auxiliary optimal control problem where the discontinuity of the integrand with respect to the state is formulated using a new control with values in {±1} and a mixed state-control constraint. Next, we show that it is equivalent to the time crisis problem. Because of this new constraint, we then introduce a penalty method and study properties of the value function associated with this approximated problem as well as properties of the regularization arising from the penalty function. In Section 3, we prove convergence properties, namely that the sequence of value functions converges to the value function associated with the time crisis problem. As well, convergence of optimal solutions of the regularized problem to an optimal solution of the original problem is also provided (although the velocity set is non-convex). Finally, these convergence results are illustrated in Section 4 on two academic examples for which optimal trajectories can enter and leave the crisis set K tangentially.

Statement of the problem
Given a set K ⊂ R n , a positive number T , a set U ⊂ R m and a map f : R n × U → R n that fulfill the following assumptions: (H1) The set U is a non-empty compact subset of R m , (H2) The map f : R n × U → R n is continuous w.r.t (x, u), locally Lipschitz w.r.t x and satisfies the linear growth condition there exist c 1 > 0 and c 2 > 0 such that for all x ∈ R and all u ∈ U , one has we consider for any τ ∈ [0, T ] and y ∈ R n , the following optimal control problem ("minimal time crisis") where x u,τ,y (·) : [τ, T ] → R n (simply denoted by x(·) hereafter) is the unique solution to the Cauchy probleṁ associated with a control u ∈ U, the set of all measurable controls u : [0, T ] → U . We also assume the following hypotheses to be satisfied: (H4) The set K is a non-empty closed subset of R n described by where ϕ : R n → R is a locally Lipschitz continuous function that takes value in R.
Note that the existence of an optimal solution to this problem is standard (see, e.g., [4, Proposition 2.1]). Since the integrand defining (TC) is discontinuous, we cannot apply the Pontryagin Maximum Principle (PMP) that requires data to be Lipschitz continuous w.r.t. the variable x. There exist many ways to approximate the indicator function with a sequence of Lipschitz continuous functions (see, e.g., [16]) in such a way to obtain a sequence of optimal trajectories converging to an optimal solution of the original problem. With Lipschitz data, one can use different numerical techniques, such as direct methods, or the Hamilton-Jacobi-Bellman (HJB) equation (to obtain a sequence of Lipschitz continuous value functions, from which one can construct a sequence of optimal trajectories). If one approximates the indicator function with more regular functions, say C 1 , then the (classical) PMP can be used to characterize a sequence of optimal trajectories. Here, we discuss another way to represent the discontinuity of the indicator function with the use of an additional control, taking advantage that in classical optimal control theory, control functions are naturally sought among measurable functions (thus discontinuous).

Formulation with mixed constraint
Let V be the set of all measurable controls v : [0, T ] → Ω where Ω := {±1}, and consider the mixed where x(·) is any admissible solution, and v(·) any control function in V. We define a new optimal control problem with mixed state-control constraint for the controlled dynamics (2). Note that, given any admissible solution (x(·), u(·), v(·)) satisfying the constraint (3), one has v(t) = sign(ϕ(x(t))), provided that ϕ(x(t)) = 0, and v(t) ∈ Ω if ϕ(x(t)) = 0. It is known that the lack of regularity of the integrand w.r.t. the state is bothersome, whereas it is common for an optimal control to be discontinuous w.r.t. t. In this new formulation, the lack of regularity of the integrand defining (TC) has been replaced by the addition of the new control variable v that only takes two values +1 and −1 together with the mixed state-control constraint (3). We now prove that problems (TC) and (TCR) are equivalent.
Proof. Notice first that for any admissible solution x , the control v defined by (4) satisfies the constraint (3) and thus one has Therefore, if (x , u , v ) is an optimal triple for (TCR), then the pair (x , u ) is also optimal for (TC). Conversely, let (x , u ) be an optimal solution of (TC), and suppose by contradiction that there exists an optimal pair (ū,v) ∈ U × V for Problem (TCR) such that Letx be its associated trajectory. Since (x,v) satisfies the constraint (3), it follows that one has Hence, we deduce that which contradicts the optimality of x and concludes the proof.

Approximation with a penalty function
The penalty method is a common technique in optimization to approximate constrained optimization problems with a sequence of unconstrained optimization problems [11,14,15,18]. In this section, we apply a penalty approach to (TCR). Let us start by introducing the following penalty function P : R n × R → R + associated with the constraint (3) as follows: and the following auxiliary optimal control problem defined by is convex for any x ∈ R n , the extended velocity set is convex for any x ∈ R n . The existence of an optimal solution for Problem (TCR # n ) follows by a direct application of Filippov's Theorem [10], under Assumptions (H1)-(H2)-(H3)-(H4). Next, V n : [0, T ) × R n → R will stand for the value function associated with (TCR # n ). For each fixed n ∈ N, Problem (TCR # n ) is a classical optimal control problem of Bolza type with Lipschitz bounded data, for which its value function V n is then locally Lipschitz continuous over [0, T ) × R (see, e.g., [2]). In addition, V n is the unique viscosity solution to the following HJB equation with the boundary condition where the Hamiltonian H n : By maximizing the Hamiltonian w.r.t. v, the expression of a maximizer v n is given by One can also check, thanks to the above expression of v n , that the following inequality holds Without any loss of generality, we can then write V n as Let us now introduce a slight variation of the previous optimal control problem in which controls v(·) are with values in Ω, and not in co(Ω): and for which, we denote by V n the associated value function. By using a similar argumentation as above, one can show that V n is the unique viscosity solution of the following HJB equation with the boundary condition (6) and the Hamiltonian H n defined as Note that the extended velocity set associated with this optimal control problem is not convex, due the fact that we consider controls v taking values −1 or 1 only. Therefore, one cannot apply Filippov's Theorem to ensure the existence of an optimal solution. To investigate the behavior of (V n ), let V : [0, T ) × R n → R be the value function associated with the auxiliary Problem (TCR) defined as Proposition 2.1. For each n ∈ N and for each (τ, y) ∈ [0, T ) × R n , one has the following inequality Moreover, the problem (TCR n ) admits an optimal solution for any n ∈ N.
Proof. Let (x , u ) be an optimal pair for Problem (TC). According to Lemma 2.1, the triple (x , u , v ), where v is defined by (4), is optimal for Problem (TCR). It follows that one has P (x (t), v (t)) = 0 for any t ∈ [τ, T ], which gives V n (τ, y) ≤ V (τ, y) for any n. Clearly, since Problem (TCR n ) is sought for the same criterion than Problem (TCR # n ), but for a smaller set of control functions, we get the inequality V n ≤ V n for any n ∈ N. Consider now Problem (TCR n ). By maximizing the Hamiltonian H n w.r.t. v (with values ±1 only), one obtains the expression of a maximizer v n , for each n ∈ N, as follows By replacing v into (TCR n ) with the expression of v n (x), one obtains the optimal control problem inf u∈U T τ Q n (ϕ(x(t))) dt, which is classical Bolza problem with Lipschitz data, for which the value function is the unique viscosity solution of the HJB equation (7) with boundary condition (6). By uniqueness of solutions of (7)-(6) in the class of Lipschitz functions, we deduce that its value function coincides with V n for any n ∈ N. Moreover, Problem (TCR n ) admits an optimal pair (x n , u n ), thanks to Filippov's Theorem (under Assumptions (H1) to (H4)). Then, the triple (x n , u n , v n ), where v n (t) :=v n (x n (t)) for any t ∈ [τ, T ], is optimal for Problem (TCR n ).

Discussion in terms of regularization of the indicator function
We give next some properties of this approach in terms of regularization of the indicator of K c defining the time crisis function. Indeed, Problems (TCR # n ) and (TCR n ) amount to consider two different regularizations of the indicator function, considering that an optimal control function v(·) has to maximize the corresponding Hamiltonian at almost any t, and that its maximizing expression can be merely replaced in the integrand. One can straightforwardly show the following results.

Proposition 2.2. Define the function
Then, one has for Problem (TCR # n ) and for Problem (TCR n ), one has where Q n (·) is defined in (9).
(2) Q # n is differentiable. Provided that ϕ is differentiable, one can then use the (regular) PMP to characterize optimal pairs (x n , u n ) for Problem (TCR # n ) considering the single control u. On the opposite, Q n is not differentiable at z = 1 2 √ n . However, it is possible to use the PMP provided that the Problem (TCR n ) is considered as a problem with two controls u and v, since the function P is differentiable w.r.t. x. Figure 1. Example of graphs of the functions Q # n and Q n (for n = 5).
These facts justify the interest of considering Problem (TCR n ) (with two controls) instead of (TCR # n ), as an approximation procedure of Problem (TC), strengthened by the fact that this problem admits optimal solutions despite the lack of convexity of its augmented velocity set. However, the consideration of Problem (TCR # n ) has been useful to justify the choice of the bounded penalization min 1, 1+v 2 + nP (x, v) instead of the unbounded one given by 1+v 2 + nP (x, v) in the criterion. From a purely numerical viewpoint, having to consider only two possible values for the control v can be an advantage if one considers numerical schemes based on dynamic programming, as we did in examples in Section 4.

Convergence results
We now provide convergence results of solutions to the penalized optimal control problems (namely (TCR n ) and (TCR # n )) to an optimal solution of (TC). Proposition 3.1. The functions V n , resp. V n , converge pointwise to the function V in [0, T ] × R n . Moreover, any optimal sequence x n , resp. x n , for Problem (TCR # n ), resp. (TCR n ), converges, up to a sub-sequence, uniformly to an optimal solution x of Problem (TC), and their derivatives weakly toẋ in L 2 (τ, T ; R n ).
Proof. Since for each (τ, y) ∈ [0, T ) × R n , the sequence (V n (τ, y)) n is non-decreasing, bounded above, and Lipschitz continuous, it converges pointwise to some function V ∞ (τ, y) ≤ V (τ, y). It can be also observed that Problem (TCR # n ) can be equivalently rewritten as a Mayer problem in R n+2 : subject to the augmented dynamics Under hypotheses (H1) up to (H4), Filippov's Theorem gives the existence of an optimal solution (x n , l n , p n ) associated with a pair of controls (u n , v n ) ∈ U × V # , for any n ∈ N. Then, from the standard compactness properties of trajectories (see, e.g., [10, Theorem 1.11]), there exists a sub-sequence, also denoted (x n , l n , p n ), and a pair of controls (u , v ) ∈ U × V such that (x n , l n , p n ) uniformly converges to a solution of (10) denoted by (x , l , p ) and associated with the control (u , v ). In addition, the sequence (ẋ n ,l n ,ṗ n ) weakly converges to (ẋ ,l ,ṗ ) in L 2 (τ, T ; R n ). Let us now show that the pair (x , u ) is optimal. First, note that one has 0 ≤ l n (T ) + np n (T ) ≤ V (τ, y), n ∈ N where l n , p n are non-negative functions. Therefore, p n (T ) has to converge to 0 when n tends to +∞, which implies p (T ) = 0. Since p is absolutely continuous with p (τ ) = 0 and satisfiesṗ ≥ 0 a.e., we deduce that the function p is identically null. Then, one has the equality (3) and we conclude that one has (by definition of V ) which proves the equality V ∞ (τ, y) = V (τ, y) and that (x , u ) is optimal for Problem (TC). Consider now Problem (TCR n ). From Proposition 2.1, we obtain that V n converges pointwise to the function V as well, and the existence of a sequence of optimal trajectories x n for Problem (TCR n ), with associated controls (u n , v n ) in U × V ⊂ U × V # . Let now (l n , p n ) be the solution of (10) associated with those controls. Using again the compactness properties of trajectories of the solutions of (10) for controls in U × V # , we obtain the uniform convergence of (x n , l n , p n ), up to a sub-sequence, to a certain 1 (x , l , p ) and we conclude as before that x is optimal for Problem (TC).
Remark 3.1. The "strong-weak" compactness of the set of trajectories does not in general provide the convergence of controls. Nevertheless, here, one can see that optimal controls (v n ) for Problem (TCR # n ) or Problem (TCR n ) converge a.e. to v defined in (4) (up to a sub-sequence). Indeed, sinceṗ n weakly converges to zero andṗ n ≥ 0 a.e., it converges a.e. to zero (up to a sub-sequence). It follows that , y), and from Lemma 2.1, v is then given by (4).

Numerical examples
We provide two examples illustrating numerically the convergence of approximated optimal solutions (to the penalized problem) to an optimal solution of the minimal time crisis problem. These two examples have the particularity that the optimal trajectories enter and leave tangentially to the set K, which does not allow the use of the HMP. Moreover, in the second example, the optimal trajectory stays on the boundary of K for a non-null duration. As controlled dynamics, we consider the planar system, as in [4] ẋ 1 (t) = −x 2 (t)(2 + u(t)), with initial condition (x 1 (0), x 2 (0)) = (0, 1) and u(t) taking values in [−1, 1], but here with different sets K.
The function ϕ (defining K) is given by: In the first example, the initial condition lies inside the set K whereas it is outside of K in the second one. Both examples highlight the possibility of entering or leaving the set K tangentially (see Fig. 2). In the present context, we introduce the so-called myopic strategy (see [5,6]) defined as the feedback Roughly speaking, taking u = +1 outside K drives the state as fast as possible inside K whereas taking u = −1 in K allows the system to spent as much time as possible in K. This feedback clearly minimizes the time spent by the trajectory outside the set K in both example. The corresponding trajectories and time crisis can then serve as a test of numerical methods solving Problem (TCR # n ) or Problem (TCR n ) for different values of n. We have used here the BocopHJB software (see [7,17]) which solves the HJB equation (but other numerical methods could be used) to compare the optimal solution of Problem (TC) with the approximated solutions.
For numerical purposes we took τ = 0, T = 5 in both examples. Numerical results are depicted in Fig. 3, Fig. 4. We can see the convergence of the approximated trajectories to the myopic solution, which illustrates our convergence results. One can also observe the convergence of controls u n , although we are not able to prove it here. As expected, the solution of Problem (TCR n ) is closer to the optimal solution, compared to the solution of Problem (TCR # n ) (in particular in the first example). Finally, optimal values for the various costs are reported in Table 1. This highlights the interest of choosing v ∈ {±1} despite the lack of convexity.

Conclusion
In this note, we have proposed a new smoothing procedure for the minimal time crisis problem by introducing an additional control v with values in {±1}, replacing the original problem with discontinuous integrand by a regular optimal control problem involving a mixed state-control constraint. We have then used a penalty method to avoid to deal with this constraint. This has led us to the study of a sequence of unconstrained optimal control problems. We have proved the convergence of the value function and optimal trajectories of the regularized problem to the solution to the original problem. Our numerical examples illustrate these results and validate the good performance of the proposed method. We observed that considering an additional control with only two possible values is quite efficient from a numerical view point, despite the lack of convexity of the augmented velocity set. This regularization technique authorizes trajectories to leave and enter a set K without requiring a transverse condition on optimal paths nor convexity of the set K. In a future work, we shall investigate necessary optimality conditions of (TC) using this approach.