Stability theory for some scalar finite difference schemes : Validity of the modified equations approach

In this paper, we discuss some limitations of the modified equations approach as a tool for stability analysis for a class of explicit linear schemes to scalar partial derivative equations. We show that the infinite series obtained by Fourier transform of the modified equation is not always convergent and that in the case of divergence, it becomes unrelated to the scheme. Based on these results, we explain when the stability analysis of a given truncation of a modified equation may yield a reasonable estimation of a stability condition for the associated scheme. We illustrate our analysis by some examples of schemes namely for the heat equation and the transport equation.


Introduction
Modified equations have been the subject of investigations and debates in numerical analysis. And yet, despite being relatively easy to establish, their use has been regarded with a lot of skepticism due to the lack of theoretical justification and the trickiness of their analysis. The first use of this technique for stability purposes is due to Hirt [1]. He provided the first practical examples for which modified equations give relevant information, in a heuristic manner. The pionneering work of Warming and Hyett [2] introduced how to obtain these equations in the general case and mostly clarified the link between the stability of the scheme and the modified equations. They provided a simple and efficient way to obtain the exact Von Neumann stability conditions from a finite number of the modified equation coefficients, for linear scalar schemes. An alternate way to obtain modified equations was introduced in [3]. The method permits to obtain the same modified equation through a series expansion of a an explicitly known function rather than the elimination technique of [2]. Other works that tried to tackle stability analysis through modified equations include [4,5].
In this paper, we investigate the modified equations for some finite difference schemes, in an attempt to make it clear why this technique often fails to provide relevant information on stability. First, we present a reminder on how to obtain modified equations for a given linear scalar scheme. We explain the basics of the heuristic stability theory and its limitations through some examples. In the second part, we present the technical framework and tools needed for stability analysis. We then clarify the mathematical reasons behind the frequent failures of the heuristic stability theory by investigating the convergence of the Fourier transform of the modified equation. We show that the latter is conditionally convergent and only then does it give significant results. Then, we compare the stability conditions of truncated modified equations with the corresponding scheme in the region of convergence. Finally, we present some examples to justify our analysis.

Obtaining the equations
Consider as an example, the linear scalar transport equation given by : with positive velocity c. In order to solve this equation with a finite difference scheme, we first introduce a uniform grid of points defined as usual by (x j = j∆x, t n = n∆t) and we denote by u n j = u(x j , t n ) the value of the numerical solution in the corresponding grid point. Under these notations, take for instance the upwind Euler scheme for equation (1) : Generally, in order to get relevant information on the consistency of the scheme [6] or its convergence rate, we assume the existence of a smooth, infinitely differentiable numerical solution u(x, t) that satisfies u(x j , t n ) = u n j in each grid point. Provided this solution, we expand each term of the scheme in Taylor series in the vicinity of (x j , t n ) which leads to an equation containing an infinite number of partial derivatives : This equation as is, is sufficient to prove consistency. In fact, we can clearly see that the in the limit ∆t → 0 and ∆x → 0 we recover the original transport equation. However, for further analysis of the numerical effects induced by the scheme, it would be more intuitive to consider an evolution equation with only space derivatives.
To do that, we use the elimination procedure introduced by Warming and Hyett in [2], that is we repeatedly use linear combinations of the equation (3) and its derivatives in order to eliminate higher order time derivatives and finally obtain the equation : This is called the modified equation associated with the upwind Euler scheme for the transport equation (1). It is worth noting that this is not a partial derivative equation in the conventional sense as it does not have an order or a finite amount of partial derivatives. For a solution to exist, one also needs a proper definition of infinitely many boundary conditions. Therefore, we restrict our analysis to solutions that are periodic [2].

Heuristic stability and limitations
Besides proving consistency, the modified equation quantifies explicitly the additional numerical effects. For example, it tells that up to first order in ∆x and ∆t, the numerical solution rather satisfies a convection diffusion equation with a numerical dissipation coefficient given by c ∆x 2 1 − c ∆t ∆x . Note however that the sign of the latter can be negative in which case this equation becomes unstable and admits solutions that grow exponentially in time rather than decay. It follows from this that a necessary and sufficient condition of stability for the solutions of this convection-diffusion equivalent is : which turns out to be exactly the CFL condition for the scheme (2). This gives a lot of potential for the modified equations to be a practical tool for stability analysis. Although the analysis is completely heuristic and has no rigorous foundation, its results make sense for a large class of schemes. However, there are also many examples for which this analysis fails to provide any practical stability condition. Consider for example the one dimensional heat equation : where α > 0 is the diffusion coefficient. We discretize this equation using centered finite differences : Using the same elimination procedure we obtain the following modified equation up to 4th order : If we choose to stop at the first non-zero truncation term as previously, the conclusions are already less straightforward as we have a non-vanishing second order term that comes from the heat equation itself and a 4th order term that comes from the numerical effects. Therefore in order to look into stability of this equation, let us shift to Fourier space. Let v(k, t) be the Fourier transform of u(x, t) then under these notations, equation (8) yields : It is reasonable to only consider the wavenumbers that are bound by |k∆x| ≤ π, since these are the only wavenumbers admissible by the discrete mesh . In this setting, it is easy to verify that : which implies that the considered truncation is unconditionally stable for all admissible ∆t and ∆x. This result is obviously erroneous as it is well known that the Von Neumann stability analysis proves that a necessary and sufficient stability condition for the scheme (7) is : Thus, the heuristic analysis of this truncation of the modified equation failed to provide any practical stability condition for the scheme and truncating the equation at higher orders does not seem to do any better. This is one of the examples that demonstrates limitations of the method. In what follows, we first introduce all the necessary notations and setting for stability analysis before attempting to explain why the stability of the truncation is sometimes incoherent with the stability of the scheme.

Notations and assumptions
We will consider for our analysis partial derivative equations that are linear, first order in time and of arbitrary order in space : where A p are constants. We consider explicit in time schemes, that are consistent with equation (12) and can be written as : where n l and n r are the number of mesh points to the left and to the right of x j respectively, used in every iteration. The coefficients b p verify for consistency purposes, so that constant solutions, which are solution of the pde (12) remain as such for the scheme. Sometimes it is preferable to cast the scheme (13) into an equivalent form : where q is the highest power of 1/∆x present in the summation. Very often, when using standard finite differences, the coefficients b p are polynomials in 1/∆x and q is the highest degree among these polynomials which is frequently equal to the order P of the pde (12). This formulation can be practical for stability analysis since most stability conditions for explicit schemes are given by bounds on the quantity λ q = ∆t/∆x q in the limit ∆x → 0 and ∆t → 0. Therefore, setting λ q as a non-vanishing parameter is a reasonable assumption that permits to reduce the number of free small parameters, that is we can take ∆t = g q (∆x) = λ q ∆x q and consider instead λ q and ∆x as free independent parameters. In what follows, in order to perform stability analysis in Fourier space, we consider a space continuous counterpart of the scheme (15) : Let v(k, t) be the Fourier transform in space of u(x, t). If we take θ = k∆x, then the Fourier transform of equation (16) yields : Since we will be operating most of the time in Fourier space, it seems necessary to recall consistency of the scheme in the same setting. Therefore if we assume that the exact solution to the pde (12) satisfies in Fourier space:ṽ then the scheme (13) is consistent with the pde (12) to order s if and only if [7] : Lastly, we denote the modified equation associated to the scheme (13) by : where µ p are constants depending on ∆x and λ q .

Fourier stability analysis
In this part, we focus on the link between the modified equation and the scheme [2]. The amplification factor of the scheme is none other than the modulus of S(θ, λ q , ∆x). Now, in order to recover an equivalent expression in the continuous time for the modified equation, we apply the Fourier transform to (20). This implies that v(k, t) satisfies the differential equation: where α p (λ q , ∆x) = µ p (λ q , ∆x)i p /∆x p . For convenience, we will say that the modified equation is stable if, for an initial condition v(k, t = 0) = v 0 (k), the solution to the Cauchy problem : that is given by: remains bounded ∀t < T , where T > 0. Given this solution, one can obviously write : Now, since the solution to the scheme (13) with the same initial condition is also an exact solution to the modified equation (20) [2], uniqueness of this solution gives : It follows from that the proposition : For any scheme that writes as (13) and that is consistent with (12), there exists a positive number θ m that depends only on λ q and ∆x such that, ∀θ ∈ ]−θ m , θ m [ the expansion (1 − S(θ, λ q , ∆x)) p pλ q ∆x q holds and the series ∞ p=1 α p (λ q , ∆x)θ p converges to 1 λq∆x q ln(S(θ, λ q , ∆x)).
Proof. The scheme (13) is consistent with the pde (12) and so we have |S(0, λ q , ∆x) − 1| = 0 < 1. Since S is a continuous function with respect to θ, ∃θ m (λ q , ∆x) > 0 such that |S(θ, λ q , ∆x) − 1| < 1 ∀θ ∈] − θ m , θ m [, and consequently, the principal logarithm ln(S(θ, λ q , ∆x)) defined by the series expansion : is convergent and e ln(S(θ,λq,∆x)) = S(θ, λ q , ∆x). This proves that the function: is a solution of equation (25) for θ ∈] − θ m , θ m [. The exponential function is locally invertible in the vicinity of 0 and so this expansion is unique in this vicinity [3] and therefore ∀θ ∈] − θ m , θ m [ . Moreover, since S(θ, λ q , ∆x) is entire (as a finite sum of exponential functions) then further expanding S into power series of θ for θ ∈]−θ m , θ m [ finally yields: and hence, ∞ p=1 α p (λ q , ∆x)θ p is the series expansion of 1 λq∆x q ln(S(θ, λ q , ∆x)). This concludes our proof. Remark 2.2. Equality (28) also implies that the coefficients α p (λ q , ∆x) are given by : Remark 2.3. θ m is not the radius of convergence of the series G(θ, λ q , ∆x). If we denote by R the radius of convergence, then we have R ≥ θ m . The exact radius is not trivial to find in practice in the general case, since G(θ, λ q , ∆x) is a series expansion of a composite function. However it is sometimes possible to give estimates or exact values of the radius for some examples as will be shown later.
Remark 2.4. In practice, when looking for convergence, we are mainly searching for constraints in λ q and ∆x, for which the series G(θ, λ q , ∆x) is convergent ∀θ ∈ [−π, π], that is we want R > π. A sufficient condition is θ m > π.
So far, we have shown that for fixed parameters ∆x and λ q , the series G(θ, λ q , ∆x) converges if θ ∈]−θ m , θ m [. This being a sufficient condition of convergence, we do not know what happens for |θ| ≥ θ m . On the other hand it is worth noting that, for |θ| > R, the series is divergent and equality (25) does not hold anymore. Therefore, the modified equation stability is not linked to that of the scheme for |θ| > R.

Scheme stability domain and series convergence domain
Generally, in the Von Neumann setting, stability conditions of the scheme are given by constraints linking λ q and ∆x so that the inequality |S(θ, λ q , ∆x)| ≤ 1 ∀θ ∈ [−π, π] (30) is verified. These constraints define a region of stability R s in the (λ q , ∆x) plane, that is : In the same manner, we can define a region R c of the same plane, in which the series G(θ, λ q , ∆x) converges : which is also equivalent to : R c = (λ q , ∆x) ∈ R 2 + : R(λ q , ∆x) ≥ π (33) In practice, it is not always possible to exactly determine R c . It is however easier to explicitly find a subset of this region, that is : It is worth mentioning that Ω c is always a non-empty set. Indeed, for λ q = 0, we have S(θ, 0, ∆x) = 1 and consequently ∀∆x ∈ R + there exists in R 2 a neighborhood of (∆x, λ q ) in which we have |1 − S(θ, λ q , ∆x)| < 1. This means that we always have convergence for sufficiently small values of λ q . Lastly, we denote by R m the region of stability of the modified equation : Provided these definitions, we can distinguish two cases : (1) if R s ⊂ R c then the stability of the modified equation provides reliable and complete information regarding the scheme stability. (2) If any subset of R s lies outside of the convergence domain R c , this means that there is information on the stability limit of the scheme that is missed by the modified equations since its Fourier transform is non existing outside of R c . The following proposition is a direct consequence : Proposition 2.5. For any scheme that writes as (13) and that is consistent with (12) we have (R m ∩ R c ) ⊂ R s , that is if the modified equation is stable and its Fourier series is convergent then the scheme is also stable.
This proves that inside the convergence domain R c , the stability of the modified equation is a sufficient stability condition for the scheme. Furthermore, we shall add that if R s ⊂ R c then this condition is also necessary. This result, literally, is not of practical interest as the stability of the full series S(θ, λ q , ∆x) is either nontrivial or impossible to obtain. But we will show that the above classification permits to justify whether a truncated version of the modified equation yields significant information on stability.

Link between the stability of the scheme and the stability of a truncation
Instead of the full series expansion, let us consider a truncated modified equation to an arbitrary order N > P : We recall that our main purpose is to know in which case does the stability of this truncation provide relevant information on the stability of the corresponding scheme. This differs from the approach of Warming and Hyett [2] in the sense that they showed how to reconstruct the exact Von Neumann amplification factor using a finite amount of coefficients µ p without actually analyzing the stability of the truncated version. Under the Fourier transform, the previous equation writes : Here, P N (θ, λ q , ∆x) is a polynomial of θ of degree N which is trivially a truncation of the series G(θ, λ q , ∆x). In the same manner as previously, the ordinary differential equation (37) yields : v(k, t + ∆t) = e ∆tP N (θ,λq,∆x) v(k, t) = S N (θ, λ q , ∆x)v(k, t) In contrast to the full series, the stability conditions of the truncation are obtainable in most cases through the analysis of the polynomial function P N (θ, λ q , ∆x). Let R N (θ, λ q , ∆x) be the rest of the series defined by : In the convergence domain R c , the rest R N (θ, λ q , ∆x) is bounded and we have : In this setting we can state the following result : Proposition 2.6. Assume an initial condition satisfying supp(v 0 ) ∈ [−M, M ] and an arbitrary truncation order N > P , then for any (∆x, λ q ) ∈ R c , if the truncated modified equation is stable in the sense that there exists C > 0 such that: |S N (θ, λ q , ∆x)| ≤ 1 + C∆t (41) then the scheme is also stable in the same sense.
Proof. For (∆x, λ q ) ∈ R c we have : Thus, since v n = S n v 0 , we can write : That is, v n is L 2 -stable for initial conditions that are of compact support in the frequency domain, that is

Heat equation -centered finite differences
Consider the centered finite differences scheme for the heat equation. It can be cast into the form : We take for simplicity α = 1. The modified equations associated to this scheme up to 8th order for example is given by :

Straightforward computations yield :
S(θ, λ 2 , ∆x) = 1 + 4λ 2 sin 2 (θ/2) (43) The domains of stability and convergence only depend on the parameter λ 2 independently of ∆x. This permits to take an arbitrary value of ∆x = 1 and carry on the analysis, based on only λ 2 . Furthermore since |S(θ, λ 2 , ∆x)| is an even function with respect to θ it suffices to look at θ ∈ [0, π]. Figure 1 shows a comparison between |S(θ, λ 2 , ∆x)| and |S N (θ, λ 2 , ∆x)| for N = 2 and N = 8 in two cases. In the limit of stability λ 2 = 1/2, the scheme is stable but the series G(θ, λ 2 , ∆x) is not convergent for θ > R = π/2 (See Appendix A). As displayed in the left-hand side of figure 1, the curves begin aligned in the low frequencies, and then the truncation curves begin to diverge completely from the function |S(θ, λ 2 , ∆x)| once θ surpasses the threshold R. This is not the case for λ 2 = 1/4. For this value we can calculate the radius of convergence R = π (See Appendix A). In fact, we can see on the right-hand side of the figure that for λ 2 , the curves remain very close ∀θ ∈ [0, π]. For N = 8, the two curves almost overlap.  Figure 1. Plot of the function |S(θ, λ 2 , ∆x)| along |S 2 (θ, λ 2 , ∆x)| and |S 8 (θ, λ 2 , ∆x)| for the values of λ 2 = 1/2 (left) and λ 2 = 1/4 (right). We can see that for λ 2 = 1/2, which lies outside of the convergence domain, the truncation curves stray away from the curve of S starting from θ = R. For λ 2 = 1/4, the truncations match well with S.

Transport equation -Upwind Euler
The scheme writes : We take c = 1. In this case, the modified equation up to 4th order for example is given by : Straightforward computations yield : In contrast to the previous example for the heat equation, we could not compute explicitly the radius of convergence for values of interest of λ 1 . Nevertheless, it is possible to extend Ω c to cover all the values of λ 1 that are in R s (See appendix B). Hence, as shown in figure 2, for values of λ 1 ≤ 1, the amplification factor of the truncation to 6th order |S 6 (θ, λ 1 , ∆x)| seems to be a very good approximation of the scheme amplification factor |S(θ, λ 1 , ∆x)|, even for values of λ 1 that are in the vicinity of the stability threshold.  Figure 2. Plot of the function |S(θ, λ 1 , ∆x)| along |S 6 (θ, λ 2 , ∆x)| for different values of λ 1 in the stable region (left) and in the unstable region (right). The continuous lines stand for the amplification factor |S(θ, λ 1 , ∆x)| and the dashed lines represent the truncated modified equation amplification factor |S 6 (θ, λ 1 , ∆x)|. We can see that for λ 1 ≤ 1 the truncation curves match well with the exact amplification factor even in the boundaries of stability. Inversely, it seems according to the right-hand graphic that λ 1 = 1 marks the threshold of convergence.

Conclusion and perspectives
We could explain throughout this work that one of the main reasons behind the failures of the modified equations technique is non other than series divergence. Although the analysis only provides sufficient conditions in general, it lifts some well-known ambiguities and provides some assumptions required by the technique to be of a more justified practical use. We show that in these settings, the stability of a truncation gives already reasonable approximations to stability conditions of the scheme. An extension of the obtained results to the case of systems is underway. It would be also interesting to extend this approach to cover a larger class of explicit and also implicit schemes.

A. Computing some radii of convergence
In each of the following cases, we would like to check the convergence radius of the series : For that, we use the root convergence test, that is the radius of convergence R is given by : Centered scheme for the heat equation : We have : S(θ, λ 2 , ∆x) = 1 − 2 sin(θ/2) 2 (53) In this case, it is possible to obtain an explicit expression of the pth term of the sequence a p which is given by : where E p (x) denotes the pth Euler polynomial function defined by : Since the series has only even order coefficients, it is equivalent and more convenient to check the convergence of the series : where φ = θ 2 and b p = a 2p . We proceed as follows in order to check the radius of convergence. We use Bernoulli's number [8]: and plug this ansatz into b p to obtain : Hence we can write : This means that the series (56) has a radius R φ = π 2 /4 which implies that the radius of convergence of the series G(θ, 1/2, ∆x) is R = π/2. Centered scheme for the heat equation : λ 2 = 1 4 We have : S(θ, λ 2 , ∆x) = 1 − sin(θ/2) 2 (60) The expression of the pth term a p is given by : Using the same previous notations as in (56), we have : which yields : This gives the radius of convergence of the series G(θ, 1/4, ∆x) is R = π.