IMPROVED SWEEPING PRECONDITIONERS FOR DOMAIN DECOMPOSITION ALGORITHMS APPLIED TO TIME-HARMONIC HELMHOLTZ AND MAXWELL PROBLEMS

. Sweeping-type algorithms have recently gained a lot of interest for the solution of high-frequency time-harmonic wave problems, in particular when used in combination with perfectly matched layers. However, an inherent problem with sweeping approaches is the sequential nature of the process, which makes them inadequate for eﬃcient implementation on parallel computers. We propose several improvements to the double-sweep preconditioner originally presented in [18], which uses sweeping as a matrix-free preconditioner for a Schwarz domain decomposition method. Similarly, the improved pre-conditioners are based on approximations of the inverse of the Schwarz iteration operator: the general methodology is to apply well-known algebraic techniques to the operator seen as a matrix, which in turn is processed to obtain equivalent matrix-free routines that we use as preconditioners. A notable feature of the new variants is the introduction of partial sweeps that can be performed concurrently in order to make a better usage of the resources. As these modiﬁcations still leave some unexploited computational power, we also propose to combine them with right-hand side pipelining to further improve parallelism and achieve signiﬁcant speed-ups. Examples are presented on high-frequency Helmholtz and Maxwell problems, in two and three dimensions, to demonstrate the properties of our improvements on parallel computer architectures.


Introduction
Solving time-harmonic wave equations using finite element-type methods is a notoriously difficult problem, especially in the high-frequency regime where the number of unknowns becomes so large that the direct solution of the resulting linear system is computationally intractable. Iterative solvers based on a domain decomposition method and a sweeping process over the subdomains appear as an interesting approach: these techniques use direct solutions on slices of the domain, and propagate information across the slices using some transmission condition, among which perfectly matched layers have recently deserved special attention [6,7,12,14,18]. The sweeping process, which is linked to the symbolic block factorization of the problem, is inherently sequential: each slice must be solved before the next one can be considered. So, although the number of iterations of the iterative algorithm can be shown to be independent of frequency under some conditions, sweeping algorithms do not scale on parallel computer architectures, as only the (direct) solution in each slice can be parallelized. The double-sweep preconditioner originally proposed in [18] introduced sweeping naturally as a matrix-free preconditioner for a non-overlapping Schwarz domain decomposition method. Recasting the Schwarz domain decomposition algorithm as the solution of a linear system, the main idea was to derive an explicit, approximate inverse of the Schwarz iteration matrix in the case of a simple one-dimensional, layer-like decomposition of the computational domain. This explicit inverse could be written recursively as the computation of two concurrent sweeps across the subdomains, one in the forward direction and one backwards. While other sweeping approaches directly precondition or solve the discrete wave equation problem, this approach preconditions the iteration operator of the domain decomposition solver, leading to very small problem sizes (the unknowns are located only on the interfaces between the subdomains). Provided that accurate approximations of the Dirichletto-Neumann (DtN) operator exist on the boundaries between the subdomains (e.g. using high-order local transmission conditions [3,4,8,13] or perfectly matched layers [1,2,11]), an iteration count independent of the number of subdomains and frequency is observed in practical numerical simulations. While this approach possesses several advantages compared to competing sweeping techniques (notably its matrix-free nature and the ease with which it can be integrated in existing simulation codes), each sweep is still a sequential process [12,17].
In this paper we investigate several variants of this preconditioner, by considering different approximations of the inverse of the Schwarz iteration operator. Some of the new variants can be linked to well-known algebraic techniques (Block-Jacobi, Gauss-Seidel); thanks to the introduction of cuts in the sweeping process, multiple subsweeps can be performed concurrently and thus lead to enhanced scalability on parallel computer architectures.
The paper is organized as follows. Section 2 recalls the classical non-overlapping Schwarz algorithm for a layered decomposition of the computational domain, and recasts it as the solution of a linear system of equations. In Section 3 we take advantage of the matrix representation of the Schwarz iteration operator to design a family of preconditioners, the original double-sweep preconditioner from [18] being a particular case. An extension with shorter, concurrent sweeps is proposed, as well as two Gauss-Seidel variants. Section 4 then presents a pipelining strategy to further improve parallelism when the new sweeping preconditioners are used to solve problems with multiple right-hand-sides. Finally, numerical results on large acoustic (Helmholtz) and electromagnetic (Maxwell) problems are presented in Section 5.

Non-Overlapping Schwarz Algorithm
Let us begin by describing the well-known non-overlapping Schwarz algorithm using operator notations. This will be useful for describing the overall algorithm as the application of an iteration operator, represented by a matrix, without detailing the underlying partial differential equations or the discretization scheme. The sweeping preconditioners presented in Section 3 will be derived directly using this operator form. The numerical results presented in Section 5 will use these preconditioners in combination with a finite element discretization of both Helmholtz (scalar) and Maxwell's (vector) equations: for a detailed presentation of the finite element formulations and their implementation in the context of Schwarz domain decomposition methods, see [15].

Layered Decomposition of the Domain
We consider a "layered", non-overlapping decomposition of the domain Ω with boundary Γ (and exterior unit normal n), where each subdomain Ω i , i = 1, . . . , N , has at most two strictly non-adjacent neighbors and the first and last subdomains are unconnected, as illustrated on Figure 1. Such one-dimensional decompositions are naturally well suited for the sweeping methods that we are considering. (The case of "cyclic" decompositions where the first and last domains have a common boundary can be treated in a similar way.) For every i = 1, . . . , N , we define Γ i = Γ ∂Ω i ; and for j = 1, . . . , N, j = i, we introduce the artificial interface (or "transmission boundary") Σ i,j = Σ j,i = ∂Ω i ∂Ω j . For the considered layered decompositions, Σ i,j = ∅ if |i − j| = 1. We also denote by n i the unit exterior normal to the subdomain Ω i .

Algorithm in Operator Form
A generic time-harmonic wave propagation problem with sources in the domain Ω and homogeneous boundary conditions on Γ can be written in the form where L and B are linear operators. For example, for an acoustic wave problem with a Sommerfeld radiation condition (the unknown scalar field u denoting the pressure), one would have L := (∆ + k 2 ) (with k the acoustic wavenumber) and B := (∂ n −ik). For an electromagnetic wave problem with a Silver-Müller absorbing boundary condition (the unknown vector field u now denoting the electric field), one would have L := (curl curl − k 2 ) (with k the electromagnetic wavenumber) and B := (γ t n curl + ikγ T n ), with γ t n and γ T n respectively the tangential and tangential component trace operators (γ t n : v → n × v and γ T n : v → n × (v × n)). Different sources and more general boundary conditions can be considered without difficulty.
The additive Schwarz domain decomposition algorithm can be described as follows at iteration k+1, assuming e.g. g Then for i, j = 1, . . . , N such that |i − j| = 1, update the interface unknowns according to: Detailed expressions for the operators B i,j in the case of time-harmonic acoustic and electromagnetic wave problems can be found in [15]: they are respectively of the general form B i,j = ∂ ni + S for acoustics, and B i,j = (γ t ni curl + S) for electromagnetics, where S is called the transmission operator. The simplest, lowest order transmission operator for acoustics is S := −ik, and S := ikγ T ni = ikγ T nj for electromagnetics. Physically, the nonhomogeneous boundary conditions on the artificial interfaces Σ i,j can be interpreted as (generalized) impedance boundary conditions.
Thanks to the general form of the operators B i,j , in the case of non-overlapping decompositions the update relation (3) can be rewritten as since n i = −n j on Σ i,j = Σ j,i .
Considering the full vector of unknowns g = [g 1,2 , g 2,1 , g 2,3 , . . . ] T made of all the interface unknowns g i,j , one step of the above algorithm can be summarized as: for some right hand side b. Iteration (5) is a fixed-point iteration, the solution of which solves the linear system that we will call the Schwarz problem: Letting aside the choice of a discretization method, the full algorithm can thus be decomposed into four steps (see [15] for details): 1. Assemble and factorize the system matrix for each one of the subproblems (2); 2. Compute the right-hand side of the Schwarz problem (6), that contains the contributions of the (physical) sources; 3. Solve the Schwarz problem (6) for the interface unknowns g; 4. Compute the solution u i in each of the subdomains Ω i , based on g.
Steps 2-4 constitute the "online" part of the algorithm that must be repeated for any new right-hand side (step 1 can be performed once, offline). From a parallel computing perspective, the most challenging step is 3: all the others are independent and can be carried out simultaneously without communication. In practice (6) is solved using an iterative Krylov subspace method such as GMRES or ORTHODIR, which only requires the application of the operator F on a given iterate. This application involves the solution of the subproblems, and is thus usually made "matrix-free", i.e. without explicitly constructing the associated matrix coefficients.
As will be explained in the next section, however, one can provide an abstract matrix representation by associating transfer operators to the subproblem solves, that we will use later on to design a family of preconditioners for that system.

Matrix Form of the Iteration Operator
For our layered decomposition of the domain, the unknown vector g of the Schwarz algorithm contains a total of M = 2(N − 1) unknown fields, which is thus the size of the (continuous) Schwarz problem. We introduce the forward and backward transfer operators B f i and B b i across Ω i , defined by: where u i (g i,i−1 , g i,i+1 ) |Σ refers to the restriction on Σ of the solution to the subproblem Lu i = f defined by (2). Notice that the forward and backward transfer operators only involve the solution of subproblems with a nonhomogeneous impedance boundary condition on one side only (left or right), since either g i,i−1 or g i,i+1 is set to 0. We also define the self-coupling operators E f i and E b i , defined from an interface to itself: These operators correspond to the contribution on an interface of the part of a wave that travels through the domain and that is reflected back to its interface of origin. These operators are illustrated on Figure 2.
Lu i = 0 in Ω i Figure 2. Illustration of the definition of the B f i and E f i operators. The source is placed on the left interface and the outputs are respectively related to the solutions on the right and left interfaces. The application of both operators can be obtained with a single solve of the subdomain Ω i for some impedance data g i,i−1 ; they can be seen as Green's functions on these interfaces. With the backward operators, the source is placed on the right interface.
With these definitions, the matrix corresponding to the Schwarz operator writes: with I the identity operator. Horizontal and vertical lines have been added to emphasize the layered structure of the matrix: for each additional domain, two lines and two columns are added, conserving the same pattern. Therefore, that matrix can easily be constructed for an arbitrary number of subdomains. It is a banded matrix, with a bandwidth of 5, although at most 3 elements are non-zero on a given line or column. This structure illustrates well the local couplings between subdomains. Note that for the layered decomposition, the same structure is kept in any dimension, as each subdomain (except the first and last ones) has only 2 neighbors; in a cyclic decomposition extra-diagonal elements should be added to close the loop with the first and last domains.
If we now consider discretized interface functions g ij , we can still obtain a matrix expression of the iteration operator, with the same structure as (9). The major difference is that the transfer and self-coupling operators , that can be seen as the discrete Green functions for the problems with single-sided impedance data, evaluated on the opposite and same interface respectively. These blocks are very expensive to form, thus it is never done in practice. But if the full numerical matrix was available, inverting it would solve the Schwarz problem; this inverse would however be quite dense, making its storage, factorization and application rather expensive.

Sweeping Preconditioners
Sweeping preconditioners can be constructed by looking for approximate inverses of (9). The underlying idea is to look for approximations that would involve the same subproblem solves than for the (direct) iteration operator: this way a fully matrix-free preconditioner can be obtained, whose application is carried out in a similar fashion to the original Schwarz iteration, without any pre-calculation.

Double Sweep Preconditioner
Let us suppose that the shape of the domain and the transmission conditions used on the artificial interfaces lead to no internal reflections. In that case, the self-coupling operators E {f,b} vanish and the Schwarz iteration matrix (9) becomes [18]: Compared to (9), the matrix has an even simpler structure, with half of its off-diagonal elements removed.
In particular, notice how the first and last columns are only made of identity blocks, and how the forward and backward parts are totally independent from each other (lines and columns are made exclusively of either forward or backward operators). This allows for the direct construction of its inverse: with the general expressions for its coefficients: where the composition operator is defined as {f,b} i appear in the iteration matrix and in its inverse, though the structure of the latter is more complex, with the operators being composed together. But we see that, as a consequence of the repetitive structure of the matrix F A (N ), a recursion formula can be found for forming its inverse for an arbitrary number of subdomains.
We now take advantage of this recurrence property to write the matrix-vector product g = F −1 A r (in the context of right-preconditioned linear solvers, r denotes the residual) as follows: where any sum b p=a for which a > b should be ignored. Factoring these expressions, we rewrite them as: where the operators B f i with i ≤ 1 and B f i with i ≥ N are not defined and should be ignored. This finally gives the double recurrence relation: The first relation describes a forward sweep: we start from the first artificial interface Σ 1,2 and propagate the information by solving a problem at each step to move to the next interface and incorporate its contribution. The other relation describes the same procedure in the backward direction; because these sequences are independent of each other (they act on distinct subsets of the unknowns), they can be done in parallel. We note that the extreme subproblems (the first and the N -th) are not solved in any of the sequences, so each sweep has an actual length of N − 2 steps. Another convenient way of rewriting relations (13) is to use multi-indices I k that are dense collections of sorted subdomain indices belonging to the k-th group of N k ≥ 2 subdomains. (So far we worked with a single group made of all subdomains (I 1 ), but we will later subdivide them.) We also define I k , where the sequence of indices is reversed. We now define the sweeps of length l > 0 in the k-th group as: Then we have the induction property for 1 ≤ l ≤ N k − 2: and the recurrence relations rewrite simply as: In our numerical experiments with the double sweep (see [16,18]), we have observed that, provided that the approximation of the DtN map used in the transmission condition is sufficiently accurate, the preconditioned Schwarz algorithm has the desirable properties of having a convergence rate virtually independent of the wavenumber k and number of subdomains N , even in the case of a heterogeneous medium. For complex heterogeneous media, the most accurate approximations of the DtN were obtained using Perfectly Matched Layers [16,18]. A detailed description of the various DtN approximations and their implementation in a finite element context, for both acoustic and electromagnetic problems, is provided in [15].
Examining the double sweep algorithm, it is important to note that it only involves subproblem solutions that are the same as the one performed in the main iteration, albeit with different right-hand-sides. In particular this means that no specific preprocessing is required to build the standard double sweep preconditioner. In total, the standard double sweep algorithm requires N + 2(N − 2) = 3N − 4 subproblem solutions per iteration.
It is interesting to note that an equivalent algorithm can be derived by combining the application of the preconditioner and the Schwarz operator: for right-preconditioning it gives We denote the resulting algorithm as the combined double sweep preconditioner, which amounts to solve an unpreconditioned problem: Applying F c rather than its two factors allows to bring down the total number of subproblem solves to 2N − 2 [18] by skipping redundant subproblem solves. It also has the fully parallel step removed and only performs sweeps, which will let us pipeline several right-hand sides as will be detailed in Section 4. Although an apparent link with multiplicative Schwarz can be seen in this combined sweep procedure, we point out some differences: simple multiplicative Schwarz sequentially solves all problem in an arbitrarily chosen direction, while we do the sweeps in both directions (which is closer to a symmetrical Gauss-Seidel iteration); the way information is exchanged and stored during the sweep is also different.
The main drawback of sweeping preconditioners is the sequential nature of the sweeps. If each CPU has a single subdomain assigned to it, it will remain idle during most of the duration of the sweep, as illustrated on Figure 4 (left). The next section introduces cuts in the sweeps in order to improve parallelism.

Block-Jacobi Preconditioners: Double Sweep with Cuts
If we introduce one or more cuts in the sweeping process, the resulting partial sweeps can be performed independently and in parallel. This is akin to a block-Jacobi preconditioning strategy, improving parallelism at the cost of a less accurate approximation of the inverse of F A .
The matrix corresponding to these reduced sweeps is derived from the full preconditioner matrix, where the transmission operators corresponding to the cut domain is replaced by 0. As a consequence, communication between the two groups of domains is forbidden and the preconditioner matrix has well separated blocks, that correspond to the independent restricted double sweeps. Each block is the inverse of a Schwarz operator F Ai over the i-th portion of the domain consisting of a few adjacent domains, taken apart from the other portions. respectively, the matrix is: This is very much like a block-Jacobi preconditioner for the simplified Schwarz iteration operator. In the k-th group of subdomains, we have the relations: In the ideal case where the exact DtN is used as transmission condition, the preconditioned operator has the following structure: that is relatively close to an identity. Spectral analysis reveals that the eigenvalues are well clustered (λ 1,...,2N −2 = 1) but that it is defective, as is the unpreconditioned Schwarz operator. However, the number of missing  rate that degrades as the number of cuts increases for a strongly guided problem, and plateaus appearing on the convergence curve. This situation is to be compared with the behavior of the unpreconditioned DDM algorithm, that corresponds to the limit case where a cut is placed in every other domain.
To illustrate the gain in application time and the improved parallelism of the double sweep with cuts, Figure  4 presents timelines of the solves performed during the application of the Schwarz iteration operator and the double sweep preconditioner, with and without cuts. The timelines for the combined double sweep algorithm mentioned at the end of Section 3.1 are illustrated on Figure 5. The combined application avoids redundant solves, while not being faster in a fully parallel implementation. The main difference is in the flow of information during the sweeps.

Symmetric Gauss-Seidel Preconditioners
Instead of simplifying the Schwarz operator in order to derive explicit expressions for its inverse like was done for double sweep algorithms, standard preconditioning techniques could also be directly applied to the full Schwarz operator. For example, a symmetric Gauss-Seidel (SGS) preconditioner M −1 SGS =L −1 NŪ −1 N (Ū N andL N are the upper and lower triangular parts, with a unit diagonal) can be considered for the full matrix F N (N ), as the factors can also be computed a priori with a recursion formula, e.g. for 5 subdomains: The application of these factors to a vector is quite similar to the lower or upper parts of the standard double sweep matrix (11), although they have roughly twice as many non-zero elements for the forward or backward sweep. They give the recurrence relations for the applications g =L −1 N r or g =Ū −1 N r: Comparing these relations with (13), we observe that a new term has appeared, that corresponds to a source on the same interface as the target interface. Consequently, the first solve can no longer be skipped, but the total number of solves per sweep is only increased by 1; the presence of the additional term essentially indicates that more information is incorporated in our sweeps, as we now account for reflections. The recurrence relations can be written in the same form as (14), if the sweep definitions are slightly modified with the induction property: A significant difference with the standard double sweep is that the 2 sweeps are now performed sequentially instead of being parallel, hence with an application time that is doubled. Although the additive relation one can still design a preconditioner with the same additive structure: It was observed experimentally that this modification does not result in a significant degradation of the convergence rate, which makes it a better choice since its application time is reduced by a factor of 2. Strictly speaking, a copy of the input vector should be kept since the sweeps otherwise modify data that will be used by the other sweep after they have crossed. In practical implementations, it is however more convenient to ignore this and modify the data in-place in a more "Gauss-Seidel-like" fashion, which also resulted in faster convergence rates in our experiments. Compared to the standard double sweep preconditioner, this variant has demonstrated a convergence rate that is roughly doubled in problems where significant reflections occur, despite its very limited additional cost; this is especially true when less efficient transmission conditions are used, as we will see in the examples of Section 5.

Other Variants
Any other standard preconditioning technique could also be investigated along the lines of SGS and SGSa. For example, an SOR-like preconditioner with a relaxation factor ω could be constructed as follows: Our numerical experiments however concluded that the optimal relaxation parameter is actually ω = 1, which reduces to SGSa. Incomplete LU factorizations could also be considered. Introducing overlap in the (partial) sweeps would in turn lead to Additive Schwarz or Restricted Additive Schwarz preconditioners... of the non-overlapping Schwarz domain decomposition method.

Additional Parallelization Strategies
The sequential nature of the sweeping process can be alleviated when problems with multiple right-hand sides are considered, as is common in many geophysical exploration simulations, or in the electromagnetic simulation of antenna arrays. We pointed out earlier that applying the combination of the Schwarz operator and the sweeping preconditioner as a single routine has the effect of suppressing the "barrier" step where all subproblems are solved simultaneously, as illustrated on Figure 5. This gives room for the simultaneous execution of several sweeps acting on different right-hand sides in a pipelining fashion, see Figure 6, as a complementary way of avoiding CPU idleness when coupled with the cuts of the previous Section. The number of right-hand sides than can be efficiently pipelined depends on the length of the sweep, i.e. on the number of cuts. . By pipelining multiple right-hand sides in the sweeps, CPU usage can be improved, while not increasing the duration of an iteration, except the last one. Left: 1 iteration with 3 right-hand sides, without cut; right: usage is maximized by pipelining as many RHS as the sweep length, here with 2 cuts. It is no coincidence that the same symbols (each corresponding to a given RHS) appear simultaneously on the cut domains, where the partial sweeps converge and for which one can solve a single problem.
We now study more in detail the algorithmic aspects of the combined double sweep with cuts. The application of operator F c on some residual vector r is illustrated below: Ω 4b Ω5 Ω6 Ω7 Figure 7. Illustration of the flow of information in the combined sweeping algorithm, with a cut.
The diagonal blocks of the matrix are combined sweeps over the groups of domains; off-diagonal blocks are the "bridges" between the groups (see Figure 7). We recall that the operators B {f,b} i and E {f,b} i actually correspond to the same solve of subdomain Ω i : the difference is simply in their target interface. Since the cut domain needs to be solved with the same data in the forward and backward sweeps, a single solve must be performed with non-zero data on both sides, as opposed to the interior domains of each group.
The generic update expressions read: The Kronecker symbol δ determines whether the current domain is the last of the current sweep, in which case the contribution from the next group should be added. In practice, the algorithm works in a similar way than the preconditioner alone, by sweeping over the subdomains, while the flow of information is more complex: instead of ignoring the reflections, this data is now "left behind", as is materialized by the presence of the E {f,b} operators in the expressions. It is illustrated on Figure 7, to be compared with Figure 3. Looking at the timing diagram for an iteration with a single RHS with the combined algorithm ( Figure 5, right), one can observe long diagonals that are prolonged in the next iteration. This suggests that even in that case, pipelining multiple RHS is possible with little interference (it is obviously not in the separate case). When the sweeps cross each other, one should ideally be able to solve the same subproblem twice at the same time (a forward and a backward), regardless of the number of pipelined RHS, as illustrated on Figure 6.

Numerical Results
The results presented in this section were obtained with the GetDDM 1 framework [15], based on the opensource finite element mesh generator Gmsh [10] and finite element solver GetDP [5,9].

Marmousi Model
The Marmousi model is a synthetic 2D acoustic model that reproduces the complex velocity profile of a slice of earth. It features a wide range of speeds, from 1500 m/s to 5500 m/s, with several layers and faults as depicted on Figure 8. It has become a classic test case for benchmarking seismic inversion codes [14]. We simulate the propagation of time-harmonic acoustic waves produced by a point source located right below the center of the top boundary (see Figure 9 for an illustration at a frequency of 40 Hz). Sommerfeld boundary conditions are imposed on all sides. A typical GMRES convergence history for the unpreconditioned and preconditioned domain decomposition algorithm is depicted on Figure 10 for a frequency of 400 Hz, with 352 subdomains. The finite element mesh is Cartesian, with about 750 million nodes, corresponding to 20 points per (smallest) wavelength in the domain. Figure 11 compares the convergence of the proposed algorithms in  The analysis of Figure 10 is interesting as it gives an idea of the actual speed-up that can be expected by using the sweeping preconditioners and the pipelining. Solving for a single right-hand side without preconditoner takes approximately 1000 time units (defined as the time for a subdomain solve). Solving the same problem with the SGS preconditioner without cuts requires approximately 20 * 350 = 7000 time units, which clearly shows the penalty due to CPU idleness during the sweeps. One should therefore use cuts to decrease the iterations duration: Figure 11 shows that the number of iterations remains constant for sweep lengths as short as 25 subdomains, leading to a duration of 20 * 25 = 500. Hence a speed-up of 2 can be expected for a single right-hand side, while a significant amount of CPU idleness is still present.
Consider now the opposite situation where we want to solve for as many right-hand sides as there are subdomains; using full pipelining without cuts, it would take N (2 * N it + 4) 2 time units to complete, while N * N it ≈ 350.000 time units would be necessary without preconditioner. The global speed-up is thus of the order of half the ratio of iteration counts between the two methods, which is approximately 25 in our example. 2 We break down the time consumption of the pipelined algorithm as follows: the computation of the right-hand sides F −1 N b for the combined algorithm takes an application of the sweeps. As the sweeps overlap, two subproblems must be solved at each step which doubles an iteration duration. Finally the roll-in and roll-out phases each take N/2 time units and once the problem is solved the solution must be computed in volume for each right-hand side. In total we have a duration of 2 * N * (N it + 1) + N/2 + N/2 + N time units.  [15].
We note that the speed-up potential of pipelining for many right-hand sides can be boosted by increasing the number of subdomains, as the convergence rates of DS and SGS weakly depend on N , as opposed to the standard algorithm.

COBRA Model
We now present results obtained on a 3D S-shaped cavity: the COBRA benchmark defined by the JINA98 workgroup (see Figure 12). We solve this benchmark for both acoustic and electromagnetic waves at k = 314π with a normalized wave speed of c = 1 m/s, with N = 100 subdomains and a finite element mesh with 20 points per wavelength. Figure 13 illustrate typical acoustic and electromagnetic solutions (at a lower frequency  of k = 30π). Typical convergence results for the proposed algorithms are reported on Figure 14 for the acoustic case, and on Figure 15 for the electromagnetic case. The same conclusions as for the Marmousi model can be drawn: relatively short sweeps allow for a good compromise between iteration count and parallel efficiency.

Conclusion
We have presented several improvements to the double-sweep preconditioner for time-harmonic wave problems originally proposed in [18], which uses sweeping as a matrix-free preconditioner for a Schwarz domain decomposition method. The matrix representations of the Schwarz operator and its inverse were used to design these preconditioners: some of the new variants can be linked to well-known algebraic techniques like Block-Jacobi and Gauss-Seidel methods, which opens some perspectives as other techniques could be explored in a similar fashion. The introduction of cuts in the sweeps allows to restore some parallel efficiency to achieve some speed-up over the standard Schwarz method, which can be further improved by pipelining for problems with multiple right-hand sides. All these algorithms have the important property of being independent of the underlying problem formulation; we have presented results for the Helmholtz and Maxwell equations, in 2D and 3D, discretized using finite elements. The extension to elastodynamics is currently under investigation. vs. sweep length. GIBC(2) denotes second order Padé impedance conditions on the artificial interfaces [15].