WASSERSTEIN MODEL REDUCTION APPROACH FOR PARAMETRIZED FLOW PROBLEMS IN POROUS MEDIA

. The aim of this work is to build a reduced-order model for parametrized porous media equations. The main challenge of this type of problems is that the Kolmogorov width of the solution manifold typically decays quite slowly and thus makes usual linear model-order reduction methods inappropriate. In this work, we investigate an adaptation of the methodology proposed in [15], based on the use of Wasserstein barycenters [1], to the case of non-conservative problems. Numerical examples in one-dimensional test cases illustrate the advantages and limitations of this approach and suggest further research directions that we intend to explore in the future. Résumé. Le but de ce travail est de construire un modèle réduit pour des problèmes d’écoulements en milieux poreux paramétrés. La diﬃculté principale de ce type de problèmes est que la distance de Kolmogorov de l’ensemble de solutions décroît lentement, rendant ainsi les méthodes de réduction de modèles linéaires usuelles ineﬃcaces. Ici, nous proposons une adaptation de la méthodologie proposée dans [15], utilisant des barycentres de Wasserstein [1], au cas de problèmes non conservatifs. Des tests numériques en dimension 1 permettent d’illustrer les avantages et les limitations de cette approche et d’identiﬁer des pistes de recherche que nous souhaiterons aborder dans un futur travail.


Introduction
Model-order reduction methods have proved to be extremely useful tools in order to accelerate parametric studies in various contexts.Given a Partial Differential Equation (PDE) involving parameters, the main goal of model-order reduction is to provide a fast approximation of the parameter-to-solution map.Classical strategies are linear in the sense that they rely on approximating the set of solutions with linear or affine spaces.This approach is the backbone of most existing methods among which stand the reduced basis method (see [22,30]), 1. Problem Setting: Two-phase flow problem in porous media

The physical model
We consider a simplified model of two-phase flow through a porous rock.This type of model is often used in geosciences to describe the displacement of two non-miscible fluid phases.The wetting phase often corresponds to the water component whereas the second one depends on the application: air in hydrogeology, CO 2 for underground gas storage, etc.
In the following, we denote by Ω ⊂ R d , d ∈ N * the spatial domain occupied by the porous medium.The domain Ω has a total volume which is the sum of the volume V v occupied by the void, and the volume V s occupied by the solid material.We can thus characterize the medium by its porosity which is the ratio of the volume of void space V v over the total volume of the material V t .We assume here that the porosity of the medium is constant in time and characterized by a function ϕ : Ω → [0, 1].The volume V v of the void space is here assumed to be completely saturated by two incompressible and non-miscible fluid phases, called wetting (w) and non-wetting (nw), with respective volumes V w and V nw which are such that Their volume fraction is given by the saturation and we have that s nw + s w = 1.Due to incoming/outcoming flows from outside the domain Ω (runoff waters from the surface, injecting/pumping wells, etc.), both saturations s nw and s w depend on space and time.For a given time interval [0, T ], they are solutions to an equation stating the conservation of their phase volume, ϕ(x) ∂ t s α (x, t) + ∇ • v α (x, t) = 0, ∀(x, t) ∈ Ω × (0, T ), ∀α ∈ {w, nw}.
In equation (1), v α denotes the generalized Darcy velocity.Neglecting the gravity and capillary effects, the velocity is defined for all (x, t) ∈ Ω × (0, T ) as where P is the fluid pressure.The pressure depends on space and time, and it is assumed to be the same for both phases.The coefficient K : Ω → R + is the absolute permeability of the porous medium and reflects the ability of one fluid to flow through that medium.The phase mobility λ α is also involved in the definition of the Darcy velocity.It is defined as λ α (x, t) := k rα (x, t) µ α where µ α ∈ R + is the phase viscosity.k rα is the relative permeability which is an increasing function of s α with values in [0, 1].In hydrogeology or in reservoir engineering applications, this law is often defined thanks to the Brooks-Corey model [9], which we simplify here to a power law k rα (x, t) = s β α (x, t), with β > 0. The dependency of k rα on s α induces a nonlinear coupling between equations (1) and equation (2) for the unknowns are s w , s nw and p.In this work, the system is closed with boundary conditions where n is the outward unit normal of ∂Ω, and initial conditions In general, the problem is often reformulated in terms of pressure P and the saturation s w .By introducing the total Darcy velocity ) the previous system of equations is equivalent to where is the fractional flow.Problem (7) with conditions (3)-( 4)-( 5) is solved using a finite-volume discretization in space and an IMPES (Implicit for Pressure, Explicit for Saturation) scheme in time.This latter scheme corresponds to an Euler implicit scheme for the pressure in the first equation of (7) and to an explicit scheme for the saturation in the second equation of (7) where the phase mobilities in (6) are evaluated explicitly too.A similar discrete system, made of a two-point scheme for the pressure gradient and an upwind scheme for the saturation, has already been considered in [8] to study the reductibility of the first equation of (7) with respect to some rock and fluid properties.The only difference between the present discretization and that study lies in the time discretation used for the saturation equation, which was implicit in this previous work and could allow the use of larger time steps.We refer to that work and to the references cited therein for more details on these discretizations.The IMPES scheme enables to decouple the resolution of (7), starting from the pressure, updating the total Darcy fluxes and finally computing the saturations.

Parametric Problem and Manifold set of solutions
Let Y ⊂ R np be a compact set of parameter values for some n p ∈ N * .For all y ∈ Y, let Moreover, let us denote by s y = s y w the solution s w to system (7) with data K = K y , ϕ = ϕ y , β = β y and µ α = µ y α .One may need, for uncertainty quantification for instance, to evaluate solutions to (7) for many instances of parameter values y ∈ Y, and thus to reduce the computational time to evaluate the parameter-to-solution map because solving (7) may be too computationally demanding.
More precisely, we wish to approximate the set of solutions Note that time is treated as parameter in M. Thus defining Z := Y × [0, T ] and denoting by s z the function s y (t, •) for all z = (y, t) ∈ Z, the solution set M can be rewritten as In the following, we develop a model reduction approach to approximate elements of the solution set M based on the use of Wasserstein barycenters.For the sake of simplicity, we will assume from now on that d = 1 with one-dimensional problems, for which certain computations in this space are greatly facilitated and that Ω = (x min , x max ) ⊂ R, with − ∞ ≤ x min < x max ≤ ∞.Section 2 presents the essential notions of Wasserstein spaces and barycenters, and Section 4 presents our model reduction algorithm.Note that our proposed approach enables to deal with parametric solutions of the PDE of interest s z that belong to L 1 (Ω) (and not necessarily L 2 (Ω) or L ∞ (Ω)).

Wasserstein barycenters in dimension 1
In this section, we recall some basic properties of the L 2 -Wasserstein space in dimension 1 and introduce the notion of Wasserstein barycenters.

Definition of the L 2 -Wasserstein space in one dimension
Let P 2 (Ω) denote the set of probability measures on Ω ⊆ R with finite second-order moments.For all u ∈ P 2 (Ω), we denote (10) its cumulative distribution function (cdf), and the generalized inverse of the cdf (icdf).The L 2 -Wasserstein distance is defined by where Π(u, v) is the set of probability measures on Ω × Ω with marginals u and v.In the particular case of one dimensional marginal domains, it can be equivalently expressed using the inverse cumulative distribution functions as The space P 2 (Ω) endowed with the distance W 2 is a metric space, usually called L 2 -Wasserstein space (see [36] for more details).

Wasserstein barycenters
Let n ∈ N * and let be the probability simplex of dimension n − 1.
For any family of probability measures U n = (u i ) 1≤i≤n ∈ P 2 (Ω) n and barycentric weights Λ n = (λ i ) 1≤i≤n ∈ Σ n , there exists a unique minimizer to which is the barycenter with respect to U n and Λ n .The barycenter can be easily characterized in terms of its inverse cumulative distribution function since (13) implies that icdf Bar(Un,Λn) = argmin which yields For any given function u ∈ P 2 (Ω), one can define a notion of best barycenter b(u, U n ) ∈ P 2 (Ω) approximating u among the family of barycenters The best barycenter is characterized in terms of its optimal weights which are sometimes called the barycentric weights.This allows us to define the optimal barycenter as b(u, It can be easily characterized in terms of its inverse cumulative distribution function.This is due to the fact that for all Λ n ∈ Σ n and all u ∈ P 2 (Ω), so the computation of the optimal weights Λ opt n in problem ( 16) is a simple cone quadratic optimization problem.Note that at least one minimizer Λ opt n exists but uniqueness is not guaranteed.

The Greedy Barycenter algorithm
We detail in this section the greedy barycenter algorithm we use here so as to build a reduced-order model to efficiently compute approximations of elements of M.

Offline phase: greedy algorithm
For all s ∈ M, we denote by m(s) := Ω s and by From now on, we make the assumption that U ⊂ P 2 (Ω) and denote by f z := icdf u z ∈ L 2 (0, 1) for all z ∈ Z.
The Greedy Barycenter algorithm is an iterative procedure which aims at selecting, after n iterations of the algorithm, a family of parameters (z 1 , • • • , z n ) ⊂ Z, and hence a family U n = (u zi ) 1≤i≤n ⊂ U, such that any element u ∈ U can be approximated with good accuracy in the Wasserstein norm by a Wasserstein barycenter Let Z train ⊆ Z be a finite subset of Z and let U train := {u z : z ∈ Z train } and In practice, the strategy consists in finding a good collection of atoms A n = (a i := f zi ) 1≤i≤n such that for all z ∈ Z, f z = icdf u z is approximated with good accuracy in the L 2 (0, 1) norm by a convex combination of elements of A n , following similar guidelines as in [16].
The obtained set of atoms is the output of the greedy algorithm.As stated in Section 2.2, solving (18) amounts to solving |Z train | quadratic optimization problems on unit simplices in R n .These problems are solved using the COSMO package [19].While the number of optimizations problems to be solved in each greedy iteration is large, they are independent of each other and could be easily parallelized.More details on the optimization problem will be discussed in Section 4.3.
The greedy algorithm terminates with one of the following three termination criteria: (1) when an absolute tolerance ε abs ≥ 0 is reached, i.e.
(2) when the addition of a supplementary atom to the dictionary does not reduce the error beyond a given relative tolerance ε rel , i.e.
) when a specified maximum number n max of atoms is reached.

Online phase: interpolation
Let us assume here that we know the values of m(s z ) for all z ∈ Z train .We then denote by I m : Z → R an interpolation function such that ∀z ∈ Z train , I m (z) = m(s z ).Furthermore, let us assume that for all z ∈ Z train , we know the values .
We then denote, for all 1 ≤ i ≤ n, by The online phase of the reduced-order modeling method then works as follows: Online phase: (1) Given a new parameter value z * ∈ Z, we compute for all (5) The reduced-order modeling approximation s gB,z * n of s z * is then defined as s gB,z * n = m * u * .We use piece-wise linear interpolation in the numerical experiments.Since the interpolants are rather simple functions (see Figures 3 and 8), this appears sufficient.Further work could be done experimenting with higher order interpolation schemes. 1 For convenience, we repeat the algorithm here:

Computational aspects
The aim of this section is to present the methodology used in order to build a reduced-order model for the parametrized porous media flow problem presented in the preceding section.We also discuss some choices we make in our implementation.The code used to generate the examples shown here is available at https://github.com/ToBlick/MorporJThe link also contains certain interactive examples to help gain understanding on the behavior of weighted Wasserstein barycenters as one varies the weights.

Computing cumulative distribution functions and their inverse
In practice, a snapshot s ∈ M (i.e. the wetting saturation s w ) is provided, for a given value of the set of parameters, as a piece-wise constant function on a uniform grid of size N = 1000.We denote this discrete representation as ŝ ∈ R N and use typewriter font for all discrete distributions, cumulative distribution functions, and their inverse.
In our one-dimensional numerical examples (with x min = 0 and x max = 1 for the sake of simplicity), we always consider that the saturation s has to satify the boundary condition s(x min ) = 1.This is the reason why, from ŝ, we compute another vector s := (s i ) 1≤i≤N +2 ∈ R N +2 such that s 1 = 0.0, s 2 = 1.0 and s i+2 = ŝi for all 1 ≤ i ≤ N .The value of s 2 is chosen so that to enforce the boundary condition mentioned above, and the value of s 1 is chosen so that the algorithm we use to compute a discrete approximation of the icdf of u = s m(s) is more numerically stable.
Let us describe this procedure here.First, from s, an approximation of u is given by the vector u = (u i ) 1≤i≤N +2 ∈ R N +2 defined by Second, an approximation of cdf u is given by the vector cdf = (cdf i ) 1≤i≤N +2 ∈ R N +2 defined by Note that the definition of s guarantees that cdf 1 = 0 and cdf N +2 = 1.
The discrete approximation of icdf u will be given as a piecewise constant function on the interval [0, 1].Let x max be a uniform discretization grid of (x min , x max ).Let also M ∈ N * (in practice M is chosen to be equal to N + 2 in the numerical experiments) and p 1 = 0 < p 2 < • • • < p M = 1 be a uniform discretization grid of the interval (0, 1).The discrete approximation of icdf u is then given as a vector icdf = (icdf j ) 1≤j≤M ∈ R M the components of which are defined as follows: for all 1 ≤ j ≤ M , let i ≥ 2 be the smallest integer such that cdf i ≥ p j .Then, (21) Note that this can be done in a single pass over j, since the cdf is monotonically increasing.To avoid numerical instabilities whenever (cdf i − cdf i−1 ) ≈ 0, we let icdf j = x i−1 whenever (cdf i − cdf i−1 ) is smaller than a prescribed tolerance (10 −12 in the numerical experiments).
If we apply the same procedure to the resulting icdf, with the roles of x and p swapped, we obtain a numerical approximation of the inverse of the inverse cumulative distribution function, which is given as a vector iicdf ∈ R N +2 .Up to the error introduced by the finite grid size and the linear interpolation, iicdf ≈ cdf.On average, this error lies between 10 −4 and 10 −3 in the discrete L 2 norm for the data from our numerical experiments.
As mentioned at the end of Section 2.2, the solution to the minimization problem yields the icdf of a normalized snapshot.We invert it to obtain the iicdf and perform simple first order backwards finite differences to compute the corresponding pdf.This guarantees that the reconstructed pdf integrates to one.
Clearly, both the inversion and differentiation could be carried out using higher-order methods such as a spline interpolant of the discrete pdf.We choose the simple linear interpolants as they prove to be robust when applied to the snapshots we are considering in this work, which feature jump discontinuities and, in some cases, resemble even discrete Dirac distributions.This, of course, limits the numerical accuracy of our method, as will be discussed later.

Sources of Error
Given a target s * = s z * for some z * ∈ Z and the reduced-order model approximation obtained s gB * n = s gB,z * n , we can identify three sources of error.
(1) Firstly, s * might not be close to any s gB n that can be obtained by means of inverting and differentiating n i=1 I i (z * )f zi using the given atoms and interpolations.This error describes how rich the range of the collection of atoms is.The decay of this error as n increases is of prime interest to evaluate the quality of the proposed method.
(2) Secondly, the optimization may fail to find the optimal barycentric weight Λ opt n .The properties of the system that has to be solved is addressed in Section 4.3.
(3) Thirdly, there is a numerical error introduced simply by following the pdf → cdf → icdf → iicdf → pdf pipeline as outlined in the beginning of this section.Naturally, this sets a lower bound on the error we can obtain, even with arbitrary large n.For all our numerical experiments, however, this error is at least one order of magnitude smaller than the average errors we observe in the numerical experiments, so it is not the limiting factor in these cases.Note that the proposed method operates entirely in the W 2 space and on the icdfs of the saturations.This error in W 2 need not reflect the errors in L 1 between the reconstructed saturations themselves, which we chose as a reference.For example, consider the distributions Note that the distributions in this example are shaped similarly to the ones in our numerical experiments.As a result, reducing the training error in W 2 through the addition of an atom to the dictionary through the greedy algorithm may not reflect in the L 1 norm.The properties of the interpolation function are not adding to the error in our experiments, as we are only looking at the training error.This decision is made to keep the presentation brief as the generalization error depends largely on the selection of the training and test data sets, which we do not address.

Short comments on computational cost
A thorough study of the run-times for the proposed method is beyond the scope of this article.The numerical inversion of a discrete cdf or icdf depends only linearly on N , which allows the computation of Wasserstein barycenters for given weights in real time, as can be seen in the demo notebook accompanying the code provided along with this work.
Selecting atoms with the greedy algorithm involves solving quadratic systems of the form min ∥AΛ − f∥ 2 with constraints Λ ≥ 0 element-wise and i λ i = 1.A is an N × n matrix with the discrete atoms as its columns, thus n increases with every iteration of the algorithm.Λ is the vector of barycentric weights to be optimized and f is the icdf of the saturation we aim to approximate.As stated, this system has to be solved |M train | times in every iteration and we observe that A T A becomes very ill-conditioned with growing iteration number, see Figure 11 below.We cannot orthogonalize the columns of A, since the resulting atoms would no longer be discrete icdfs (for example, they will most likely no longer be strictly positive).This point is discussed further in Section 5.3.
Since A does not change within greedy iterations, significant computational time can be saved by assembling the system once and then modifying f in-place.Furthermore, we initialize Λ inversely proportional to the W 2 distance between each atom in the dictionary and the distribution f that is to be approximated -that is, if f is very close to the i-th atom, the corresponding λ[i] will be initialized close to one.

Numerical Examples
In this section, we compare the performance of the method for two test-cases to a POD representation.In the latter case, we project every element of M train to the POD basis and back.We choose the relative L 1 error to compare the methods.To be precise, let Ψ be the N × n matrix that has as columns the first n POD modes.The POD reconstruction of a solution s is then s POD := Ψ T POD Ψ POD s and the corresponding error ∥s − s P OD ∥ L1 /∥s∥ L1 .The reconstruction error for the barycentric method is given by ∥s − sgB ∥ L1 /∥s∥ L1 , where sgB is obtained by applying the method as it was described in Section 2. In the following examples, Ω = (0, 1) , corresponding to a physical domain of length 1 km and the grid is composed of N +2 = 1002 cells.The time interval, which covers several years, is discretized thanks to time steps satisfying the CFL condition given in [17].The boundary and initial conditions are: p D (0, t) = 4.137 × 10 7 Pa, p D (L, t) = 2.758 × 10 7 Pa, s(0, t) = 1 for all t ∈ [0, T ], s w (x, 0) = 0 for all x ∈ (0, L).

Example 1: Homogeneous medium with a varying exponent of the relative permeability law and various viscosity ratios
For the first example we consider the case of a homogeneous medium with porosity ϕ = 0.1 and permeability k = 10 −13 m 2 .The wetting viscosity is kept constant at a value of µ nw = 0.003 Pa s, but the non-wetting viscosity is changed in such a way that µ = µ nw /µ w ranges from 1 to 25.The sampling is chosen analogous to a geometric progression, so that µ ∈ {1, 2, 3, 6, 12, 25}.The exponent β of the relative permeability varies uniformly from 2 to 6, that is β ∈ {2, 3, 4, 5, 6}.The evolution of the water saturation is calculated over 5 years, saving snapshots at a fixed time-step, t ∈ {0.2, 0.4, . . ., 5.0} years, leading to |Z train | = 750 and z = (t, µ, β).Some snapshots of the training set at the final time are shown in Figure 1.The methodology described in Sections 3-4 is then applied.A few selected atoms selected by the greedy algorithm are shown in Figure 2. Figure 3 shows cross-sections of the interpolating functions from the parameter space for the case n = 3.Note how these are simple functions showing a smooth mapping from parameter space to the reduced coordinates.The average reconstruction error, measured in the L 1 norm, is already below 5% for n = 3 (see Table 1).The decay of the maximum and average Wasserstein error on U train as more atoms are added is shown in Figure 4.Note that there is an increase in maximum error at the addition of the 29th atom.This can be explained by the iterative solver not converging before reaching its maximum iteration number while determining the optimal weights for one of the training snapshots.While the addition of atoms should only decrease the reconstruction error, we observe in practice that many atoms may make it harder to solve for the optimal barycentric weights (c.f.Section 5.3).
Table 1 shows that decent reconstructions can already be achieved with very few atoms.This highlights the strength of the method and its applicability to the example at hand.As expected, the POD reconstruction is ill-suited: to obtain an L 1 -error of one percent, one needs over 230 modes, which is almost one third of the size of the training set -barely any reduction is possible.This is mainly due to oscillations occurring in the POD reconstruction around the jump discontinuity as shown in Figure 5. Table 1.Atoms and POD modes needed to reach a given tolerance ε, in L 1 norm, for the snapshots used in example 1.
Figure 5. Example 1: Snapshot and its reconstruction using the gBar algorithm and a POD basis.

Example 2: Porous medium made of two rock types with varying permeability and position of the heterogeneity
For the second example, a heterogeneous medium is tested, composed of two different rock types: a highly permeable one (HP) characterized by ϕ HP = 0.1 and k HP = 10 −13 m 2 , and a low permeable one (LP) with fixed porosity ϕ LP = 0.01 but uniformly varying permeability k LP ∈ {5, 6, 7, 8, 9} × 10 −14 m 2 .In this case, the viscosities are kept constant at µ w = 0.003 Pa s and µ nw = 0.03 Pa s (giving a ratio of µ = 10), and so is the relative permeability exponent, β = 2.The location of the interface between the two rock types is denoted by γ.For x < γ, the rock type is HP, and LP otherwise.γ, which is sampled at irregular intervals, with a finer sampling around x = 0: γ ∈ {0.0, 0.05, 0.1, 0.15, 0.2, 0.4, 0.6}.The evolution of the water saturation is calculated over 10 years and snapshots are regularly taken at t ∈ {0.5, 1.0, . . ., 10.0} years, leading to z = (t, k LP , γ) and |Z train | = 700.From Figure 6, where some snapshots at the final time are shown, we see that the solution is quite different to the homogeneous case: when the saturation front reaches the permeability interface, the velocity of propagation changes, leading to a kink in the saturation profile.In some cases, the front does not reach the jump and the evolution of the saturation still behaves as in an homogeneous medium.The first atoms selected by the greedy algorithm are shown in Figure 7 and cross-sections of the interpolating functions for λ 1≤i≤3 (t, γ, k LP ) and m(t, γ, k LP ) in Figure 8. Letting the greedy algorithm add atoms allows the W 2 -error to decrease in a similar way with respect to the homogeneous case (Figure 9).Note, however, that both maximum and average errors in W 2 are about one order of magnitude above the values from Example 1.This also reflects in the errors as measured in the L 1 norm,   Table 2. Atoms and POD modes needed to reach a given tolerance ε, in L 1 norm, for the snapshots used in example 2.
shown in Table 2.It was not possible to obtain an accuracy of 0.5% for this test-case before the barycentric algorithm became unstable and the iterative solver failed to converge to optimal barycentric weights.The reconstruction of a snapshot, plotted in Figure 10, shows that the proposed method does not accurately capture the kink of the solution at the interface of the HP/LP regions at x ≈ 0.2.On the other hand, the POD reconstruction shows an unstability at the solution front.Several elements of the training set no longer show this jump discontinuity, however, and in these cases the POD method performs better, which reflects also in the second row of Table 2.
Figure 10.Example 2: Snapshot and its reconstruction using the gBar algorithm and a POD basis.Note how the barycentric interpolation does not capture the kink located at the HP/LP transition well.In this section, we want to provide some insights into the properties of the optimization problem min Λ ∥AΛ − f∥, i.e. the approximation of a discrete icdf f as a convex combination of a collection of atoms (the columns of A). Figure 11 shows the conditioning number of the matrix A T A throughout the greedy algorithm iterations for examples one and two.In both cases, the conditioning number grows fast with increasing n, making it harder to solve for the optimal Λ.Note that a bad conditioning of the matrix A T A does not necessarily mean that the optimization problem has no solution.For example, for s 1 = δ x1 and s 2 = δ x2 , we find icdf 1 ≡ x 1 and icdf 2 ≡ x 2 and A T A is singular.Nevertheless, the optimization problem for the optimal barycenter to approximate a given u by these atoms is well posed -it is solved by projecting the mean of u onto {(1 − λ)x 1 + λx 2 : 0 ≤ λ ≤ 1}.

Properties of the minimization problem
Figure 12 shows the W 2 error between two target icdfs and the barycenter of atoms for different n and Λ in log-scale.We display elements of the n − 1-dimensional simplex as points in a convex polygon.For n = 3, this corresponds to the well-known barycentric coordinates within a triangle, depicted in Figures 12a and 12d.For n > 3, we use generalized barycentric coordinates (in particular, so-called Wachspress coordinates) within polygons as described in [18].These coordinates associate to every point x poly ∈ {n − Polygon} ⊂ R 2 a vector of barycentric weights λ gbc ∈ Σ n ⊂ R n .The i-th vertex of a polygon corresponds to barycentric weights with only one non-zero entry for the i-th atom, while points along the edge between the vertices corresponding to a i and a j represent (1 − λ)a i + λa j where 0 < λ < 1 along the edge.The center of each polygon corresponds to the point where λ gbc,i = 1/n ∀i.The mapping x poly → λ gbc is not surjective, so the second and third column of Figure 12 show only a specific cut through the true energy landscape defined on Σ 4 and Σ 8 .For example, from Figure 12b, we can see that approximations with W 2 error below 10 −3 lie close to a line that enters the tetrahedron Σ 4 at the a 1 -a 2 edge and exits at the a 3 -a 4 edge.The minimum lies on the a 1 -a 3 -a 4 face, very close to the a 3 -a 4 edge.In the case n = 8, it naturally is quite hard to faithfully represent this high-dimensional setting with a two-dimensional plot.Recall that the objective is a quadratic function constrained to a convex set.Therefore, no disjoint minima are possible.In all depicted cases, the contour lines of the energy are very eccentric, a consequence of the poor conditioning of the matrix A T A. In both examples and for large n, optimal barycentric weights lie on the faces of the simplices, with several zero entries.This sparsity is seen throughout the elements of M train and has also been observed in [6].It is an interesting question for future work how one could choose and/or modify the atoms in order to avoid this and thereby moving the minimum of the objective into the interior of the simplex.
The existence of a mapping from the Wasserstein space to R N (namely, pdf s → icdf s ) is a special case of one spatial dimension and does not hold in higher-dimensional case ( [29], Proposition 8.2).To get an understanding if we are adding an atom to the collection in step n+1 that is already close to a convex combination of the atoms {a 1 , . . ., a n }, we can exploit this.Using the distance matrix ∥a i − a j ∥ 2 L2 1≤i,j≤n , we calculate the volume of the simplex with vertices {a i } 1≤i≤n as a function of n using the Cayley-Menger determinant [7].We normalize this value with the volume of the unit n-simplex.The results are shown in Figure 13.Adding another atom that is close to the simplex with vertices a 1 , . . ., a n will lead to a small volume for the resulting simplex with vertices a 1 , . . ., a n+1 .For n = 2, this would be a very slim triangle.We observe that up until n ≈ 10, in both examples, the simplices relative volume remains almost constant, which we interpret as every additional atom providing incremental information.For larger n, the relative volume decreases exponentially as the simplices become more degenerate, i.e. some of the a i 's are almost affinely dependent.

Conclusions from the numerical experiments
The numerical experiments exemplify that the presented method is able to represent the solution space of a complex non-linear equation using only very few atoms.In the first test-case considered, using only n = 3 atoms allows to reconstruct every element in training data set with an accuracy of 95%.The data depend on three parameters and the solutions vary considerably with those parameter values -note, for example, how slowly the front of the wetting phase moves at high values of µ and β.The map from the parameter space to the barycentric weights for only a small number of atoms has a simple structure as seen in Figure 3a and 3b.The second test-case features a collection of solutions that are both more challenging for the presented method and easier for the POD method to represent.In this case, several elements of M train are supported on the entire domain, which removes the biggest challenge for the POD representation, i.e. the jump discontinuity.At the same time, the barycentric interpolation does not preserve the characteristic kink located at the HP/LP interface, reducing the quality of the barycentric reconstruction.Ultimately, the barycentric method still performs well even with very few atoms, but cannot reach the same level of accuracy from test-case 1 before the dictionary becomes too ill-conditioned and the greedy algorithm terminates.
It is important to note, when comparing the presented method to POD, that there is no straightforward online method for the latter.We are merely comparing the encoding and reconstruction properties of the two methods.The online method for the barycentric representation is purely interpolating and comes at a very small computational cost (that is: evaluation of the functions I 1,...,n , calculation of the convex combination, inversion and differentiation of the icdf) that is linear in N , the dimension of the full-order problem.
Lastly, we provide some run time values.Naturally, these are strongly dependent on the implementation, size of the problem, solver tolerances chosen, and many other factors, but we believe despite those caveats they might be of interest.The full order model run time depends on the parameters, as it employs a variable timestep size to deal with the stiffness of the equation.On an all-purpose personal notebook, it takes between 6 to 66 seconds to generate a time-series of snapshots for example one, and between 120 and 610 seconds to generate one for example two.Running the greedy algorithm to obtain n = 5, 10 and 30 atoms in the offline phase takes around 11, 16, and 140 seconds respectively for example one and 102, 115, and 586 seconds for example two.Reconstructing one snapshot in the online phase takes around 1.6 × 10 −5 , 3.4 × 10 −5 , and 8.9 × 10 −5 seconds in example one and 1.6 × 10 −5 , 3.4 × 10 −5 , and 8.4 × 10 −5 seconds in example two.

Future research directions and extensions
The computation of Wasserstein distances and barycenters through the inverse cumulative distribution functions is a special case limited to one-dimensional problems.There has been a lot of progress in the development of algorithms for arbitrary dimensions in recent years which can be used to extend the present work for these cases [29], [34].We intend to address this in upcoming work.It can be worthwhile to allow for the inclusions of atoms into the dictionary that are not snapshots of the solution themselves.This approach is taken in [33], where the atoms are initialized as uniform distributions and then optimized alongside the barycentric weights, i.e. one solves min Un∈P2(Ω) n min Λn∈Σn u∈Utrain W 2 2 (u, Bar(U n , Λ n )).
We would further like to explore ideas on how to improve the conditioning of the matrix arising from the atoms.Instead of adding new atoms to the dictionary as they come, one could look for a way of adding a modified atom which is -in a sense that has to be made precise -fully outside the current range of the dictionary.In classical linear reduced basis methods, this is ensured by projecting any new element to the orthogonal complement of the present basis but such an operation is not appropriate for the current framework.Greedy algorithms developed for the cone of positive functions such as the Enlarged Nonnegative Greedy algorithm introduced in [3] could be an interesting starting point for our purposes.
It would also be very interesting to see if the approach proposed in [31] could also be adapted in the present non conservative context.
Lastly, the proposed method is purely interpolating in the online phase.It would be very interesting to utilize a more intrusive method, perhaps one that directly works with the residual of the equation the way a projection-based ROM does.This is made difficult by the fact that this residual is generally given as a function of the pdf, while the reduced representation by atoms in our method is formulated in terms of icdfs.Moving back and forth between those two representations is costly (in particular, it depends on the size N of the high fidelity model).Of particular interest could be functions that are expressed as Wasserstein gradient flows of the (time-discrete) form f (t + ∆t) ∈ arg min f W 2 2 (f (t), f ) + 2∆tV (f ) for a given potential V [32].In this case, the first term can be cheaply evaluated in terms of icdfs in the one dimensional case.As a perspective, it would also be interesting to compare our approach with other nonlinear intrusive methods, like [35], or the transformed snapshot interpolation from [38].

Figure 1 .
Figure 1.Randomly selected snapshots at final time from example 1.

Figure 2 .
Figure 2. Dictionary atoms (left) and corresponding densities (right) for the first five atoms chosen by the greedy algorithm for example 1.

Figure 3 .
Figure 3. Interpolations from parameter space to barycentric weights and mass in Example 1.

Figure 4 .
Figure 4. Decay of the W 2 error through the iterations of the greedy algorithm for example 1.

Figure 6 .
Figure 6.Randomly selected snapshots at final time from example 2.

Figure 7 .
Figure 7. Dictionary atoms (left) and corresponding densities (right) for the first five atoms chosen by the greedy algorithm for example 2.

Figure 8 .
Figure 8. Interpolations from parameter space to barycentric weights and mass in example 2.

Figure 9 .
Figure 9. Decay of the W 2 error through the iterations of the greedy algorithm for example 2.

Figure 11 .
Figure 11.Conditioning number of the matrix A T A as n increases.

Figure 13 .
Figure 13.Normalized volume of the simplex with vertices a 1 , . . ., a n as n increases.