Goal-oriented adaptive sampling under random field modelling of response probability distributions

In the study of natural and artificial complex systems, responses that are not completely determined by the considered decision variables are commonly modelled probabilistically, resulting in response distributions varying across decision space. We consider cases where the spatial variation of these response distributions does not only concern their mean and/or variance but also other features including for instance shape or uni-modality versus multi-modality. Our contributions build upon a non-parametric Bayesian approach to modelling the thereby induced fields of probability distributions, and in particular to a spatial extension of the logistic Gaussian model. The considered models deliver probabilistic predictions of response distributions at candidate points, allowing for instance to perform (approximate) posterior simulations of probability density functions, to jointly predict multiple moments and other functionals of target distributions, as well as to quantify the impact of collecting new samples on the state of knowledge of the distribution field of interest. In particular, we introduce adaptive sampling strategies leveraging the potential of the considered random distribution field models to guide system evaluations in a goal-oriented way, with a view towards parsimoniously addressing calibration and related problems from non-linear (stochastic) inversion and global optimisation.


Introduction
Many problems in science and engineering boil down to studying the effect on a response of interest of varying some decision or control variables x. Yet it is typically unrealistic to assume a deterministic relationship between x and the response, be it for instance because of uncertainty in other input variables or because the assumed response and/or observation generating processes themselves involve some randomness. Addressing optimisation and related tasks in such frameworks comes with both conceptual and implementation challenges. While a number of contributions around stochastic and "noisy" optimisation have been developed, they often rely on homoscedasticity or specific distributional assumptions. In contrast, expensive-to-evaluate systems for which various features of response distributions evolve in decision space call for versatile modelling approaches.
Here we consider the situation where responses follow probability distributions µ x indexed by decision variables x ∈ D and with a common support I. Without loss of generality, the index set -or decision space-D is assumed to be a compact subset of R d (d = 1 or 2 in forthcoming examples). We furthermore assume that an objective function g(x) = ρ(µ x ) depending on x through µ x is to be minimised, where ρ returns for any considered probability distribution a real-valued quantity such as a moment or a quantile with some given level. One of the main challenges in the considered framework is that typically, g(x) cannot be exactly evaluated as one only observes draws from µ x . We use the letter t (resp. T in random form) to denote such draws.
Index space D x 1 x 2 x 3 150 data points available No data available 10 data points available Figure 1. Schematic representation of a field of probability distributions µ x (represented by their probability density functions, in red) to be predicted based on scattered samples (with associated histograms in grey). Here the end goal is to collect additional samples towards the minimisation of g(x) = ρ(µ x ).
In the context of global optimisation under expensive objective function evaluations, Bayesian Optimisation (BO) leverages the potential of meta-modelling, especially Gaussian Processes (GP) (Rasmussen and Williams [37]), to keep a memory of explored points with the aim to explore decision space while keeping a parsimonious evaluation budget. First introduced by Močkus et al. [29] and further developed and popularized by Jones et al. [21] in the noise free setting, it has latter been extended to stochastic black box optimisation (Frazier et al. [15], Frazier [16]; Srinivas et al. [43]; Picheny et al. [33]; Hernández-Lobato et al. [18]; Jalali et al. [19] and references therein) and further sequential strategies have been studied (Risk and Ludkovski [38]; Binois et al. [6]). However, existing approaches typically assume Gaussian response distributions µ x . Another branch of study pertaining to geostatistics and that does not rely on Gaussian or specific distributional assumptions is the so-called distributional Kriging (Aitchison [1]; Egozcue et al. [14]; Talská et al. [44]) but such approaches are ill-suited in the case of moderate sample size heterogeneously scattered across space.
In this paper, we investigate and leverage a non-parametric Bayesian model for fields of probability distributions that generalizes the logistic Gaussian process model. We focus on the situation where, for each x, the distribution µ x admits a density p x with respect to a prescribed dominating measure. Our approach is adapted to the case of moderate and heterogeneously scattered across space sample size. It delivers a probabilistic prediction of the field of probability distributions which allows us to derive sampling acquisition schemes. We introduce the aforementioned model in the Section 2. Then, we study an application of this model in sequential design in Section 3 and demonstrate its applicability on toy probability density fields and on a contaminant localization problem in Section 4.
The main contribution of this paper lies in the exploitation of a spatial extension of the logistic Gaussian process model for the sequential design of stochastic simulations towards optimisation and inversion.

2.
Sample-based modelling of density valued fields 2.1. The logistic Gaussian process for density estimation Before considering the full scale problem of density field estimation, let us briefly consider the simpler problem of (univariate) density estimation without any indexing on a spatial variable.
Our focus is on a class of Bayesian density estimation methods where prior distributions of probability density functions are constructed by rescaling positive random processes in such a way that resulting realisations integrate to one. When the considered random processes are obtained by taking the logistic density transform (exponentiation and normalisation) of a Gaussian process, the resulting model is called Logistic Gaussian Process (LGP).
LGP for density estimation were established and studied by Leonard [26] and Lenk [24,25]. More recently, Tokdar and Ghosh [47] demonstrated that a large class of LGP achieves strong and weak consistency (under some regularity assumptions) at true densities that are either continuous or piecewise continuous.
Using this prior within a Markov Chain Monte Carlo framework allows approximate sampling from the posterior distribution over the space of densities on I, as illustrated in Figure 2. Models relying on other mappings than the logistic density transform have been introduced and studied. In the particular case where the sigmoid (logistic function) is used instead of an exponential, Murray et al. [31] introduced an exact sampling approach relying on the boundedness of the sigmoid. More recently, Donner and Opper [11] proposed an augmented model that allows for efficient inference by Gibbs sampling and an approximate variational mean field approach. However, our first experiments hinted that choosing a bounded function such as the sigmoid often leads to a lack of expressiveness, hence we decided to rely on LGP settings.

A spatial extension of the logistic Gaussian process for density field estimation
When modelling and optimising functions depending on both decision and uncontrolled variables, it can be a relevant option to use augmented models incorporating the uncontrolled variables as part of their inputs. In Picheny et al. [33], a GP indexed by the joint space of design parameters and computational time is considered to model partially converged simulation data. In Janusevskis and Le Riche [20], a GP model over the joint spaces of controllable and uncontrollable variables is used for simulation-based robust optimisation.
Here we consider a spatial extension of the logistic Gaussian process. We work with a GP indexed by the product space D ×I and apply a logistic density transform to each x-slice of the process to produce a probability density field. A similar spatial LGP was introduced in Tokdar et al. [48] in the context of conditional density modelling (i.e., with random x). To our knowledge, this family of models has not yet been used in the present context of sequential design of experiments, be it for stochastic optimisation or towards other goals.
Here we assume that our probability measures µ x of interest possess probability densities p x (with respect to some prescribed reference measure(s)) and we appeal to the said generalization of LGP to model the resulting density field. The resulting Spatial Logistic Gaussian process (SLGP) is defined by where W x,t ∼ GP(m, k) is a measurable GP with given mean function m : D × I → R and covariance kernel k : (D × I) × (D × I) → R such that for any x ∈ D, I e Wx,u du < ∞ almost surely.
Remark 2.1. When the index space D is a singleton, SLGP coincides with the LGP mentioned in Section 2.1.
Remark 2.2. Throughout the paper, the Lebesgue measure is assumed as dominating one, I is [0, 1], and the considered GPs possess smoothness properties amply guaranteeing the a.s. existence of the considered integrals.

Conditioning on data and practicalities
In the SLGP model, the denominator involves values of W over the whole domain. This infinite dimensional objects makes likelihood-based computations cumbersome. In Tokdar [46], this problem is reduced to the one of a finite dimensional integral by relying on a moderate number of inducing points, introduced to reduce the dimensionality of the LGP. These inducing points are selected according to a life and death process.
Here, we opt for another way to reduce the dimensionality that is more suitable for sequential optimisation. We consider finite-rank GP with the following parametrisation where p ∈ N, the e i 's are functions, the λ i 's are scalars satisfying λ 1 ≥ . . . ≥ λ p > 0 and the ε i 's are i.i.d. N (0, 1). Truncated Karhunen-Loève expansions constitute an important class falling under this framework. This simple form of the GP allows us to pre-calculate the values of e j 's at the grid points and then swiftly obtain evaluations of W and its integral (through a quadrature) for varying realisations (or tentative values) i 's of the ε i 's. Now, consider that our data consist in n couples of locations and observations {(x i , t i )} 1≤i≤n ∈ (D × I) n . Moreover, assume the t i 's are realisations of some independent random variables T i , and let µ xi be the (unknown) distribution of T i . We will note T n = (T i ) 1≤i≤n and t n = (t i ) 1≤i≤n . The GP W defined by the Equation 2, as well as the associated SLGP inherit their randomness from the Gaussian random vector ε ε ε = (ε i ) 1≤i≤p . To highlight this dependency we shall also denote the SLGP p x (·) by p x (·; ε ε ε) and use the notation p x (·; ) when instantiating the SLGP with a fixed value of ε ε ε. We perform a Bayesian estimation of the true unknown density field p xi in the following way: Hence, for T 1 , ..., T n independent responses at locations x 1 , ..., x n , one obtains the likelihood • The posterior density of ε ε ε, given by Bayes formula is π( We leverage the fact that the posterior is known (up to a constant) to implement our density field estimation via a preconditioned Crank Nicholson algorithm (Cotter et al. [10]). This approach delivers a probabilistic prediction of (p x ) x∈D , and allows us to approximately sample from the SLGP posterior distribution and recommend new sampling locations.

Some first contribution in stochastic optimisation
One of the main challenges in optimisation is to define suitable sequential design strategies. Choosing where to add observations is indeed crucial and a good design strategy must achieve a trade-off between exploration and exploitation of the input space so as to discover regions of interest while avoiding getting trapped in the vicinity of local minimisers or of artefactual basins of attraction.
Deriving such strategies requires anticipating (probabilistically) the effect of adding new observations. To this end, meta-models are commonly used. We now present the considered optimisation problem, the criterion used to derive a sequential design strategy, and a stochastic simulation-based approach to estimate this criterion.

Optimisation problem considered
We recall that we are interested in minimising g(x) = ρ(µ x ), while we do not know the field µ x (or equivalently its density p x ). We consider G x , the random function obtained by applying ρ to the density valued field delivered by the SLGP model. G x is the model for g(x) induced by the SLGP p x (·).
In the rest of the paper we will consider the particular case where ρ is the median, but the presented approach is not restricted to this choice and can be applied to arbitrary (measurable) mappings, potentially also mappings depending on x.
Remark 3.1. G x remains uncertain knowing T n = t n because of the conditional variability of ε ε ε.
In the spirit of robust optimisation, we will consider the problem of minimising an α-quantile of the random function G x . Here the value of α is arbitrarily set to 0.9 and stays fixed through the optimisation procedure. We note Q n (x) the α-quantile of G x . Q n (x) is a random function as the observations T n are left in random form. We shall also consider Q n+K (x; x new ), the α-quantile of G x conditioned on past observations T n and on a batch of K i.i.d. observations to be made at x new ∈ D. Denoting E n [·] = E[·|T n = t n ], we consider: Remark 3.2. This criterion was inspired by the Expected Quantile Improvement presented in Picheny et al. [32], which would write here as E n [(min , but is modified in the spirit of knowledge gradient approaches from Frazier et al. [15] to account for improvements on the whole domain.

Simulation-based computation of criteria
Classically, in Sequential Uncertainty Reduction (SUR) (Bect et al. [2]) approaches, it is assumed that the function of interest is a realisation of a GP. Under these assumptions, several criteria enjoy (semi-)analytical forms, favouring criterion optimisation and the implementation of design strategies. However, in our situation, it does not appear feasible to obtain a closed-form formula for the considered EQI criterion and we therefore estimate it via stochastic simulation.
In order to quantify the effect of adding an observation T n+1 at a given location x new , one needs to study the probability density of a new observation T at x conditioned on T n = t n . In favour of the law of total probability, and by considering Equation 3, one finds that it is given by: π(t|T n = t n ) ∝ π(t| )π( |t n ) d ∝ p x (t; )π( |t n ) d .
This motivates the following approach, which can be considered as a basic application of Sequential Monte Carlo (Doucet et al. [12]), where we use a simple simulation-based particle filter to approximate an unknown future quantity: (1) The generative model given by the SLGP model is implemented within a MCMC framework and used to obtain N approximate realisations of ε ε ε|T n = t n , noted ( (j) ) 1≤j≤N .
The density of a new observation at x (See Equation 5) is approximated by the mixture N −1 N j=1 p x (·; (j) ).
(2) The impact of adding K observations at a given location In light of Equation 5, the response density at x conditional on past data and on the simulated batch is given by π(t|T n = t n , T new ; (j) ) and summing to one.
(3) The density field distribution after adding K observations at x new is predicted by

Applications
For all the coming applications, we consider a zero mean GP W x,t = p j=1 λ j e j (x, t)ε j , p ∈ N, with t and x being uni-variate. To ensure consistency with the rest of the document, we will keep the bold notation for x. To define the e j 's, we start from bi-variate Fourier functions of order q > 0: sine and cosine of 2π(ω 1 t + ω 2 x), where ω 1 and ω 2 are integers satisfying −q ≤ ω 1 , ω 2 ≤ q. Then, we remove redundant terms as well as those irrelevant in the SLGP setting (i.e. functions independent of t, that would be simplified with the normalisation of the process) and set the corresponding λ 2 j to be 1/(1 + ω 1 + ω 2 ).

Some analytical applications
In the analytical applications, we have D = I = [0, 1] and consider four known density fields. The median functions of these fields appear in Figure 3 and are defined as f 1 (x) = 0.25 sin(16x+9)+0.25 sin(4.8x+2.7)+0.625 (minimum at x * ≈ 0.5095) and f 2 (x) = 0.15 + 7 72 (minimum at x * ≈ 0.7414). Our first class of probability density fields, that we will refer to as "truncated Gaussian fields", writes as h The truncation ensures that the distribution remains symmetrical around its median and mean f i . The second case -that we will refer to as "multi-modal field"-is of the same form yet with h x median-0 but multi-modal such as represented in Figure 3.
We perform density field estimation as presented in Section 2.3 for such reference fields for different sample sizes and order of the GP. In this section, we are not yet in an adaptive setting: each time a new observation is added to the model, its location is determined randomly, with uniform distribution over the index set. Figure 4 displays the posterior mean field with two sample sizes for the truncated Gaussian field with median f 1 . We observe in Figure 4 that a higher sample size seems to yield a better estimation. In order to quantify the prediction error for different sample sizes and GP's order, we define an integrated squared Hellinger distance to measure dissimilarity between two probability density valued fields p i = (p i x ) x∈D (i = 1, 2): In Figure 5, we display the distribution of d 2 ISH between true and estimated fields for various sample sizes and SLGP orders. As both functions yielded close results, we show only the results for f 1 . We see that the errors are comparable for small sample sizes. The order becomes limiting when more observations are available as those of the considered SLGPs relying on the smallest numbers of basis functions appear to struggle capturing small scale variations. Figure 5. Integrated squared Hellinger distance distribution for different sample sizes and process orders, when the reference field has f 1 as its median.

Test case: contaminant source localization under uncertain geology
We now consider a hydro-geological test case in which, given breakthrough curves at several monitoring depths, we want to localize the source depth of a contaminant injected in a saturated aquifer whose geological properties are highly uncertain. We use the term breakthrough map to speak of the stacked breakthrough curves at prescribed monitoring depths. Also, denote by observed the given breakthrough map, even if throughout the section these observed maps stem from simulations with prescribed source depth and geology. This inference is performed by measuring the dissimilarity (misfit) between observed and simulated breakthrough maps, and our goal here is to uncover the depth that yields simulations with minimal median misfit. Here the index x stands for a candidate source depth while the response t represents the dissimilarity between simulated and observed breakthrough maps. The stochasticity of the response is due to our imperfect knowledge of the geological properties of the aquifer, as sketched on Figure 6 and further detailed below. Figure 7 shows the misfit values obtained by comparing 10000 simulations (200 replications at 50 different source depth values) to two different observed breakthrough maps, as well as the posterior mean field obtained with a SLGP of order 121.
To perform a flow simulation with a source at depth x while accounting for the uncertainty regarding the geological properties of the domain, we rely on plausible geological realisations. To keep the problem simple, the zone of interest of the aquifer is modelled as a 2D vertical section (10 meter deep and 5 meter wide) aligned with the main flow direction. For each simulation performed, the plausible geological realisation used is a multiple-point statistics realisation generated with the Deesse algorithm (Mariethoz et al. [28]) that reproduce the complex features of braided-river aquifer models (Pirot et al. [35]). Then, the contaminant flow and transport is simulated under steady-state flow and fixed boundary conditions (constant hydraulic gradient) using the Maflot Matlab code (Künze and Lunati [23]). After running such a simulation for a contaminant released at time 0, width 0 and depth x, we obtain a concentration breakthrough map (such as depicted in Figure 6 under three different scenarios). Here 100 candidate observation depths, regularly spaced between 0m and 10m, are considered. Each breakthrough curve, i.e., section of a breakthrough map at a a prescribed depth, describes the contaminant concentration observed at width 5m for varying times (discretised over 101 points regularly spaced between 0s and 12.5s). Misfits between observed and simulated curves are defined here in terms of a re-scaled 2-norm distance.
Throughout the section and as illustrated in Figure 7, we use two reference fields of misfit distributions that are obtained by fitting SLGP models to misfit data produced under the two latent geologies of Figure 6. Using these two reference density fields to draw new samples, we follow the methodology introduced in part 3 and represent in Figure 8 the posterior mean field and its estimated median before and after running 25 steps of the algorithm on the first geological structure. In these settings, the global minimum (at 2,24m) is easily found and the algorithm focuses on improving its estimation of the median. This corresponds to an exploitation-oriented approach. Although not displayed here, we found out, as reflected in Figure 9 that with the other geological structure, our approach was also able to locate the minimum, this time by focusing on exploring by adding observations at new locations of interest.

Optimisation benchmark
For our small benchmark, the starting design consists in n = 20 data points (x i , t i ) from the reference fields heterogeneously scattered across space. Their location are independent uniformly distributed across parameter space. At each step, observations are added in batches of 20 at the same location. We repeat 24 independent instances of the optimisation process for each strategy and each application and compare the performances in term of optimality gap (difference between real and estimated optimal medians). This approach is favoured due to the relatively high cost of one evaluation of the EQI criterion for SLGP, but we expect it to be detrimental to our GP-based competitors, as GPs would benefit more from having scattered observations rather than batches scattered over different points.
We compare different strategies for modelling the field and choosing the next sampling location. The value of the minimiser is inferred by modelling the function of interest with one of three models: the first one, that will be called homoscedastic GP consists in a GP regression where the observation noise level is assumed to be uniform throughout the domain. The second one, a GP regression with input dependent noise rates as in Kersting et al. [22], Binois et al. [5] will be called heteroscedastic GP. The last one is the SLGP model. For each of these three models, we compare a non-adaptive approach (at each step, new observations are added at a location chosen uniformly at random) to an adaptive approach. The criteria used are: Approximated Knowledge Gradient (AKG) as implemented in the R package DiceOptim version 2.0.1 [34] for the homoscedastic GP, the Expected improvement, as implemented in the R package hetGP version 1.2.1 [4] for the heteroscedastic GP, and the criterion 4 of part 3 for the SLGP. The results of the benchmark are shown in Figure 9. Hyper-parameters of the two GP's are estimated by maximum likelihood. For the adaptive SLGP, we decided to display the results obtained when minimising a 90% quantile of the random median, as our first experiments showed no strong sensitivity to the chosen level α. On a personal laptop with 16Go RAM and a simple R implementation, performing the SLGP density estimation took between 13 and 24 minutes depending on the sample size, and inferring the EQI with 150 simulations for 101 locations took 5 more minutes.
One notices that in the most complex situations, the sampling scheme based on GP modelling of the functional performs worse than the approaches based on SLGP modelling. We found out that GP based approaches tends to get trapped in local optima. This might be explained by the credit allocation.

Conclusion and perspectives
We demonstrated how a spatial generalization of the Logistic Gaussian Process model can be used for sample-based modelling of distribution fields, and how the resulting probabilistic predictions of distribution fields can be leveraged for moderate-dimensional stochastic optimization problems under unknown heterogeneous noise distributions. In particular, we introduced a simple sequential Monte Carlo-based approach to quantify the impact of sampling at a location and showed, on a modest benchmark, that it enables implementing sequential design strategies trading off between exploration and exploitation.
The conducted experiments yielded promising results. However, further work is needed to more extensively compare algorithms with respect to the employed meta-models and criteria, respectively, and identify which of these degrees of freedom are most influential on performances. Tuning model settings and strategies appropriately could ultimately lead to substantial efficiency gains, be it towards stochastic optimization or broader uncertainty reduction goals. In particular, SLGP would benefit from more systematic model selection and parameter estimation approaches.
Upcoming research directions encompass making the SLGP model more suitable for higher dimensions and reducing associated computational costs, assessing the performance of the resulting probabilistic predictions, and deriving further uncertainty reduction strategies. It is also of interest to investigate theoretical properties of SLGP models and considered sequential strategies, notably in view of underlying mean and covariance functions.