Regression Monte Carlo for Microgrid Management

We study an islanded microgrid system designed to supply a small village with the power produced by photovoltaic panels, wind turbines and a diesel generator. A battery storage system device is used to shift power from times of high renewable production to times of high demand. We introduce a methodology to solve microgrid management problem using different variants of Regression Monte Carlo algorithms and use numerical simulations to infer results about the optimal design of the grid.


Introduction
A Microgrid is a network of loads and energy generating units that often include renewable sources like photovoltaic (PV) panels and wind turbines alongside more traditional forms of thermal electricity production. These microgrids can be part of the main grid or isolated. Communities in rural areas of the world have long now enjoyed the installation of isolated microgrid systems that provide a reliable and often environment-friendly source of electricity to meet their power needs.
The elementary purpose of a microgrid is to provide a continuous electricity supply from the variable power produced by renewable generators while minimizing the installation and running costs. In this kind of systems, the uncertainty of both, the load and the renewable production is high and its negative effect on the system stability can be mitigated by including a battery energy storage system in the microgrid. Energy storage devices ensure power quality, including frequency and voltage regulation (see Hayashi et al. (2017)) and provide backup power in case of any contingency. A dispatchable unit in the form of diesel generator is also used as a backup solution and to provide baseload power.
In this paper, we consider a traditional microgrid serving a small group of customers in islanded mode, meaning that the network is not connected to the main national grid. The system consists of an intermittent renewable generator unit, a conventional dispatchable generator, and a battery storage system. Both the load and the intermittent renewable production are stochastic, and we use a stochastic differential equation (SDE) to model directly the residual demand, that is, the difference between the load and the renewable production. We then set up a stochastic optimization problem, whose goal is to minimize the cost of using the diesel generator plus the cost of curtailing renewable energy in case of excess production, subject to the constraint of ensuring reliable energy supply. A regression Monte Carlo method from the mathematical finance literature is used to solve this stochastic optimization problem numerically. Three variants of the regression alrogithm, called grid discretization, Regress now and Regress later are proposed and compared in this paper. The numerical examples illustrate the performance of the optimal policies, provide insights on the optimal sizing of the battery, and compare the policies obtained by stochastic optimization to the industry standard, which uses deterministic policies.
The optimization problem arising from the search for a cost-effective control strategy has been extensively studied. Three recent survey papers Olivares et al. (2014); Reddy et al. (2017); Liang and Zhuang (2014) summarize different methods used for optimal usage, expansion and voltage control for the microgrids. Heymann et. al.Heymann et al. (2016 transform the optimization problem associated with the microgrid management into an optimal control framework and solve it using the corresponding Hamilton Jacobi Bellman equation. Besides proposing an optimal strategy, the authors also compare the solution of the deterministic and stochastic representation of the problem. However, similarly to most PDE methods, this approach suffers from the curse of dimensionality and as a result, it is difficult to scale. The main contribution of this paper is to solve the microgrid control problem using Regression Monte Carlo algorithms. In contrast to existing approaches, the method used in this paper is more easily scalable and works well in moderately large dimensions Bouchard and Warin (2012).
Identifying the optimal mix, the size and the placement of different components in the microgrid is an important challenge to its large scale use. The papers Mashayekh et al. (2017b,a) use mixedinteger linear programming to address the design problem and test their model on a real data set from a microgrid in Alaska. In a similar work, Olatomiwa et al. (2015) studied the economically optimal mix of PV, wind, batteries and diesel for rural areas in Nigeria. In Haessig et al. (2015), optimal battery storage sizing is deduced from the autocorrelation structure of renewable production forecast errors. In this paper, we propose an alternative approach for the optimal sizing of the battery energy storage system, assuming stochastic load dynamics and fixed lifetime of the battery. Our in-depth analysis of the system behavior leads to practical guidelines for the design and control of islanded microgrids.
Finally, several authors Ding et al. (2012Ding et al. ( , 2015; Collet et al. (2017) used stochastic control techniques to determine optimal operation strategies for wind production -storage systems with access to energy markets. In contract to these papers, in the present study, energy prices appear only as constant penalty factors in the cost functional, and the main focus is on the stable operation of the microgrid without blackouts.
The rest of the paper is organized as follows: In section 2 we describe the microgrid model and introduce the different components of the system, in section 3 we translate the problem of managing the microgrid in a stochastic optimization problem and present the dynamic programming equation that we intend to solve numerically. Section 4 introduces the numerical algorithms used to solve the control problem, we give a general framework for solving the dynamic programming equation and we then provide three algorithms for the approximation of conditional expectations. In section 5 we illustrate the results of the numerical experiments, identify the best algorithm among those we studied and then employ it to analyze the system behavior. We conclude with section 6 where the estimated policy for the stochastic problem is compared, in an appropriate manner, with a deterministically trained one; the aim is to provide evidence that industry-widespread deterministic approaches underperform stochastic methods.

Model description
In this section, we will discuss the topology of the microgrid, its operation, components and their respective dynamics. Although we discuss a simplified microgrid model, more complicated typologies can be studied using straightforward generalizations of the methods presented in this paper.
Consider a microgrid serving a small, isolated village; most of the power to the village is supplied by generating units whose output has zero marginal cost, is intermittent and uncontrolled. Additional power is supplied by a controlled generator whose operations come alongside a cost for the microgrid owner (either the community itself or a power utility). Often the intermittent units include PV panels and wind turbines, while the controlled unit is often a diesel generator. In order to fully exploit the free power generated by the renewable units at times when production exceeds the demand, microgrids are equipped with energy storage devices. These can be represented by a battery energy storage system.
The introduction of the battery in the system not only allows for inter-temporal transfer of energy from times when demand is low, to times when it is higher, but also introduces an element of strategic behavior that can be employed by the system controller, to minimize the operational costs. Without an energy storage, diesel had to be run at all times demand exceeded production. When a battery is installed, intensity and timing of output from the diesel generator can be adjusted to move the level of charge of the battery towards the most cost effective levels.
In figure 1 we propose a schematic description of the system which might help the reader to familiarize themselves with the microgrid, whose components are described more in depth in the following subsections.
Remark 1. Note that for convenience, in the following, we will work in discrete time only. This setting is not restrictive as in reality measurements of the systems are repeated at a given, finite, frequency. We also consider a finite optimization horizon represented by the number of periods over which we want to optimize the system operations indicated by T

Residual Demand
Consider two stochastic processes L t and R t , the former represents the demand/load and the latter the production through the renewable generators. Notice that both processes are uncontrolled and they represent, respectively, the unconditional withdrawal or injection of power in the system (constant during time step). For the purpose of managing the microgrid, the controller is interested only in the net effect of the two processes denoted by the process X t : (1) Remark 2. The state variable X t represents the residual demand of power at each time t, such that for X t ą 0, we should provide power through the battery or diesel generator and for X t ă 0 we can store the extra power in the battery. The network is arranged as follows: photovoltaic panels and wind turbines provide renewable generation, a diesel generator provides dispatchable power for the village and a battery storage system is used to inject or withdraw energy.
For simplicity, we model the residual demand as an AR(1) process, the discrete equivalent of an Ornstein-Uhlenbeck process. In practical applications we expect X t to be an R-valued mean reverting process with many different sources of noise and time dependent random parameters; our formulation avoids the cumbersome notation using constants in place of stochastic processes still providing scope for generalization. The process X t is driven by the following difference equation, starting from an initial point X 0 " x 0 : where ξ t " Np0, 1q, ∆t is the amount of time before new information is acquired, b is the mean reversion speed, σ the volatility of the process and Λ t is the time dependent mean reversion level.
Remark 3. In real applications the function Λ t should represent the best forecast available for future residual demand at the time of the estimation of the policy.

Diesel generator
The Diesel generator represents the controlled dispatchable unit. The state of the generator is represented by m t " t0, 1u. If m t " 0 then the diesel generator is OFF, while it is ON when m t " 1. When the engine is ON, it produces a power output denoted by d t P rd min , d max s at time t, for d min ą 0.
Notice that, in addition, when the engine is turned ON, an extra amount of fuel is burned in order for the generator to warm up and reach working regime. We model the cost of burning extra fuel with a switching cost K that is paid every time the switch changes from 0 to 1. The fuel consumption of the diesel generator is modeled by an increasing function ρpd t q which maps the power d t produced during one time step into the quantity of diesel necessary for such output. Denoting by P t the price of fuel at time t, the cost of producing d t KW of power at one time step is P t ρpd t q; for simplicity we take a constant price of the fuel P t " p. Two examples of efficiency functions ρ are described in figure 2.  , typical of a generator designed to operate at medium regime, on the right ρpdq " d 0.9 , typical of a generator designed to operate a full capacity.

Dynamics of the Battery
The storage device is directly connected to the microgrid and therefore its output is equal to the imbalance between demand X t and diesel generator output d t , when this is allowed by the physical constraint. The battery therefore is discharged in case of insufficiency of the diesel output and charged when the diesel generator and renewables provide a surplus of power.
Let us denote the power output of the battery by B d t and its power rating by B max and B min , where B max and B min represent respectively the maximum and minimum output. Thus: The case where B d t ă 0, represents that the battery is charging while the case where B d t ą 0, represents that the battery is supplying power.
Notice then that an energy storage has a limited amount of capacity after which it can not be charged further, as well as an "empty" level below which no more power can be provided from the battery. We denote the state of charge by the controlled process I d t which is described by the following equation: here I d t P r0, I max s and B d t P rB min , B max s, for B min ă 0 and B max ą 0. For simplicity we assume that the battery is 100% efficient. Notice that we used superscript d on B d and I d to highlight the dependence of these processes on the controlled diesel output d t .
Intuition tells us that the bigger the battery, the less diesel will be needed to run the operations of the microgrid. This is true because a bigger battery would allow to store for later use a bigger proportion of the excess power produced by the renewables. Batteries however are very expensive, and the cost per KWh of capacity scales almost linearly for the kind of devices we consider in this paper (parallel connection of smaller batteries), hence it is important to find the optimal size of battery for the needs of each specific microgrid.

Management of the Microgrid
The purpose of the microgrid is to provide a cheap and reliable source of power supply to at least match the demand. Therefore, we search for a control policy for the diesel generator which minimizes the operating cost and produces enough electricity to match the residual demand. In order to assess how well we are doing in supplying electricity, we introduce the controlled imbalance process S t defined as follows: Ideally, the owner of the Microgrid would like to have S t " 0 @ t. This situation represents the perfect balance of demand and generation. When S t ą 0 we observe a blackout, residual demand is greater than the production meaning that some loads are automatically disconnected from the system. The situation S t ă 0 is defined as a curtailment of renewable resources and takes place when we have a surplus of electricity. We treat the two scenarios, blackout and curtailment asymmetrically. To ensure no-blackout S t ď 0 and regular supply of power, we impose a constraint on the set of admissible controls: However, for S t ă 0 i.e. surplus of electricity, we penalize the microgrid using a proportional cost denoted by C. Large penalty would lead to low level of curtailment and can be thought of as a parameter in the subsequent optimization problem.
A rigorous mathematical description of the microgrid management problem follows in section 3.

Stochastic optimization problem
We state now the stochastic control problem for the diesel generator operating in a microgrid system as described in section 2. In practice we seek a control that minimizes the cost of diesel usage pρpdq, the switching cost K and the curtailment cost C|S t |1 tStă0u , under the no black-out constraint S t ď 0. Note that, given the type of control we have on the diesel generator, we can frame the optimization problem as a special case of stochastic control problems known as optimal switching problems.
Let us denote by F t the filtration generated by the residual demand process pX s q t s"0 , the state of charge process pI d s q t s"0 and the current regime m t , which represents all the information available on the system up to time t. In practice, given the markovianity of the problem, we have that F t is reduced to the σ-field generated by the triple pX t , I d t , m t q. Let us define the pathwise value J, given by where pX t , I t , m t ; d t q " pX s , I d s , m s ; d s q T s"t . As a consequence, we define the value function as: where (9a) represents the black-out constraints translated for the power produced by the diesel generator, (9b) represents the minimum and maximum power output of the generator and (9c) models the physical constraints of the battery: maximum input/output power and maximum capacity. From equation (8), we can write the associated dynamic programming formulation which helps understand the structure of the problem composed of two optimal control problems: an optimal switching problem between being in the regime ON or OFF, and another absolutely continuous control problem assuming the regime is ON. The equation reads as follows: where is the conditional expectation of the future costs and U t is the collection of admissible controls d at each time step t, i.e.
U t :" td t : equations (9a) -(9c) are satisfied and d t adapted to F t u.
In order to ensure that the set of admissible controls is nonempty we introduce the following assumption: Assumption 1. The diesel generator is powerful enough to supply demand at all times, i.e there is always a control d that satisfies the blackout constraint.
Remark 4. We enforce assumption 1 by redefining the residual demand process with a truncated version of (1), such thatX t " minpX t , X max q is the residual demand. In practice this is reasonable because the maximum power that could be required from the microgrid is known apriori and the diesel generator is generally sized to the maximum capacity installed on the system. For the sake of notational simplicity, we will drop the " on the variableX t from the following sections.
Note that (10) provides a direct technique to solve problem (8), iterating backward in time from a known terminal condition and solving a static, one period, optimization problem at each time step. The only difficulty in this procedure lies in the estimation of conditional expectations of future value function, which can not be computed exactly. In the next section 4 we will focus on the numerical solution of (8).

Numerical Resolution
In this section we describe the algorithm which we want to employ in the solution of the energy management problem for the Microgrid system described in section 3. The main mathematical difficulty comes from the approximation of conditional expectations in (10), which we will tackle using a family of methods called Regression Monte Carlo.
The algorithm we propose fully exploits the dynamic programming formulation (10): we start generating a set of simulations (scenarios) of the process X, which we will refer to as training points, then we optimize our policy so that it performs well, on average (weighted on the probability of each scenario), on the different scenarios.
In practice, we initialize the value function at last time step in the backward procedure to be equal to the terminal condition g. We then iterate backward in time and at each time step over each training point we choose the control that minimizes the sum of one step cost function and the estimated conditional expectation of the future costsCpt, x, w, m; dq. Note that, as expected, the conditional expectation is a function of time, the state of the system px, wq and the state of the diesel generator, represented by the ON/OFF switch m and the control d.
As the iteration reaches the initial time point we collect a set of optimal actions for each time step and many different scenarios; in addition, since the problem is Markovian, we can summarize such strategies in the form of control maps: best action at each time t given a pair of state variables pX t , I t q and state of the diesel generator m t . We propose three different techniques to computeC in section 4.1.
A fair assessment of the quality of the control policies approximated by the algorithm just introduced is obtained by running a number of forward Monte Carlo simulations of the residual demand, controlling the system using such policies and then taking the average performance.
We give a general description of the pseudo code in algorithm 1.
Remark 5. Notice that it is typical of Regression Monte Carlo algorithms to provide the optimal policy only implicitly, in the form of minimizer of an explicit parameterized function. The outputs of the algorithm are therefore the parameters (regression coefficients) of such function.

Regression for continuation value
In this section we present the numerical techniques we use to estimate conditional expectations Cpt, x, w, m; dq in algorithm 1. These techniques belong to the realm of Regression Monte Carlo methods, and in particular these specifications allow to deal with degenerate controlled processes (the inventory). We focus on two main variants: a two dimensional approximation of the conditional expectation and a discretisation technique which considers a collection of one dimensional approximations.
In particular, we test three algorithms: Grid Discretisation, Regress Now and Regress Later. Grid Discretization is characterized by a one dimensional projection in the residual demand dimension repeated at different inventory points. Regress Now/Later, on the other hand, use a two dimensional regression in residual demand and inventory. Moreover, while Grid Discretization and Regress Now require projection of the value function at t`1 on F t measurable basis functions, Regress Later requires an F t`1 projection. For details on these techniques see Balata and Palczewski (2017) for regress later, Boogert and de Jong (2008); Warin (2012) for GD and Carmona and Ludkovski (2010) for 2D regress now. Note that in the three algorithms we repeat the regression approximation for both values of m. An open source platform has also been developed to numerically solve wide variety of stochastic optimization problems in Gevret et al. (2016).
Let us denote by tX j t u M j"1 the collection of training points at time t, similar notation is used for the inventory tI j t u M j"1 . Generate a customary grid tw 0 , . . . , w D u points over the domain of I t .

4:
Simulate tX j t u M 1 ,N j,t"1 according to its dynamics where M 1 " M {pD`1q;

5:
Define tX j t , I j t u M j"1 as cross product of tX j t u M 1 j"1 and tw j u D j"0 for @t 6: if Regression 2D then 7: if Regress Later then 8: Generate tX j t , I j t u M,N j,t"1 accordingly to a distribution µ;

9:
if Regress Now then 10: Generate tX j t u M,N j,t"1 according to its dynamics and tI j t u M,N j,t"1 according to a distribution µ; 11: Initialize the value function V pN, X j N , I j N , 1q " V pN, X j N , I j N , 0q " gpI j N q, @j " 1, . . . , M ; 12: for t " N to 1 do 13: Compute the approximated continuation valueC using Algorithms 3 or 2 14: for j " 1 to M do 15: for m " 0 to 1 do 16: for j " 1 to M do 22: F 1 "CpX j t , I j t ; 0, 0q 23: 24: m j t`1 " 1 tp0RUtq or p0PUt and F 2 ăF 1 qu

25:
if m j t`1 " 1 then 26: compute X j t`1 and I j t`1 " I j t´B d t ∆t 28: j"1 pJ j N`g pI j N qq output: control policy td t u, value function V .

Grid Discretisation
Grid discretisation is characterized by a one dimensional approximation of the conditional expectation repeated at different levels of inventory. Let Υ I " tw 0 " 0, . . . , w D " I max u be a discretisation of the state space of the inventory and tX j t u M,N j"1,t"1 be generated from a forward simulation of the dynamics of X. We define the approximation of the continuation value on the grid Υ I by regressing the set of value functions tV pt`1, X j t`1 , w i qu M j"1 over the basis functions tφ k pxqu K k"1 for each where we compute a collection of regression coefficients through least square minimization where we define R K Q α t i,m " pα t 1,i,m , . . . , α t K,i,m q. Note that the least square projection is a sample estimation of the L 2 projection induced by the conditional expectation, for this reason we can approximate the function Cpt,¨q using a least square projection of the value function at time t`1. However, as we have not included the inventory in the basis functions, we need to interpolate between values ofĈpt, x, w i ; mq in order to obtain an estimation of the value function for I t P pw i , w i`1 q. Let us define byCpt, x, w; m, dq the linear interpolatioñ where ωpt, w, dq " and i " 0, . . . , D. Details of the algorithms are given in the pseudocode 2.

2D Regression
Contrary to the grid discretisation approach, the 2D regression methods approximate the conditional expectation of the value function as a surface, function of both residual demand X and inventory I, without the need for interpolation. In the problem we consider, the control only acts on a degenerate (deterministic) process and we can therefore test two specifications of the method: "Regress Now", where we project over tφ k pX t , I t`1 qu K k"1 and "Regress Later", where we project over tφ k pX t`1 , I t`1 qu K k"1 . The terminology Regress Now or Regress Later is attributed to the time step of the exogenous variable X t used in the projection.
In Regress Now, we generate training points tX j t u M,N j"1,t"1 from a forward simulation of the dynamics of X and tI j t u M,N j"1,t"1 from a distribution µ N on r0, I max s. In Regress Later, on the other hand, we generate both processes tX j t , I j t u M,N j"1,t"1 from an appropriate distribution µ L , for details see Balata and Palczewski (2017). In the following we will generalize the discussion of the two approaches by using the subscript r with realization t to indicate Regress Now algorithm and t`1 to indicate Regress Later. As training measures we choose µ N to be the Lebesgue measure on r0, I max s and µ L to be Lesbegue measure on r0, I max sˆr´X max , X max s.
The regression coefficients in the 2D regression Monte Carlo method are computed by least-square projection as: where we define R K Q α t m " pα t 1,m , . . . , α t K,m q. Let us recall, denoting by φ the vector`φ 1 p¨q, . . . , φ K p¨q˘, that the coefficients α t m can be computed explicitly by and therefore, even though the regression coefficients are random (sample average approximation of expectations with respect to the measure µ) they are independent of F t . Given the previous remark we can estimate the conditional expectation of future value through: Cpt, x, w; m, dq " E The explicit value of E " φ k pX r , I t`1 qˇˇX t " x, I t " w, d t " d ı now depends on r, i.e. whether we are using "Regress Now" or "Regress Later" to deal with the uncontrolled residual demand. In the first case we simply obtain, from the measurability of X t , E " φ k pX t , I t`1 qˇˇF t ı " φ k px, w´B d t ∆tq ":φ k px, w, dq.
In the second case we need to compute the expectation with respect to the randomness contained in the transition function from X t to X t`1 and we simply write Remark 6. For polynomial basis functions, i.e. φ k pX t`1 , I t`1 q :" X p t`1 I q t`1 , the conditional expectationφ k px, w, dq can be written in closed form as: Using the notation just introduced we can summarize the differences between the two techniques in the following table: Details of the algorithms are given in the pseudocode 3 .

Numerical Experiments
In this section we use the algorithms introduced in section 4 to solve a simple instance of the microgrid management problem. We fix some base parameters and test the three algorithms; the one performing best is then used to study the sensitivity of the control policy and of the operational costs on changes in system parameters, hoping to gain some insight on the optimal design of the microgrid. We now list the base parameters chosen for the numerical experiments; notice that the "s" column indicates whether a sensitivity analysis is run for such parameter. For the meaning of the parameters refer to section 2.
According to the parameters table above, and recalling remark 4 the residual demand has the (a) (b) (c) Figure 3: In the three panels above we display the estimated regression coefficients corresponding to the basis tx, i, x iu in the case of 2D regression, and txu at three different inventory levels for GD for m t " 1. Although we used basis function up to polynomial degree 2, we present few coefficients for clarity of presentation. Notice that the time axis is inverted to show the number of time steps computed backward. Remarkable smooth coefficients are computed by the Regress Later algorithm.
following dynamics: where ξ t " Np0, 1q. We decided to use such simple dynamics for illustrative purposes in order to make the sensitivity of the optimal control policy to the remaining parameters more straight forward to understand.
Consider now that for the parameters listed above, the problem is time homogeneous. We have also observed empirically that the estimated continuation values tend to forget the terminal condition rather quickly. We show in Figure 3 that the regression coefficients for all algorithms converge to a stationary value time steps, suggesting that optimization ran for longer time horizons would not bring any noticeable effect to control policy. Since all three methods use polynomial basis of degree two for the projection, it also allows for easy comparison of the dynamics of the coefficients across methods. For example, at inventory level I " 0 the dynamics of the coefficient for x achieves same stationary level for both Grid Discretization and Regress Now. Although an exact comparison is not possible between Regress Now and Regress Later, we continue to observe similar sign and dynamics for each of the coefficients. However, getting away with almost no noise in the dynamics of the estimated coefficients of Regress Later compared to Regress Now is essentially magical.
As a result, we define a stationary policy dpx, w, mq to be used in a longer time horizon than the one employed for its estimation which performance are comparable to the time dependent policy dpt, x, w, mq.
We finally tested the value of both stationary and time dependent policy and found that the performance of the stationary policy is comparable to that of the time dependent policy.

Analysis of the controllers
In this section we compare the control policies estimated by the three algorithms and we try to assess whether one of the approaches is preferable.

Control maps
We compare now the stationary control policies produced by the different algorithms; recall that these policies are feedback to the state, i.e. can be written as function d m px, wq. Figure 4 displays an example of the feedback control policy in the form of control map, a graphical representation of the value of the optimal control for each pair px, wq. We observed that the three policies agree with the intuition that the diesel generator should produce more power when residual demand is high and inventory is low. We can also notice that the switching cost influences the policy, forcing the diesel to keep running for longer in order to charge the battery sufficiently and avoid turning ON and OFF the generator too often. Just by observation of the control maps little difference can be found among the algorithms, we display in Figure 4 the effect of the control policy on a the state of charge of the battery. It can be observed from the estimated unconditional probability density of the process I that the policies induced by Regress Now and Regress Later are very similar. Both seem to induce a peculiar mass of probability around I n " 2.5, differentiating the behavior of the inventory compared to Grid Discretization. The distribution of the state of charge, obtained by plotting the histogram of all simulations over all time steps, shows that Regress Now and Regress Later does not fully exploit the whole inventory but rather they are more conservative, saving energy to avoid to turn ON the diesel generator in the future. In the next section we will investigate the value associated to this control maps.

Performance of the policies
In order to assess the performance of each policy in an unbiased manner, we select a collection of simulated paths of the residual demand process X, and record the costs associated with managing the microgrid as indicated by each control map.
We first study how the quality of each policy improves when we increase the computational budget given to each algorithm to compute the stationary policy. In Figure 5, we show the estimated value of the policy when the initial state of the system is px, i, mq " p0, 5, 0q for polynomial basis functions of increasing degree, for 2D regression. In case of GD we increase the number of discretisation points for the inventory. In particular we make the computational time increase by providing the problem with more training points and more parameters to use in the definition of C as increasing the number of basis functions. In the case of 2D regression, surprisingly, we noticed that the performance of the estimated control improves only when polynomials of even degree are added, and the effect is more prominent for Regress Later.
We notice from the comparison that Grid Discretisation converges quickly, resulting in the best algorithm in terms of trade off between running time and precision. Among the 2D regressions, we observe similar bias for Regress Now and Regress Later (not displayed in order to maintain clear presentation, but available on request), however latter has lower standard error. This is not surprising because Regress Later has only one element of approximation error due to finite basis functions while Regress Now has error attributed to two sources, first, due to finite basis function and second, pathwise estimation of the conditional expectation.

System behavior
In the previous section we selected Grid Discretisation to be the best performing algorithm by our criteria. In the following we shall always employ Grid Discretisation to conduct our study of the sensitivity of the control policy and the associated cost of managing the grid to some of the parameters of the model.
The aim of the section is to build a solid understanding of the behavior of the microgrid in order to get an insight into the optimal design of the system. We decided to study the following aspects of the grid: battery capacity, represented by I max ; different proportion of renewable production, via the volatility σ and the mean reversion b; tenable behavior of the policy, via the switching cost K and curtailment cost C.
In order to be able to carry out our analysis, without introducing cumbersome economic and engineering details regarding the microgrid components, we have to make very simplistic assumptions. Our aim is however to guide the reader through a methodology that can be replicated to study real world microgrid systems.

Battery capacity
We study first the behaviour of the system relatively to changes in the capacity of the battery. We would expect to observe negative correlation between the quantity of diesel consumed and the battery size. We display in Figure 6 both the quantity of energy curtailed and the cost of running the diesel generator for different values of the battery capacity. We can observe that, as expected, increasing the size of the battery leads to lower diesel usage thanks to the higher proportion of renewable energy that is retained within the system. As the capacity of the battery reaches 30/40 KWh, we start observing a decrease in the cost-reduction per KWh of additional capacity suggesting that further analysis should be run in order to understand up to which size it is worth to pay to add storage capacity to the system.
We show now how to infer information about the optimal sizing of the battery, minimizing the trade off between the installation cost of a bigger battery and the reduced use of the diesel generator. Consider however that including battery ageing in the stochastic control problem is outside the scope of this paper but rather in this section we present only a post-optimization analysis. Assuming that the microgrid runs under similar conditions for the next 10 years, we can quickly estimate the total throughput of energy for the different battery capacities. Consider now that a battery has not an infinite lifetime, but rather it should be scrapped after equivalent 4000 cycles (amount of energy for one full charge and discharge). Under the previous assumptions, we can compute how many batteries would be necessary to cover the next 10 years of operations. Similarly, using the data relative to the usage of diesel generator for different levels of capacity, we can compute the operating cost of the diesel generator over the same time period. Further exploiting the assumption about the lifetime of a battery, we obtain the cost of running the grid for 10 years as a function of the number of batteries. To conclude, assuming a linear cost of 400 e /KWh of capacity, we work out the installation cost of the different-size storage devices.
Once this information is collected we search for the minimum of the sum of installation and running cost and, in turn, we compute the optimal capacity. Figure 7, on the left, displays a graphical summary of the procedure just described and shows that in our problem the optimal size of the battery is 14 KWh under the current set of assumptions. Further, we study how much our result is affected by the cost per KWh of capacity, repeating the procedure above. We find that, as expected, as cost increase the size of the optimal battery decreases. Figure 7, on the right, displays such behaviour.

Renewable penetration
In this section we want to investigate how robust the microgrid is to higher penetration of renewable generation, or, in other words, to what extent the algorithm can cope with increasing randomness and decreasing predictability of the system. To model this phenomena we assume that greater penetration Figure 7: In the figure above we compute the total cost of installing and running the grid for ten years, assuming we replace the battery every 4000 cycles, and plot it against the battery capacity (left panel). From the corresponding minimum we can work out the optimal battery capacity and, further, compute the sensitivity of such result with respect to the cost per KWh of capacity. of renewables can be modeled by increasing both the parameters for volatility σ and the mean reversion rate λ. Increasing these two parameters makes the problem more difficult to solve, given that the control policy can rely less and less on the statistical properties of the process X.
In order to establish the real added value provided by our stochastic optimization algorithm, we compare the estimated policy with an heuristic myopic control which can be reproduced in our model solving the dynamic programming equation (10) taking constant conditional expectation with respect to the control. We show the value of the two control policies as function of the increasing learning difficulty in Figure 8 where we observe that the value of accounting for statistical estimation of future conditional expectations when taking decisions decreases.
In figure 8 we present cost of diesel as a function of σ for stochastic and myopic policy. Since increasing σ alters the volatility of the distribution, we define the mean reversion rate λ :" σ 2 {p2cq in order to ensure that the volatility of the process is constant while we increase σ. The stochastic policy leads to at least 12% reduction in the cost of the diesel usage, compared to the myopic policy, and the difference magnifies with increasing "fluctuations" in the process. The decreasing relationship of the cost with σ signifies the importance of the battery storage system in the microgrid which absorbs the sharp change in the demand. In figure 9 we compare the demand for two different levels of the σ, the dynamics of the diesel generator and the inventory. Notice significantly less usage of the diesel for high fluctuations, σ " 5, compared to σ " 1.175.
The results of this experiment are affected by the over-pessimistic assumption of modeling greater penetration of renewables with an increasingly unpredictable, and eventually completely random, residual demand process. This sort of analysis can however provide insight into how much (weather and load) forecasting capability will be necessary for a given level of renewable penetration.

Switching and curtailment
We conclude this section by analyzing the dependence of the system behavior on two key parameters in the model: switching cost K and curtailment cost C. Switching cost is a system's property and the microgrid controller has little freedom over, however the controller can significantly reduce the amount of curtailed energy by choosing the appropriate curtailment cost. In figure 10, we observe that increasing the curtailment cost reduces the total curtailed energy by approximately 4%. However, it comes at the cost of inefficient usage of the diesel generator, which is represented on the right in the figure 10. The histograms represent the difference between the cost of diesel usage (blue) and the energy curtailed (orange) for C=20 and C=2. Positive diesel cost depicts inefficient usage of the diesel at C=20 compared to C=2. Depending upon the specific cost functional for the diesel, the controller can use C as a parameter for better optimization.
The optimal policy when the generator is ON m t " 1 is significantly altered depending upon the switching cost. For example, in figure 11, we present the control maps associated with K=2 and K=5. As expected, larger switching cost disincentivise the controller to switch OFF the diesel generator once it's ON. However, we don't observe "significant" change in the control policy due to increase in switching cost when the generator is OFF.

Comparison with deterministically trained policy
In this section we compare our stochastic optimization algorithm with a deterministically trained policy. The latter is widely used in online optimization where the solution is computed with respect to the best forecast available at a given time. We emulate this situation by computing the optimal set of actions for a particular deterministic demand trajectory at different levels of the inventory. We assume that the forecast of the demand is given by: X t`1 " X t`0 .5p6 sinp πt 12 q´X t q∆t; t P t0, 1, . . . , T´1u.
Equation (13) implies periodicity of one day in the residual demand and is equivalent to σ " 0, b " 0.5 and Λ t " 6 sinp πt 12 q1 in (2). Zero volatility in the residual demand curve leads to a deterministic optimal control problem, rather than a stochastic control problem we have presented in section 5.
Notice that the deterministic optimal control problem results in a sequence of control maps d t : pw, mq Ñ rd min , d max sY0. As a result, although the policy has been trained on a deterministic residual demand, it dynamically adapts itself to different inventory levels and state of the diesel generator, when tested in a stochastic environment. We present the modified algorithm in 4. There are two key differences from the previous algorithm, first, we use one dimensional projection of the value function and second, we replace regression with interpolation since there is no randomness left in the problem.  deterministic control problem was solved using the residual demand curve (13), the stochastic control problem was fed in with the residual demand curve (14). Finally, we test both the strategies on fresh out-of-sample paths following the residual demand (14).
X t`1 "´X t`0 .5p6 sinp πt 12 q´X t q∆t`2 ? ∆tξ t¯^1 0 ; t P t0, 1, . . . , T´1u In figure 13, we present the histogram of the cost from the stochastic policy and the deterministic policy pathwise for 10,000 out-of-sample paths. As evident, most of the distribution lies on the negative side, implying gain due to stochastic policy. To measure this difference, in table 1, we quantify the gain of the stochastic policy for different switching cost. For switching cost of K=5, we observe that the stochastic policy is 7.5% better than the deterministic policy. As the switching cost increases, mistakes made by deterministic policy become more expensive leading to higher percentage difference.
Finally, Figure 14 displays the behavior of inventory and the cost along a random trajectory of residual demand. In blue we show the stochastically trained control policy and in orange the deterministically trained. The stochastic policy has lesser switch of the diesel generator and thus lower costs. The spikes in the cost function for the deterministic policy is due to poor management of the inventory and thus inefficient usage of the microgrid.

Conclusion
In this paper we solved the problem of optimal management of a microgrid by employing three algorithms from the Regression Monte Carlo literature, namely: Regress Now, Regress Later and Inventory Discretization. We find that Inventory Discretization significantly outperforms the other two methods. Besides algorithm design, we propose a methodology to optimize the design of the grid and determine the optimal sizing of the battery. In addition, we perform a thorough sensitivity analysis to some of the key parameters, showing the robustness of our solution. Finally, we compare the control policy estimated by our algorithm to industry standard deterministic control, observing a 5-10% reduction in cost.
Future research in this direction will include further studies of the optimal sizing of the battery by explicitly incorporating the wearing off caused by usage. Another more challenging direction is to understand the impact of delay, e.g., in the switching of the diesel generator, on the optimal management of the microgrid. This problem introduces several mathematical and algorithmic issues which are currently the focus of our research.