Construction of a statistical learning tool based on ordinary differential equations to model the digestive behaviour of ross chickens

. Being able to monitor and forecast farm animal performances is a strategic problem in the agronomy industry. We use a Data-Model Coupling approach to build a biomimetic Statistical Learning tool taking into account some aspects of the biological dynamics of the animal body. The objective is to build a tool which is able to assimilate data about daily feed consumption and measured performances. Themodelencompasses several sub-models corresponding to compartments and permitting to mimic a kinetic process divided into several steps. Each sub-model contains parameters which can be learnt by using an optimization algorithm and data. The goal of the (cid:28)rst application of the model on (cid:28)eld data was to simulate and predict the growth of chickens. An experiment was performed during 70 days to collect every day the feed consumption and the weight gain of a male and a female chickens. After the learning of the model parameters, the model shows a very good approximation of the chicken’s weight evolution over time.


Introduction
Being able to monitor and forecast farm animals performances is a strategic problem in the agronomy industry: since several decades, endeavors have been made in this activity sector, with the aim to optimize the breedingrelated production, as for example meat, eggs, milk, etc. (See [9]).
Today, new technologies permit to monitor farm animals and collect a wide range of information as explained in [6,19,27,30] and [12]. Yet, those tools are still expensive and their use still infrequent.
Furthermore, biological data contain high variability (Noise, heterogeneity, missing and aberrant values, etc.) (See [17] [20]). Therefore, to treat most of biological issues we have to build precise predictive tools from few exploitable data.
Animals are complex living organisms in which intakes induce complex physico-chemical phenomena. Therefore to be able to link inputs and outputs concerning the evolution of some sensed biological factors, we need a mathematical model taking into account some aspects of the animal's body dynamics (See [23]).
The main objective of this study is to build a tool which is able to predict outputs from inputs concerning farm animals and also able to manage the existence of underlying complex biological phenomena. In this paper we are going to present the construction of a Statistical Learning tool based on a dynamical mathematical model corresponding to a system of Ordinary Differential Equations (ODE). The first application of this Statistical Learning tool consisted to simulate and predict the growth of chickens.

Purpose
Growth simulation is a well-known problem, and some classical models -such as the Gompertz's and the Verhulst's models -are very well suited to fit some growth data as it is showed in [5] and [24]. Nevertheless those classical growth models do not permit to perform data assimilation. Indeed, almost no input data can be integrated in those kind of models.
On another hand, some works had already treated biological modeling issues by developing realistic and specific models as in [28]. But the construction of those realistic models is a costly task resulting in models containing a lot of equations and unknown parameters making them difficult to implement ( [31], [4] and [8]).
A model-free approach is also explored, as in [11] for instance. Machine Learning tool based on Neural Networks were developed to simulate and predict the evolution of biological factors in [10]. This approach, which are only based on data does not need knowledge about the link existing between the used inputs and outputs and permits to develop easier to handle models. But those models need a large amount of data to be fitted and compensate the lack of taken into account knowledge (See [29]).
In the light of the existing methods for predicting biological responses, we decided to explore an approach which can be defined as being half-way between "model-free" and "full-model" approaches: we explored the Model-Data Coupling theory to construct a tool integrating biological knowledge in a mathematical model, and using data, to optimize the model parameters. Model-Data Coupling is an expanding approach but primarily developed to treat issues in meteorology (See [26]), hydrology (See [15,16] and [18]) and biogeochemistry (See [2,3,22] and [25]). We built our model by splitting the whole complex kinetic into several parsimonious sub-biomimetic-processes performed by a combination of successive compartments. All those compartments compute their respective behaviors, and then exchange messages, assimilable to fluxes. To do that we built several Ordinary Differential Equations integrating the mathematical expressions of biological phenomena (storage, saturation, etc.).

Scope
In the scope of this study, we focus on meat poultry and specifically two biological strains : • Ross 308 (Aviagen) • Cobb 500 (Cobb-Vantress) Some informations about those strains are publicly available ( [1] and [7]). We know the average daily feed consumption and the average daily weight gain of an average female and an average male of each strain, during 70 days. Therefore, those available data correspond to smoothed experimental data.
We decided to add a Gaussian noise to those data in order to obtain experimental-like data.
As a first step, we considered the growth (mass gained each day) and the loss (ratio of food unmetabolized). Therefore we have a database concerning four individuals (2 females and 2 males) and containing 70 lignes and 4 columns.
In this paper we are going to introduce in Section 1 the applied methodology. Then in Section 2 we will present the developed mathematical model and the results obtained by using this model to simulate the growth of chickens. In this section, the parameters of the model will be learnt by using the previously presented database.

Biological-like function
The basis computational unit used in the following work is what we defined here as organ-like compartment. We define it as a tuple : where input and output can be treated as vectors of fixed dimensions in and out, which are each relative to a flux of a particular kind. There is a mapping between those vectors through function, which corresponds to a biological-like function. This function models a biological phenomenon in a synthesized way. Finally, state is a vector allowing these computational units to have a state, in other words a kind of memory, which can take part in the expression of the said function.
The dimensions of these input / output vectors therefore correspond to information exchanges in the form of messages between the organ-like compartments, the nature of which can be, for example, nutrients, drugs active principles, etc. These functions can for example try to mimic a phenomenon of fixing, convection, diffusion, etc. Figure 1 shows how such a biological-like function can be summarized as a computational unit.

Functions network
These deliberately simple functions are not complex enough to model an organism. However, by making the contact between them, it becomes possible to model a more complex structure. To do this, a communication is established from the output vector of a biomimetic function to the input vector of the following function ( Figure  2). Since these functions have states, the network as a whole can be likened to a finite state machine.

Optimization steps
This network acting like a model containing several parameters to optimize. We can distinguish two groups of parameters. The first step of optimization is at the level of biological-like functions. Most of the phenomena that we are trying to model involve parameters corresponding for example to the part of information exchanged between the compartments, saturation thresholds, etc. These are optimized in order to fit as much as possible the data.
To determine these parameters we use an non-linear optimization algorithm implmented in Matlab (fmincon). This algorithm allows us to find the set of parameters minimizing the model error towards data used.

Network structure (organism level)
The nature of the links between these different functions also involves parameters (weighting link, preprocessing of the vector), and is therefore also subject to optimization.
In our study, these parameters essentially correspond to the amount of information exchanged between the compartments. Given the small number of computational units involved at this stage, and so a reasonable dimensionality of the problem, it is possible to use a conventional optimization by just including them in the above set of parameters, but potentially on a complex structure, a heuristic search method (genetic algorithm for example) could be preferred.
We could also favor separated processes of optimization, where each organ would be adjusted independently, before adjusting the whole network.

Main goals
The goal of the following work is to build a Statistical Learning tool that can be used to simulate the food consumption and mass gain of poultry, to obtain results close to what one would obtain with a growth law fitting, with the singular difference that this association of function will not only be a function of time but also of the food consumed, and thus would have an increased biological correspondence.
It therefore seeks to replicate stem growth data, but adjusting a network of biomimetic functions instead of a Gompertz logistic function. The latter will, however, be used as a reference point for the metrics considered (coefficient of determination R 2 and root mean square error RMSE ).

Construction of the model
Growth requires substrate and complex physico-chemical phenomena to convert substrate into dry weight. Weight formation can be influenced by many factors such that substrate provision, assimilation, digestibility, storage, daily losses, environmental parameters, etc. Modeling of all those processes would lead to the construction of a huge model. Therefore we do not want to reproduce the digestion kinetic but just integrate into the mathematical model information exchanges, delay, fixation, accumulation and saturation effects.
In the model, we use the Gompertz growth equation as in [14] : where W is the weight which varies in time t, W f is the maximum weight that can be reached with the available nutrients and A is a constant.

Assumptions
In our model, we assume that the organs participating in the food digestion are the stomach, the small intestine and the large intestine. We built our model under the following assumptions: (A1) The ingested food is stored in the compartment named "crop" before moving to the compartment assimilable to the stomach.
(A2) The flux rates are constant between compartments.
(A3) There exist some transmission delays between some compartments.
(A4) The small intestine emptying dynamic starts after some time τ . Following those assumptions, we can draw the graph associated to our model and synthesizing the digestive system of a chicken as in Figure 3, where • S d is the amount of food consumed each day; • Q cp (resp. Q st , Q si , Q Li , Q w ) is the information inside the crop (resp. stomach, small intestine, large intestine, storage compartment); • γ cp , (resp. γ st , γ si , γ abs ) is the rate of nutrients transferred from crop to stomach (rep. from stomach to small intestine, from small intestine to large intestine, from small intestine to storage compartment). • γ o , corresponds to the non-metabolized part of the nutrients. The dynamical system of the digestive functions at each day d are synthesized by the ODEs: with the initial conditions Note here that there is some delay in the procedure of information transmission from crop to stomach exponentially with rate β 1 . The storage compartment uses a proportion of nutrients going out from small intestine to gain weight and all the information left from the day before is kept to be used in the present day, that gives The quantity of the growth machinery is proportional to the amount of information transmitted to the storage compartment. However, there are some losses during the information transmission corresponding to the part of the nutrient used via physical activities, reproduction needs and the non-metabolizable nutrients. Those losses are proportional to the weight of the animal and nutrients produced from metabolism, The effectiveness of the machinery κ exponentially decays in time with the delay µ, Equation (12) follows that As a result, the final weight is the total gain up to considering day, where N d is the number of days and t f is the time at the end of a digestive cycle of a day.

Global optimization
The parameters contained into the different biological-like functions are initially unknown. Those parameters which can be learned by using data and optimization algorithms confer to the model a learning capability. Therefore we used the function directL ( [13]) existing in R ( [21]), to learn the model parameters. To find the values of the parameters permitting the model to fit the data, this function minimize an objective function. In this application the objective function corresponds to the mean squared difference between the initial and the predicted curves (15), where P redCurve(i) and InitCurve(i) respectivelly correspond to the i th point of the predicted curve and the i th point of the initial curve, and n is the total number of point, that is 70 days.

Simulation results
After the fitting we parametrized the model with the obtained values of the parameters ( Table 1).
The dynamics of the model throughout one day and throughout two days ( Figure 4) were simulated with the Female Model, that is the model simulated the growth of the female chicken. We also simulated the growth of the animals throughout the whole studied period ( Figure 5).

Knowledge feedback
First of all, considering for the same strain males on one side, and females on the other, it is interesting to note that some parameters seem to be stable, while only a part of them vary significantly among poultry's sex. In that way, the adjusted model give us back some information, that could be confronted to some business expertise (here zootechnicians).
Indeed, it appears that organism's difference of behavior is concentrated on parameters : • β 2 , relative to the effectiveness of the organism 1 • γ abs , the link's strength between the small intestine and the storage compartment 1 its ability to metabolize a high ratio of ingered food with respect of time • γ si , the link's strength between the small and the large intestine • γ cp , the link's strength between the crop and the stomach In particular, we observe that male's organism seem to stay highly efficient longer than female's (a fact that we observe in ground truth), characterized by a slightly better β 2 on the male. At the same time, we observe that on female's, the amount of material sent from the small intestine to the big intestine (which is going to be lost) is higher than male's, where a bigger part is sent to the storage compartment.
Those observations would need to be confirmed on other strains on to be confornted to a zootechnician expertise, but is by itself a first step of knowledge feedback. Figure 5 shows that the built model fits the data with a goodness of fit close to our target. We chose to compare our model with two types of models : a Gompertz growth (17) law usually used to simulate growth phenomena, and a two order Polynomial Model (16) took as a basis with the same number of parameters. The latter one doesn't have a biological likelyhood, so the first goal is to perform better.

Accuracy
We fitted the parameters of the Polynomial Model (p 0 , p 1 , p 2 ) and the ones of the Gompertz's Model (g 0 , g 1 , g 2 ) on the data concerning the female and the male chickens.
We calculated for each model the mean of the R 2 and the RM SE values associated the Female and the Male Models. Figures 5, 6 and 7 and the results contained in Table 2 show that the accuracy of those different models is globally equivalent. Therefore we built a model satisfying in terms of accuracy. But the real advantage of the Biomimetic Model is its capability to integrate biological knowledge and assimilate input data.

Perspectives
The developed approach allows us to stay very modular with regards to the nature of the simulated model, so it could adapt to a higher level of information for a few adaptations.

Fluxes multiplication
In the scope of this work, the model was only trained to deal with simple weightings data, and so not in a context where it could out-perform classic growth laws. The actual added value to those models would appear as soon as more fine data would be available.
Indeed, the objective at mid-term is to exploit an eventual higher level of knowledge (theoretical or measured) about modelized phenomenons, helping to define in a more precise way what happens inside the organism, typically with data centered on an organ's input and output. Unlike classic growth laws based on temporal axis (Gompertz, Verhulst for example), this kind of model could handle data made of several dimensions (time, ingered proteins, drugs, etc.), and among everything, could pretend to follow a biological likelyhood.

Model reinterpretation
At a longer term, and in a context where the previous hypothesis about new data would be validated, the goal would be to optimize automatically the weigth parameters between biological-like functions.
Knowing the oragnism's structure to a sufficient level of details, and having a sufficient number of measures, it would be possible to refine the knowledge concerning the nature of the connections between these different virtual organs, by set of parameters optimization (genetical optimization for example).
As those parameters do have a relationship with actual phenomenon due to their biological likelyhood, it would be interesting to confront the final set of optimized parameters to a business expertise.

Conclusion
We explored a Data-Model Coupling approach and built a biomimetic dynamic tool containing Ordinary Differential Equations. Those Differential Equations contain parameters conferring a learning capability to the developed tool. Therefore, our tool is a Statistical Learning tool able to learn the value of parameters from field data.
An optimization process permits to learn the parameters and then parametrize the mathematical model by integrating input data concerning the feed consumption of the chickens. After the learning step we obtained a model able to simulate the growth of chickens with an accuracy similar to the one associated to the classic growth models. But the real advantage of our model in comparison with those classical logistical models, is its capability to integrate input variables. Furthermore, the structure of the model is create, manage and flexible. Hence, the structure of the model can be adapted if we are able to collect more detailed data.
The number of parameters to determine is less important than in the realistic models, but it is still high and only one structure of the model was tested. Therefore model selection methods could be used to determine the optimal structure of the model in terms of accuracy and the number of parameters to learn.
Considering the number of parameters to determine and the number of data, a problem of overfitting can be expected. We suppose that the integrated knowledge and the imposed profil of the growth curve could permit to avoid overfitting, but this aspect has to be verified on a Test Dataset.
Furthermore, it could be a major improvement to have a higher quantity of data to fit the model. In particular, more detailed data concerning the processes occuring in the different organs of the digestive system of the chickens could permit to fit the parameters compartment by compartment.