NETWORK OF INTERACTING NEURONS WITH RANDOM SYNAPTIC WEIGHTS

. Since the pioneering works of Lapicque [17] and of Hodgkin and Huxley [16], several types of models have been addressed to describe the evolution in time of the potential of the membrane of a neuron. In this note, we investigate a connected version of N neurons obeying the leaky integrate and ﬁre model, previously introduced in [1–3,6,7,15,18,19,22]. As a main feature, neurons interact with one another in a mean ﬁeld instantaneous way. Due to the instantaneity of the interactions, singularities may emerge in a ﬁnite time. For instance, the solution of the corresponding Fokker-Planck equation describing the collective behavior of the potentials of the neurons in the limit N → ∞ may degenerate and cease to exist in any standard sense after a ﬁnite time. Here we focus out on a variant of this model when the interactions between the neurons are also subjected to random synaptic weights. As a typical instance, we address the case when the connection graph is the realization of an Erd¨os-Renyi graph. After a brief introduction of the model, we collect several theoretical results on the behavior of the solution. In a last step, we provide an algorithm for simulating a network of this type with a possibly large value of N .


From a model for one neuron to a model for N neurons
One of the first models for neurons was introduced by Louis Lapicque in 1907, see [17]. It is called Integrate and Fire (IF) model. The membrane potential (X(t)) t of a neuron evolves in time according to the simple electrical equation where (I(t)) t is an input current that goes through the membrane. The above equation holds true up until the neuron emits a spike. We use the generic notation τ F for the spiking time. From the mathematical point of view, τ F is regarded as the first time when the membrane potential reaches a certain hard threshold value X F , the latter being referred to as a firing value. Equivalently, the time τ F is nothing but the first hitting time of X F . As just mentioned, it must be seen as a firing time at which the neuron emits a spike. After τ F , the dynamics are continued in the following way: At time τ F , the potential is reset to a reset value X R and the dynamics are refreshed from X R . This model serves as the basis for a large variety of other neural network models developed in order to model more accurately certain behaviors of neurons and neural networks, such as memory, leaking, etc. For instance, a common feature is to write the sub-threshold dynamics of the membrane potential in the form of a stochastic (instead of ordinary) differential equation: for t < τ F , where (I t ) t describes the mean trend of the input current (we here use the probabilistic convention: the time parameter is put in index) and ( d dt W t ) t is a white noise accounting for fluctuations in the input. A typical choice for the coefficient b is b(x) = −λ(x − a), in which case λ reads for some leakage parameter. This model is commonly referred to as a stochastic LIF model.
In this article, we address a network of N connected neurons that evolve according to a variant of the above stochastic LIF model. We describe the discrete system of neurons through the evolution in time of the membrane potentials of each of the neurons: The membrane potential of neuron i (for i = 1, · · · , N ) is described by a process X i = (X i t ) t≥0 . The sub-threshold dynamics of each X i are of the general form: with initial conditions X i 0 < X F . Here (X i 0 ) i=1,··· ,N is a family of independent and identically distributed random variables and (W i ) i=1,··· ,N is a family of independent Brownian motions, the initial conditions and the noises being independent. The term dI i,network t models the current that neuron i receives at time t from the other neurons. As above, the value X F is a firing threshold, which is assumed to be common to all the neurons: Whenever a neuron reaches the threshold value X F , its potential is reset to X R , X R being the same for all the neurons.
For sure, the term dI i,network t is of great importance in the description of the model. Inspired by [18,22], we choose dI i,network t in the form where δ 0 is the Dirac delta measure in 0, τ j k (for some j = 1, ..., N and k ∈ N) is the k-th time at which the potential of neuron j reaches the threshold value X F , and J j→i N is a synaptic weight that prescribes the influence of neuron j onto neuron i. In case when the synaptic weight is positive, the interaction from j to i is excitatory; otherwise, it is inhibitory. It is worth mentioning that, with the above prescription, (dI i,network t ) t is not exactly a current since the measure dI i,network t is singular. Differently, the term dI i,network t account for impulses received by neuron i from the network at time t. These impulses account for the instantaneity of the interactions: each time neuron j spikes, neuron i feels a pulse of intensity J j→i N . The main purpose of this note is to address the case when the synaptic weights are of the form where β is a scaling parameter that calibrates the global connectivity of the network and α i,j represents some variability in the synaptic weight between neurons i and j. The factor 1/N is typical of a mean field model. The main feature is that we allow the weights (α i,j ) i,j=1,··· ,N to be random, in which case the tuples (X i 0 ) i=1,··· ,N , ((W i t ) t≥0 ) i=1,··· ,N and (α i,j ) i,j=1,··· ,N are required to be independent. The idea below is to study the behavior of the system, when N is large, considering different kind of randomness for α i,j .

Mean field limit
The time integrated version of the dynamics may be put in the more accessible form: The case J j→i N = α/N , for a deterministic factor α, has been already addressed in several papers, among which [1-3, 6, 7, 18, 22]. In all these references, authors have paid much effort to discuss the asymptotic regime N → ∞.
Because of the normalization factor α/N in the interaction term and by independence of the various sources of noise, we may indeed expect from standard results on large particle systems with mean field interaction, see for instance [24], that neurons become independent asymptotically, each of them solving in the limit a nonlinear stochastic differential equation with interaction with the common theoretical distribution of the network. In the limiting network, the membrane potential (X t ) t of a representative neuron should evolve according to the equation: where W = (W t ) t is the proper noise to the representative neuron and (τ k ) k stands for its own spike train.
Passage from (1) to (2) is usually referred to as propagation of chaos in the literature. While it may be simpler to justify for systems with regular interactions, it turns out to be a much more challenging problem in the current framework because of the instantaneity of the interactions between the neurons. In fact, even equation (2) itself -which should be called a McKean Vlasov equation-is a difficult object to study, especially in the excitatory regime α > 0. In order to address its well-posedness, two approaches are conceivable, a PDE one and a probabilistic one. The PDE approach is based upon the description, through the so-called Fokker-Planck equation, of the evolution of the marginal law of the potential. We refer to [1][2][3]. The probabilistic route is to analyze directly equation (2), see [6,7] and the more recent contributions [15,19]. In fact, both approaches face the same difficulty: Whenever the interaction parameter α is too large, the system may blow up in some sense. Heuristically, a blow-up occurs when a large proportion of neurons in the finite network spike at the same time; in the limit regime, a blow up occurs when the time derivative of the mean field term in (2) is infinite. It is proven in [1,6] that a blow up appears if the mass of the initial distribution is too concentrated near the firing threshold X F and, conversely, that no blow up appears if the initial mass is sufficiently far away from X F . While global existence of solutions with a blow up is addressed in [7] local uniqueness results are shown in [6,15,19]: In these latter references, successive improvements are obtained on the length of the interval on which uniqueness is known to hold.
Blow up not only matters for the limiting mean field equation. Going back to the particle system, we understand that the difference between the two values X F and X R is certainly important because it must influence, among others, the typical delay between two consecutive spikes. In particular, when α is large in comparison with X F − X R , a neuron may spike at a high frequency because of the pulses received from the others. This stylized fact is at the core of our analysis. Still, the reader must be aware that, in biological models, an additional refractory period is usually added, during which a neuron is somehow locked after a spike. Here we shall not include such a refractory period; for sure, it would ask for further investigations.
Whilst the case J j→i N = α/N has received much attention in the literature, the case of random synaptic weights has received limited attention. This is our precise objective to understand how randomness in the synaptic weights may impact the passage from (1) to (2) and how it may impact the emergence of a blow up in the system. In this regard, it is worth noticing that another form of inhomogeneous connection is addressed in [20]: Therein, a finite network of distinct infinite populations is studied; inhomogeneous weights are used to describe the connections between populations.

Model with random synaptic weights
Throughout the text, we will take X F = 1 and X R = 0. This permits to rewrite equation (1) for the network of N ∈ N neurons in the following more concise form: where the process M i t counts the number of times that the potential of neuron i has reached the threshold value X F = 1; formally, we may write with τ i k standing for the random time at which the process X i reaches the threshold for the k-th time. As already announced, we will discuss the behavior of the system for different types of variables {J j→i N } i,j=1,·,N . Particularly, we will address the case J j→i N = β α i,j /N , with β > 0 constant and: The note is organized as follows. We provide some theoretical results in Section 2. Section 3 is dedicated to the presentation of an algorithm for simulating (a variant of) the particle system with a large value of N . Throughout the text, all the Brownian motions are normalized, meaning that their volatility is 1.

Mathematical inquiries
This section is dedicated to a theoretical analysis of (3), choosing X F = 1 as firing threshold and X R = 0 as reset potential. Throughout the section, we discuss (sometimes in a pretty informal way) the form of the limiting mean field equation together with the existence of a blow-up, as defined in the previous section.

Independent Random weights: α i,j ∼ B(p)
Let us consider the case where the connections are i.i.d. Bernoulli variables, with a parameter p that is independent of N , namely α i,j ∼ B(p) (this is case (1) in Subsection 1.3). From a modeling point of view, this amounts to say that the neurons are connected through the realization of an Erdös-Renyi graph. The first purpose is to conjecture what is the limit equation, or equivalently the analogue of (2), for the network.
Observe first that, whenever p = 1, the synaptic weight takes the form J j→i = β/N , which exactly fits the case addressed in (2). By some heuristic calculation, we expect that, in the more general case p ∈ (0, 1], the limiting mean field term manifests in the form: Intuitively, the reason is that, as N → ∞, not only the neurons should become asymptotically independent between them, but also they should become independent of the synaptic weights. Indeed a given neuron i feels less and less the values of each α i,j as N grows up; this means that, applying a law of large numbers, the interaction term in (3) should get closer and closer to So, as a limit equation for the network, we expect the following (at least in the sub-threshold regime): with M t = k∈N * 1 [0,t] (τ k ). In other words, the Bernoulli parameter should scale as a linear factor in the intensity of the interaction. In particular, if our derivation is correct, the limiting equation fits the framework considered in [1-3, 6, 7]. We will come back to this point next.
For the time being, we have in mind to justify more rigorously the conjectured equation. To do so, we propose the following argument based on a simple model for propagation of chaos.

On the Derivation of the limit equation in a toy model
To get an idea of the behavior of particles in our setting, we consider the simpler model in which the processes (X i ) i satisfy the following system of equations: where (α i,j ) i,j=1,...,N is a family of i.i.d. bounded random variables and (W i ) i is a family of independent standard Brownian motions. The families (α i,j ) i,j and (W i ) i are assumed to be independent. For simplicity, we do not address the influence of the threshold and do as if X F was equal to +∞. Also, we take the same deterministic initial conditions X i 0 = x 0 for all the particles. We use the technique of coupling introduced in [24]. That is, we want to show convergence of (5) to where α is another random variable with the same distribution as each α i,j and W is a standard Brownian. Although it is not needed to state the equation, we assume for convenience that α and W are independent and that (α, W ) is independent of the family ((α i,j ) i,j , (W i )). To prove the convergence stated above, let us also introduce, for each i ∈ {1, · · · , N }, the SDE: and let us follow the strategy consisting in comparing (6) with (7) and (7) with (5). As for (6) with (7), we notice thatX andX i have the same law. We therefore just need to consider X i and X i . For the latter point, we observe that the processes (X, (X i ) i ) are independent of the family (α i,j ) i,j . Using the independence of α andX i , we also know that Since α i,j is bounded by a constant C, we may write By summing over i and dividing by N , also defining δ t to be the function 1 By using Gronwall's lemma, we then deduce that Taking the expectation, we have the following: Moreover, it is now useful to investigate the second moment of α i,jX j s − E αX i s , which is a finite value C 2 2 . Hence, by independence of (α i,j ) i,j and (X j ) j , we get: Since it holds that where W 1 (·, ·) is the 1-Wasserstein distance defined as W 1 (µ, ν) = inf π R 2 |x − y|dπ(x, y), the infimum being taken over all the probability measures π on R 2 that have µ and ν as marginal distributions, we have also proved that, for any T > 0, By the law of large numbers, this implies that which is nothing but propagation of chaos.

Blow-up Argument
We now address the blow-up phenomenon for (4). Since (4) is similar, up to the scaling factor p, to the equation investigated in [1,6], we expect the same picture: If the initial condition is too close to X F = 1, then a blow-up occurs; if it is sufficiently far away, then it cannot occur. Here we show that a blow-up indeed exists if X 0 = x 0 for some deterministic x 0 that is close enough to 1. Our proof is based upon the probabilistic approach, but, in fact, it consists of a reformulation of the arguments used by Cáceres et al. in [1]. Below, we use repeatedly the notation e(t) := E[M t ].
Theorem 1. Assume that the drift coefficient satisfies, for some λ > 0, If the (deterministic) initial condition X 0 = x 0 < 1 is sufficiently close to the threshold 1, there is no global in time solution of the SDE Proof. The idea is to retrace the proof in [1]. Assume indeed that we have a global in time solution (X t ) t≥0 to (8) with a differentiable mean counter e. The object of our study is then Now we apply Itô's formula to ϕ(X t ): Taking expectation: .
, we can rewrite the above expression as: Then, using the hypothesis on b, we get that that is Defineλ asλ and let us choose µ such that −λ + µ 2 > 0. We can now proceed by stating that: It suffices to define the first time T := inf{t ≥ 0 : F µ (t) <λ} and to observe from the above inequality that T = +∞. Coming back to (9) we get that This implies that: But we know that From (10) and (11) we get a contradiction in the following sense: If we have a global in time solution (X t ) t≥0 to (8) with a differentiable mean counter e and with an initial condition that satisfies F µ (0) ≥λ, then we get (10) and (11). As the latter two cannot be true at the same time, we deduce that, whenever F µ (0) ≥λ, there is no global in time solution to (8).
Remark 2. In summary, for a fixed (β, p) the phenomena of blow up holds for initial conditions that are concentrated around the threshold, namely for initial conditions such that Since we assumed for simplicity that the initial condition is deterministic, we know that F µ (0) = exp(µx 0 ). So, we have a blow-up for all p such that: In Figure 1 we present the upper bound for the threshold p + c , varying on the parameter of the drift λ and the initial condition x 0 . The parameter β is taken equal to 1. Of course, values of p + c (x 0 ) that are above 1 are irrelevant, in the sense that there cannot be any blow-up in those cases. Let us focus on a particular case, in absence of drift, so λ = 0, with deterministic initial condition x 0 = 0.8 and β = 1. From the previous remark, we get that p + c ∼ 0.539. Moreover by Theorem 5.2 in [6] we get that p − c ∼ 0.1, where p − c is a lower bound under which there is no blow-up. From simulations, we conjecture that there exists in fact a critical threshold p c (x 0 ) ∈ (p − c (x 0 ), p + c (x 0 )) under which there is no blow-up and above which there is a blow up; numerically, the threshold is around 0.385 (in red the result obtained by numerics). 0.38 In the more general case, where β = 1, we get that the threshold for p is simply rescaled by the factor β.

Dependent Random Weights
In this section, we will present a short analysis on the behavior of the network of neurons when the synaptic weights are dependent. In particular, we will focus on two different forms of random connections.
First, we focus on the case when ..,N are independent Bernoulli random variables with the same parameter p, in which case the dynamics of the system reads (this is case (2) in Subsection 1.3): Our first purpose is to find out the limit equation. We conjecture that the limit equation for this system is the following: where α is another Bernoulli random variable of parameter p and W is an independent Brownian motion. The initial condition X 0 has the same distribution as any of the X i 0 's and is independent of the pair (α, W ). We guess that (13) is the limit equation. Importantly, (13) does no longer read as a rescaled version of (2). The reason is the processes X i and M i should not become independent of α i , even if the network size tends to infinity. This is the main rationale for investigating this example.
Equation (13), in fact, defines a network that consists of two different components: The first corresponds to those neurons for which α = 1; it is a complete sub-network, composed by neurons which are all connected among themselves, so that neurons of this sub-network feel an interaction with all the other neurons of the same sub-network; The second component is made of isolated neurons for which α = 0. As such, the limit equation catches out the duality of the population and the parameter p describes the relative size of the connected sub-network within the whole population. Noticeably, neurons in the asymptotic connected sub-network evolve according to a rescaled version of (2), see Remark 3 right below.
The second case we would like to deal with is the one with α i,j = pα i , where again (α i ) i is an i.i.d. family of Bernoulli random variables of parameter p (this is case (3) in Subsection 1.3). The rationale for this choice is that α i,j here reads as the limit of 1 N α i N j=1 α j , which prompts us below to compare this example with the previous one. In this case as well, we have a network of neurons with two different populations: one formed by isolated neurons and the other one composed by neurons that are interact with one another with the same deterministic synaptic weight p. The dynamics of the particle system reads: Using the fact that p = E[α], our guess is that the limit mean field equation equation should be given by: Remark 3. In this remark, we try to make a first comparison between (13) and (15). As for (13), we observe that which really says that, on the event α = 1, (13) behaves like (4).
Regarding (15), we have: As we expect the activity of the isolated neurons to be less than the activity of the connected ones, our guess is that the leading term in (16) for causing a blow up is So if we denote by p 1 c the critical value of p (that should depend on the initial condition) for the occurrence of a blow-up in (13) and similarly by p 2 c the critical value in (15), we can expect that p 2 c ≈ p 1 c . Of course, we make the conjecture, based on the final discussion of the previous section, that there is indeed, for each model, a critical value of the connectivity p under which there is no blow-up and above which there is a blow-up.

Derivation of the limit equations in the toy model
Following Subsection 2.2, we here justify the mean field term manifesting in (13) by focusing on a toy example. To make it clear, instead of studying rigorously the limit behavior of Bernoulli variables with parameter p), we consider the simpler model introduced in Subsection 2.2: where X i 0 = x 0 is deterministic. We also introduce, following the notation already used, the processesX i described by Assuming that X i andX i have the same initial conditions, we get Taking absolute values, multiplying by α i , summing over i and dividing by N , we can assert that By using Gronwall's lemma, we obtaiñ , taking expectation and using Cauchy-Schwarz inequality, we have We can manage the last term in the expression above as we did before. In fact, using a standard independence argument, we get Therefore, and we conclude as in Subsection 2.2. The derivation of the toy version of equation (15) from the corresponding toy version of the network of interacting neurons (14) follows from a similar argument.

Blow-up Argument
We now provide analogs to Theorem 1 for (13) and (15). Let us first consider the second case, i.e., α i,j = pα i , with α i Bernoulli independent random variables of parameter p.
If the (deterministic) initial condition X 0 = x 0 < 1 is sufficiently near the threshold 1, there is no global in time solution of the SDE Proof. The idea of the proof is very close to that of Theorem 1. We want to find an initial condition that leads to a contradiction if we assume that t → e(t) = E(M t ) is differentiable. However, the proof slightly differs since we here focus on the quantity F α Assuming that e is indeed differentiable, applying Itô's formula to (ϕ(X t )) t≥0 , multiplying by α and taking the expectation, we get The random variable α takes values in {0, 1} and M t is non-negative, which makes it possible to state that Now, the proof exactly matches that of Theorem 1. Choose indeed −λ + µ 2 > 0 and take the initial condition close enough to the threshold 1 so that We then get This is a contradiction because X t ≤ 1 and α ∈ {0, 1}, which means that F α µ (t) ≤ e µ . Assuming for simplicity that the initial condition is deterministic, we get: So, given a deterministic initial condition, we have a blow-up when p is such that: In Figure 2, we present the upper bound for the threshold p +,2 c , varying on the parameter of the drift λ and the initial condition x 0 .
Consider now the case α i,j = α i α j , with α i Bernoulli independent random variables of parameter p.
Theorem 6. Assume that the drift coefficient satisfies, for some λ > 0, If the (deterministic) initial condition X 0 = x 0 < 1 is sufficiently near the threshold 1, there is no global in time solution of the SDE Proof. The strategy is almost the same as in the previous case, namely the main tool is the function F α µ (t) := E[αϕ(X t )]. By Itô's Formula: with e α (t) = E[αM t ]. With the same calculation as in the previous case, we get The only differences with respect to the previous case are that we have e α (t) instead of e(t) and that there is no p in the third term.
Remark 7. As before, there is blow-up when Assuming for simplicity that the initial condition is deterministic, we get: So, given a deterministic initial condition, we get a blow-up when p is such that:

Comparison between the two models
In this section we continue to investigate the differences between the two cases (2) and (3) in Subsection 1.3. We here consider the following SDEs: with the usual definitions. Recall that our guess is that the two systems converge to the limit SDEs (compare with (13) and (15)): again with the usual definitions. Using the same conditioning argument as in Remark 3, these two last SDEs can easily be rewritten as: Assuming, as we did to compute thresholds numerically, that the initial condition is deterministic, i.e., X 0 = x 0 , and that b is zero, we finally get: We now study equation (18) and compare it to (19). First, let us consider the term E M t α = 0 : this is the expectation of the process M t given the fact that the neuron is not concerned by interaction. The good point is that the limiting model without interaction can be solved explicitly. Namely, we can identify E M t α = 0 with E M t , whereM t solves the SDE: where y is the floor part of y. Calling f t the density of the running maximum of Brownian motion until time t, we get: Recalling that f t (x) = 2/πt exp − x 2 /2t 1 {x≥0} , the above expression becomes Particularly, since x 0 < 1, it holds that −x 0 + k is always positive for k ≥ 1: therefore, we then see that the function e 0 is differentiable and its derivative is .
Hence, we obtain the following: , and then: , that is, for a finite constant C x0,T , sup 0≤t≤T e 0 (t) ≤ C x0,T .
We now come back to (18) and (19). We can rewrite (18) in the form: which shows that, in comparison with (19), there is not only an additional factor p in the conditional mean field term given the event α = 1, but there is also an additional drift e 0 . In other words, if we omit the term αp(1−p) t 0 e 0 (s)ds in (20), we recover (19), but with p replaced by p 2 . Intuitively, this says that, if e 0 was equal to 0, we would expect p 2 c = p 1 c . In fact, e 0 is positive: It helps pushing the particles towards the threshold. As a result, it makes sense to expect p 2 c < p 1 c . Fortunately, this is exactly what shows up in the numerical analysis performed at the end of Subsection 2.6.

Independent Bernoulli random variables with parameter p N
We now address case (4) in Subsection 1.3, namely we deal with a neural network described by the equation where α i,j N are i.i.d. Bernoulli random variables with parameter p N depending on N , the total number of neurons in the network. We are interested in three cases: • p N · N = log 1/2 (N ); • p N · N = log(N ); • p N · N = log 2 (N ). The reason why we choose these three parameters follows from the theory of Erdös-Renyi for random graphs, see [9,10] for undirected graphs and [14,23] for directed graphs. We indeed know that p N = log(N )/N is a sharp threshold for connectedness, meaning that: • If p N < (1 − ε) log(N )/N , then the probability that the graph has a unique connected component tends to 0 as N goes to infinity; • p N > (1 + ε) log(N )/N , then the probability that the graph has a unique connected component tends to 1 as N goes to infinity. In contrast with more standard particle systems like those addressed in [24] or (say) of the same form as in (5), it is here possible to have several neurons hitting the firing potential at the same instant of time, and, most of all, it is even possible to have a neuron spiking more than once at the same instant of time, which fact may be excluded in other models of neural networks, see for instance [6]. As exemplified in [4,7], this requires a sequential definition of the spikes that may occur at the same time, given by an induction with the following initialization: Γ 0 := i ∈ {1, · · · , N } : X i t− = 1 . We say that a spike occurs at time t if Γ 0 = ∅. Because of the interactions between the neurons, neurons in the set Γ 0 may force the others to jump at the same time t. This happens for neuron i ∈ Γ 0 if which prompts us to let Γ 1 := {i ∈ {1, · · · , N } \ Γ 0 : X i 1,t− ≥ 1}. Iteratively, we define for any i ∈ {1, · · · , N }:

And then we let
Intuitively, Γ k is the set of neurons that spike at the kth iteration, X i k,t− is the potential of neuron i before the kth iteration and X i k+1,t− is the potential after the kth iteration. With this rule we observe that: • It is possible that one neuron receives (at a given iteration) a cumulative kick (from all the other particles) that is greater than 1, meaning that the potential of a neuron can jump from a potential less than 1 to a potential greater than 2. With our definition of X i k+1,t− if i ∈ Γ k , we just regard the whole as a single spike.
Still a neuron can spike several times at the same time: • It is indeed possible that a neuron i spikes, that its kick makes others spike and that those, in return, make i spike again. This behavior may repeat again: when it repeats just for a finite number of times, we call it a "finite cascade". In that case, the limit in (21) is well defined. • It may happen that the cascading behavior just described above goes on infinitely many times: we call it an "infinite cascade". In that case the limit in (21) may not exist.
Below, we choose β < 1. We then try to make a connection between neurons that spike more than twice and neurons that have a large degree, where, by definition, the degree d i of neuron i is d i := N j=1 α i,j N . Our guess is based on the fact that, for a neuron that is connected to neurons that do not spike more than once at time t, the only way for it to record more than two spikes at time t is that its degree is higher than p N N/β . In particular, the first neurons that record a second spike in the inductive construction of the sets (Γ k ) k must have a large degree. In this regard, we show below that the probability that a neuron has a high degree gets smaller and smaller as N tends to ∞. We thus conjecture that, for the prescribed values of parameters, most of the neurons can only jump once at a given time and, henceforth, that, among those that record more than two spikes at the same time, most of them have a large degree.
We are therefore interested in computing where the apex 1 may be substituted by any other index i.
Remark 8 (Estimate on the upper bound for the probability to have a multiple spikes). Observing that p N /β is bigger than p N and using the Cramer's Theorem (in the context of Large Deviation Theory), we find out that where We have that where k is such that p N = log k (N )/N (recall that k = 1/2, 1, 2 in the three typical cases we have in mind). The last part above is Therefore, the last expression becomes which can be rewritten as Finally, we can again rewrite the expression above as Now, the first term in the expression above converges to 1 when N → +∞; similarly, the "big O" multiplied by log k (N ) goes to 0 when N → +∞; this means that, for every c < 1, for N large enough. Coming back to (23): Calling C(c, β) : which is especially meaningful when k > 0. To conclude, we have found that, for N large enough, Our conjecture is that, for k > 0, this should be a good approximation of the probability to observe more than two spikes for a single neuron at the same time.
By letting c tend to 1 in (24), we get in fact the following large deviation upper bound.

Remark 10.
As for the lower bound, we did not manage to implement the usual tilting method used in the proof of Cramer's theorem. The fact that (a N ) N ≥0 increases at a rate that is much smaller than the rate of convergence of the law of large numbers, as given by the central limit theorem, is a hindrance.

Comparison with the Mean Field Case
A crucial question is to decide whether the mean field approximation is still valid in each of the three cases addressed in the preliminary discussion of Subsection 2.8. Observe indeed that, since p N is assumed to get smaller and smaller as N tends to 0, it is by no means clear that the arguments used in Subsections 2.2 and 2.5 still apply. So, we consider the network given by a family of i.i.d. Bernoulli synaptic weights (α i,j N ) i,j ∼ B(p N ), with p N as before, and we want to compare numerically the behavior of the particle system driven by the interaction term with the behavior of the particle system driven by the interaction term where (γ i,j ) i,j ∼ B(β) are i.i.d. Bernoulli variables of parameter β. The comparison between both is motivated by the fact that E[βα i,j N /p N ] = β. And for sure, our preliminary investigations have shown that (27) had the same limit behavior as the model driven by the simpler interaction term We provide in the left pane in Figure 3) simulations of the interaction terms in both cases, (26) and (27), when p N = log 2 (N )/N and β = 0.375. It shows that, for N large, both are indeed close. This seems to be a numerical evidence that the mean field approximation should be valid when p N is above the Erdös-Renyi threshold for connectedness of the interaction graph. When p N = log(N )/N and p N = log 1/2 (N )/N , and for the same value of β, the center and right panes in Figure 3 suggest that the mean field approximation is no longer valid below the Erdös-Renyi threshold. These observations seem to be in accordance with the results obtained in [8] for the mean field approximation of another interacting diffusion model (so-called Kuramoto model) on a random Erdös-Renyi graph: Therein, the mean field approximation is shown to hold true if p N N/ log(N ) → ∞.

Introduction
In this section, we present the algorithms used to simulate the particle system (3). We also consider a slightly different model in Section 3.3, where the neurons no longer spike when their membrane potential crosses a fixed deterministic threshold but they spike with rate f i (X i t ) at time t. (see Section 3.3 for a precise definition).

Clock-driven vs. event-driven algorithm
The simulation of a very large network of neurons requires to carefully organize the program. We have used two different methods.
Clock-driven method. We fix a time step ∆ and approximate the values of the membrane potentials at each time step T 0 + ∆ for ∈ N. This method is quite simple to implement, but not very fast. The choice of the time step ∆ is crucial: it has to be very small compared to the typical length of an InterSpike Interval (ISI). Event-driven method. This method consists in, first, simulating the times at which events occur, and second, updating the state of the network at these times. The advantage is that the clock is automatically fitted to the state of the system. In our algorithm, we have chosen the spiking times to describe the important events of this method. We have used the clock-driven method to simulate the hard threshold model and the event-driven method for the soft threshold model. In this case, our algorithm is an extension of Ogata's algorithm, see [21].

Hard Threshold
We consider the finite system defined by (1), which we rewrite below for the sake of convenience:

Algorithm 1 Pseudo code for the simulation of a network of interacting neurons in case of hard threshold
Initialize the processes X i and M i of neuron i, for each i = 1, ..., N ; repeat simulate the potential of each neuron for one time-step not considering interaction or resetting; add every neuron with potential above the threshold to the list of spiking neurons; for i neuron still below the threshold do simulate the event "a Brownian bridge crosses the threshold" for neuron i; update the list of spiking neurons; repeat update the entire network by adding kicks caused by spiking; reset the spiking neurons; update the list of spiking neurons; until the cascade of spiking neurons has been exhausted until we have simulated the network for enough time-steps

Soft threshold
We now address a new model, which leads to a new algorithm. We explain below how this new model is connected with the previous, but, first, we want to make clear the reasons why we introduce this new model: • One issue with the model developed previously is that it allows a cascade phenomenon; this turns out to be a hindrance from the numerical point of view. We are thus interested in having a smoother version. • Anyway, the model studied up to now is not perfectly adapted to biological measurements: one main criticism (to which we already alluded in introduction) is precisely the fact that spikes are transmitted instantaneously to the post-synaptic neurons. Indeed, in (3), the post-synaptic neuron j receives a kick of size J i→j N exactly at the spiking time of the pre-synaptic neuron i. If its membrane potential exceeds the threshold, the neuron j also spikes at the same time. The idea for solving this issue is to use a point process based model (see [5,11] for related references), with the following main features: Neurons have a probability of spiking that is strictly less than one when their potential reaches the threshold (which explains why the threshold is no longer hard ). Also the probability of having two spikes at the exact same time is zero (as long as the potential has a finite value) and so there is no more possibility of a cascade. A point process based system can be difficult to simulate, but there exist recipes for simplifying the creation of the algorithm. One of them is an algorithm developed by Ogata, see [21], specifically designed for simulating point processes. We make it clear below.
Our soft threshold model is a Markov process based upon the following general rule. The neuron i spikes at time t with rate f i (X i t ), that is Also, given the history before t, neurons behave independently until a new neuron spikes. In particular, for such a model, two neurons spike precisely at the same instant with probability 0, which explains why we have no more cascades. Between two spikes in the system, the membrane potentials of the neurons evolve independently according to the same diffusive dynamics as in (1); when a neuron spikes, all the membrane potentials receive a kick, as prescribed by the values of the synaptic weights.

Remark 11.
(1) One can think of the hard threshold model as a particular case of the soft threshold model with the rate functions which case the rate functions are the same for all the neurons.
p with a large p ∈ N and a good choice of constant C should be a good approximation of the hard threshold model. Notice that this intensity function is equal to zero for values of the potential that are below X F , the latter still playing the role of a threshold: biological neurons indeed do not spike when their potential is below some threshold (actually it depends on the type of neurons we are dealing with, but this is mostly true for cortex neurons). When the potential exceeds X F , the probability of spiking should quickly increase with the membrane potential so that the neuron will quickly spike.
The soft threshold model can be rigorously defined as a multivariate point process. Although we do not provide the definition explicitly, we borrow materials from this theory to construct our algorithm.
• First, we know that, given the membrane potentials X 1 t , · · · , X N t at a time t (t being typically an event of the multivariate point process), the first next event t + T after t (or equivalently, the first time after t at which a new neuron spikes) is given by the first occurrence of a new point in a point process of intensity ( • Second, given T , the label i of the spiking neuron is distributed proportionally to f i (X i t+T ). Although this construction is pretty simple, it is not possible to implement it directly in the form of a simulation method. So, in practice, we combine the last two points with a rejection procedure. Ideally, it reads as follows: • For all, i = 1, . . . , N , we introduce a (predictable) process (f i t+θ ) θ≥0 such that f i (X i t+θ ) ≤f i t+θ , for all θ ≥ 0; • provided we can do so, we simulate, conditional on the history up until time t, the next event (or next point) t + T of a point process with intensity ( N i=1f i t+θ ) θ≥0 ; • we choose the label i 0 of the spiking neuron proportionally tof i0 t+T ; • we simulate the membrane potential X i0 t+T according to the sole diffusive dynamics prescribed before; • we accept the spike of neuron i 0 at time t + T with probability f i0 (X i0 t+T )/f i0 t+T . In this case, we update its membrane potential to its reset value X i0 t+T = X R and add the kick J i0→j N to the post-synaptic neurons j (i.e. such that J i0→j N = 0); • we then restart the procedure.
Of course, the choice of the "approximation functions" (f i t+θ ) θ≥0 is crucial, as the next event of the process ( N i=1f i t+θ ) θ≥0 should be easily simulated. In practice, functions (f i ) i=1,··· ,N are taken piecewise constant. We can even think of requiring the conditionf i t+θ ≥ f i (X i t+θ ) for θ in a small neighborhood of t only and then of choosing (f i t+θ ) θ≥0 as a random function depending on the sole past of the system before t, in which case the next event t + T is simulated as the next event in a standard Poisson process.
In fact, the conditionf i t+θ ≥ f i (X i t+θ ) may be easily verified when we consider the determinisic part of the dynamics (with the sole drift b and without the Brownian motion), as the values of the process X i are then easily controlled. However, because of the Brownian part in the dynamics, it is impossible to create a piecewise constant approximation function, as the support of a non-degenerate Gaussian random variable is the whole (−∞, +∞). However the probability that these values be very large is quite small; as a result, we can choose the approximation function in such a way thatf i t+θ ≥ f i (X i t+θ ) with very high probability for θ in the neighborhood of t.
For instance, if we consider that, between two spikes t and t + T , X i has the following Ornstein-Ulhenbeck dynamics: t as initial condition and a i as attractor, then we have the following explicit expression for the membrane potential: In words, the membrane potential follows Gaussian dynamics between two spikes and, using standard confidence intervals for the supremum of Gaussian processes, we can easily cook up the approximation functionf i .

Remark 12.
(1) For very large N , we do not updatef i at each spiking times of one neuron. We only modify the value off i0 andf j for neurons j s.t. J i0→j N = 0.
(2) The full network is implicitly updated after a fixed time interval: the time is segmented by a coarse time grid of scale ∆, and thef i are computed within these intervals. At the end of one of these intervals, the maximaf i are updated. We use this method in order to achieve a trade-off: if ∆ is large then the maximaf i are large which means high computational cost due to the percentage of rejected points whereas if ∆ is small then the computational cost is high due to the frequent updates of the full network. The scale ∆ is chosen arbitrarily from experience.

Summary
The algorithm can be found in a pseudo-code form in Algorithm 2 and is represented in Figure 4. It can be summarized in three steps: generate a spiking time t, select a neuron, apply a rejection sampling condition on the spike. This method is applied as many times as necessary for reaching a certain condition, typically until a final time is reached or a given number of spikes has been found. The rejection method is represented on step (4) in Figure 4, but several preliminary steps are necessary.
The step (0) in Figure 4 is to find a time interval that should contain the next event. This step is actually important for the speed of the algorithm. Events are exponentially separated, with the sum of the approximation functions as parameter, see step (1) in Figure 4: This step gives the time when the event is occurring. To find the neuron responsible for the event, we use again the sum of the approximation functions. The bigger the value off i , the higher the probability that the neuron i is the next one to spike.
Once we know which neuron spikes, we update its potential X i (T next ) (step (3)). In this regard, it is important to remark that, while the event is categorized as T next in step (1), it is regarded in the end as an event proper to neuron i (2 in the figure); we denote it by T i,n .
The value of the potential X i at time T i,n is needed to compute the value of the rate f i (X i (T i,n )) and thus to classify the event between false and true spikes (step (4)). The value of a random variable uniformly distributed between 0 andf i gives the classification: if it is less than f i (X i (T i,n )), the spike is a true one; if it is greater, it is a false one. If the spike is a true one, the potential of the spiking neuron is reset to X R while the potentials of all the postsynaptic (children) neurons are updated to the time of event T next . The values of the synaptic weights (J i→j ) j=1,··· ,N are then added to their potentials (step (5)) (so that the rates (f j (X j Ti,n )) j=1,··· ,N may increase). If the spike was a false one, nothing happens.
The algorithms then loops back to step (0) until a given condition (number of spikes reached, time elapsed, etc.) is met.

Comparison between clock-driven and event-driven algorithms
We now argue why the event-driven algorithm may be more advantageous than the clock-driven one. We first recall that the clock-driven method is based upon a time discretization. The state of the system is indeed approximated at discrete times of the form (T 0 + ∆) ∈N , for a given time step ∆. The advantage is pretty clear: The clock-driven method may be easier to implement than the event-driven one. Still, it has a major drawback: the time mesh disregards the own clock of the system; the former is indeed uniform whilst the latter is deeply heterogeneous since spikes may occur much more frequently at certain periods. In other words, the time mesh chosen in the clock-driven method may not be adapted to the dynamics of the system (we did not explore adaptive time meshes since the event-driven method is automatically adapted). As a result, it may be harder for the clock-driven algorithm to capture the right behavior, leading to precision, stability and complexity issues.
For instance, precision issues may arise because spikes (at least in the simplest form of the clock-driven method) can only occur at the nodes of the time mesh. For sure, this creates a local small error, that may accumulate on the long run. Whilst sophisticated strategies may be implemented to mitigate this effect (for instance the dynamics between two consecutive nodes may be reconstructed a posteriori in the form of a bridge), 12: Put t = T k − T n and T n = T k 13: Put X n = a n + e −λ n t (X n − a n ) + σ n N (0, (1 − e −2λ n t )/(2λ n )) 14: Put u ∼ U ([0, 1]) 15: iff n τi * u < f (X n ) then 16: for all j do1N 17: if j = n then 18: 19: X j = X j + J n→j 20: Put τ i = T k and go to 4 a common (and simple) solution is to take a very small time step ∆. However, the smaller the step ∆ the higher the computational cost.
For sure, choosing ∆ small has a low interest when the activity of the network is low: When spikes are rare, the computational effort that is required for handling a small ∆ looks somewhat disproportionate.
Conversely, ∆ has to be chosen very small in order to account for blow-ups. As stated above, this increases the complexity of the method.
These basic observations explain the need for an alternative, event-driven, method. By focusing on the spikes generation (and only computing that), the event-driven approach may indeed reduce the global complexity. Also, times at which the dynamics are simulated are no longer multiples of ∆ but can now potentially take any decimal value within the precision range of the machine. The values are then easily precise up to 10 −15 on modern machines.

Graph management
The storage of the interaction matrix J N is a strong limitation in our algorithm. To simulate a network with N neurons, we need to know the values of N 2 synaptic weights J i→j N . In practice, the size of the memory requested to store this matrix is one of the main limitation in our algorithm. For instance, in the classical algorithm used for creating a N -node Erdös-Renyi directed graph, a matrix of size N 2 is created, where each element indicates whether there is a connection between two neurons; typically, this may be an issue when N is greater than 10 6 . Indeed, it is impossible to store on a laptop a graph with one million of nodes, or a network with one million of neurons.
The main idea we propose here is to slightly change the classical method for generating the synaptic weights (when given as the realizations of independent and identically distributed variables as it is the case in the Erdös-Renyi graph); in a sentence, our strategy is: "instead of storing N rows of size N , store only N numbers, namely store for each row one number that encodes the whole row". We call our new algorithm a reconstruction algorithm. Let us rephrase the main idea: instead of storing the full matrix of interaction, we store N values, say S 0 , S N . . . , S N (N −1) (the fact the indices are in the form N i is explained in Figure 5 below). The knowledge of S N i is sufficient to compute the row J i→• N of the synaptic weights from i to all the other neurons. In other words, instead of requesting a memory stack of order N × N , we just request to a smaller memory stack of order N . Of course, the price to pay for this operation is that, each time we need the elements of the row number i, we have to recompute from scratch the whole vector of interactions pointing from i. As we make clear below, this method is numerically efficient if, first, the matrix is static (i.e., the values do not evolve in time), and second, we do not use it too often; in fact, both conditions are fulfilled in our setting.
The whole trick is based on the fact that, from the numerical point of view, the matrix J N is obtained by a simulation technique, which is in fact deterministic! Consider indeed a Pseudo Random Number Generator (PRNG) in a state S 0 . If one uses this PRNG once, a random number r 0 is produced and the state of the PRNG is changed from S 0 to S 1 . After N iterations, the PRNG is in a state S N and N random numbers have been returned. Now if we put the PRNG back to state S 0 , and ask again for a number, then r 0 will be returned again and the RNG will be back in state S 1 ; after N iterations, the PRNG is again in state S N and the N random numbers that have been returned are the same as the N numbers that had been initially generated. So, going back to the matrix J N , we can store the states S N i of the PRNG at the beginning of the simulation of the row i + 1. Each time we need the values of the weights J i0→j N , we just change the state of the PRNG to S N i0 and generate again (with the PRNG) the pseudo-random values J i0→j N . Obviously, putting the PRNG in the same state gives the same pseudo-random result. We make this clear in Figure 5 below.
This method is easily implemented in the case of an Erdös-Renyi graph (which is our benchmark example), see also Algorithm 3. Instead of storing the connections in a matrix, one stores only the first states (S N i ) i=0,··· ,N −1 ; at the end of the day, the memory print is much lower... but the computational cost is higher. To make it clear, this algorithm is not very efficient if one needs to often access the individual entries of the matrix, or if one wants a dynamical system where the graph connections are allowed to evolve in time. But in our case the connections are fixed once for all at the beginning of the simulation and the only moment we need them is to update the potential of the children after a spike; in the latter case (and this the key point), what we want is exactly the full row of the matrix. if Bernoulli(RNG)= 1 then Update potential of neuron j A natural question is the question of the complexity of such a reconstruction-based method compared to the complexity of the classical one. This is a broader matter that is addressed in the next section.

Complexity: memory and instructions
We remind the user that algorithmic complexity must be read as a function relating the input of an algorithm (more precisely its size) and the number of steps it takes (its time complexity) or the number of storage locations it uses (its space complexity). The asymptotic values of this function (for larger and larger sizes of the input) are noted with a big O notation.
Of course, the reader must remember that two algorithms with the same order of complexity (or two implementations of the same algorithm) may execute with different amounts of resources because of a multiplicative factor between the two (which will become somehow insignificant asymptotically, but which is not for inputs of small size). Having in mind this limitation, the analysis we provide below focuses on the comparison between the orders of complexity. It must be stressed that the only input that we regard in the analysis of the complexity is the number N of neurons. In particular, all the routines that do not depend explicitly on N do not really appear in the complexity, whilst they could have a non-negligible cost. For instance, PRNG takes time, but it does not show up in the final computation of the complexity.
The first complexity column is given as an example of how the complexity is computed. In the classical algorithm for generating an interaction matrix, for all the parents, we must look at all the other neurons in order to determine whether they are children of the parent. Hence the double loop, leading to N 2 . In the case of the reconstruction method presented above, the number of children of a given parent is determined before hand, and in the second loop their index.
It must be also stressed that, in practice, the time complexity at creation of the reconstruction method is lower than the complexity of the interaction method; theoretically, this is not the case as one should take into account the worst case scenario (i.e, the complete graph).
As for the last two columns, the space complexity is obviously smaller for the reconstruction method, making it very relevant for big graphs. Still, the time complexity using the vector of PRNG states is worse than the time complexity of the interaction method, but, as we already commented, this is not such a hindrance in our case.