Change-Point Detection, Segmentation, and Related Topics

. Recent contributions to change-point detection, segmentation and inference for non-regular models are presented. Various problems are considered including the multiple change-point estimation with adaptive penalty for time series with di(cid:27)erent dependency structures, estimation of the singularity point in cusp-type models, inference for thresholded autoregressive models, and cross-segmentation of matrices. RØsumØ. Des contributions rØcentes dans les problŁmes de dØtection de rupture, de sØgmentation et de l’infØrence pour des modŁles non-rØguliers sont prØsentØes. Les problŁmes considØrØs incluent l’estimation de plusieurs points de rupture avec une pØnalitØ adaptative pour des sØries temporelles avec di(cid:27)Ørentes structures de dØpendance, l’estimation d’un point de singularitØ pour des modŁles de type cusp, l’infØrence pour des modŁles auto-rØgressifs (cid:224) seuil et la segmentation croisØe des matrices.


Introduction
This article presents some recent results on the statistical inference for models with abrupt changes in the parameters and for related non-regular problems.The results were presented at the session Change-point detection and segmentation of the Journées MAS 2016, organized by Farida Enikeeva.Each section of the article is based on one of four talks given by Jean-Marc Bardet, Vincent Brault, Sergueï Dachian and Bruno Saussereau.Farida Enikeeva has written the introduction and coordinated the present article.
The models with irregularities such as the abrupt changes in a signal or noise parameters have received a lot of attention for decades.These problems arise naturally in many applications such as quality control, network trac data analysis, seismography, analysis of DNA sequences, audio signal processing and many others [8].
A number of methods had been developed for the problems of detecting a single change in a parameter (often the signal mean) of an observed process in the iid noise case and under dependency conditions (see [17], [25]).
Later on those methods were generalized to the problem of estimating multiple change points with known or bounded number of changes.The early techniques include, for example, the binary segmentation [81], the least squares estimation [62], and the dynamic programming approach [51] that allows to estimate the unknown change-points in quadratic time with respect to the sample length.Several techniques were proposed for the We assume that (X 1 , . . ., X n ) has an unknown number K * − 1 of semi-parametric changes.More precisely, in each of the three frameworks, (X t ) is a time series depending on the changing parameter θ with the values θ * k for t ∈ {t * k−1 + 1, t * k−1 + 2, . . ., t * k }, where k = 1, . . ., K * , with t * 0 = 0 and t * K * = n by convention.The number of changes K * − 1, the change-points (t * k ) 1≤k≤K * −1 and the changing parameters (θ * k ) 1≤k≤K * are unknown.

Breaks detection analysis of the FTSE index
Our aim is to estimate K, (t k ) 1≤k≤K−1 , and (θ k ) 1≤k≤K .In the following we will rst consider the situation of known number of changes K * .In Section 1.1 we give a general overview of the estimation methods based on the minimization of an appropriately contrast function for a known K * .Next, in Section 1.2 we will consider several penalization criteria applied in the case of unknown number of changes.The results on the consistency and asymptotic of the obtained estimators of K, (t k ) 1≤k≤K−1 , and (θ k ) 1≤k≤K will be given for each of three frameworks (1)(3) stated above.Finally, in Section 1.3 the slope heuristic procedure of an adaptive penalization is presented.
We consider the general multiple change framework dened below.
Assumption.We observe a sample (X 1 , • • • , X n ) such that X t is a time series depending on the parameter In order to estimate all these parameters, for as the sum of contrasts within each interval, The estimators of parameters minimize this sum of contrasts.Denition 1.1.Let K * be known.Dene Note that a consequence of this denition is the following: if we denote by the estimate of θ withing the interval (t i−1 , t i ], then the estimates of change-points t i are obtained as the solution of the following minimization problem: Thus the initial minimization problem is reduced to the computation of ( t i ) 1≤i≤K * −1 , which normally requires to study O(n K * −1 ) dierent values of ( t i ) 1≤i≤K * −1 .By use of the dynamic programming (see [35]), this problem is reduced to O(n 2 ) for any K * .

The case of unknown number of changes
When K * is unknown, we can dene a penalized sum of contrasts, i.e. for 0 ≤ K ≤ K max , where K max is a xed real integer number supposed to be larger or equal to K * , we set where (κ n ) is a sequence of positive numbers and K ∈ N → pen(K ) is an increasing function.
Denition 1.2.Let K * be unknown.Dene The estimators of this type arise in three following well known situations.
• The case of multiple least square regression (see for instance [3]) with always with (U k ) 1≤k≤n a family of observed vectors in R d .This is the classical Mallows C p criterion; • The general case of maximum likelihood estimation with This is the classical AIC criterion; • The general case of maximum likelihood estimation with This is the classical BIC criterion.
Let us make additional assumptions: • the explicit form or expansion of functions s θ in the "signal + noise" model, f θ and M θ in the case of causal ane time series and the expansion of the spectral density f X (λ) in case of long memory processes are known.

Framework 1: signal + noise model
We observe (X 1 , . . ., X n ) where the underlying process (X t ) t∈Z satises the relationship Example 1.3.In a typical situation we consider s θ (t) = θ ∈ R and therefore d = 1.Then, for dierent contrast functions Φ θ we have the following estimators of θ: • for the contrast Φ (1) we have θ n = median(X 1 , . . ., X n ).
Denote now for any θ i ∈ Θ, In case of general causal ane processes we can use a Gaussian Quasi-Maximum Likelihood (QML) estimation.
We will use the short abbreviation QMLE is we talk about Quasi-Maximum Likelihood estimator.
Then the Gaussian QML on {t i−1 + 1, . . ., t i } is given by Then we can dene the Laplacian QML by Denition 1.7.Dene the contrasts Φ G θi and Φ L θi as follows, Finally, we will assume that there exists two sequences of Lipschitzian coecients (α i (f )) i≥1 and (α i (M )) i≥1 respectively for f θ and g θ , i.e satisfying for any x, y ∈ R N , Theorem 1.8.For Φ G θ we prove in [6] that under identiability conditions on f θ and M θ , if there exists M > 0 such as for any x ∈ R N and any • Moreover, if r ≥ 4 and if κ n = √ n, under conditions on the rst and second derivatives of f θ and M θ (see more details in [6]), then Remark: Note that the convergence rate is the same as if (X 1 , . . ., X n ) was a family of independent random variables.

Framework 3: Long memory processes
In this case, the spectral density of ( where d * i ∈ (0, 1/2), c * 0,i , c * 1,i and β * i are positive real numbers.Robinson [75] introduced the local Whittle contrast for estimating d * i .Hence, for d i ∈ (0, 1/2), and m ∈ N * , dene is the periodogram.Using this contrast we can dene the local Whittle estimator of the long memory parame- In [5], we prove the following theorem.
We also obtain the following rates of convergence: 3. An adaptive penalization: the slope heuristic procedure The heuristic slope procedure has been introduced in [1].It was applied in the three previous framework (see respectively [4] or [35], [6] and [5]) for deducing a data-driven penalization.In the sequel we detail the case of long-memory change detection.
Instead of a xed sequence (κ n ), we can use a data-driven penalty rate κ n = 2 × | s|, with s the slope of the least square regression of be an estimator of the number of changes using the data-driven procedure.Figure 1.3 shows a graphical illustration of this procedure in the long memory processes framework: Using the MonteCarlo experiments we have shown that this data-driven procedure leads to more accurate results than the procedures based on penalization with a sequence (κ n ) chosen a priori (see in [4] and [35] for the result for Framework 1, [6] for Framework 2, and [5] for the case of Framework 3).In the research about the autonomous vehicles, the cost of a high performance GPS is a problem and a solution proposed is to study the environment of the vehicles to guide the latter in the case of regular trips.For this, we need to synthesize large videos footage of the environment (see [54]) by separating the distinct places (e.g.straight line, intersection...) (see [10]).
This problem is similar to the analysis of the biological Hi-C data (see [30]) and some algorithms exist.
Notably, Brault et al. [12] have developed an algorithm of quick segmentation for a block-wise constant matrix, Brault et al. [15] have studied the segmentation method based on rank statistics.
In Figure 3, we display an example of a trip and the similarity matrix of the associated video images: the redder a cell (t 1 , t 2 ) is, the stronger is the resemblance between the two associated images at times t 1 and t 2 .
We can observe some color blocks in the matrix and our goal is to segment the rows and columns to bring out homogeneous blocks (either with uniform coloring or with the same types of colors).
To estimate the change-points in the matrix, two procedures have been developed for the Hi-C data called respectively BlockSeg (see [12]) and MuChPoint (see [15]).In this presentation, we compare the two methods with respect to their complexities and theoretical guarantees (see also [16] and [14]).

Statistical framework
In this part, we develop each modelization.

BlockSeg method
In their article [12], the authors consider estimating n 1 = (n 1,1 , . . ., n 1,L 1 ) and n 2 = (n 2,1 , . . ., n 2,L 2 ) from the random matrix Y = (Y i,j ) 1≤i,j≤n dened by with the convention n 1,0 = n 2,0 = 1 and n 1,L 1 +1 = n 2,L 2 +1 = n + 1.Then C = (C i,j ) 1≤i,j≤n is a blockwise constant matrix and the entries E i,j of the matrix E = (E i,j ) 1≤i,j≤n are iid zero-mean random variables.With such a denition the Y i,j are assumed to be independent random variables with a blockwise constant mean.Let T be a n×n lower triangular matrix with non-zero elements equal to one and B a sparse matrix containing null entries except for the B i,j such that (i, j) ∈ {n 1,0 , . . ., n 1,L 1 } × {n 2,0 , . . ., n 2,L 2 }.Then, (1) can be rewritten as follows: where T denotes the transpose of the matrix T. Let Vec(X) denote the vectorization of the matrix X formed by stacking the columns of X into a single column vector, then Vec(Y) = Vec(TBT ) + Vec(E).Thus (2) can be rewritten as Y = X B + E with Y = Vec(Y), X = T ⊗ T, where ⊗ denotes the Kronecker product, B = Vec(B) and E = Vec(E).By virtue of these transformations, Model (1) has thus been rephrased as a sparse high dimensional linear model where Y and E are n 2 × 1 column vectors, X is a n 2 × n 2 matrix and B is n 2 × 1 sparse column vectors.Multiple change-point estimation problem (1) can thus be addressed as a variable selection problem: where u 2 2 and u 1 are dened for a vector is related to the popular Least Absolute Shrinkage and Selection Operator (LASSO) in least-square regression.
Due to the sparsity enforcing property of the 1 -norm, the estimator B of B is expected to be sparse and to have non-zero elements matching with those of B. Hence, retrieving the positions of the non zero elements of B thus provides estimators of (n 1,k ) 1≤k≤L 1 and of (n 2,k ) 1≤k≤L 2 .More precisely, let us dene by A(λ n ) the set of active variables: For each j in A(λ n ), consider the Euclidean division of (j − 1) by n, namely (j − 1) = nq j + r j then to the number of distinct elements in {r j : j ∈ A(λ n )} and {q j : j ∈ A(λ n )}, respectively.

MuChPoint method
In their article [15], the authors assume that Y = (Y i,j ) 1≤i,j≤n is a symmetric matrix such that the Y i,j 's are independent random variables for i ≥ j.
is its mean rank in the group .
To estimate the change-points, the authors propose to maximize the statistic: S n (n 1 , . . ., n L ) using dynamic programming strategy [51].By symmetry, we obtain at the end (L + 1) 2 blocks with the same distribution of elements for each block.

Comparison
In this part, we compare the consistency and the complexity of the two procedures.

Consistency
From a theoretical point of view, the two procedures are consistent but not under the same assumptions.We compare the assumptions according to the characteristics of each: • Assumption on the laws: ). (M1) Y i,j 's are independent random variables when i ≥ j and the cumulative distributions functions of the Y i,j 's are continuous.
• Assumption on the separation of the blocks: , where F X is the distribution function of X • Assumption on the repartition of the observations: • Assumption for the LASSO: Under these assumptions, we obtain the following theorem: Theorem 2.1.Under the assumptions (B1), (B2), (B3) and (B4), the estimator of the BlockSeg procedure is consistent.
Under the assumptions (M 1), (M 2) and (M 3), the estimator of the MuChPoint procedure is consistent.

Complexity
Since the B is sparse (due to the number of blocks that is xed), the complexity of the BlockSeg procedure is linear with the number of cases of the matrix while the complexity of the MuChPoint procedure is slightly worse.
Proposition 2.2.The complexities of the BlockSeg procedure is O(n 2 ) and of the MuChPoint procedure is O n 3 .
The procedures are implemented in the software respectively in the packages [11] and [13].

Perspectives
In theory, the BlockSeg procedure is faster than the MuChPoint procedure but its consistency conditions are more restrictive.In practice, the model selection in the rst procedure requires estimating several times the change-point while in the second a model selection criterion can be used.Moreover, it is possible to improve the estimation of the BlockSeg procedure using the symmetry of the matrix.It is also important to automate the selection of the number of breaks.These two perspectives are the object of a future work.

On parameter estimation in non-regular situations of cusp type
In this section we present a survey of recent (and less recent) results on change-point location estimation for dierent observation models (i.i.d.observations, signal in white Gaussian noise, inhomogeneous Poisson processes, ergodic diusion processes, etc.), in presence of a cusp-type singularity (the function is continuous, but has an innite derivative) in the driving function (density, signal, intensity function, drift, etc.) at the change-point.The survey follows a recent paper [27] by Dachian, Kordzakhia, Kutoyants and Novikov, though here we consider a slightly more general situation of asymmetric (having dierent multiplicative constants at the left and at the right) cusp.For each model, we study the asymptotic behavior of the maximum likelihood estimator (MLE), as well as that of the Bayesian estimators (BEs).
Let us recall that it is well-known that in regular statistical models (irrespectively of the nature of the observations), both the MLE and the BEs are asymptotically normal and asymptotically ecient.This comes from the fact that regular statistical models are locally asymptotically normal (LAN).
The situation is quite dierent for non-regular statistical models.For classical change-point models (the function is discontinuous at the change-point), the limiting distributions of the MLE and of the BEs are no longer Gaussian, and we can cite at least three dierent kinds of asymptotic behavior.
• For i.i.d.observations with discontinuous density (see Cherno and Rubin [22], as well as Ibragimov and Khasminskii [46,47,50]) and for Poisson processes with discontinuous intensity function (see Kutoyants [57,59]), the limiting distributions of the MLE and of the BEs are given by some random variables which are (dierent) functionals of a two-sided Poisson process, and the BEs are asymptotically ecient.
• For a discontinuous signal in white Gaussian noise (see Ibragimov and Khasminskii [49,50]), for changepoint type dynamical systems with small noise (see Kutoyants [57,58]), for ergodic diusion processes with discontinuous drift (see Kutoyants [57,60]), for change-point type delay dierential equations with noise (see Küchler and Kutoyants [56]), for a discontinuous signal in time inhomogeneous diusion (see Höpfner and Kutoyants [45]), as well as for many other change-point models, the limiting distributions of the MLE and of the BEs are given by some random variables which are (dierent) functionals of a two-sided Brownian motion (Wiener process), and the BEs are asymptotically ecient.
• For the two-phase regression model (see Koul and Qian [55] for the linear case, as well as Ciuperca [24] for the nonlinear case) and for the threshold auto-regressive (TAR) model (see K.S. Chan [18], as well as N.H.Chan and Kutoyants [20,21]), the limiting distributions of the MLE and of the BEs are given by some random variables which are (dierent) functionals of a two-sided compound Poisson process, and the BEs are asymptotically ecient.
Below, we consider the problem of change-point location estimation in presence of a cusp-type singularity at the change-point for several models of observation.Interestingly, unlike the classical (discontinuous) changepoint models, and somewhat like regular statistical models, the asymptotic behavior of the MLE, as well as that of the BEs, is the same through all the models: the limiting distributions of the MLE and of the BEs are given by some random variables which are (dierent) functionals of a (two-sided) fractional Brownian motion (fBm), and the BEs are asymptotically ecient.
In all the considered models, the asymptotic behavior of the estimators is studied using the likelihood ratio analysis method introduced by Ibragimov and Khasminskii in [50].The method consist in rst showing that the normalized likelihood ratio process (with a suitable normalization) converge to some limiting likelihood ratio process, and then deducing the properties of the estimators (and, namely, the limiting distributions of the MLE and of the BEs given by some functionals of the limiting likelihood ratio process, as well as the asymptotic eciency of the BEs).It turns out that in all the considered models, the limiting likelihood ratio process is the same and is an exponential of a fBm with a power drift (see below for more details).
Though historically the rst model in which a cusp-type singularity at the change-point was studied was the model of i.i.d.observations (which will be considered in Section 3.2), we will start our survey by presenting in Section 3.1 the signal in white Gaussian noise model, in which the fBm based limiting likelihood ratio process appears the most naturally.Further, in Sections 3.3 and 3.4, we will consider inhomogeneous Poisson processes and ergodic diusion processes, respectively.Finally in Section 3.5, we present some numerical simulations concerning the limiting distributions of the MLE and of the BEs.
In conclusion, let us note that some other models of observations were equally studied in presence of a cusptype singularity at the change-point by dierent authors.For nonlinear regression models we refer to Prakasa Rao [73], Döring [31], as well as to Döring and Jensen [32]; for delay dierential equations with noise we refer to Gushchin and Küchler [40]; and for dynamical systems with small noise we refer to Kutoyants [61].These authors still obtain the same (fBm based) asymptotic behavior of the MLE and of the BEs.So, the asymptotic behavior seems to be universal in presence of a cusp-type singularity at the change-point, and it would be interesting to study more observation models to conrm (or inrm) this conjecture.

Signal in white Gaussian noise
Suppose we observe a deterministic signal in white Gaussian noise, that is, a realization of a stochastic process X where T > 0 is the observation time, ε > 0 is the noise level, θ ∈ Θ = (α,β) ⊂ (0, T ) is the unknown parameter, and S θ is the deterministic signal supposed to have the form with some continuously dierentiable function h.Here and throughout the survey, the process (W t , t ∈ R) is a standard two-sided Brownian motion (Wiener process), 0 < p < 1/2 (note that for p > 1/2, the Fisher information is nite and the statistical model becomes regular) and with a 2 + b 2 > 0.
We are interested in the estimation of the parameter θ in small noise asymptotics, that is, as ε → 0. Let us note that the long observation time asymptotics can be reduced to this case in the following way.Suppose we observe Y = 0 and satisfying the equation ( 5) with T = nτ , ε = 1 and some τ -periodic signal S θ (the period τ > 0 is supposed to be known).Then, putting we obtain a stochastic process satisfying ( 5) with T = τ and ε = n −1/2 (and a dierent noise Wiener process).
The likelihood of our model is given by (see, for example, Liptser and Shiryaev [66]) Using this likelihood, we can introduce the maximum likelihood estimator (MLE) θ ε and the Bayes estimator (BE) θ ε for a given strictly positive and continuous prior density p on Θ (and for quadratic loss function) by usual relations In order to describe the properties of the MLE and of the BEs, we need to introduce some more notations which will be used throughout the survey.
For any H ∈ (0, 1), we denote W H (u), u ∈ R the fractional Brownian motion (fBm) of Hurst parameter H, that is, a centered Gaussian process with continuous trajectories and having covariance function of the form Further, we introduce the processes and, for any γ > 0, the process We also introduce the random variables ξ H and ξ H by the relations .
Note that the random variable ξ H is well dened, since with probability one the process Z H attains its maximum in a unique point (see, for example, Ermakov [34]).Note also that similar random variables associated to the process Z γ,H are nothing but ξ H /γ and ξ H /γ. Finally for 0 < p < 1/2 and a, b from (7), we introduce the constant Note that this constant have the following explicit expressions (see Ibragimov and Khasminskii [50]): Now we can state the man result of this section.
(1) We have the following asymptotic lower bound on the mean squared error of an arbitrary estimator θε : The MLE θ ε and the BEs θ ε are consistent, have the following limiting distributions: the convergence of moments holds for the above convergences in law, and the BEs are asymptotically ecient, that is, The proof of this theorem can be found in Chernoyarov, Dachian and Kutoyants [23] and consist in checking the condition of Theorems 1.9.1, 1.10.1 and 1.10.2 of Ibragimov and Khasminskii [50], which provide all the desired results.The main ingredient hidden behind this theorems is the weak convergence (in a suitable functional space) of the normalized likelihood ratio process . The process Z (ε) is dened by , and can be continuously extended to the whole R so that it decreases to 0 at ±∞.
Let us give an heuristic argument explaining the convergence of the process Z (ε) to the process Z γ,p+ 1

2
. It is based on the following representations of the fBm (see Kordzakhia, Kutoyants, Novikov and Hin [52]).
Lemma 3.2.The process is a fBm of Hurst parameter p + 1 2 .Proof.Obviously, Y (u), u ∈ R is a centered Gaussian process with continuous trajectories.It remains to check that its covariance function is of the form (9). Indeed, denoting g v,u = d(v − u)|v − u| p − d(v)|v| p for the sake of shortness, we can write Here we used the elementary equality 2ab = a 2 + b 2 − (a − b) 2 and the change of variable v = su 1 resp.v = su 2 and v = u 1 + s(u 2 − u 1 ) in the rst (resp.second and third) integral.
For the process Z (ε) , denoting s t,u = d(t − θ − uϕ ε ) for the sake of shortness and noting that for h dened in ( 6) Further, where we used the change of variable t = θ + uϕ ε s.
Similarly, using the change of variable t = θ + ϕ ε v, we obtain where W is a Wiener process, while W p+  Combining all the above, we nally get which is the desired convergence.
The MLE θ n and the BE θ n for a given strictly positive and continuous prior density p on Θ are dened by usual relations similar to (8) with the likelihood and their asymptotic behavior as n → +∞ is described by the following theorem.
(1) We have the following asymptotic lower bound on the mean squared error of an arbitrary estimator θn : (2) The MLE θ n and the BEs θ n are consistent, have the following limiting distributions: the convergence of moments holds, and the BEs are asymptotically ecient.
For the proof of this theorem we refer to Ibragimov and Khasminskii [50, Chapter 6].

Inhomogeneous Poisson processes
Suppose we observe n independent realizations on the interval [0, T ] of an inhomogeneous Poisson process having a strictly positive intensity function λ θ of the form where T > 0 is xed, θ ∈ Θ = (α,β) ⊂ (0, T ) is the unknown parameter, and the function Ψ is continuous and, uniformly in t, Hölder continuous of order µ > p + 1 2 with respect to θ.We denote the observations X (n) = X 1 , . . ., X n , where X i = X i t , t ∈ [0, T ] is the i-th realization, i = 1, . . ., n.
The likelihood of this model is given by (see, for example, Liptser and Shiryaev [66]) Using this likelihood, the MLE θ n and the BE θ n for a given strictly positive and continuous prior density p on Θ are dened by usual relations similar to (8), and their asymptotic behavior as n → +∞ is described by the following theorem.(1) We have the following asymptotic lower bound on the mean squared error of an arbitrary estimator θn : (2) The MLE θ n and the BEs θ n are consistent, have the following limiting distributions: the convergence of moments holds, and the BEs are asymptotically ecient.
For the proof of this theorem we refer to Dachian [26].

Ergodic diusion processes
Suppose we observe a realization X T = X t , t ∈ [0, T ] of the ergodic diusion process where T > 0 is the observation time, θ ∈ Θ = (α,β) ⊂ R is the unknown parameter, and the drift S θ is supposed to be of the form S θ (x) = d(x − θ)|x − θ| p + h(x − θ) with some Hölder continuous of order µ > p + 1 2 function h.Moreover, we suppose that the conditions ES, EM and A 0 (Θ) from [60] are fullled.These conditions guarantee, in particular, the existence and uniqueness of the solution of the above equation, as well as the existence of the invariant density where G(θ) > 0 is the normalizing constant.
The likelihood of this model is given by (see, for example, Liptser and Shiryaev [66]) Using this likelihood, the MLE θ T and the BE θ T for a given strictly positive and continuous prior density p on Θ are dened by usual relations similar to (8), and their asymptotic behavior as T → +∞ is described by the following theorem.(1) We have the following asymptotic lower bound on the mean squared error of an arbitrary estimator θn : (2) The MLE θ T and the BEs θ T are consistent, have the following limiting distributions: the convergence of moments holds, and the BEs are asymptotically ecient.
For the proof of this theorem we refer to Dachian and Kutoyants [28].In Figure 5, we present the densities of the random variables ξ H and ξ H obtained by numerical simulations in Kordzakhia, Kutoyants, Novikov and Hin [52].Note that on Panel B: H = 0.5, the solid curve is plotted using the analytic expression of the density of the MLE.This is the only case where the density is known explicitly.The Threshold Autoregressive (TAR) models are introduced by [78] and were studied by many authors such that [18,70,71] and references therein.This model captures the dynamic behavior of time series by switching the regimes.The TAR model plays an important role in nonlinear time series and have been widely used to nonlinear phenomena in various elds, for example economics, environment, hydrology, physics, population dynamics, biological sciences, and among others.The TAR process is able to capture asymmetric limit cycles, as the main motivation for these models was to describe limit cycles of cyclical time-series [80].For an update overview on TAR models, we can see [79].The popularity of TAR models is due to the fact that they produce a simplied way of presenting a complex stochastic system in terms of decomposing it into a set smaller subsystem.
The main goal of TAR model related problems is to study the asymptotic properties of the estimated parameters and the estimated threshold.In [18], the author showed that under some regularity conditions, the least squares estimators of a stationary ergodic TAR models is strongly consistent.Qian, in [74], establishes the results similar to [18] for the maximum likelihood estimators for the same model under some regularity conditions on the errors density, not necessarily Gaussian.Moreover, [64] provided a numerical method to tabulate the limiting distribution of the estimated threshold in practice.In [4143], the authors developed a statistical theory for threshold estimation in the context of regression.Under the assumption that the threshold eect is vanishingly small, he obtained the distribution and a parameter free limit of the estimated threshold.
Suppose that the sequence of processes (( φn (s) s∈R ) n≥1 converges in the Skorohod space D(R) of c.a.d.l.a.g.functions on R (details will be given hereafter) to a process (φ(s)) s∈R .Then one uses a continuity result on the Skorohod space established in [76,77].This result asserts that the argmin will also converge to the argmin of the process (φ(s)) s∈R (if it exists).
As soon as we have proved that φ n converges to the two sided compound Poisson process φ in the Skorohod space, we use Theorem 3.1 of [77].Then there exists a unique random interval [M − , M + ] on which the process φ attains its global minimum a.s. and then n(r n − r 0 ) converges to M − .Now we can state our convergence result but we need an additional mixing assumption of the process X.
(H6): There exists a real a with ν/(2 + ν) < a < 1, a constant C and a real 0 < β < 1 such that for any This is a local mixing assumption and is clearly satised in the independent case and if (H2) is satised.In the context of [18,64], it is deduced from a conditional argument and the Markovian context which implies that the process is geometrically mixing.Hence (H6) is a technical assumption but we emphasize the fact that this condition is written in the same spirit as Assumption (II) in [9] so it is quite natural in our non Markovian context.
Now we can state our other main result as follows: Theorem 4.3.We suppose that Assumptions (H1) to (H6) hold and that the density π is Lipschitz.Then n(r n − r 0 ) → M − and n(r n − r 0 ) is asymptotically independent of √ n(α n (r 0 ) − α 0 , βn (r 0 ) − β 0 ) which is always asymptotically normally distributed (regardless of whether r 0 is known or not).

Figure 1 .
Figure 1.Log-ratio of the closing values of the FTSE index from July 27, 2005 to March 18, 2011 (n = 1428).

2
which is equivalent to the LS criterion, we have θ n = 1

Figure 3 .
Figure 3. On the left are the GPS coordinates of the route made by a vehicle whose lm was studied.On the right is the matrix of similarities of the associated video images.

3. 5 . 2 H
Numerical simulations It is interesting to compare the limiting mean squared errors of the MLE and of the BEs for dierent values of H = p + 1 2 .In Novikov, Kordzakhia and Ling [69], it was shown via numerical simulations that E ξ 2 H can be essentially larger than E ξ .The results are presented in Figure 4 for H ∈ (0.4, 1].

4 .
Statistical inference for threshold d autoregressive models with independent innovations 4.1.Introduction