Random ultrametric trees and applications

Ultrametric trees are trees whose leaves lie at the same distance from the root. They are used to model the genealogy of a population of particles co-existing at the same point in time. We show how the boundary of an ultrametric tree, like any compact ultrametric space, can be represented in a simple way via the so-called comb metric. We display a variety of examples of random combs and explain how they can be used in applications. In particular, we review some old and recent results regarding the genetic structure of the population when throwing neutral mutations on the skeleton of the tree.


Introduction
In this paper, we review some mathematical properties of random tree models bearing in mind potential applications to evolutionary biology. Trees are used in population genetics, to trace the genealogy of a set of homologous genes, also of individuals sampled from asexual populations (in contrast to genealogies of individuals from sexual populations, which are called pedigrees and are not trees); in phylogenetics, to represent the ancestral relationships between species; in epidemiology, to model both the history of transmissions in epidemics and the genetic relationships between pathogenic strains. When these entities (genes, individuals, species, patients, pathogens) are sampled at the same point in time, simply called the present time, these trees are said ultrametric, which means that all the leaves of the tree lie at the same graph distance from the root. Mathematically, what actually is ultrametric w.r.t. the graph distance is the set of leaves of the tree, called its boundary.
From a theoretical point of view, a tree starts from one particle (coinciding with the root of the tree) and is generated by series of replication events (birth, speciation, transmission, division), which produce the so-called branching points of the tree (points whose complementary has at least three connected components) and of termination events (death, extinction, recovery, apoptosis), which produce the leaves, or tips, of the tree (points whose complement is connected). Probabilistic models for these processes abound [19,23]: branching processes, birth-death processes, Wright-Fisher and Moran model, lookdown process... The genealogy of particles present at time t is the subtree spanned by all points at the same distance t from the root. It is the ultrametric tree we have introduced in the previous paragraph, called the reduced tree in probability and the coalescent tree in population genetics. It is also the ball of radius t centered at the root.
Seen from an empirical point of view, reduced trees are also called reconstructed trees. Indeed, the tree itself is not available as data, and so has to be inferred (that is, reconstructed), typically from multiple sequence alignments. This can be done because there exist measures of genetic distance (i.e., dissimilarity between two aligned genetic sequences) which are good proxies of their genealogical distance (i.e., graph distance, or twice the time since most recent common ancestor). In most organisms, most mutations (said neutral) occur along the lineages of the tree at a more or less constant pace, hence the name of molecular clock. Modeling mutational processes by Poisson processes with possibly variable mutation rates in time and across genes then provides mathematical relations between genealogical distance and genetic distance, which can in turn be used to infer the tree from the sequences. Statistical methods for the inference of trees from sequences form a scientific field in its own right and will not be considered here any longer. Note that the empirical trees represented in Figures 1 and 2 are not ultrametric even though the sequences used to infer them do co-exist at present time. This is because evolutionary trees are often represented with genetic distances rather than genealogical distances.
The study of phylogenetic trees has fueled much mathematical research in graph theory, geometry and probability [7,12,35]. Here, we review some results, essentially those recently obtained by the author and his co-authors, around the study of ultrametric trees, initially motivated by two questions: the inference of the process most likely to have generated a given reduced tree (phylogenetics, epidemiology); the neutral genetic composition expected to be observed in a branching population (population genetics).
We start with the general definition of real tree as a metric space, then we introduce the reduced tree (rootcentered ball) and its boundary (root-centered sphere). We then show how the boundary of an ultrametric tree, like any compact ultrametric space, can be represented in a simple way via the so-called comb metric. We display a variety of examples of deterministic and random combs, the infinite p-ary tree, the Kingman comb and comb-based exchangeable combs in general, the boundary of a branching process and coalescent point processes in general. In the last section, we review some old and recent results regarding the genetic structure of the population when throwing neutral mutations on the skeleton of the tree.

Real Trees, Ultrametric Trees, Combs
2.1. The real tree Definition 2.1. A real tree, or R-tree, is a complete metric space (t, d) satisfying (A) Uniqueness of geodesics. For any x, y ∈ t, there is a unique isometric map φ x,y : [0, d(x, y)] → t such that φ x,y (0) = 0 and φ x,y (d(x, y)) = y.
Theorem 2.2 (Four points condition). The metric space (t, d) is a real tree if it is complete, path-connected and satisfies for any For major references on this topic, see [7,12].
For any x ∈ t, the multiplicity, or degree of x denotes the number of connected components of t \ {x}. If m(x) = 1, then x is called a leaf or a tip, and if m(x) ≥ 3, then x is called a branching point.
We will further need the following notation and terminology.
• Mrca. For any x, y ∈ t the most recent common ancestor (in short mrca) of x and y, denoted x ∧ y, is the unique z ∈ t such that ρ, x ∩ ρ, y = ρ, z . • Partial order. For any x, y ∈ t, y is said to descend from x, and then x is called an ancestor of y if x ∈ ρ, y , and this is denoted x y.
• Length measure. Whenever t is locally compact, there is a unique measure λ on the Borel σ-field of t, called length measure, such that for any x, y ∈ t, λ( x, y ) = d(x, y) (see Section 4.3.5 in [12]). • Reduced tree. For a real tree t and a fixed real number T > 0, the so-called reduced tree at height T is the tree spanned by points at distance T from the root, i.e.
The topology of the reduced tree can be understood from the topology of the sphere of t with center ρ and radius T > 0 Note that by the four-points condition, for any x, y, z ∈ t {T } , which yields d(x, z) ≤ max{d(y, z), d(y, x)}, that is the metric induced by d on t {T } is ultrametric. From now on, we assume that (t, d) is locally compact, so that (t {T } , d) is a compact ultrametric space (by application  of the Hopf-Rinow theorem, since a real tree is a length-metric space). We will see in the next section that any compact ultrametric space can be represented by what we call a comb.

The comb metric
Let I be a compact interval and f : I → [0, ∞) such that for any ε > 0, {f ≥ ε} is finite. For any s, t ∈ I, It is clear thatd f is a pseudo-distance on {f = 0} and that it is ultrametric, i.e.
Let us assume additionally that {f = 0} is dense in I for the usual topology, so thatd f is a distance on {f = 0}.
Definition 2.4. We call f a comb-like function or comb, andd f the comb metric on {f = 0}.
The space ({f = 0},d f ) is not complete in general. To make it complete, one has to distinguish for each point t ∈ I between its left face (t, l) and its right face (t, r). The distanced f is extended to the space I × {l, r} by the following definitions for f, and the symmetrized definitions for s > t. If f (s) = 0,d f ((s, l), (s, r)) = 0 so that (s, l) and (s, r) must be identified. It can be shown [30] that the associated quotient space (Ī,d f ) is a compact, ultrametric space called comb metric space. Actually the converse also holds, as we will see with Theorem 2.6.
Before stating this theorem, we wish to construct the ultrametric tree hidden behind the comb metric space, as illustrated on Figure 3b.
The tree τ f (T ) is defined as the completion of (Sk,d f ), so that in particular Sk is its skeleton. We will always take the root of τ f (T ) equal to ρ = (0, T ). For t ∈ (0, T ], we call the lineage of t the subset of the tree L t defined as the closure of the set It can be shown [8] that the boundary of (τ f ,d f ) is indeed (Ī,d f ), which explains why we keep the same notation for the two distances. Also that for each t ∈ I, L t = ρ, α t , where α t ∈Ī is equal to (t, r).
Theorem 2.6 ( [30]). Any compact ultrametric space without isolated point is isometric to a comb metric space.
Note that a comb metric space (Ī,d f ) is naturally endowed with the finite measure defined by  for any Borel set A ⊆ I and B ⊆ {l, r} (where Leb denotes the Lebesgue measure, and it is important that {f = 0} is dense), which suggests that any compact ultrametric space can be equipped with a finite measure (isolated points can be treated separately). Actually, any compact ultrametric space can be endowed with a finite measure charging every ball with non-zero radius. One example of such a measure is the so-called visibility measure [31], as shown by the following argument, which is actually also used in the proof of the theorem. For any ultrametric space (U, d) and any r > 0, U can be partitioned into balls of radius r, because the relation ∼ r defined by x ∼ r y ⇔ d(x, y) ≤ r is an equivalence relation. If in addition U is assumed to be compact, the number M r of blocks in this partition has to be finite, and it is nondecreasing in r. The visibility measure is constructed by putting mass 1 on U , and recursively at each jump time of M as r decreases, by dividing the mass of each fragmenting block equally between its new sub-blocks. The comb can be constructed simultaneously with this recursive construction of the visibility measure, by mapping each ball with measure m to an interval with length m, and putting 'walls' between such intervals (the graph of the comb). See [30] for the mathematical details.
Of course, if U is already endowed with a measure, the same construction can be done using the given measure. This is in particular the case when the ultrametric space is a sphere of a totally ordered, measured tree. It can then be shown [29] that there is a càdlàg function h : R → R with no negative jumps which codes for the tree in a sense that we specify hereafter.

Sphere of a tree coded by a real function
Let h : [0, ∞) → [0, ∞) be càdlàg with no negative jumps and compact support. We are going to explain how h codes for a real tree. Set σ h := sup{t > 0 : h(t) = 0} and h.  p h : [0, σ h ] → t h map any element of [0, σ h ] to its equivalence class relative to ∼ h . Note that the tree t h is naturally endowed with a total order and a mass measure, as follows.
• Total order. We define ≤ h as the order of first visits, that is for any x, y ∈ t h , • Mass measure. The measure µ h is defined as the push forward of Lebesgue measure by p h . Conversely, it can actually be shown [9,29] that if (t, d) is a compact R-tree endowed with a total order ≤ and a finite mass measure µ satisfying some consistency conditions, there is a unique càdlàg map h called the jumping contour process of t such that the tree Figure 4 in reverse order). Now let us consider a compact real tree t coded by a function h (its jumping contour process if the function is not given a priori ) and let T > 0 be such that the sphere t {T } is not empty. We know that t {T } is an ultrametric space, but we have no guarantee that µ h charges t {T } , so we will directly construct an isometric comb metric space, following [30].
Assume that the sphere t {T } has no isolated point in itself. Then it can be shown that {h = T } has no isolated point and has empty interior. So we can construct a local time at level T for h, that is a nondecreasing, so for any s < t in I, the distance between the two points p h (J s ) and This indicates that t {T } should be represented by the comb metric space associated to the comb f just defined.
Note that f is a comb with values equal to the depths of the excursions of h away from T . Further definē is a global isometry preserving the order and mapping the Lebesgue measure to the push forward of the measure dL by p h .

The boundary of the infinite p-ary tree
The fundamental example of compact ultrametric space is the boundary of the infinite p-ary tree. This is the set U p of sequences x = (x n ) n≥1 with values in {0, . . . , p − 1} endowed with the distance d u defined for any pair x = (x n ) and y = (y n ) of elements of U p by with the convention p −vu(0) = 0. It is known that (U p , d u ) is a compact ultrametric space. The distance d u is actually the graph distance associated to the tree where each edge between generation n − 1 and generation n has length equal to p −n . We show how to construct explicitly the isometry of the theorem between (U p , d u ) and a comb metric space which can intuitively be guessed from the previous remark, as shown on Figure 5. Recall that t ∈ [0, 1] is called a p-adic number if it has two distinct p-adic decompositions. We denote by φ l (t) its p-adic decomposition stationary at p − 1 and by φ r (t) its p-adic decomposition stationary at 0. When t is not p-adic, its p-adic decomposition is simply denoted φ(t). Now for any x ∈ U p , set w(x) := max{n ≥ 1 : x n = 0}, and for any t ∈ [0, 1], define The function F p is a comb called the p-adic comb and it is not difficult to see thatd Fp defines a comb metric space (Ī,d Fp ). Then we defineφ : (Ī,d Fp ) → (U p , d u ) that maps every t ∈ [0, 1] × {l, r} to the padic decomposition of t, except when t is p-adic which then has two faces, each mapped to a distinct p-adic decomposition as follows.φ It can be seen thatφ is a global isometry between (Ī,d Fp ) and (U p , d u ) conserving the measures, where U p is endowed with the visibility measure µ p that puts mass p −n on any ball with radius p −n . The measure µ p is also the law of a sequence of i.i.d. random variables uniformly distributed in {0, . . . , p − 1}. See Figure 6 for an illustration of the dyadic comb showing the left and right faces of the same dyadic number.

The comb-based exchangeable coalescent
A first example of random ultrametric tree is the following. Start with a comb f on [0, 1] and an independent sequence (V i ) of independent and identically distributed (i.i.d.) random variables uniform in (0, 1). Note that a.s. f (V i ) = 0 for all i, so we can define the ultrametric distance δ f on N by Definition 3.1. The comb-based coalescent process (R f (t); t > 0) is an exchangeable coalescent process, in the sense that its law is invariant under permutations of N.
The converse statement is given in the next proposition.
Proposition 3.2. Let (R(t); t > 0) be a random exchangeable coalescent process such that for each t > 0, R(t) has a finite number of blocks and no singleton. Then there is a random comb f such that the comb-based coalescent process R f is equal in distribution to R.
Proof. We can define the distance δ on N by δ(i, j) := inf{t ≥ 0 : i and j are in the same block of R(t)}.
Then it is straightforward that δ is ultrametric. Indeed, for any integers i, j, k, and for any t ≥ 0, if δ(i, j) ≤ t and δ(j, k) ≤ t, then i and j on the one hand, and j and k on the other hand, are in the same block of R(t), so that i and k are in the same block of R(t). This shows that δ(i, k) ≤ t, so that δ(i, j) ≤ max{δ(i, j), δ(j, k)}. Unfortunately, m is not a measure (it does not satisfy Caratheodory's property) and (N, δ) is not compact. But because we assume that for all t > 0, R(t) has a finite number of blocks and no singleton, we can use the same procedure as outlined after the statement of Theorem 2.6 and construct a comb f on I = [0, 1] and a map φ between balls of (N, δ) and balls of (I, f ), which conserves radii, measures and partial order (inclusion). Now let (V i ) be i.i.d. uniform random variables independent of the comb (I, f ) and of the map φ and define . By construction, the processes of ranked frequencies of blocks of R and of R f are a.s. equal. Kingman's representation theorem of exchangeable partitions ensures that the law of R(ε) can be obtained from its frequencies by a paintbox process, so that R f (ε) has the same law as R(ε). By monotonicity, this implies that (R f (t); t ≥ ε) and (R(t); t ≥ ε) are equally distributed. Since ε is arbitrary, the result is proved.
The archetypal example of comb-based coalescent is the following. where the families of r.v. (U j ) and (τ j ) are independent, the (U j ) are i.i.d. uniform on (0, 1) and τ j = k≥j+1 e k , where e k are independent exponential r.v. with parameter k(k − 1)/2. Then the comb-based coalescent R f has the same distribution as the Kingman coalescent [18]. We call f the Kingman comb.
In the rest of this section, we assume that t denotes a binary R-tree, that is, m(x) ≤ 3 for all x ∈ t. See [26] for extensions to random real trees with arbitrarily large degree.

The coalescent point process
Recall from Theorem 2.7 that any càdlàg map h codes for a compact R-tree denoted t h . Let us call Brownian tree the tree coded by a Brownian excursion conditioned to have height larger than T . The Brownian excursion has a local time at level T , which allows one to construct as in Theorem 2.8 the comb giving the metric of the reduced tree at level T . This comb is a 'list', in the plane order, of the depths of excursions of the contour away from T . These depths form a Poisson point process, which motivates the following definition.  ν 0 (dx) = dx 2x 2 We will call Brownian CPP a CPP with intensity measure ν 0 (or a multiple of it).
More generally, a (root-centered) sphere of any real tree whose contour process is strongly Markovian is isometric to a coalescent point process (CPP). This is the case of splitting trees, which are the trees generated by binary branching processes where particles give birth at constant rate during a lifetime which follows an arbitrary distribution [14]. A splitting tree is actually isometric to a tree coded by a Lévy process with finite variation [21].
More generally, we say that a random binary R-tree satisfies the splitting property if for any x ∈ t, the subtrees rooted at the branching points of the segment ρ, x form a Poisson point process on ρ, x × E, where E denotes the space of locally compact R-trees. For example, the tree coded by a Brownian excursion satisfies the splitting property. We have proved that a tree satisfying the splitting property, again called splitting tree, is isometric to a tree coded by a Lévy process with possibly infinite variation [29], so that its sphere is again isometric to a CPP. From a practical point of view, an interesting question is to characterize the intensity measure ν of the CPP. In the case of strongly Markovian contour processes, ν is simply the push forward of the excursion measure away from T of the contour process, by the function which maps an excursion to its depth, and a lot is known on this measure in the Lévy case (see [21] for details). There is also a whole class of random trees whose spheres are CPPs with finite intensity measure (derogating to our definition of combs having dense support) and which have applications in evolutionary biology.
Consider a population where all individuals live and reproduce independently, and each individual is endowed with a trait (some random character living in R for simplicity) that evolves through time according to independent copies of the same, possibly time-inhomogeneous, Markov process. Further assume what follows.
• This trait is non-heritable, in the sense that any individual born at time t draws the value of her trait at birth from the same distribution, independently of her ancestors' histories; • All individuals give birth during their lifetime, according to a Poisson point process with intensity β, where β is a diffuse Radon measure on [0, ∞); • An individual holding trait x at time t dies at rate d(t, x). Theorem 3.6 ( [28]). For a tree generated by the previously defined population model, starting with one individual at time 0 and conditional on having at least one alive individual at time T , the sphere of radius T is isometric to a CPP with intensity measure ν given by where q(t) denotes the probability that an individual born at time t has no descendants alive by time T . Note that the knowledge of W (t) for t > T is not needed. In addition, if g(t, ·) denotes the density of the death time of an individual born at time t, then W is solution to the following integro-differential equation with initial condition W (0) = 1.
The last statement can be used to compute the likelihood of a given reconstructed tree (cf. Introduction) to infer the model that most likely has generated this tree. Two beautiful biology papers applying this procedure are [32,36]. More recent examples of applications can be found in Section 3.3 of the lecture notes [23], see in particular [2,11,24,25].
The genealogy of such a process can be encoded by the infinite binary tree T (finite sequences of 0's and 1's) endowed with the birth dates α(v) of each finite sequence v. If u is a prefix of v, we write u v and we say that v is a descendant of u. We will denote ω(v) the date of death of v and assume that ω(v) = α(v0) = α(v1). The tree T may then be equipped with a measure L on its boundary ∂T = {0, 1} N defined by where B u denotes the set of infinite sequences v with prefix u and N u (t) is the number of descendants of u at time t N u (t) : The mass of L is an exponential r.v. with parameter 1.
In particular, if we take T = 1 and ϕ given by ϕ(t) = e −β(t) , then ϕ(t) has the distribution of τ f , where f is a CPP with height 1 and intensity dx x 2 . Proof. We know from [8] that ϕ(t) is the genealogy of a reversed (i.e. time flows from T to 0) pure-birth tree with birth rate β = β • ϕ −1 . From the same paper, we know that a reversed pure-birth tree with birth rate β and measure L (invariant by the time-change) on its boundary is the tree τ f associated with a CPP f with intensity measure ν satisfying that the Laplace-Stieltjes measure associated with the increasing function ln(ν) is equal to β , where we wrote ν(t) = ν([t, ∞)). The proof follows from a simple calculation.
Notice that we could have applied the same reasoning to a supercritical birth-death process conditioned to nonextinction, by considering the subtree spanned by its boundary, i.e. the tree of indefinitely surviving lineages, which has the law of the tree generated by the pure-birth process with birth rateβ(dt) = β(dt) (1 − q(t)), where q(t) is the probability of survival of a single particle alive at time t. Then everything works as in Theorem 3.7 provided that the counter N u (t) is restricted to the particles alive at t with indefinitely surviving descendance, and that β is replaced withβ in the displayed formula.
Furthermore, we can recover the first part of Theorem 3.6 by replacing in the last paragraph 'infinite survival' with 'survival up to T ' and by taking ϕ(t) = T − t in Theorem 3.7.

Link between Kingman coalescent and CPP
Recall the Kingman comb and the Brownian CPP defined respectively in Definition 3.3 and Proposition 3.5. Both objects code for the genealogy of a large exchangeable population, but the Kingman coalescent is based on the assumption of a stationary population with constant size (total size constraint), whereas in the CPP the size of the population is fluctuating like a branching process, and its foundation time is fixed (time constraint). Following [27], we show that one of the two is embedded in the other. In an exchangeable population with large constant size, the descendance of a small subpopulation is blind to the total size constraint and it is constant in expectation (see for example Theorem 1 in [3]). Our goal is to state a backward-in-time version of the last informal observation, namely 'in a large stationary population with constant size, the genealogy of a subpopulation with recent MRCA is given by a CPP', and to derive some consequences of this fact.
Let K : [0, 1] → [0, ∞) denote the Kingman comb and let C : [0, ∞) → [0, ∞) denote the CPP with intensity measure ν(dx) = 2x −2 dx. Now for each ε > 0, we define S ε the linear operator mapping each real function f to the function S ε (f ) : t → ε −1 f ε −1 t . The next statement ensures that if we zoom out on the Kingman comb restricted to a small interval, we end up with the Brownian CPP.
The genealogy generated by a comb restricted to a small interval focuses with high probability on parts of the ultrametric tree which are closely related (small genealogical distances). Now conditional on K, let us sample n points conditioned to have a time to mrca smaller than ε, and consider the genealogy of the whole subtree spanned by this sample (quenched conditional sampling). In other words, we let 0 = x ε 0 < x ε 1 < · · · < x ε Nε < 1 = x ε Nε+1 denote the ranked enumeration of the finite set {K > ε} and we let B ε i denote the comb restricted to the i-th interval of the subdivision (1 ≤ i ≤ N ε + 1) where Ω denotes the set of combs endowed with the vague topology), we define conditional on K We naturally extend the definition of S ε to bivariate f , by S ε (f ) : (ω, l) → (S ε (ω), ε −1 l). where L is a Gamma (n + 1, 2) r.v. and M is an independent CPP with intensity measure ν(dx) 1 x<1 restricted to the interval [0, L].
In words, the previous statement ensures that after proper rescaling and averaging over the Kingman coalescent, the subtree spanned by a quenched conditional sample can be described in terms of a n-size-biased (i.e., biased by the n-th power of its size) Brownian CPP with height 1.

Throwing point mutations on a tree
As explained in the Introduction, phylogenetic trees are inferred from genetic distances, due to the existence of a so-called molecular clock which regulates the pace at which new mutations appear on the lineages of the tree. This provides statistical relationships between genealogical distances and genetic distances, that allow biologists to infer the former from the latter.
So we consider a point measure M on the skeleton of a real tree (t, d), that we will call mutation point measure, whose atoms are viewed as mutation events. Assume that each point x of the tree is further given a type, or allele, inherited from the most recent atom of M on ρ, x , that is, the point σ(x) := arg max{d(ρ, y) : y atom of M, y x}, which is set equal to ρ if the above set is empty. This assumption is known as the infinitely-many allele model. A point which carries the same allele as the root will be said clonal. The partition of the boundary into distinct alleles is the so-called allelic partition.
There are usually two ways of studying the allelic partition in the framework of combs. One possibility is to consider the allelic partition of a sample of size n. This can readily be done as in Subsection 3.1, by considering n r.v. V 1 , . . . , V n i.i.d. uniform in the interval I of definition of the comb, and associate each i with the most recent atom of the lineage ρ, V i . This induces a partition of {1, . . . , n} which can be described by the so-called allele frequency spectrum (A(1), . . . , A(n)), where A(k) is the number of blocks of the partition with cardinality k.
A second possibility is to consider the allelic partition of the whole population. If the comb has finite support (finite number of teeth), then one can proceed as previously. Otherwise, the allele frequency spectrum has to be expressed thanks to the measure defined on the boundary, by defining the point measure on (0, ∞) where the sum is taken over all atoms m of M and R m denotes the set of points x in the boundary such that σ(x) = m.
In the context of the molecular clock, the most natural way of modeling mutation events is to use Poisson point processes. We have seen that a locally compact R-tree (t, d) has a length measure λ, so for any nonnegative Borel function µ(t) (mutation rate at time t), we could define the mutation point measure as the Poisson point measure with intensity µ(d(ρ, x)) λ(dx). But if we are only interested in the genetic composition of the population of individuals/species co-existing at the same time, we can restrict our attention to ultrametric trees and in virtue of Theorem 2.6, focus on the tree τ f (T ) associated with a comb f defined on the interval I = [0, a]. Let µ denote a diffuse Radon measure on (0, T ]. From now on, we define the mutation point measure M on Sk as the Poisson point measure with intensity measure where we assume that f (0) = T so that there can also be mutations on the origin branch L 0 . We will refer to the measure µ as the mutation rate. Usually, µ(dx) is taken equal to θ dx, except when the comb studied is the image of an infinite tree by some map ϕ, as in Subsection 3.3.

Kingman coalescent with mutations
Here, we consider that the comb f is the Kingman comb and that µ(dx) = θ dx. The well-known results reviewed here are wonderfully exposed in [10].
Let us focus first on the allelic partition of a sample of n individuals as defined in the previous subsection. Let A n (k) denote the number of blocks of the allelic partition containing k elements. Observe that we must have n k=1 kA n (k) = n. Theorem 4.1 (Ewens sampling formula [13]). The random vector (A n (1), . . . , A n (n)) has the same law as the random vector (Y 1 , . . . , Y n ) conditional on n k=1 kY k = n, where the Y k 's are independent, and Y k is a Poisson r.v. with parameter θ/k. For any vector (a 1 , . . . , a n ) such that n k=1 ka k = n, P(A n (1) = a 1 , . . . , A n (n) = a n ) = c θ,n n k=1 θ k a k a k ! where c θ,n := n! θ(θ + 1) · · · (θ + n − 1) Proof. The trick is to follow the lineages of the n individuals backwards in time. It is known (but not obvious from Definition 3.3) that regardless of mutations, each pair of lineages coalesces at unit rate independently. Now each lineage is hit independently by a mutation at constant rate θ. Once it is hit, mutations occurring further in the past can be ignored because they have no consequence on the allelic partition at present time, thanks to the infinite-allele assumption. In addition, by the sampling consistency of the Kingman coalescent, the lineage itself can be frozen and put aside because its presence has no consequence on the process of coalescences of other lineages. So we are left with a Markov process in backward time where pairs of lineages coalesce at unit rate and single lineages are frozen at rate θ. Once all lineages have been frozen by a mutation, each of these mutations corresponds to a distinct allele and the maximum number of lineages downstream from this mutation is the size of the block corresponding to this allele. See Figure 7 for an illustration.
Now it is not difficult to see that changing the arrow of time once again and changing time, this process becomes a pure-birth process with immigration started at 0 and stopped when it hits n. It has unit birth rate and immigration rate θ. So the allele frequency spectrum has the same law as the vector of sizes of immigrant families after the process is stopped. For the calculations ending the proof see [22] or the references therein. Corollary 4.2. As n → ∞, the random vector (A n (1), . . . , A n (n)) converges in the sense of finite-dimensional distributions to a sequence (Y k ) k≥1 of independent r.v., where Y k is a Poisson r.v. with parameter θ/k. This limiting spectrum of small families (i.e., blocks with size O(1)) is sometimes called the harmonic frequency spectrum. Besides the dependence in 1/k, it is important to remember that the number of small families is O(1), which is in deep contrast with what happens for coalescent point processes (see next subsection). Now to understand the behavior of the large families (the other end of the spectrum), we set X n (i) the size of the i-th largest block of the allelic partition. The following result follows from the description of the allelic partition in terms of a pure-birth process with immigration, as in the proof of Theorem 4.1.
In words, the sizes of the largest blocks are of the order of the sample size n, and they represent fractions of the sample size which converge to the GEM distribution with parameter θ, distribution that can be obtained by a recursive stick-breaking procedure of the unit interval.
One could also tackle directly the problem on the Kingman comb, by considering the measures, say X k := (R k ), of the largest blocks (R k ) k≥1 of the allelic partition of the whole population. One would theoretically expect (X k ) to follow the GEM distribution. Another interesting question is to study the geometry of the subsets (R k ) of the unit interval analogously to what is done below in Theorem 4.4.

CPP with mutations
CPPs are less prone than the Kingman comb to calculations based on a sample of a fixed size. On the other hand, it can be shown that the genealogy of a sample of points thrown on the boundary according to a Poisson point process with constant intensity is equal in distribution to a new CPP, with finite intensity measure [20]. Modulo a modification of ν, we can therefore focus on the allelic partition of the whole population without loss of generality.
From now on, we assume that the comb f is a CPP killed at its first atom with second component larger than T , and with intensity ν, a diffuse measure on (0, ∞) such that ν(x) := ν([x, ∞)) < ∞ for all x > 0. We additionally assume that the mutation intensity measure µ is a diffuse Radon measure on [0, ∞). In particular, we can define µ(t) := µ([0, t]). Then it can be seen [8] that if 0 µ(x)ν(dx) < ∞ the total number of mutations is finite a.s. whereas if 0 µ(x)ν(dx) = ∞ then the number of mutations in any clade (set of all descendants of a point) is infinite a.s.
Recall that a point of I is clonal if it carries the same allele as the root, that is, there is no mutation on its lineage. The next statement is based on the idea that for any clonal t, the right-hand side of t only 'sees' the mutation-free lineage of t. . Let T = ∞ so that I = [0, ∞) and let R denote the closure of the clonal set. Conditional on the absence of mutation on the origin branch L 0 , R is a regenerative set that can be described as the range of a subordinator whose Laplace exponent φ is given by In addition, there exist semi-explicit formulae (that we will not provide explicitly here) for the expectation of the allele frequency spectrum, namely for where T is here to recall the dependence on the height of the CPP, and the measure Λ T is defined on N when ν is finite [20] and on (0, ∞) when ν is infinite [8]. In addition, the following convergence holds weakly on [0, ∞) where a(T ) is the width of the CPP killed at its first atom with second component larger than T (in particular, E(a(T )) = 1/ν([T, ∞))). In the case of a unit rate critical birth-death process with mutations at rate θ, we get [20] Analogously, in the case of the Brownian CPP with intensity measure dx x 2 and mutation measure θ dx, elementary calculations show that The last two formulae are reminiscent of the harmonic frequency spectrum displayed in Corollary 4.2.
Next, a LLN type of argument entails the following result.
Theorem 4.5 ( [4,8]). LetĀ T (q) denote the number of alleles whose carriers on the boundary form a set of measure larger than q (counting measure in the finite case, otherwise). Then the following limit holds a.s. A similar convergence will be mentioned in the next subsection for the allele frequency spectrum at time t of the supercritical branching process as t → ∞.

Supercritical processes with mutations
Here, we wish to explain how the allelic partition at the boundary of a supercritical branching process can be understood from the allelic partition of a CPP, in the vein of Theorem 3.7. We also say a word on the limiting allele frequency spectrum for large times.
Let t be the infinite tree generated by a supercritical birth-death process with birth intensity measure β conditioned on non extinction and let q(t) denote the probability of extinction of a particle born at time t (which depends on some unspecified, possibly 0, possibly age-dependent death rate). Also assume that conditional on t, mutations occur on the lineages of t according to some intensity measure µ on [0, ∞). Also recall that there is a measure L defined on the boundary of t by where B u denotes the set of infinite sequences v with prefix u,Ñ u (t) is the number of descendants of u at time t which have infinite descendance andβ(t) = t 0 (1 − q(s)) β(ds). Now from Theorem 3.7 and the remarks following it, if ϕ : [0, ∞) → (0, T ] is a decreasing bijection, then ϕ(t) is isometric to the tree τ f constructed from a CPP f with intensity ν and height T , naturally measured by , where Now it is straightforward to see that the mutations on τ f occur according to a mutation point measure with mutation rate µ ϕ := µ • ϕ −1 .
If we assume that µ has finite mass θ and we take ϕ given by ϕ(t) = θ −1 µ([t, ∞)), then the allelic partition at the boundary is equal in distribution to that of a CPP (with intensity ν previously displayed) with height 1 and mutations at constant rate θ.
On the contrary, if we take µ(dx) = θ dx and ϕ given by ϕ(t) = e −β(t) , then ϕ(t) has the distribution of τ f where f is a Brownian CPP with height 1, intensity dx x 2 and mutation rate µ ϕ satisfying µ ϕ [(x, 1]) = θϕ −1 (x). In particular, if the birth-death process is time-homogeneous with birth rate b and if we set a := b(1 − q), then ϕ(t) = e −at , so that ϕ −1 (t) = − ln(t)/a. As a conclusion, the allelic partition at the boundary of a timehomogeneous supercritical birth-death process with mutations at constant rate θ is equal in distribution to that of a Brownian CPP with height 1, intensity dx x 2 and mutations at inhomogeneous rate θ/(at).
Similarly to Theorem 4.5, the allele frequency spectrum at time t of the supercritical birth-death process properly rescaled converges a.s. as t → ∞. The LLN type of argument invoked here is known as the theory of branching processes counted with random characteristics [15][16][17]33]. In our setting, the random characteristic of individual i, say, can be for example the number χ k i (t) of mutations that i has experienced during her lifetime and which are carried by k alive individuals, t units of time after her birth (χ i (t) = 0 if t < 0). Then the total number A t (k) of alleles carried by k individuals at time t (except possibly the ancestral type) is the sum i χ i (t − α i ) over all individuals i (dead or alive), where α i is the birth time of individual i. The theory of branching processes counted with random characteristics ensures that these sums rescaled by e at converge a.s. on the survival event.
This method has been used extensively in [37]. Further refinements have been obtained in [5] thanks to the use of coalescent point processes. For example, we have shown the convergence of the (properly rescaled) largest blocks of the partition at time t, as t → ∞. In particular, we showed, letting N ≡ N t denote the total population size at time t, that the largest blocks are roughly of size N θ/a when θ < a, log(N ) 2 when a = θ and log(N ) when θ > a. Similarly, we showed that the oldest mutations with alive carriers at time t appeared roughly at time O(1) when θ < a, at time ln(t)/a when θ = a and at time (1 − a/θ)t when a < θ. Notice that in contrast to the Kingman case, no block has a size of the order of the total population size when the mutation rate is constant. It is an open question to investigate whether the compactification mentioned earlier in this subsection of the tree into a CPP can shed extra light on these results. In particular, the case of large families is not straightforward at all since the compactification focuses precisely on subsets of the boundary with measure of the order of the total population size.