Volume 5, Issue 11
Research Article
Free Access

Intrinsic inference difficulties for trait evolution with Ornstein‐Uhlenbeck models

Lam Si Tung Ho

Department of Statistics, University of Wisconsin, 1300 University Ave., Madison, WI, 53706 USA

Search for more papers by this author
Cécile Ané

Corresponding Author

Department of Statistics, University of Wisconsin, 1300 University Ave., Madison, WI, 53706 USA

Department of Botany, University of Wisconsin, Madison, WI, USA

Correspondence author. E‐mail: ane@stat.wisc.eduSearch for more papers by this author
First published: 08 October 2014
Citations: 64

Summary

  1. For the study of macroevolution, phenotypic data are analysed across species on a dated phylogeny using phylogenetic comparative methods. In this context, the Ornstein‐Uhlenbeck (OU) process is now being used extensively to model selectively driven trait evolution, whereby a trait is attracted to a selection optimum μ.
  2. We report here theoretical properties of the maximum‐likelihood (ML) estimators for these parameters, including their non‐uniqueness and inaccuracy, and show that theoretical expectations indeed apply to real trees. We provide necessary conditions for ML estimators to be well defined and practical implications for model parametrization.
  3. We then show how these limitations carry over to difficulties in detecting shifts in selection regimes along a phylogeny. When the phylogenetic placement of these shifts is unknown, we identify a ‘large p ‐ small n’ problem where traditional model selection criteria fail and favour overly complex scenarios. Instead, we propose a modified criterion that is better adapted to change‐point models.
  4. The challenges we identify here are inherent to trait evolution models on phylogenetic trees when observations are limited to present‐day taxa, and require the addition of fossil taxa to be alleviated. We conclude with recommendations for empiricists.

Introduction

Analysis of trait evolution is at the heart of evolutionary biology. Much effort on reconstructing phylogenies is motivated by the subsequent analysis of traits, to study how traits correlate to each other, to ecological covariates or to study how rates of trait evolution vary across clades (Pennell & Harmon 2013). This paper focuses on the statistical power that phylogenies provide for studying continuous trait evolution, when traits evolve under natural selection as modelled by an Ornstein‐Uhlenbeck (OU) process (Hansen 1997; Hunt, Bell & Travis 2008). Under the OU process, the trait undergoes random drift with variance accumulation rate σ2, but it is also attracted to a ‘selection optimum’ μ with selection strength α. More specifically, changes in a trait Y over time t are described by
urn:x-wiley:2041210X:media:mee312285:mee312285-math-0001
where Bt is a Brownian motion. The amount of time needed for the trait to move halfway towards μ, on average, is called the phylogenetic half‐life: t1/2 = ln (2)/α (Hansen, Pienaar & Orzack 2008). There is weak phylogenetic correlation between taxa when t1/2 is very small compared to the tree height T, while t1/2T results in strong phylogenetic correlation. In the particular case when α = 0, the constraint to stay near μ disappears and the process reduces to random drift only, as modelled by a Brownian motion (BM). The OU model has been used to detect the presence of natural selection, as opposed to neutral evolution. For instance, Scales, King & Butler (2009) showed evidence for natural selection acting on the fibre‐type composition of locomotor muscles of lizards, with a common selection pressure across their clade. Harmon et al. (2010) showed broad support for the OU model across many clades. The OU model has also been used to account for a partial but unknown degree of phylogenetic effect when testing for correlation between traits. Indeed, the presence of natural selection is expected to reduce the level of phylogenetic correlation. Here, the parameter α is not interpreted as a selection strength, but rather as a parameter to estimate the level of phylogenetic correlation, or phylogenetic inertia (Hansen & Orzack 2005). This is useful when testing relationships between various traits [e.g. Smith, Ané & Baum (2008); Lavin et al. (2008)].

The OU process is ideal to model changes in selection regimes, such as changes in the selection optimum μ between different parts of the tree (as pioneered by Butler & King (2004), see also Scales, King & Butler (2009) for instance), or changes in α or in the variance rate σ2 (Beaulieu et al. 2012). Changes in the selection optimum μ have been modelled elegantly by Hansen, Pienaar & Orzack (2008) and Bartoszek et al. (2012), under the assumption that μ is driven by a continuous predictor whose evolution is modelled with a BM or OU process.

While maximum likelihood (ML) is widely used with OU models, very few studies have focused on the statistical properties of these methods on trees. Some recent work showed that statistical behaviours expected under standard models with independent units should not be taken for granted (Boettiger, Coop & Ralph 2012; Ho & Ané 2013), such as:
  1. Consistent estimators: parameter estimates should converge to the true parameter values as more and more data are collected. For this, it is necessary (but not sufficient) that parameters are identifiable.
  2. Increasing power: tests of particular hypotheses should reach any desired power provided that enough data are collected.
  3. Accurate model selection using likelihood ratio tests, AIC or BIC.

In this paper, we first review cases for which these properties were proved to break down on phylogenies. We then describe limitations with OU models for trait evolution regarding all three properties: lack of identifiability and consistency, lack of power and inaccurate model selection. Our simulations show that the power to detect phylogenetic changes in the selection optimum depends little on the tree size. Moreover, when the phylogenetic position of the changes is unknown, AIC and BIC fail to select the correct model in favour of overly complex models with many changes. To remedy this problem, we introduce a phylogenetic adaptation of the modified BIC suggested by Zhang & Siegmund (2007). Other recommendations are provided to help empiricists handle these various limitations.

Review of known statistical limitations

Limitations in inferring ancestral states have been reported very early in empirical studies because large standard errors were observed (e.g. Schluter et al. 1997; Garland & Ives 2000). Confidence intervals for ancestral states can be so wide as to span the range of observed present‐day values, so much so that ‘[...] it might be best to accept our limitations and not even try to estimate ancestral states from comparative data’ (Martins 2000). These limitations, due to phylogenetic dependence among taxa, have recently been explained theoretically. Ané (2008) showed that under BM and some general conditions, the ancestral state is not estimated consistently as the tree grows indefinitely. More specifically, the variance of the maximum‐likelihood estimator (MLE) of the ancestral state cannot be lower than σ2t/k where k is the number of daughters of the root node and t is the length of the shortest branch stemming from the root. This result proves that the ancestral state reconstruction accuracy is very limited, unless evolution proceeded slowly or the ancestral node of interest consists of a large polytomy, or unless fossil taxa can be included in the analysis. The accuracy loss due to phylogenetic dependence can be substantial. Using the phylogeny from Bininda‐Emonds et al. (2007) with 4507 species for instance (Fig. 4, left) and under BM evolution, the reconstruction at the base of the mammal tree has the same accuracy as that obtained from only 5 independent taxa, if such existed, from a star tree of the same height. With marsupials excluded (4249 species), trait reconstruction for the non‐marsupial mammal ancestor has the same accuracy as would be obtained from about 19 independent species. Interestingly, this number of equivalent independent observations was shown to be lower than the adjusted degree of freedom defined by Paradis & Claude (2002) as the tree length to tree height ratio, when the tree is ultrametric.

Under the OU model, Ho & Ané (2013) proved a similar issue with the ancestral state and the selection optimum μ. If the tree is ultrametric and the height of the tree is bounded, μ cannot be estimated consistently as the tree grows indefinitely, regardless of the estimation method. They also proved that the presence of fossil taxa is crucial to increase precision in μ. Slater & Harmon (2013) point to many other benefits of integrating data from both fossils and extant taxa.

While contemporary species bear limited information on ancestral states and optimal values, rates of evolution and correlation among traits are typically much easier to estimate (also shown experimentally by Oakley & Cunningham 2000). Under BM evolution, independent contrasts (Felsenstein 1985) have been used to detect significant correlations in many studies, too numerous to count. Many studies also successfully detected shifts in the rate of trait evolution σ2 (e.g.O'Meara et al. 2006; Davis et al. 2007; Eastman et al. 2011; Venditti, Meade & Pagel 2011). Indeed, the precision in estimated rates and regression coefficients is known to increase with the square root of the number of taxa, unlike ancestral state estimates (Ané 2008).

Inference on the level of phylogenetic correlation from contemporary taxa is also challenging. One common way to measure phylogenetic signal is through a λ parameter (Freckleton, Harvey & Pagel 2002), where λ = 0 corresponds to no phylogenetic correlation and λ = 1 corresponds to the BM model. Using simulations, Boettiger, Coop & Ralph (2012) showed clearly that λ estimates can be very uncertain and typically have a lot less precision than estimates of the variance rate σ2.

Another common way to measure phylogenetic signal is through the OU model (e.g. Lavin et al. 2008), with α = ∞ corresponding to no phylogenetic correlation and α = 0 to the BM model. Here again, Ho & Ané (2013) proved that, in some situations, α is not consistently estimable (while the variance rate σ2 might be), and Boettiger, Coop & Ralph (2012) demonstrated with simulations that the power to detect α>0 can be disappointingly low even on very large trees.

Traditional methods for model selection are also challenged by phylogenetic dependence. The standard BIC was proved to be inappropriate under BM evolution (Ané 2008). Its tendency to select overly simple models was explained by its penalty increasing with the number of present‐day species instead of a number of equivalent independent observations. Both AIC and BIC were shown in simulations to result in a strong bias for overparametrized models when applied to OU models with multiple selection regimes (Boettiger, Coop & Ralph 2012), casting doubts on the appropriateness of these criteria for phylogenetic comparative methods. Given these limitations when selection parameters are constant along the whole tree, precision and model selection issues are expected to be even more pronounced under complex OU models with multiple selection regimes in the tree (Butler & King 2004; Hansen, Pienaar & Orzack 2008; Bartoszek et al. 2012; Beaulieu et al. 2012).

(Un)identifiability of selection regimes

A desired property of any model is that the likelihood reaches its maximum at a unique point over the range of parameter values. Unfortunately, we show that it is not always the case for OU models on trees, when the maximum likelihood is reached on a ridge making the MLE undefined and some parameters unidentifiable. Consider first the simple OU model on an ultrametric tree with a single selection optimum μ. Given trait y0 at the root, the observation Yi for individual i is normally distributed with mean y0e−αT + μ(1−e−αT), where T is the height of the tree. We can write this as
urn:x-wiley:2041210X:media:mee312285:mee312285-math-0002(eqn 1)
where β = (y0,μ), the residual variation e has covariance derived from the OU process and X is a design matrix with 2 columns, e−αT1 and (1−e−αT)1, where 1 is a vector of ones. In general, X needs to be of full rank for the MLE to be unique and for all parameters in β to be identifiable. In our example, the two columns of X are parallel (each one is a rescaled version of the other), so X is not of full rank and y0 and μ cannot be identified apart from each other. This is true regardless of α. Surprisingly, this lack of identifiability has gone unnoticed for a long time, even though it can occur frequently with OU tree models. One consequence is that the MLE of β is not unique, regardless of the size of the tree. To illustrate the problem, we simulated data under an OU process on the 4507‐species ultrametric mammal tree from Bininda‐Emonds et al. (2007) with ancestral state y0 = 3, selection optimum μ = 2, phylogenetic correlation parameter α = 1 and variance rate σ2 = 2. The tree was rescaled to have height T = 1, so that α = 1 corresponds to a half‐life t1/2 = 69% of the tree height, an intermediate level of phylogenetic correlation. For our simulated data, the MLEs were urn:x-wiley:2041210X:media:mee312285:mee312285-math-0003 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0004 (obtained with the R function phylolm, Ho & Ané 2014). However, every point (y0,μ) lying on the line urn:x-wiley:2041210X:media:mee312285:mee312285-math-0005 maximized the likelihood (Fig. 1). This line formed a ridge of the likelihood surface. Hence, any good optimization procedure should report some convergence issue, and any one reported MLE value on this line is obviously not necessarily a good estimate for the true value of (y0,μ).
image
Log likelihood surface with respect to (y0,μ) from data simulated on a 4507‐taxon tree, with other parameters fixed to their ML values. The dash line urn:x-wiley:2041210X:media:mee312285:mee312285-math-0006 is where the likelihood achieves its maximum.

Identifiability of selection optima

More generally, non‐identifiability occurs whenever X is not of full rank, in which case the likelihood surface has a hyperplane ridge and the MLE is undefined. The unidentifiability of the ancestral state and selection optima can occur in models where μ takes different values (μ12,…,μm) along different parts of an ultrametric tree, even when the location in the phylogeny of the different regimes is known. It is the case when the part of the tree under the influence of μ forms a connected subtree for every ℓ (Fig. 2 left, proof in Appendix A1).

image
Edges are ‘painted’ according to their selection regime, with one optimum μ for each colour. Unidentifiability case (left): every selection regime forms a connected component. Identifiability case (right): one regime (black) covers two disconnected parts in the tree.

Note that connected subtrees do not need to form clades. The condition of connected subtrees is equivalent to there being a minimal number m−1 of changes of selection regimes along the tree. This is the case, for instance, if there is only one shift in the selection optimum, dividing the tree into 2 connected subtrees, each with its own selection regime. In this common situation, the ancestral state and the two selection optima are not separately identifiable.

On the contrary, OU models with multiple regimes are identifiable when selection regimes are not perfectly correlated with the tree, with at least one regime covering two disconnected subtrees (Fig. 2 right). However, we need to emphasize that identifiability only guarantees the uniqueness of the ML estimator. It does not guarantee its performance. In fact, the next section discusses situations when the MLE has poor precision even on large trees.

If we relax the assumption of known location (s) in the phylogeny for the putative shifts in selection regimes, then some of these location parameters also become unidentifiable, like the precise timing of a shift along a branch or the number of shifts along a single branch (see Appendix A1). Also, alternative scenarios with separate shifts along adjacent edges might not be distinguishable. In particular, 2 shifts along 2 sister edges have the same signature of expected values at extant taxa as one shift on either of the 2 sister edges and the other shift earlier along the parent edge. This is kept in mind for model selection in section.

Diagnostics and reparametrization

A lack of identifiability can be diagnosed by a lack of convergence during ML optimization, as algorithms may search the ridge of the likelihood surface forever. This might explain some convergence failures and the lack of reasonable estimates for some μ's reported by Butler & King (2004) or Beaulieu et al. (2012). Luckily, in studies using several selective regimes, hypotheses typically have at least one regime forming a disconnected subtree – see for instance the study of fibre‐type composition of iliofibularis muscle in lizards (Fig. 1 in Scales, King & Butler 2009). These models are thus identifiable with unique ML parameter estimates. Another source of non‐identifiability is when α = 0, because μ has no influence on the trait when selection is not acting. This can cause an almost flat ridge in the likelihood surface for very low estimates of α (Butler & King 2004). Note also that a ridge in the likelihood plagues any Bayesian method, with the prior distribution having complete influence over which values get supported along the ridge.

To fix this lack of identifiability, we can reparametrize the model to use a new design matrix of full rank. For example, with a single regime, we can re‐express the model by using (β0,α,σ2) instead of (y0,μ,α,σ2) where β0 = y0e−αT + μ(1−e−αT). With this new parametrization, Y = β01 + e, but now β0 is identifiable with a unique MLE.

To illustrate, we reanalysed flower diameter of 25 Euphorbiaceae species under an OU model with ancestral optimum size μ0 and a shift to a larger optimum flower size μ1 at the base of the 3‐species Rafflesiaceae clade (see Fig. 1 in Davis et al. 2007). As noted before, y0, μ0 and μ1 are not identifiable separately. We reparametrize the model using (β01,α,σ2) where β0 = y0e−αT + μ0(1−e−αT) is the expected flower size in non‐Rafflesiaceae and β1 = (1−e−αt)(μ1−μ0) is the difference in expected flower diameter between Rafflesiaceae and non‐Rafflesiaceae. Here T is the age of the tree and t the age of the shift, which we do not need to estimate with our reparametrization. The model can be written Y = β01 + β1X1 + e where X1 is a column of zeros and ones, with ones corresponding to Rafflesiaceae. We rescaled branch lengths in the phylogeny to a tree height of 1 and used phylolm (Ho & Ané 2014) on the log‐transformed diameters (as in Davis et al. 2007). We obtained urn:x-wiley:2041210X:media:mee312285:mee312285-math-0007 (that is,  exp (0·88) = 2·41 mm), urn:x-wiley:2041210X:media:mee312285:mee312285-math-0008 (that is, an  exp (4·4) = 81·2‐fold size increase in extant taxa), urn:x-wiley:2041210X:media:mee312285:mee312285-math-0009 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0010, which indicates weak phylogenetic correlation (urn:x-wiley:2041210X:media:mee312285:mee312285-math-0011 of the tree height).

Another approach is to assume that the ancestral state y0 is random according to some prior distribution. If the OU process is homogeneous with a single selection optimum, the stationary distribution of that process is a natural choice for the prior distribution of y0: Gaussian with mean μ and variance σ2/(2α). However, the stationary distribution has no clear definition when there are several regimes. To use this approach for the flower diameter data above, we assumed that y0 followed a normal distribution with mean μ0 and variance σ2/(2α). By doing this, we drop y0 from the set of parameters, but keep the selection optima μ0, μ1. We also need to keep the shift age t in the model, which we assumed to be 0·605, at the base of Rafflesiaceae. We obtained the same urn:x-wiley:2041210X:media:mee312285:mee312285-math-0012, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0013 as before, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0014 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0015 (that is, a 196‐fold increase in optimal flower size).

Another key ingredient to restore identifiability is to add fossil data, whenever possible. Fossil taxa can provide very influential information on both y0 and μ (or μ's) because they are at a shorter distance T from the root and under a lesser influence of μ than contemporary tips.

(Non)‐Microergodicity of selection parameters

Unlike in traditional regression models with independent residuals, identifiability does not guarantee that the MLE converges to the true parameter when the sample size increases indefinitely. A requirement stronger than identifiability is needed instead, historically called microergodicity (Stein 1999). To understand this concept, we first consider two simple examples.

Example 1: Assume that urn:x-wiley:2041210X:media:mee312285:mee312285-math-0016 are independent binary observations with unknown mean p = P{Yi = 1}, and P{Yi = 0} = 1−p. In this case, the sample mean urn:x-wiley:2041210X:media:mee312285:mee312285-math-0017 is a ‘good’ estimator of p, in the sense that it converges to p as the sample size n increases, from the law of large numbers. Here we keep gaining more precision by collecting more data.

Example 2: Consider repeated binary observations urn:x-wiley:2041210X:media:mee312285:mee312285-math-0018 with mean p = P{Yi = 1}. Assume that Y2 is independent of Y1, but that the remaining observations simply repeat Y1 and Y2 over and over: Y2k−1 = Y1 and Y2k = Y2 for k≥1. Here, observing the entire sequence does not provide any extra information than observing Y1 and Y2 only, due to the extreme correlation between sampling units. Collecting more data does not increase information, and obviously, there does not exist any estimator f(Y1,…,Yn) that would converge to the true value of p.

To conceptualize the difference between these two examples and to understand the amount of information carried by comparative data, we use the concepts of orthogonal distributions and of microergodicity.

Definition: Two distributions P1 and P2 are orthogonal if there exists an event A such that P1(A) = 1 and P2(A) = 0.

Intuitively, this means that we can use data and tell with certainty whether these data came from P1 or P2. If the event A is observed, then there is certainty that the model associated with P2 is wrong. But if we observe that A does not occur, then the model associated with P1 is certainly wrong. With example 1, we might want to compare the models where p = p1 = 0·5 vs. p = p2 = 0·8, and call Pi the distribution of the entire sequence urn:x-wiley:2041210X:media:mee312285:mee312285-math-0019 when p = pi (i = 1 or 2). Consider the event urn:x-wiley:2041210X:media:mee312285:mee312285-math-0020 converges to 0·5}. By the law of large numbers, P1(A) = 1 and P2(A) = 0, so P1 and P2 are orthogonal. This reflects the fact that we can identify whether p = 0·5 vs. p = 0·8 with certainty from the entire sequence urn:x-wiley:2041210X:media:mee312285:mee312285-math-0021 in example 1.

Definition: Let (Yn)n≥1 be an infinite sequence of observations, and let Pθ be the joint distribution of (Yn)n≥1 under some model and parameters collected in a vector θ. A function f(θ) of θ is said to be microergodic if for every urn:x-wiley:2041210X:media:mee312285:mee312285-math-0022 such that urn:x-wiley:2041210X:media:mee312285:mee312285-math-0023, then urn:x-wiley:2041210X:media:mee312285:mee312285-math-0024 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0025 are orthogonal.

We think of (Yn)n≥1 as trait values obtained from infinitely many species as a best case scenario, as if we were able to increase taxon sampling indefinitely. With the definition above, a function of parameters f(θ) is microergodic if the full data set (Yn)n≥1 contains enough information to tell the value of f(θ) with certainty. Unless θ is microergodic, there is no good estimator for it (Zhang 2004), even from an infinite sample. Therefore, microergodicity is necessary for a precise estimation of model parameters. In example 1, the mean p is microergodic, as the argument above used to distinguish p = 0·5 from 0·8 can be repeated to distinguish any two values of p. In example 2, p is not microergodic. The dependence between observations maintains uncertainty about the value of p, even from infinitely many Yn observations.

Recently, Ho & Ané (2013) investigated microergodicity for the one‐regime OU model on an ultrametric tree and with a random ancestral state at the root (to avoid the unidentifiability issue mentioned earlier). While this simple model is unlikely to be adequate for most real traits, it is still considered when comparing hypotheses and more complex models are expected to require even more data. Ho & Ané (2013) proved that if the height of the tree is bounded, as when sampling from a group of interest, then the selection optimum μ is not microergodic. Therefore, information from contemporary taxa is not sufficient to estimate μ exactly, even if infinitely many taxa were observable. On the other hand, σ2 was shown to be microergodic in the case when the tree has many ‘young’ internal nodes. Unfortunately, an additional assumption to ensure enough variation in internal node ages was required for α and the stationary variance γ = σ2/(2α) to be microergodic. The microergodicity of α is necessary for a good estimator urn:x-wiley:2041210X:media:mee312285:mee312285-math-0026 to exist, but is it sufficient? This question remains open. In the particular case of a symmetric tree, we can prove that the restricted MLE of α converges to the true α with more data, if the microergodicity condition is met (using tools in Ho & Ané 2013, appendix B).

To illustrate these theoretical results, we simulated data on several very large phylogenies from across the tree of life: a 9993‐species phylogeny of birds (Jetz et al. 2012), a 4507‐species mammal tree (Bininda‐Emonds et al. 2007), a 839‐taxon tree on Fabales (Simon et al. 2009) and a 140‐species phylogeny of ants (Moreau et al. 2006). We also used a 400‐language phylogeny (Gray, Drummond & Greenhill 2009). For simplicity, we present here the results on the largest tree only (9993 taxa). The results on the other phylogenies are similar (see Figs A1–A4).

Twenty sequences of nested phylogenies from 50 to 9993 taxa were created by randomly selecting subsets of taxa from 20 bootstrap trees from Jetz et al. (2012), conditional on the root being the only common ancestor of the selected taxa to guarantee that all trees have the same ancestral species and same height. Trees were rescaled by a common factor to have height 1. Data were simulated from a one‐regime OU model with μ = 0, γ = 1, α = 0·1, 1 or 10 and σ2 = 2αγ. This model is very close to a Brownian motion when α = 0·1 (t1/2 = 6·9 much larger than the tree height T = 1) and very close to phylogenetic independence when α = 10 (t1/2 = 0·069 very small). Figure 3 shows the MLEs urn:x-wiley:2041210X:media:mee312285:mee312285-math-0027, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0028, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0029 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0030, which were obtained using phylolm (Ho & Ané 2014). When phylogenetic correlation is strong to moderate (α = 0·1 or 1), the accuracy of urn:x-wiley:2041210X:media:mee312285:mee312285-math-0031 does not improve with more taxa, illustrating the non‐microergodicity of μ. Under strong phylogenetic correlation (α = 0·1), urn:x-wiley:2041210X:media:mee312285:mee312285-math-0032 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0033 are strongly biased with few taxa, with α being overestimated almost all the time with less than 100 taxa. Their bias improves with more taxa but their precision does not: there is as much uncertainty about α and γ from 9993 taxa as from 50. This illustrates that α and γ are not microergodic. On the other hand, the variance rate σ2 is estimated precisely, regardless of the phylogenetic correlation. Under weak phylogenetic correlation (α = 10), all parameters are estimated as usual: with little or no bias and with increasing precision from more taxa. These simulations illustrate the main difficulty of detecting selection. When it is moderate, phylogenetic correlation between taxa greatly reduces the actual amount of information on selection parameters, especially on the target of selection. With these limitations in mind for a single selection regime, we now turn to models with multiple regimes.

image
Violin plots showing the distribution of the MLEs of μ, γ, α and σ2 = 2αγ on subtrees from the 9993‐species bird phylogeny (Jetz et al. 2012) with 2000 simulations at each sample size. Each column corresponds to one set of true parameter values: μ = 0; γ = 1; α = 0·1 (left, t1/2 = 6·9 compared to the tree height T = 1), 1 (middle, t1/2 = 0·69), 10 (right, t1/2 = 0·069) and σ2 = 2αγ.

Power to detect shifts in the selection optimum

Detecting changes in selection regime is important to study the drivers of selection pressure. For example, Brawand et al. (2011) detected evolutionary shifts in gene expression in testes for a large number of mammalian genes, by comparing a single‐regime OU model to a two‐regime OU model in which the optimal gene expression level μ undergoes a shift on a specific lineage (see also Rohlfs, Harrigan & Nielsen 2014). In this section, we use simulations to study the power to detect such shifts. We first consider the case when the location in the phylogeny of the shift is known, such as when it is hypothesized to be driven by an environmental change or change in another trait that can be mapped onto the phylogeny (e.g. Butler & King 2004). Next we consider the case when the number and phylogenetic position of the shift(s) need to be estimated.

Known location: We simulated data from an OU model with two selection regimes on the 4507‐species mammal tree from Bininda‐Emonds et al. (2007). The shift in selection optimum was placed at the base of Euarchontoglires (Fig. 4, left), so that each regime applied to about half the taxa. Twenty sequences of nested trees from 10 to 4507 taxa were created by randomly selecting subsets of taxa, under the constraint that the number of sampled Euarchontoglires was half the total number of sampled taxa. Along each tree, 100 data sets were simulated according to the OU model with ancestral state and ancestral optimum y0 = μ0 = 0, γ = 1, α = 1, and a Euarchontoglires optimum shift to μ1 = 1·079,2·157 or 4·314. This model corresponds to expected values in extant taxa of β0 = 0 for non‐Euarchontoglires and of β1 = 0·5, 1 or 2 for Euarchontoglires. The parametrization with β values was used for inference, to avoid the non‐identifiability issue described earlier. Testing the shift in selection regime is then equivalent to testing whether β1 = 0 or not. Figure 5 shows that the estimation error in β1 decreased from 10 to 50 taxa, but did not change much from 50 to 4507 taxa. This phenomenon reflects the non‐microergodicity of both selection optima, and suggests that the power to detect the shift (β1 ≠ 0) does not increase significantly as the tree grows. The same simulation was carried out on the 839‐taxon tree on Fabales (Simon et al. 2009) with similar results (see Fig. S5).

image
Position of change in the selection optimum, used in simulations along a tree for mammals (Bininda‐Emonds et al. 2007) and Fabales (Simon et al. 2009). The bar shows the half‐life used in simulations (t1/2= ln (2)/α).
image
Boxplots showing the distribution of the MLEs of the shift, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0034, from data simulated on the 4507‐species mammal tree in Fig. 4(left).

From each simulated data set, we rejected the null hypothesis of no shift if urn:x-wiley:2041210X:media:mee312285:mee312285-math-0035 where the standard error of urn:x-wiley:2041210X:media:mee312285:mee312285-math-0036 for each tree size was estimated as the standard deviation of urn:x-wiley:2041210X:media:mee312285:mee312285-math-0037 over the 2000 simulations. Figure 6 shows that the power depends mostly on the effect size and very little on the tree size: when the signal is weak (effect size urn:x-wiley:2041210X:media:mee312285:mee312285-math-0038), the shift is detected only 39·8% of the time even from 4507 taxa. On the other hand, the shift is detected 99% of the time from only 50 taxa when the signal is strong (effect size urn:x-wiley:2041210X:media:mee312285:mee312285-math-0039). In the example above on flower diameter evolution from Davis et al. (2007), the shift at the base of Rafflesiaceae was statistically significant (SE(urn:x-wiley:2041210X:media:mee312285:mee312285-math-0040, P‐value 10−12). This is likely not because of a large sample size: there were 22 and 3 species in each regime. Instead, the shift was detectable because of a very large effect size; urn:x-wiley:2041210X:media:mee312285:mee312285-math-0041 was estimated to be 8·92.

image
Proportion of times the no‐shift hypothesis H0: β1=0 was rejected.

To further illustrate that the power to detect a selection optimum shift depends little on tree size, we used the approach introduced by Boettiger, Coop & Ralph (2012) to detect the shift. This method uses a bootstrap‐based likelihood ratio test and simultaneously estimates the power of the test. We simulated data from the OU model on the 839‐taxon tree on Fabales (Simon et al. 2009; Fig. 4 right) with one shift in μ at the base of a clade containing roughly half the total number of taxa. Twenty sequences of nested trees from 10 to 839 taxa were created by sampling taxa randomly, but requiring an equal number of taxa in each regime when n < 839. Along each tree, one data set D was simulated under the OU model as before with γ = 1, α = 1, y0 = μ0 = 0 and a shift varying from μ1 = 0·873, 1·747 to 3·494, corresponding to β0 = 0 and β1 = 0·5, 1 or 2. Each data set D was analysed as follows (Boettiger, Coop & Ralph 2012). First, model parameters were estimated under a single regime (model 1). Bootstrap data sets urn:x-wiley:2041210X:media:mee312285:mee312285-math-0042 were then simulated independently with these estimated parameters under model 1 (Fig. 6). For each urn:x-wiley:2041210X:media:mee312285:mee312285-math-0043, the log likelihood ratio δ = 2(log L2−log L1) was computed to compare model 1 to the two‐regime OU model (model 2) with the shift placed at its correct location. Secondly, D was used again to estimate parameters under model 2 and parametric bootstrap replicates were simulated under the two‐regime model to obtain a sample of δ values under that model. The area of the overlap between the two distributions of δ was then used to measure the power to detect the shift: the smaller this overlap, the greater the power (see Fig. 7 insets for examples of this overlap). As Fig. 7 shows, the power to detect the shift increases very little (i.e. the overlap decreases very little) with the number of taxa. Instead, the overlap and hence the power are most influenced by the true magnitude of the shift.

image
Average overlap between two distributions of the log likelihood ratio (log LR) statistic δ: under the 1‐regime and two‐regime models. Data were simulated under two regimes on the 839‐taxon tree in Fig. 5 (right). Inset: example distribution of δ under 1 regime (dark grey) or 2 regimes (light grey) and overlap (black).
Unknown location: When the number and phylogenetic placement of shifts are unknown, a model selection procedure is needed. The most complex model includes a shift on every branch b in the tree:
urn:x-wiley:2041210X:media:mee312285:mee312285-math-0044
where e has phylogenetic covariance from the OU process and each Xb is a column of zeros and ones, with ones corresponding to descendants of branch b. The expected change βb in extant taxa is linked to the shift in selection optimum Δμ through urn:x-wiley:2041210X:media:mee312285:mee312285-math-0045 where tb is the shift's age, due to phylogenetic inertia. We seek to identify on which branch there is a shift, that is, for which branch, βb≠0. To do so, we consider here a traditional stepwise selection method, similar to that used by Ingram & Mahler (2013) in SURFACE. The simplest one‐regime model is first evaluated. At each step, a list of candidate models is made by modifying the current model: either adding a shift on a branch, moving an existing shift to a neighbour branch or dropping a shift. The procedure stops if the current model is better than every candidate model based on a criterion like AIC (Akaike 1974) or BIC (Schwarz 1978). Otherwise, the best model in the list is selected and used as the new current model for the next step. For identifiability purposes again, we consider a maximum of one shift per branch and our procedure disregards models with ‘extinct’ regimes, that is regimes applying to internal branches only. Models with shifts located on sister branches are considered, with the caveat that their exact location among the 2 sister edges and their parent edge is not identifiable. An R implementation of this stepwise selection method is available from the authors.

We simulated 10 data sets along a 140‐taxon tree under the OU model with 3 shifts, located on branches labelled urn:x-wiley:2041210X:media:mee312285:mee312285-math-0046, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0047 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0048 (Fig. 8, left). We used a moderate α = 1 after rescaling the tree to a height of 1, γ = 1, and simulated shifts in selection optima leading to urn:x-wiley:2041210X:media:mee312285:mee312285-math-0049, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0050 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0051. The first shift is expected to be easy to detect, but the other two are not, because their magnitude is less and because they are located on sister edges. As discussed before, due to a lack of identifiability, these two shifts are expected to be detected on either urn:x-wiley:2041210X:media:mee312285:mee312285-math-0052, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0053 and/or on their parent edge. The stepwise model selection procedure was used with the constraint that the number of shifts in any proposed model may not exceed a given number. Figure 8 shows one simulation, for which AIC selected m = 10 shifts when allowed no more than 10. The first 3 corresponded to the true shifts (with some location error). With one exception, all other estimated shifts corresponded to a single taxon or a pair of sister taxa, misinterpreting one or two extreme value(s) for a shift in selection regime. Figure 9 (top) shows a steady decrease in AIC and BIC values with m, leading to the estimation of the maximum allowed of 10 shifts in all cases, and a strong overestimation of regime shifts. With the allowed maximum increased to 40, AIC selected 40 shifts for all 10 data sets. BIC also selected 40 shifts for 9 data sets and 28 shifts for the remaining data set.

image
Left: 140‐species phylogeny of ants (Moreau et al. 2006) used for simulations, with true shifts in the selection optimum μ located on branches urn:x-wiley:2041210X:media:mee312285:mee312285-math-0054, urn:x-wiley:2041210X:media:mee312285:mee312285-math-0055 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0056. AIC selected 10 shifts when allowed no more than 10, along edges marked in bold. Edge numbers indicate the order in which estimated shifts were added by AIC. Middle: simulated normalized trait data. Right: model selected by SURFACE when allowed no more than 10 shifts. Shifts were detected on the same 10 branches. Identical colours are used for regimes inferred to share the same optimum.
image
OU model selection with a maximum allowed number of shifts (m) in the selection optimum. Each shift was penalized as one parameter with AIC (top left) or BIC (top right), or as two parameters using AIC like in SURFACE (bottom left), or through the modified BIC (bottom right). Each line corresponds to 1 data set, simulated under OU evolution with 3 true shifts. Separate points are used for strictly decreasing values, indicating areas when the estimated number of shifts equals the maximum allowed. Horizontal lines indicate cases when the estimated number of shift does not change with the maximum allowed.

We conjecture that the failure of AIC and BIC may be caused by two phenomena related to the very large number of candidate models on large trees. First, there are more models to choose from than available data points, a problem coined ‘large p, small n’ in the statistics literature. Indeed, with n taxa on a rooted tree, there are 2n−2 models with one shift, one for each of the 2n−2 branches. So with a single shift and a single extra parameter, the number of potential models is already larger than the number of data points, providing ample opportunity for overfitting. The second issue with heterogeneous tree models is combinatorial: the number of models explodes with the number of shifts. On the 140‐taxon tree used above, there are 2n−2 = 278 models with a single shift, but 38 503 models with 2 shifts (from choosing 2 of the 278 edges), and 6·4×1017 models with 10 shifts. This explosion gives an advantage to 10‐shift models over 2‐shift models, and to 2‐shift models over 1‐shift models when using AIC or BIC, which are not meant to protect against multiple testing.

A natural way to avoid overfitting is to increase the penalty in AIC or BIC. Ingram & Mahler (2013) counted both the magnitude and position (edge) of each shift as parameters, therefore using an AIC penalty of 2×(3 + 2m) in SURFACE (m being the number of shifts), compared to the AIC penalty of 2×(3 + m) when the shift locations are known (Butler & King 2004). However, the larger AIC penalty used in SURFACE still favours overly complex models (Fig. 9 left), with 40 shifts selected when allowed up to 40, for all 10 replicates in our simulations. This difficulty occurs in change‐point models for time series, for which the failure of AIC and BIC has already been recognized (e.g. Zhang, & Siegmund 2007, 2012; Rigaill, Lebarbier & Robin 2012). In particular, Zhang & Siegmund (2007) introduced a modified BIC with the following penalty:
urn:x-wiley:2041210X:media:mee312285:mee312285-math-0057
where m is the number of shifts, n is the sample size. In our phylogenetic context, we used n0 as the number of observations not influenced by any shift and ni the number of observations last influenced by shift i (n0+n1+⋯+nm = n). The first term 3 log n penalizes for β0, γ and α. For a given value of m, the remaining penalty is between (2m−1) log n and 3m log n, as if penalizing each shift by 2 to 3 parameters. We applied this modified BIC in our stepwise selection procedure with the same 10 data sets as before, naively ignoring the independence assumption in Zhang & Siegmund (2007). This modified BIC had much better performance in our simulations (Fig. 9, bottom right), selecting between 2 and 4 shifts only. In all these 10 cases, the true shift on branch urn:x-wiley:2041210X:media:mee312285:mee312285-math-0058 was detected first. The second detected shift was either on the parent branch of urn:x-wiley:2041210X:media:mee312285:mee312285-math-0059 and urn:x-wiley:2041210X:media:mee312285:mee312285-math-0060 (6 times) or on urn:x-wiley:2041210X:media:mee312285:mee312285-math-0061 (4 times), and overfitting at or near external edges was avoided. We repeated the modified BIC stepwise selection on 100 new simulated data sets. The average number of detected shifts was 2·55 (standard deviation 0·99), demonstrating that overfitting is not an issue for the modified BIC here. Further improvements would likely be achieved by adjusting for phylogenetic correlation in the BIC penalty, where n is replaced by an effective sample size (see Ané 2008; for the BM model). The modified BIC could also be used in a variety of problems, such as to detect changes in rate or in selection strength. However, more extensive simulations are needed to quantify the performance of modified BIC penalties for these problems.

Fully Bayesian methods are not immune to this large p‐small n issue if the prior distribution is not carefully thought to counteract the combinatorial explosion of models. Venditti, Meade & Pagel (2011) used a Bayesian reversible‐jump framework to detect changes in the evolutionary rate of body size. They detected evidence of shifts in about one‐third of all branches (1494 branches) in their 3185 mammal phylogeny. Their prior distribution for the number of changes was not discussed, so it is possible that the very large estimated number of shifts was driven by the combinatorial explosion of models and equal weights to all models a priori.

Eastman et al. (2011) proposed a truncated Poisson prior for the number of shifts, which places a high prior weight on models with few shifts and a much lower weight a priori on each individual model with many shifts. Rabosky (2014) used a similar approach to infer differential rates of species diversification or shifts in trait evolutionary rate, using a compound Poisson prior process for the location and number of shifts. Again, this choice controls the combinatorial explosion of models by giving a Poisson prior probability to the entire set of models with a given number of shifts. Simulations showed that overfitting was not an issue. After this work was completed, Uyeda & Harmon (2014) proposed a Bayesian method for OU models with shifts in the selection optimum, implemented in the R package bayou. Their prior on the number of shifts is a conditional Poisson distribution ranging from 0 to half the number of tips. Their simulations showed that the estimated number of shifts is sensitive to this prior.

The modified BIC used here is one attempt to solve the large p‐small n problem. Many methods have been developed to solve this problem for independent data, such as LASSO (Tibshirani 1996; Zhao & Yu 2006) or least angle regression (Efron et al. 2004). As pointed out for the modified BIC, more work should be done to adjust these methods for phylogenetic correlation and adapt them to change‐point detection problems on evolutionary trees.

Conclusion and Recommendations

We discussed here difficulties for studying trait evolution for OU models from present‐day species: unidentifiability of ancestral states, non‐microergodicity of parameters and limited power to detect changes in selection regimes. These issues are intrinsic to the nature of comparative data with observations at the tips of the tree only. Our study does not mean that OU models are limited and should be abandoned. If trait evolution truly followed an OU process, then analyses using an OU model are certainly most appropriate. But investigators should be aware from the onset that increasing data collection from more present‐day taxa may not provide the power needed to discriminate between hypotheses. These difficulties are not tied to maximum likelihood: non‐microergodic parameters are bound to be estimated with imprecision, regardless of the estimation method. Finally, we illustrated the breakdown of traditional model selection criteria (AIC and BIC) to detect changes in the trait evolution process and we identified a ‘large p‐small n’ issue: the number of candidate models is larger than the number of taxa and explodes with the number of shifts. We introduced a modified criterion borrowed from change‐point models for time series, which does not overestimate the number of shifts like AIC and BIC do. While fully Bayesian methods have been proposed for related heterogeneous trait evolution models on trees, more work is needed to discover appropriate model selection tools in a frequentist framework.

Recommendations for empiricists

When using an OU model to analyse comparative data, the chosen model should first be checked for identifiability. Unidentifiability would plague both maximum‐likelihood and Bayesian approaches. We proposed a reparametrization where all parameters are identifiable. A second recommended step is to compare the half‐life estimated from the data with the total tree height. If the half‐life is found to be very high, this is indicative of strong phylogenetic signal and the researcher should be aware that α is likely to be overestimated. In this situation, also, there is a lack of power to detect shifts of small magnitude in the selection optimum μ. The researcher should be aware that the absence of evidence for a shift might be from a lack of power, even on a huge tree, rather than from a true absence of shift. A recommended action here is to use the Monte Carlo approach introduced by Boettiger, Coop & Ralph (2012), using the tree at hand and estimated parameters, to determine whether a lack of power or a truly small (or absent) shift is responsible for the negative test result. Alternatively, a recommended action is to seek to add fossil data. Fossils can restore the identifiability of parameters and greatly increase the precision of estimated ancestral states, shifts in ancestral states and μ values (see Ho & Ané 2013). The benefit of combining fossils with living taxa has been recognized empirically and is receiving increased recognition (Slater & Harmon 2013), with interest to test more and more complex evolutionary processes (Slater 2013). Finally, we strongly discourage the use of AIC to search for shifts in μ along the phylogeny, when the locations of these shifts are not known in advance. Instead, we recommend using the modified BIC or a fully Bayesian framework (Uyeda, Eastman & Harmon 2014) where the prior is carefully chosen. Running the Bayesian estimation procedure with no data is a good way to check the mean number of shifts a priori. Doing so is important to understand the influence of the prior on the inferred number of shifts from the data.

Acknowledgements

We thank Emmanuel Paradis, Natalie Cooper and an anonymous reviewer for their insightful comments and suggestions. This work was supported in part by the National Science Foundation (DMS 1106483).

    Data accessibility

    The R code for the stepwise selection method using the modified BIC, and the data on flower diameter in Euphorbiaceae are available at https://github.com/lamho86/.

      Number of times cited according to CrossRef: 64

      • A Bayesian extension of phylogenetic generalized least squares: Incorporating uncertainty in the comparative study of trait relationships and evolutionary rates, Evolution, 10.1111/evo.13899, 74, 2, (311-325), (2020).
      • Allometric escape from acoustic constraints is rare for frog calls, Ecology and Evolution, 10.1002/ece3.6155, 10, 8, (3686-3695), (2020).
      • Habitat transitions alter the adaptive landscape and shape phenotypic evolution in needlefishes (Belonidae), Ecology and Evolution, 10.1002/ece3.6172, 10, 8, (3769-3783), (2020).
      • Global biogeographic synthesis and priority conservation regions of the relict tree family Juglandaceae, Journal of Biogeography, 10.1111/jbi.13766, 47, 3, (643-657), (2020).
      • Maximum Likelihood Estimation of Species Trees from Gene Trees in the Presence of Ancestral Population Structure, Genome Biology and Evolution, 10.1093/gbe/evaa022, 12, 2, (3977-3995), (2020).
      • Tempo and Pattern of Avian Brain Size Evolution, Current Biology, 10.1016/j.cub.2020.03.060, (2020).
      • Stasis of functionally versatile specialists, Evolution, 10.1111/evo.13956, 74, 7, (1356-1377), (2020).
      • Migratory lineages rapidly evolve larger body sizes than non-migratory relatives in ray-finned fishes, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2019.2615, 287, 1918, (20192615), (2020).
      • Body shape diversification along the benthic–pelagic axis in marine fishes, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.1053, 287, 1931, (20201053), (2020).
      • Crocodylomorph cranial shape evolution and its relationship with body size and ecology, Journal of Evolutionary Biology, 10.1111/jeb.13540, 33, 1, (4-21), (2019).
      • Evolutionary dynamics of sexual size dimorphism in non-volant mammals following their independent colonization of Madagascar, Scientific Reports, 10.1038/s41598-018-36246-x, 9, 1, (2019).
      • Drift and Directional Selection Are the Evolutionary Forces Driving Gene Expression Divergence in Eye and Brain Tissue of Heliconius Butterflies , Genetics, 10.1534/genetics.119.302493, 213, 2, (581-594), (2019).
      • Fast likelihood calculation for multivariate Gaussian phylogenetic models with shifts, Theoretical Population Biology, 10.1016/j.tpb.2019.11.005, (2019).
      • Repeated Evolution of Divergent Modes of Herbivory in Non-avian Dinosaurs, Current Biology, 10.1016/j.cub.2019.10.050, (2019).
      • Allometry, evolution and development of neocortex size in mammals, , 10.1016/bs.pbr.2019.05.002, (2019).
      • Rapid Change in Mammalian Eye Shape Is Explained by Activity Pattern, Current Biology, 10.1016/j.cub.2019.02.017, (2019).
      • Live fast, diversify non-adaptively: evolutionary diversification of exceptionally short-lived annual killifishes, BMC Evolutionary Biology, 10.1186/s12862-019-1344-0, 19, 1, (2019).
      • Ancient and contingent body shape diversification in a hyperdiverse continental fish radiation, Evolution, 10.1111/evo.13658, 73, 3, (569-587), (2019).
      • The multi-peak adaptive landscape of crocodylomorph body size evolution, BMC Evolutionary Biology, 10.1186/s12862-019-1466-4, 19, 1, (2019).
      • Vertical support use and primate origins, Scientific Reports, 10.1038/s41598-019-48651-x, 9, 1, (2019).
      • The African ape-like foot of Ardipithecus ramidus and its implications for the origin of bipedalism, eLife, 10.7554/eLife.44433, 8, (2019).
      • Beyond Brownian motion and the Ornstein-Uhlenbeck process: Stochastic diffusion models for the evolution of quantitative characters, The American Naturalist, 10.1086/706339, (2019).
      • On the convergence of the maximum likelihood estimator for the transition rate under a 2-state symmetric model, Journal of Mathematical Biology, 10.1007/s00285-019-01453-1, (2019).
      • Shifts to multiple optima underlie climatic niche evolution in New World phyllostomid bats, Biological Journal of the Linnean Society, 10.1093/biolinnean/blz123, (2019).
      • What determines the distinct morphology of species with a particular ecology? The roles of many-to-one mapping and trade-offs in the evolution of frog ecomorphology and performance, The American Naturalist, 10.1086/704736, (2019).
      • Automatic generation of evolutionary hypotheses using mixed Gaussian phylogenetic models, Proceedings of the National Academy of Sciences, 10.1073/pnas.1813823116, (201813823), (2019).
      • Realistic scenarios of missing taxa in phylogenetic comparative methods and their effects on model selection and parameter estimation, PeerJ, 10.7717/peerj.7917, 7, (e7917), (2019).
      • Phylogenetic Comparative Methods on Phylogenetic Networks with Reticulations, Systematic Biology, 10.1093/sysbio/syy033, 67, 5, (800-820), (2018).
      • Evolution of body size and trophic position in migratory fishes: a phylogenetic comparative analysis of Clupeiformes (anchovies, herring, shad and allies), Biological Journal of the Linnean Society, 10.1093/biolinnean/bly106, 125, 2, (302-314), (2018).
      • Vocal specialization through tracheal elongation in an extinct Miocene pheasant from China, Scientific Reports, 10.1038/s41598-018-26178-x, 8, 1, (2018).
      • Skink ecomorphology: forelimb and hind limb lengths, but not static stability, correlate with habitat use and demonstrate multiple solutions, Biological Journal of the Linnean Society, 10.1093/biolinnean/bly146, (2018).
      • The Grass was Greener: Repeated Evolution of Specialized Morphologies and Habitat Shifts in Ghost Spiders Following Grassland Expansion in South America, Systematic Biology, 10.1093/sysbio/syy028, (2018).
      • Contemporary Ecological Interactions Improve Models of Past Trait Evolution, Systematic Biology, 10.1093/sysbio/syy012, (2018).
      • A cerebellar substrate for cognition evolved multiple times independently in mammals, eLife, 10.7554/eLife.35696, 7, (2018).
      • Inference of Adaptive Shifts for Multivariate Correlated Traits, Systematic Biology, 10.1093/sysbio/syy005, (2018).
      • Lineage Diversity and Size Disparity in Musteloidea: Testing Patterns of Adaptive Radiation Using Molecular and Fossil-Based Methods, Systematic Biology, 10.1093/sysbio/syx047, 67, 1, (127-144), (2017).
      • A General Model for Estimating Macroevolutionary Landscapes, Systematic Biology, 10.1093/sysbio/syx075, 67, 2, (304-319), (2017).
      • Evolutionary Transcriptomics and Proteomics: Insight into Plant Adaptation, Trends in Plant Science, 10.1016/j.tplants.2017.03.001, 22, 6, (462-471), (2017).
      • Evidence of a chimpanzee-sized ancestor of humans but a gibbon-sized ancestor of apes, Nature Communications, 10.1038/s41467-017-00997-4, 8, 1, (2017).
      • A critical comment on the ‘multiple variance Brownian motion’ model of Smaers et al. (2016), Biological Journal of the Linnean Society, 10.1093/biolinnean/blw030, 121, 1, (223-228), (2017).
      • On the accuracy and theoretical underpinnings of the multiple variance Brownian motion approach for estimating variable rates and inferring ancestral states, Biological Journal of the Linnean Society, 10.1093/biolinnean/blx003, 121, 1, (229-238), (2017).
      • Exceptional Evolutionary Expansion of Prefrontal Cortex in Great Apes and Humans, Current Biology, 10.1016/j.cub.2017.01.020, 27, 5, (714-720), (2017).
      • Pattern and Process in the Comparative Study of Convergent Evolution, The American Naturalist, 10.1086/692648, 190, S1, (S13-S28), (2017).
      • Adaptive radiations should not be simplified: The case of the danthonioid grasses, Molecular Phylogenetics and Evolution, 10.1016/j.ympev.2017.10.003, 117, (179-190), (2017).
      • Approaches to Macroevolution: 2. Sorting of Variation, Some Overarching Issues, and General Conclusions, Evolutionary Biology, 10.1007/s11692-017-9434-7, 44, 4, (451-475), (2017).
      • The evolution of climatic niches in squamate reptiles, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.0268, 284, 1858, (20170268), (2017).
      • Global decline of bumblebees is phylogenetically structured and inversely related to species range size and pathogen incidence, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.0204, 284, 1859, (20170204), (2017).
      • Widespread ecomorphological convergence in multiple fish families spanning the marine–freshwater interface, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.0565, 284, 1854, (20170565), (2017).
      • Arboreality constrains morphological evolution but not species diversification in vipers, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.1775, 284, 1869, (20171775), (2017).
      • A continuous morphological approach to study the evolution of pollen in a phylogenetic context: An example with the order Myrtales, PLOS ONE, 10.1371/journal.pone.0187228, 12, 12, (e0187228), (2017).
      • A global test for phylogenetic signal in shifts in flowering time under climate change, Journal of Ecology, 10.1111/1365-2745.12701, 105, 3, (627-633), (2016).
      • Phase transition on the convergence rate of parameter estimation under an Ornstein–Uhlenbeck diffusion on a tree, Journal of Mathematical Biology, 10.1007/s00285-016-1029-x, 74, 1-2, (355-385), (2016).
      • Evolutionary change in physiological phenotypes along the human lineage, Evolution, Medicine, and Public Health, 10.1093/emph/eow026, 2016, 1, (312-324), (2016).
      • Fast and accurate detection of evolutionary shifts in Ornstein–Uhlenbeck models, Methods in Ecology and Evolution, 10.1111/2041-210X.12534, 7, 7, (811-824), (2016).
      • Shedding light on the ‘dark side’ of phylogenetic comparative methods, Methods in Ecology and Evolution, 10.1111/2041-210X.12533, 7, 6, (693-699), (2016).
      • Phylogenetic confidence intervals for the optimal trait value, Journal of Applied Probability, 10.1239/jap/1450802756, 52, 4, (1115-1132), (2016).
      • Phylogenetic confidence intervals for the optimal trait value, Journal of Applied Probability, 10.1017/S0021900200113117, 52, 04, (1115-1132), (2016).
      • Phylogenies, the Comparative Method, and the Conflation of Tempo and Mode, Systematic Biology, 10.1093/sysbio/syv079, 65, 1, (1-15), (2015).
      • Area, climate heterogeneity, and the response of climate niches to ecological opportunity in island radiations of nolis lizards, Global Ecology and Biogeography, 10.1111/geb.12327, 25, 7, (781-791), (2015).
      • Post-molecular systematics and the future of phylogenetics, Trends in Ecology & Evolution, 10.1016/j.tree.2015.04.016, 30, 7, (384-389), (2015).
      • A consistent estimator of the evolutionary rate, Journal of Theoretical Biology, 10.1016/j.jtbi.2015.01.019, 371, (69-78), (2015).
      • Identification of Lineage-Specific Cis -Regulatory Modules Associated with Variation in Transcription Factor Binding and Chromatin Activity Using Ornstein–Uhlenbeck Models , Molecular Biology and Evolution, 10.1093/molbev/msv107, 32, 9, (2441-2455), (2015).
      • Detecting Adaptive Evolution in Phylogenetic Comparative Analysis Using the Ornstein–Uhlenbeck Model, Systematic Biology, 10.1093/sysbio/syv043, 64, 6, (953-968), (2015).
      • The Independent Evolution Method Is Not a Viable Phylogenetic Comparative Method, PLOS ONE, 10.1371/journal.pone.0144147, 10, 12, (e0144147), (2015).