Volume 2, Issue 4
Free Access

Measuring individual differences in reaction norms in field and experimental studies: a power analysis of random regression models

Julien G. A. Martin

Corresponding Author

Département de Biologie, Université de Sherbrooke, Sherbrooke, Québec, Canada

Centre d’Études Nordiques, Université Laval, Québec, Québec, Canada

Correspondence author. E‐mail: julien.martin2@usherbrooke.caSearch for more papers by this author
Daniel H. Nussey

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK

Search for more papers by this author
Alastair J. Wilson

Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK

Search for more papers by this author
Denis Réale

Canada Research Chair in Behavioural Ecology, Université du Québec à Montréal, Montréal, Québec, Canada

Search for more papers by this author
First published: 22 December 2010
Citations: 123

Summary

1. Interest in measuring individual variation in reaction norms using mixed‐effects and, more specifically, random regression models have grown apace in the last few years within evolution and ecology. However, these are data hungry methods, and little effort to date has been put into understanding how much and what kind of data we need to collect in order to apply these models usefully and reliably.

2. We conducted simulations to address three central questions. First, what is the best sampling strategy to collect sufficient data to test for individual variation using random regression models? Second, on occasions when precision is difficult to assess, can we be confident that a failure to detect significant variance in plasticity using random regression represents a biological reality rather than a lack of statistical power? Finally, does the common practice of censoring individuals with one or few repeated measures improve or reduce power to estimate individual variation in random regressions?

3. We have also developed a series of easy‐to‐use functions in the ‘pamm’ statistical package for R, which is freely available, that will allow researchers to conduct similar power analyses tailored more specifically to their own data.

4. Our results reveal potentially useful rules of thumb: large data sets (N > 200) are needed to evaluate the variance of individual‐specific slopes; a number of individuals/number of observations per individual ratio of approximately 0·5 consistently yielded the highest power to detect random effects; individuals with one or few observations should not generally be censored as this reduces power to detect variance in plasticity.

5. We discuss the wider implications of these simulations and remaining challenges and suggest a new way to standardize results that would better facilitate the comparison of findings across empirical studies.

Introduction

Phenotypic plasticity, defined as a change in the phenotype or genotype of an individual in response to a change in the environment (Pigliucci 2001), is an important source of within‐individual and within‐genotype variation in natural populations (Schlichting & Pigliucci 1998; Pigliucci 2001). Phenotypic plasticity is often conceptualized and measured in terms of ‘reaction norms’: functions that relate individual phenotypes to an environmental variable (De Jong 1990; Schlichting & Pigliucci 1998; Pigliucci 2001; DeWitt & Scheiner 2004).

When studying plastic traits in natural populations, the key evolutionary questions are whether individuals vary in their plastic responses to the environment and if so which environmental factors or genes drive this variation. Of central interest is whether genetic variation in individual plasticity exists, and thus, whether plasticity is likely to evolve in response to natural selection (Brommer et al. 2005; Nussey, Wilson, & Brommer 2007; Dingemanse et al. 2010). Tackling this question requires that we first quantify whether, and to what extent, individuals differ in their phenotypic plasticity. Several recent studies have attempted to answer this question by combining a reaction norm approach with the use of a particular type of mixed model that has come to be known as ‘random regression’ (see Table S1). Although closely related to earlier models developed for growth (e.g. Rao 1965), random regression models as we now know them were largely pioneered by Henderson (1982) and by Kirkpatrick & Heckman (1989) (the latter developing this analytical framework under the term of ‘infinite dimensional models’).

Random regression, or infinite dimensional, models are a particular form of mixed‐effect model in which individual phenotypes are modelled as a continuous function of a covariate. The parameters describing those functions (e.g. slopes and intercepts) are free to vary among individuals and are treated as random effects drawn from distributions with means and (co)variance structures to be estimated. In the context of studying phenotypic plasticity, these models provide a means to quantify and test the significance of variation between individuals or groups in their reaction norms (Nussey, Wilson, & Brommer 2007; van de Pol & Wright 2009; Dingemanse et al. 2010). However, as with any statistical modelling, the success of such a test depends critically on the available data as well as the extent to which any model captures biological reality. Test success is not only defined by the ability to detect a significant effect, but also defined by the ability to determine whether not finding one is because of absence of an effect or lack of power. Here, we undertake several power analyses with the aim of assessing whether useful ‘rules of thumb’ exist with respect to sample size and data structure required to test for between‐individual variation in reaction norms at the phenotypic level.

In the context of phenotypic plasticity, the phenotype of an individual might be described as a function of an environmental covariate that is expected (or known) to influence phenotypic expression (e.g. temperature, population density). In what follows, we refer to individual phenotypic responses to environmental variables, but note that this framework can equally be used to relate phenotype to intrinsic parameters (e.g. age, experience, physiological state). Furthermore, we take the individual as the only unit by which data are grouped, although it should be noted that any other grouping variable could also be used within the random regression framework (e.g. family, social group, population, or genotype).

In its simplest linear form, an individual’s reaction norm can be defined by two parameters: an elevation and a slope. Here, it is the slope of the reaction norm that describes an individual’s response to the environment and thus its phenotypic plasticity. In this linear example, the pattern of variation in reaction norms across a population of individuals can be described by the between‐individual variances in elevation (henceforth, VI), the between‐individual variation in slope (VS) and the covariance between elevation and slope, [cov(I,S) or rIS where the covariance has been converted into a correlation coefficient]. The variance in elevation is in fact equivalent to the between‐individual component of phenotypic variance when the environmental covariate equals zero (i.e. the intercept). In the absence of significant between‐individual variation in slopes, the variance attributable to individual identity will remain constant across all values of the environmental covariate and can be used to estimate the repeatability of a trait (i.e. the proportion of phenotypic variance explained by individual identity). However, if individuals also vary in plastic responses to the environment (i.e. slopes), then the among‐individual variance for the trait will necessarily change across environmental conditions, complicating the definition and measurement of repeatability (Martin & Réale 2008; Dingemanse et al. 2010). Significant variance in slopes among individuals has been referred to as an ‘individual by environment interaction’ (or ‘I × E’– see Fig. S1 for illustration; Nussey, Wilson, & Brommer 2007).

Using random regression analyses, recent studies have provided mounting evidence that individuals vary, not only in their mean phenotype, but also in their plastic response to environmental variation (Table S1). Random regression models thus appear to offer a powerful and flexible approach to detecting and estimating I × E, in particular because they add a small number of parameters to the model of random variation, rather than requiring as many parameters as there are individuals to estimate (as would be the case for using a fixed effect model). When pedigree information is available, a ‘genotype by environment’ interaction, or G × E, can be estimated using a random regression animal model (Wilson, Kruuk, & Coltman 2005; Nussey, Wilson, & Brommer 2007). However, as for any statistical approach, valid conclusions require that underlying model assumptions are not violated. Furthermore, while it is known that application of these models in evolutionary and ecological studies requires that large amounts of repeated‐measures data are available (Pinheiro & Bates 2000; Nussey, Wilson, & Brommer 2007), just how many data are required to detect significant random effects is not clear. Previous studies of power in the context of random regression models have assessed either model robustness or the suitability of these models for particular applications (e.g. detecting quantitative trait loci, Lillehammer, Odegard, & Meuwissen 2007). Maas & Hox (2005) showed that error around estimates of fixed and random terms in mixed models can be a real concern when using relatively small sample sizes that are common in field studies.

In this paper, we use simulations to address three key questions regarding the use of random regression models to assess the patterns of phenotypic variation in labile traits. Use of an animal model to estimate G × E interaction is not investigated here. First, if time (or money) is limited, how should a researcher allocate effort in the field – towards sampling more individuals, or towards obtaining more repeated records per individual? In other words, what sample sizes and number of repeated measures per individual (or group) should researchers aim for when designing a study of phenotypic plasticity?

Secondly, to what extent can and should we attach biological meaning to statistical ‘null results’? Our ability to confidently reject the presence of individual by environment interaction, I × E, hinges on having sufficient statistical power to detect it in the first place. It is notable that all studies listed in Table S1 found evidence for significant between‐individual variation in phenotypic elevations. However, several of the studies – which vary markedly in sample size and data structure – failed to find evidence for I × E (Reed et al. 2006; Charmantier et al. 2008; Martin & Réale 2008). These studies interpreted the lack of variation in slopes as biologically meaningful, arguing, for example, for an adaptive response to previous stabilizing selection on reaction norm slopes (Reed et al. 2006; Charmantier et al. 2008). However, could these ‘null’ results instead reflect limited statistical power? While we share the view of others (e.g. Hoenig & Heisey 2001) that post hoc power analysis has generally limited utility, this view is contingent on adequate determination (and reporting) of precision associated with parameter estimates. In the context of random regression models of plasticity, problems sometimes can and do arise in this regard (discussed further below).

Thirdly, should records pertaining to individuals observed only once (or perhaps two or three times) be removed from the data set prior to analysis? This practice has been common in field studies to date (Brommer et al. 2005; Nussey et al. 2005a,b; Reed et al. 2006; Charmantier et al. 2008) and is based on the intuitive premise that an individual’s plasticity, defined as the slope of its response to environment, cannot be determined for a trait measured in only a single environment. However, observations on these individuals do still provide information with respect to the model parameters to be estimated and thus may improve power to detect between‐individual variance in reaction norms. Furthermore, these individuals may not be a random sample of the population (e.g. the lack of additional observations for some individuals may be caused by selective disappearance), and excluding them may lead to biased results.

With these questions in mind, we undertake simulation‐based power analyses to assess: (i) the sample sizes and data structure required to detect individual variation in elevations, VI, and individual by environment interaction, I × E, using random regression models, (ii) the power to detect I × E in field studies that failed to find evidence for such variation and (iii) the consequences of excluding individuals with a single observation within random regression models. Our approach was restricted to linear mixed regression with a homogenous Gaussian error, a random intercept and a random slope. While the situations considered here are not exhaustive, we have developed a series of easy‐to‐use functions (freely available in the statistical package R), which will allow researchers to conduct power analyses specifically tailored to their own data set or questions.

Materials and methods

Reaction norm model and overview of simulations

A general formulation of a random regression model for the case of a linear reaction norm is:
image
where μ and β are the population average intercept and slope, respectively, whereas δi and Δi are the deviations from those population averages for a group of observations indexed by i (e.g. a particular individual). δi and Δi have means of zero and a covariance structure defined by a matrix, Ψ, to be estimated. Ψ is a 2 × 2 matrix with inline image, the inter‐group variance in intercepts, and inline image, the inter‐group variance in slopes, on the diagonal and σ(δ, Δ), the covariance between intercept and slope as the off‐diagonal element. For clarity, we refer to the estimators of these elements as VI, VS and cov(I,S), respectively. X is the environmental covariate and ε is a residual that is assumed to be uncorrelated across observations and normally distributed with a mean of zero and an estimated variance VR. We assume that this residual variance is homogeneous across X. We simulated phenotypic observations (Y) under this model and an assumed Ψ, and then assessed power to detect the given Ψ matrix according to different sampling strategies by analysing the simulated data. Models were parameterized using a likelihood maximization approach. Specifically, we estimated variance parameters using the restricted maximum likelihood (REML) method to avoid a known downward bias under maximum likelihood method (ML) (Pinheiro & Bates 2000). Furthermore, variances estimated with a mixed model are conditional on the fixed effects included in the model (Pinheiro & Bates 2000). Therefore, the estimated phenotypic variance (i.e. the sum of variance component estimates from model) might vary from observed phenotypic variance (i.e. sample variance of raw data, Wilson 2008). Because the individual by environment interaction is influenced by both slope variance and intercept–slope covariance, we not only vary VS but also cov(I,S) in our simulations to estimate power to detect I × E.

It should be noted that variance components VI, VS and VR are expressed in different units and are therefore not readily comparable. VI and VR are in units of Y2 and independent of X, whereas VS is in units of Y2 X−2. Thus, it is important to recognize that in real‐world situations the relative magnitude of this variance component will be influenced by the units and scaling of X (see Nussey, Wilson, & Brommer 2007; Table 1 and discussion for standardization method).

Table 1. Examples of the influence of rescaling the environmental variable (X) on variance component estimates from a random regression analysis. Variance component estimates are derived from a single (simulated) data set but with the environmental covariate (X) scaled in different ways. On the natural scale, simulated X is distributed with a mean and variance of 0·99 and 3·89, respectively. Observed phenotypic variance across all data (VP*) is 33·21. However, the scaling will depend on units of measurement (e.g. X in cm is 10X in mm) and any standardization applied. Here, XSDU is expressed in standard deviation units. XMC is on the natural scale but mean centred, and XSDU.MC is in standard deviation units and mean centred
Environmental covariate
X 10X XSDU XMC XSDU.MC
VI 4·75 4·75 4·75 6·04 6·04
VS 1·33 0·01 5·11 1·33 5·11
VR 5·33 5·33 5·33 5·33 5·33
cov(IS) −0·09 −0·01 −0·18 2·69 5·09
r IS −0·04 −0·04 −0·04 0·95 0·92
VI/(VI + VR)−1 = τx = 0 0·47 0·47 0·47 0·53 0·53
VI/VP* 0·14 0·14 0·14 0·18 0·18
VS/VP* 0·04 0·0003 0·15 0·04 0·15
τ (min–max) 0·47–0·95 0·47–0·95 0·47–0·95 0·47–0·95 0·47–0·95

Simulation functions

We developed three functions for use with the R statistics package (R Development Core Team 2009) to explore the statistical power associated with random regression–based analyses of I × E. These functions and their related help files are freely available on the Comprehensive R Archive Network (CRAN; http://www.r‐project.org/) in a package called ‘pamm’ (power analysis with mixed models). The first function – PAMM, Power Analysis for Mixed Models – is intended to help one design a sample structure (i.e. number of individuals and number of replicates) to detect a user‐specified level of I × E with an expectation of acceptable power. The second function – EAMM, Exploratory power Analysis for Mixed Models – is useful for the situation when the sampling structure is determined already, but the researcher wishes to estimate his or her power to detect (co)variance components of various magnitudes. In other words, given a sampling design already in place, what level of I × E might a researcher realistically hope to detect. The third function – SSF, Sampling Size Function – is intended to help design a sampling structure to detect variance in random intercepts and/or slopes, given a constraint on the number of total observations that can be made. Note that these functions are designed to help understand the statistical power of varying experimental or observational designs. However, what actually constitutes ‘sufficient power’ is of course a question for the researcher.

Details on use and structure of the three different functions are provided in Appendix S1. Briefly, functions are built on iterative simulations of a balanced data set given a specified random (co)variance structure and sampling structure (number of individuals and number of observations for each individual). Mixed models are then fitted to the simulated data using the ‘lmer()’ function from the ‘lme4’ package in R (Bates, Maechler, & Dai 2008). Random effect significance is assessed using REML log‐likelihood ratio tests (LRTs) between models with and without the relevant random effects term. Following Pinheiro & Bates (2000), the log LRT statistic is assessed against a χ2 distribution with degrees of freedom equal to the difference in the number of parameters in the models for random variation being compared. Thus, in our simulations, the significance of a random intercept term is assessed against a single degree of freedom (i.e. by comparison between a model with the random intercept fitted and one without any random term). In contrast, the significance of I × E is assessed against two degrees of freedom as we compare the log‐likelihood of the full I × E model [in which VI, VS and cov(I,S) are estimated] to that of a model with just the random intercept fitted [i.e. VI estimated but not VS or cov(I,S)]. Power is estimated as the percentage of simulations that provide a P‐value smaller than 0·05.

It should be acknowledged there are some difficulties associated with approximating the distribution of the LRT statistic when testing for random effects in mixed‐effects models. The present approach of using a χ2 distribution with the number of degrees of freedom determined by the difference in the number of non‐redundant parameters in the models is widely used and easily implemented, but there is a sensible argument that it is overly conservative (see Self & Liang 1987; Stram & Lee 1994; Visscher 2006). The issue arises because the use of LRT to test the significance of variance components estimated through random effects can be thought of as an implicitly two‐tailed test. However, it is not (normally) sensible for a variance component to be less than zero (though clearly a covariance can be). Thus, when testing a single variance component, the null hypothesis (i.e. that the true variance equals zero) places the parameter on the boundary of parameter space defined by the alternate hypothesis (i.e. that the variance component is greater than zero; Stram & Lee 1994). It is therefore argued that the distribution of the LRT statistic should instead be assumed to have a distribution determined by a mixture of χ2 distributions with differing degrees of freedom (Visscher 2006). Although we have not implemented this approach, we note that the code in the functions provided could be easily modified by a user to consider a mixture of χ2 distributions with different degrees of freedom.

Simulation settings

We carried out simulations under three scenarios designed to explore the key questions raised earlier. In all cases, the environment effect (X) is assumed to follow a normal distribution. In scenarios 1 and 3, values of X at which individuals are observed are assumed to be uncorrelated both within and between individuals. In scenario 2, we simulate phenotypic observations at values of X that are common to groups of individuals sampled. This mimics the data structure of many field studies that have tested for plasticity with respect to an annually defined environmental covariate (e.g. spring temperature or North Atlantic Oscillation). For all simulations, we set X and β so that the trait would increase 1 unit across 4 SD environment units [environment effect: variance = 1, mean = 0, effect size (β) = 0·25].

Scenario 1: designing a sampling strategy (functions ‘SSF’ and ‘pamm’)

Before conducting an experiment, or beginning work on a field population, one may want to design the sampling strategy to obtain the highest power within a set of constraints. One might be constrained by three different parameters: limited number of total observations, limited number of individuals (or groups, e.g. families, social units, populations) and limited number of observations per individual. All power analyses require some a priori knowledge (or assumption) regarding the magnitude of the variance components and fixed effects in a random regression model. Here, we have simulated four different variance structures. In the first two cases, a relatively small between‐individual variance in elevations was simulated, VI = 0·2, VS = 0·1 and VR = 0·8, with two different values of the intercept–slope correlation (rIS), 0 and 0·5. We then simulated stronger between‐individual variance in elevation, VI = 0·4, VS = 0·1 and VR = 0·6, when rIS = 0 and when rIS = 0·5. In the absence of I × E or at X = 0, repeatability could be estimated as VI (VI + VR)−1, implying simulated repeatabilities of 0·2 and 0·4 at X = 0, respectively. Repeatabilities between 0·2 and 0·4 are commonly observed in personality, behavioural, life history and morphological traits in both field and laboratory settings (Roff 1996; Réale et al. 2007; Bell, Hankison, & Laskowski 2009).

To investigate the optimal allocation of sampling effort between the number of individuals and number of observations per individual, we simulated data sets with three different total numbers of observations (or total sample sizes, TSS): 50, 200 and 1000. The SSF function was used to simulate different ratios of individuals to observations per individual such that the total number of observations always summed to the relevant TSS specified in the function. Thus, the two extreme combinations for TSS = 200 are two individuals with 100 replicates each and 100 individuals with two replicates each. We ran 500 simulations for each scenario.

As an example of the use of pamm, we estimated the power of a random regression analysis to detect VI and I × E [i.e. both VS and cov(I,S)] using a data structure based on a study of behavioural plasticity in Eastern chipmunks (Tamias striatus; Martin & Réale 2008). In this study, changes in behavioural responses to repeated hole‐board test were analysed in 24 chipmunks with, on average, four tests per individual (range = 2–6 observations). For each individual, hole‐board tests were sequentially numbered, and test number was used as the environmental variable (X) to evaluate the habituation rate. We first tested how an increase in the number of replicates could increase the power to detect significant VI and I × E. We fixed the number of individuals at 24 and varied the number of replicate tests per individual between 2 and 40. Note that 40 observations per individual represent a number of tests that would be impossible to obtain in the field for both practical and ethical reasons. We ran mixed models on 500 simulated data sets for each of the two hypothetical Ψ matrices. We then flipped the scenario to investigate how varying the number of individuals affected the power by fixing the number of replicates at four and by varying number of individuals tested between 2 and 100. Again, we note that 100 is more than twice the actual size of the chipmunk population upon which these simulations are loosely based.

Scenario 2: power analysis in the putative absence of I × E (function ‘EAMM’)

We estimated the power to detect I × E with random regression models using data structures similar to those reported in two recent studies that failed to find significant variances in slope (Reed et al. 2006; Charmantier et al. 2008). These two studies represent different extremes of data structure that are likely to be collected from long‐term individual‐based studies of wild populations. Charmantier et al. (2008) investigated reaction norm variation for first egg laying date in response to spring temperature over 46 years in a population of wild great tits (Parus major) in Wytham Woods in Oxfordshire, UK. The authors ran random regression models on three subsets of their data, each yielding consistent results. Specifically, they restricted their analyses progressively to data sets including only females with two or more observations (4462 observations on 1746 individuals over 46 years), three or more observations (2258 observations on 644 individuals), and four or more observations (1040 observations on 238 individuals). Reed et al. (2006) also investigated variation in reaction norms of egg laying date in response to climate (in this case, the North Atlantic Oscillation) over 23 years in a population of guillemots (Uria aalge) on the Isle of May, UK. Here, the authors restricted their analyses to individuals with four or more laying date observations (2597 observations on 245 individuals over 23 years). While the sample sizes collected in both studies are exemplary for vertebrate field studies, the data structures are rather different. In both studies, the environmental variable was measured on an annual basis such that the same value was assigned to all individuals observed in the population level in any given year. While the great tit data set has twice as many years (and so values of the environmental covariate) represented, it is also characterized by a smaller number of repeated observations per individual when compared to the guillemot study. This is a consequence of the marked difference in life span between these two species. With the data restricted to individuals with four or more observations, the mean number of observations per individual was 4·4 for the great tits and 10·6 for the guillemots.

In both of the aforementioned studies, a strong population‐level response to climate was observed in laying date and individuals were found to vary significantly in their laying date elevations, but slope variation was found to be non‐significant. In general, confidence intervals or estimated standard errors allow us to make at least a qualitative judgement as to the statistical power of an analysis. It should be noted, however, that confidence intervals can be asymmetric and the use of estimated standard errors for formal statistical inference is not generally recommended for the type of models being considered here. However, neither of the aforementioned studies reported standard errors of variance components, or specified whether estimated covariance matrices were constrained to be positive definite (such that variance estimates are non‐negative and all correlations lie between −1 and 1). If matrix Ψ is necessarily constrained to be positive definite (as it is in ‘lme’ and ‘lmer’ but not in all statistical software packages), then empirical studies may find that elements of Ψ are constrained to lie at the boundary of allowable parameter space. In such circumstances (e.g. if the estimate of VS is constrained to zero), standard errors and associated confidence intervals of the constrained parameters are typically non‐estimable and a power analysis is both useful and justified.

We ran simulations, using the function EAMM, based on the data structures and estimates generated in these two studies to examine the power of each to reject the null hypothesis of no between‐individual variation in plasticity. Because all individuals observed in a given year are assigned the same value of the environmental covariate (annual spring temperature or NAO), we restricted the number of possible X values to the number of years of observations. X values were then simulated following a normal distribution and randomly attributed with replacement to each observation. For the great tit simulation, we considered the least restrictive data set used by the authors and imposed a worst‐case scenario of only two observations per individual measured over 46 springs (1746 individuals with 2 observations each and 46 different X values). For the guillemot simulation, we specified 245 individuals with 10 observations each over 23 years. The proportion of total phenotypic variance attributed to VI in the great tit data set was 0·25 (Charmantier et al. 2008), but it was not possible to calculate this proportion from the data presented in the guillemot study. We therefore used a magnitude for VI similar to that reported in the great tit study, by specifying VI = 0·25 and VR = 0·75 which implies a repeatability of 0·25 when X = 0 and/or when VS = 0. As we were interested in examining the power to detect I × E, we fixed VI = 0·25 throughout the simulations and varied VS between 0·01 and 0·25. We also varied the correlation between intercept and slope between −0·9 and +0·9. We ran 500 simulations of each scenario.

Scenario 3: censoring of individuals

When using random regression, individuals with few observations are often removed prior to analysis (Table S1), presumably based on the intuition that a statistical model of an individual’s linear reaction norm cannot be fitted through one or two points. Although it is certainly true that under linear regression, with individual‐specific slopes treated as fixed effects, individuals with only one observation will not be informative, the question of whether to drop those observations is less clear in the mixed model framework. We estimated power to detect random intercept variance and I × E when including and excluding individuals with only a single observation. We also compared the estimates of VI, VS and the size of their confidence intervals with and without these ‘singletons’ included. Two different sample sizes were used. First, we considered a data set with 300 individuals each having four observations and a further 200 individuals with a single observation only. We also considered a much smaller total sample size in which 30 individuals with four replicates each were sampled and a further 20 had only a single phenotypic observation. For both cases, data were simulated according to four underlying covariance structures. Specifically, we used (VI = 0·2, VS = 0·1, VR = 0·8) and (VI = 0·4, VS = 0·1, VR = 0·6) with two different values of intercept–slope correlation, rIS = 0 and rIS = 0·5. We simulated 500 data sets for each of those eight scenarios. Within each scenario, we then compared the mean estimates of VI and VS and the width of their corresponding confidence intervals when individuals with a single observation were alternatively retained for the analysis of a simulated data set, or removed prior to parameter estimation. Comparisons were made for each scenario using paired t‐tests. Confidence intervals of variance components were estimated using a profile‐likelihood approach with the ‘profile()’ function in the development version of ‘lme4’ (D. Bates, unpublished data). Briefly, the profile function systematically varies the parameters in a model, assessing the best possible fit that can be obtained with one parameter fixed at a specific value and comparing this fit to the original model fit (with all parameters unconstrained). The models are compared according to the LRT statistic. If z is the signed square root transformation of the LRT statistic, then a 95% profile deviance confidence interval on the parameter consists of the values for which −1·960 < z < 1·960 (Bates, unpublished data).

Results

Scenario 1: sampling strategy design

Intuitively and unsurprisingly, Fig. 1 shows that power increases with total sample size and that power to detect VI increases with its magnitude (at least over the range of parameter sets we simulated). Moreover, we found that power to detect significant intercept variance (VI) was higher than power to detect significant individual by environment interaction (I × E), irrespective of total sample size. Finally, in our simulations, non‐zero values of the intercept–slope correlation (rIS) increase power to detect I × E but do not noticeably improve power to detect VI. More interestingly, Fig. 1 also reveals some potentially useful rules of thumb. For example, small TSS (N = 50) yielded low estimates of power to detect between‐individual variance when VI = 0·2 in these simulations (power was generally <0·4; Fig. 1a). The power was somewhat higher (generally >0·65) when VI = 0·4 was used (Fig. 1d). On the other hand, much larger sample sizes (N = 1000) resulted in high estimates of power even when VI = 0·2 in the simulations (generally >0·95; Fig. 1c,f). At intermediate total sample size (N = 200), our simulations suggest that there should be very high power to detect VI of 0·4 (generally >0·95) and reasonable power to detect VI of 0·2 (estimates of power up to 0·9, depending on sampling strategy).

image

Power of random regression to detect VI (bold lines) and I × E (dashed lines) for a fixed total sample size [(a) & (d): N = 50; (b) & (e): N = 200 and (c) & (f): N = 1000] with varying repartition of observations between group and observations per group for different (co)variance random structure: (a), (b), (c): VI = 0·2; VS = 0·1; VR = 0·8 and for (d), (e), (f): VI = 0·4; VS = 0·1; VR = 0·6. Black lines show simulations for which rIS = 0 and grey lines are for rIS = 0·5. Note that number of observations represents the average number of observations per group with some group having one more observation than others.

For a given total sample size, not all ratios of number of individuals to number of observations provided the same power. This is particularly obvious at intermediate total sample size (200; Fig. 1b,e). A consistent pattern in our simulations is that the estimated power for detecting among‐individual variance (VI or I × E) was highest for a ratio of individuals to observations per individual of around 0·5 (e.g. 10 individuals with 20 observations each for a total N = 200). This apparently optimal ratio was similar regardless of whether the goal is to detect differences in intercept alone or to detect I × E (variation in individual slope). However, it also evident that a much larger samples size will generally be required to detect I × E with high power. For instance, with N = 200 and VS = 0·1, the estimated power to detect I × E had maximum values of approximately 0·6 or 0·75 (depending on the value of VI; Fig. 1b,e), and with N = 1000 and VS = 0·1, the estimated power to detect I × E was always higher than 0·9 (Fig. 1c,f).

Tailoring the simulations to mimic the data structure obtained in our prior study of chipmunk behavioural plasticity (Martin & Réale 2008), we find that 24 individuals with six observations each yielded an estimated power of 1 to detect VI = 0·4 (Fig. 2b), while at least 10 observations per individual were needed to achieve this when VI = 0·2 (Fig. 2a). On the other hand, more than 20 observations per individual are needed in both cases to achieve equivalent power to detect individual slope differences of the magnitude simulated (Fig. 2a,b). Conversely, if only four observations are possible on each individual, more than 100 individuals are needed to achieve adequate power to detect VI = 0·2, although there was reasonable power to detect VI = 0·4 with only 30 individuals (Fig. 2c,d). In general within these simulations, the estimated power to detect I × E was unsatisfactory: even sampling 100 individuals four times each may be insufficient to reliability detect variation in individual plasticity (depending of course on the true parameters; see Fig. 2c,d). A non‐zero correlation between intercept and slope did not improve estimated power to detect VI, but did increase the estimated power to detect I × E. Indeed, the estimated power to detect I × E increased with the magnitude of the correlation between elevations and slopes (Fig. 2).

image

Power of a random regression to detect VI (bold lines) and I × E (dashed lines) estimated using pamm function. (a) & (b): based on a simulated data set of 24 individuals with varying number of replicates; and (c) & (d): four replicates per individuals with varying number of individuals. Simulations were based on the following random (co)variance structure: (a) & (c): VI = 0·2, VS = 0·1 and VR = 0·8; (b) & (d): VI = 0·4, VS = 0·1 and VR = 0·6. Intercept–slope correlation (rIS) was set to 0 for black lines and to 0·5 for grey ones.

Scenario 2: interpreting the absence of I × E

In all simulations based on the ‘great tit’ and ‘guillemot’ scenarios, power to detect significant VI was consistently estimated as 1, given our set of parameters. Figure 3 shows our estimates of power to detect significant I × E at different levels of simulated VS and with different correlations between elevation and slope in each scenario. Under both scenarios, the power to detect significant I × E was always estimated as 1 for values of VS > 0·1 and so we have not presented these situations graphically. However, at lower values of VS, estimated power rose as the correlation between elevation and slope increased in absolute magnitude (Fig. 3). At a correlation of zero (i.e. the worst‐case scenario for detecting I × E), both data structures had limited power to detect I × E. The guillemot scenario, in which fewer individuals had many more repeated measures (245 individuals with 10 measures each and 23 different X values), generally offered greater power to detect I × E than the great tit scenario (1746 individuals with two measures each and 46 different X values). In the guillemot scenario, estimated power to detect I × E was close to 1 for VS values in excess of approximately 0·05 (with rIS = 0, Fig. 3b). In the great tit scenario, an estimated power to detect I × E close to 1 was not attained until VS values were in excess of approximately 0·09 (Fig. 3a).

image

Power of random regression to detect I × E according to different VS and intercept–slope correlation (rIS) for two different sampling structures. a: 1746 individuals with 2 replicates over 46 years (‘great tit’ scenario) and b: 245 individuals with 10 replicates over 23 years (‘guillemot’ scenario). VI was fixed to 0·25 in all simulations.

Scenario 3: censoring individuals

Inclusion of individuals with a single observation did result in qualitatively higher estimates of power to detect I × E in three of the four simulated situations for which sample size was small (i.e. 50 individuals and 140 observations). With simulation parameters of VI = 0·2, VS = 0·1, VR = 0·8 and rIS = 0, inclusion of single observations increased estimated power to detect I × E by 2% (power without singletons = 0·24; power with = 0·26). This increase in estimated power associated with including single observation individuals was also 2% when rIS = 0·5 (power without singletons = 0·30; power with = 0·32). With increased simulated VI (parameters: VI = 0·4, VS = 0·1, VR = 0·6 and rIS = 0·5), a similar effect was seen: estimated power to detect I × E increased by 5% when individuals with a single observation were included (power without singletons = 0·43; power with = 0·48). When a large sample size was simulated (i.e. 500 individuals and 1400 observations), power to detect I × E was much greater (>0·97) and we found no differences in power associated with inclusion/exclusion of these individuals (i.e. power differences <0·5%).

In all cases, including individuals with a single observation yielded similar estimates of VI, VS and VR (all pairwise comparisons of estimates between models with and without these observations included, P > 0·80, Fig. 4). However, inclusion of single data point individuals consistently resulted in narrower confidence intervals for both VI and VS (P < 0·001 for all t‐tests, Fig. 4). For simulations with a large sample size, differences in the width of confidence intervals of both VI and VS were small (below 0·01). For simulations with a small sample size, differences in the width of confidence intervals associated with inclusion of singleton were between 0·04 and 0·06 for both VI and VS. In 19% of the simulations (ranging between 12% and 28% depending on sample size and variance component), confidence intervals of VI, VS and VR using all data were within those obtained when singletons were excluded. This reveals that power to detect individual variation in linear reaction norm components is consistently improved, at least across the scenarios simulated, by inclusion of individuals that were only observed once. Accordingly, we suggest that the common practice of excluding individuals only observed once or a few times in random regression analyses is largely unjustified (although see Discussion for important caveats).

image

Comparison of the precision of random regression estimates depending on whether individuals with a single observation only were included or excluded from analyses. Panels (a) & (b) show simulation results for VI and panels (c) & (d) show results for VS. Estimates and their 95% confidence intervals are plotted, obtained either with (full symbols) or without (open symbols) individuals with a single measure. Circles represent simulations where VI = 0·4; VS = 0·1; VR = 0·6 and squares are for VI = 0·2; VS = 0·1; VR = 0·8. Panels (a) & (c) show simulations with rIS = 0, while panels (b) & (d) show simulations with rIS = 0·5. Small sample size represents 30 individuals with four replicates and 20 with only one value, and large sample size was simulated with 300 individuals with four replicates and 200 with only one.

Discussion

Our simulations confirm that random regression models to explore variation in plasticity (I × E) are inherently data hungry. However, a number of other general, and potentially useful, conclusions emerge from our three realistic simulation scenarios. In the following sections, we summarize these insights and suggest potentially useful rules of thumbs for the empiricist. We then highlight some of the questions that remain open and unresolved for researchers interested in applying random regression models to the studies of phenotypic plasticity.

What is the best sampling strategy?

Although small sample sizes (around 100 observations) appear adequate to detect biologically realistic values of VI, variance in reaction norm slopes is clearly much harder to detect. Furthermore, it is also clear that the allocation of sampling effort (i.e. more individuals vs. more observations per individual) is important in determining power to detect VI and I × E. Extreme combinations of individuals/observations per individual generally had lower power, while, in our simulations at least, the number of individuals to observations per individual ratio that offered the highest power was around 0·5. This result suggests a potentially useful rule of thumb for designing a sampling strategy when the total sample size is limited and confirms the approach already used in different studies. For example, for a total sample size limited to 200, a researcher should aim for around 10 individuals each with 20 observations. However, this might be an unachievable target because of logistical, ecological or ethical questions. In such cases, our simulations suggest that one should target the individual/observations per individual ratio closest to 0·5 that can be attained. With sample size greater than 200 observations, ratios >0·5 (i.e. more individuals with fewer observations each) should be preferred because they provide more power to detect both VI and I × E than ratios smaller than 0·5. This finding is encouraging because it reflects the data structure of most biological data sets collated in the field.

How should we interpret non‐significant I × E?

Results from our simulations based on two wild bird studies suggest that power to detect I × E was reasonably high in these scenarios. In our simulated scenarios with the EAMM function, estimated power to detect I × E at VS values of 0·04 (‘Guillemot scenario’) and 0·06 (‘great tit scenario’) was high even when rIS = 0. As noted earlier, setting rIS = 0 represents a worst‐case scenario in the sense that statistical power to detect I × E appears to increase with the magnitude of this correlation. These results are encouraging, not least because few field studies enjoy data sets as large as the two studies on which our simulated scenarios were based.

More generally, we do not suggest that this type of analysis is always necessary if the null hypothesis of no variation in plasticity (i.e. VS = 0) cannot be rejected. In most cases, the uncertainty around an estimate of VS can (and should) be assessed using estimated standard errors or confidence intervals. It should be noted that asymmetric confidence intervals can complicate the interpretation of standard errors (Bates, unpublished data) such that profile‐likelihood‐based confidence intervals are probably preferable. It is crucial that, wherever possible, researchers report confidence intervals for all (co)variance terms and identify whether matrices have been constrained, and it is always necessary to avoid the trap of equating lack of statistical significance with an effect size of zero. However, as highlighted earlier, there are occasions when variance components are constrained to the boundaries of allowable space and standard errors cannot be estimated. In such cases, or when precision has been inadequately reported in the literature, it is well worth applying a post hoc analysis of the kind demonstrated here. Nevertheless, it is important to recognize that the power estimations made in this way are conditioned on the exact parameter values used for the simulations, not the exact parameter values of the motivating studies themselves (which are obviously unknown and therefore estimated with uncertainty).

Should we censor individuals?

Despite the common practice of excluding data from individuals observed only once (or sometimes twice; Table S1), our simulations suggest that this practice has no benefit and reduces statistical power. Thus, including individuals with a single value reduced the confidence intervals around variance component estimates in all simulated cases. This result indicates that there is no need to discard hard won data when using random regression models to analyse plasticity and that all observations contribute usefully to parameter estimation.

It is important to note that in our simulations individuals with only a single observation were in other respects no different from the individuals with higher numbers of observations. In real‐world field studies, however, individuals with more observations may comprise a non‐random sample for the trait of interest. For example, animal personality could induce a bias in sampling if shy (weakly explorative) individuals are sampled less frequently than bold (highly explorative) ones (Biro & Dingemanse 2009). There is also the very real possibility that repeated captures and tests could themselves modify a behaviour being observed. Finally, it is worth noting that some individuals may have more records simply because they lived longer and that mortality may not be random with respect to the trait of interest. We have not simulated such effects here but note that if present then data censoring of individuals with few records may produce a sample that yields an invalid and incomplete biological representation of the population. In such cases, parameter estimates from random regression analyses may be biased (Hadfield 2008) and additional steps (e.g. the ‘within‐ individual’ centring method of van de Pol & Wright (2009)) might be useful to ensure accurate partitioning of within‐individual and between‐individual processes.

Further considerations

Homogeneous vs. heterogeneous residual variances

In our simulations, we have made the common assumption that residual variance is homogenous across X. However, it is worth noting that heterogeneous residual variance structures can often provide a better fit to empirical data in wild animal populations (Wilson et al. 2007; Brommer, Rattiste, & Wilson 2008). For instance, residual variance for a trait may be found to increase with age (e.g. Wilson et al. 2007) or an environmental covariate such as temperature (Brommer, Rattiste, & Wilson 2008). While we have not presented results using heterogeneous error structures here, functions in the ‘pamm’ package do have the capability to simulate such data. We hope that this will facilitate further exploration of impacts on accuracy and power that may arise from mis‐specification of residual variances. It should be noted, however, that not all packages for mixed model analyses have the capacity to fit the more complex error structures required if residual variance is not homogenous.

Distribution of observations along the X axis

In the simulations presented earlier we assumed that phenotypic observations were made at values of X that were drawn from a normal distribution. In scenarios 1 and 3, these values were also independent among observations (both within and between individuals), while in scenario 2 we assumed they were common to all observations made in a given year. It is likely that in many, and perhaps most, real‐world situations the environmental covariate will be correlated among observations that are grouped in time or space (and such clustering is likely for repeated observations on an individual). Furthermore, X may sometimes have a more uniform distribution (e.g. for studies of growth with size measured at regular, equally spaced time points). Our expectation is that the distribution of X will have an impact on the power of random regression analyses. For example, if the X is both measured annually and correlated between observations in successive years (a common occurrence), then we might expect a decrease in power to detect I × E because of reduced within‐individual variance in X. In Great tit and Guillemot scenarios, all observations within any given year had the same value of X, and while our simulations indicated there should be sufficient power to detect I × E over the range of effect sizes tested, we did not simulate any temporal correlation of X among years that may also exist. Thus, we certainly have not explored all possible scenarios in the results presented, and we suggest that thought be given to the appropriate distribution of X when using simulations to aid experimental design. Accordingly, functions in the ‘pamm’ package have been written with the capability to simulate phenotypic data using different distributions of X (including non‐independence; see Appendix S1).

How can we compare estimates of repeatability and I × E across studies?

A further question that we have not directly addressed here concerns the plausible range of VS values over which to carry out simulations for power analysis. In the absence of pilot data, educated guesses can be informed from relevant literature, although the magnitude of phenotypic variance components necessarily depends on the scaling of the trait (Y). For this reason, it is common practice to standardize variance components, for example by scaling by the phenotypic mean (Houle 1998) or by expressing them as a proportion of the total phenotypic variance. In the simulations aforementioned, we have effectively taken the second option by scaling phenotypic variance at X = 0–1. With unit variance and at X = 0, VI is equal to the repeatability of the trait (R), which is a dimensionless quantity used in behavioural and evolutionary ecology (Goldstein, Browne, & Rasbash 2002). For example, studies of animal personality or temperament often use trait repeatability as a measure of the importance of personality in a given system (Réale et al. 2007; Dingemanse et al. 2010).

Unfortunately, the variance in reaction norm slopes (VS) cannot be directly compared in size to VI because it is in units of Y2 X−2 as opposed to Y2. Furthermore, if VS is non‐zero, then the among‐individual variance itself will change as a function of X and, as a consequence, so will the trait repeatability. Thus, an important implication of the existence of I × E is that between‐individual variation changes as a function of the environment (see Fig. S1). Consequently, all else being equal estimates of repeatability for a given trait will differ across environments. Finally, because I × E is dependent on both VS and cov(I,S), the intercept–slope covariance should also be reported and standardized to a correlation.

To better allow comparison of the amount of variance in plasticity across studies, one option is to rescale the environmental parameter (X) so that it too is dimensionless (i.e. transform to standard deviation units). VS can then be expressed as a proportion of observed phenotypic variance across all data. This approach was advocated in Nussey, Wilson, & Brommer (2007) and used to suggest that across a handful of studies employing a random regression approach, the amount of phenotypic variance attributable to variation in reaction norm slopes was consistently around 5% (Nussey, Wilson, & Brommer 2007). However, it is also important to recognize that parameter estimates will depend not only on the units of X as highlighted earlier, but also on whether it is zero‐centred (i.e. mean or median X set to 0; see Table 1). VI and the sign as well as the magnitude of rIS are altered by location (i.e. the choice of where zero is for X; see Table 1 for an example), highlighting the fact that a naïve biological interpretation of these terms has enormous potential to mislead.

Another way to express the information in a way that facilitates cross‐study comparison of I × E would be to generate environment‐specific estimates of the repeatability (or ‘variance partition coefficient’, τ) over the observed range of the environmental covariate X (Goldstein, Browne, & Rasbash 2002). The environment‐specific repeatability is readily interpreted as the proportion of total variance attributable to among‐individual variation when measuring all individuals under the same specific environmental conditions and ranges between 0 and 1. Under a linear reaction norm model with non‐zero VS (and homogeneous VR), τ follows a quadratic function of environment (X) (Goldstein, Browne, & Rasbash 2002; see Fig. S2). Assuming a constant residual variance (VR) with X, then the phenotypic variance (VP) in environment X is:
image
and the variance partition coefficient for any value of X:
image
When VS = 0 or X = 0, τ is equivalent to the classical estimate of repeatability, R = VI × (VI + VR)−1. This approach would allow authors to report a range of environment‐specific repeatabilities (e.g. τ at X = 0, as well as perhaps the minimum and maximum values of τ and the values of X at which these occur). In conjunction with the estimates of I, VR, and a description of the environmental variation (e.g. mean, variance and range of X) this information would facilitate more meaningful comparison of levels of I × E across studies. Reporting confidence intervals for values of τ across the environment range is also an important though not a trivial exercise. Approximate standard errors of τ across the environment can be computed analytically for REML‐based estimates of Ψ (Fischer, Gilmour, & Werf 2004). However, because distributions of variance component estimates are often asymmetric (Bates, unpublished data), confidence intervals derived from estimated standard errors should be interpreted with some caution. Alternatively, if the random regression model is fitted using a Bayesian approach, then it would be straightforward to determine credible intervals from the posterior distributions of τ (however, it should be noted that the posterior will contain no useful information about the plausibility of VS = 0, if for example the prior for VS has no support at VS = 0 or if the model fitting constrains VS to be non‐negative).

By way of example, in our simulations using the EAMM function, when rIS = 0, VI = 0·25 and VR = 0·75, we had high power to detect VS at values as low as 0·04 (‘Guillemot scenario’). Although the absolute values of VS here appear encouragingly small, under these conditions the repeatability of trait Y will actually change substantially over X (τx = 0·25–0·42 for the ‘guillemot’ case, where we assumed X to range between −2·7 and 2·7, with a mean of 0 and variance of 1, see Fig. S2). Thus, the apparently small value of VS that can be detected translates into a level of I × E that is substantial when its implications for repeatability are recognized.

Conclusions

In summary, random regression models offer a useful approach for studies of individual plasticity, but sampling strategies must be carefully designed to ensure that a data set affords sufficient statistical power to detect I × E at biologically plausible levels. The general advice outlined earlier, coupled with the simulation tools provided, should be of use to empiricists interested in employing this method. We would emphasize that large sample sizes are required for detecting I × E and that an intermediate ratio of the number of individuals to the number of observations per individual appears to consistently maximize power. We also note that individuals (or groups) with one or a small number of repeated measures should not be dropped from random regressions. Estimating power represents an important part of the design stage of any experimental study. However, once a study has been undertaken, the easiest means to evaluate power is through reported uncertainty and error terms around parameter estimates. Thus, we urge researchers to report confidence intervals (or equivalents) for all (co)variance components and to highlight any a priori constraints placed on model parameters when publishing their results. Finally, we suggest a means of standardizing and comparing patterns of variation in reaction norms across studies and urge authors to use this approach for reporting these values in the future.

Acknowledgements

NSERC provided financial support to JM through a PhD scholarship and to DR through a discovery grant. DHN is supported by a Natural Environment Research Council (NERC) post‐doctoral fellowship and AJW by a BBSRC David Phillips fellowship. We thank F. Janzen and one anonymous reviewer for helpful comments on earlier drafts of the manuscript and are particularly grateful to D. Elston for his detailed and constructive criticisms that have led to major improvements in this work.

      Number of times cited according to CrossRef: 123

      • Visual cues of predation risk outweigh acoustic cues: a field experiment in black-capped chickadees, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.2002, 287, 1936, (20202002), (2020).
      • Breeding strategies for animal resilience to weather variation in meat sheep, BMC Genetics, 10.1186/s12863-020-00924-5, 21, 1, (2020).
      • Under cover of the night: context-dependency of anthropogenic disturbance on stress levels of wild roe deer Capreolus capreolus, Conservation Physiology, 10.1093/conphys/coaa086, 8, 1, (2020).
      • Call order within vocal sequences of meerkats contains temporary contextual and individual information, BMC Biology, 10.1186/s12915-020-00847-8, 18, 1, (2020).
      • Is less more? A commentary on the practice of ‘metric hacking’ in animal social network analysis, Animal Behaviour, 10.1016/j.anbehav.2020.08.011, 168, (109-120), (2020).
      • The impact of personality, morphotype and shore height on temperature‐mediated behavioural responses in the beadlet anemone Actinia equina, Journal of Animal Ecology, 10.1111/1365-2656.13301, 89, 10, (2311-2324), (2020).
      • Lifetime low behavioural plasticity of personality traits in the common vole (Microtus arvalis) under laboratory conditions, Ethology, 10.1111/eth.13039, 126, 8, (812-823), (2020).
      • Optimization of Eucalyptus breeding through random regression models allowing for reaction norms in response to environmental gradients, Tree Genetics & Genomes, 10.1007/s11295-020-01431-5, 16, 2, (2020).
      • Pedigree‐free quantitative genetic approach provides evidence for heritability of movement tactics in wild roe deer, Journal of Evolutionary Biology, 10.1111/jeb.13594, 33, 5, (595-607), (2020).
      • Consistent measures of oxidative balance predict survival but not reproduction in a long‐distance migrant, Journal of Animal Ecology, 10.1111/1365-2656.13237, 89, 8, (1872-1882), (2020).
      • Slicing: A sustainable approach to structuring samples for analysis in long‐term studies, Methods in Ecology and Evolution, 10.1111/2041-210X.13352, 11, 3, (418-430), (2020).
      • Natural Leaders: Some Interlocutors Elicit Greater Convergence Across Conversations and Across Characteristics, Cognitive Science, 10.1111/cogs.12897, 44, 10, (2020).
      • Uncovering drivers of dose-dependence and individual variation in malaria infection outcomes, PLOS Computational Biology, 10.1371/journal.pcbi.1008211, 16, 10, (e1008211), (2020).
      • Orb-weaving spiders show a correlated syndrome of morphology and web structure in the wild, Biological Journal of the Linnean Society, 10.1093/biolinnean/blaa104, (2020).
      • Cognition in Context: Plasticity in Cognitive Performance in Response to Ongoing Environmental Variables, Frontiers in Ecology and Evolution, 10.3389/fevo.2020.00106, 8, (2020).
      • Random Effects Misspecification Can Have Severe Consequences for Random Effects Inference in Linear Mixed Models, International Statistical Review, 10.1111/insr.12378, 0, 0, (2020).
      • Individual Variation in Glucocorticoid Plasticity: Considerations and Future Directions, Integrative and Comparative Biology, 10.1093/icb/icaa003, (2020).
      • Quantifying individual variation in reaction norms: Mind the residual, Journal of Evolutionary Biology, 10.1111/jeb.13571, 33, 3, (352-366), (2019).
      • Context is key: A comment on Herczeg et al. 2019, Journal of Evolutionary Biology, 10.1111/jeb.13520, 32, 12, (1444-1449), (2019).
      • Genetic analysis of novel phenotypes for farm animal resilience to weather variability, BMC Genetics, 10.1186/s12863-019-0787-z, 20, 1, (2019).
      • Individual variation and the challenge hypothesis, Hormones and Behavior, 10.1016/j.yhbeh.2019.06.013, (104549), (2019).
      • Differences in behavior help to explain lemming coexistence, Journal of Mammalogy, 10.1093/jmammal/gyz103, 100, 4, (1211-1220), (2019).
      • Individual variation in phenotypic plasticity of the stress axis, Biology Letters, 10.1098/rsbl.2019.0260, 15, 7, (20190260), (2019).
      • On the importance of individual differences in behavioural skill, Animal Behaviour, 10.1016/j.anbehav.2019.06.017, (2019).
      • Genotype–covariate correlation and interaction disentangled by a whole-genome multivariate reaction norm model, Nature Communications, 10.1038/s41467-019-10128-w, 10, 1, (2019).
      • Nest defence behavioural reaction norms: testing life-history and parental investment theory predictions, Royal Society Open Science, 10.1098/rsos.182180, 6, 4, (182180), (2019).
      • North American red squirrels mitigate costs of territory defence through social plasticity, Animal Behaviour, 10.1016/j.anbehav.2019.02.014, 151, (29-42), (2019).
      • Animal Personality, Reference Module in Life Sciences, 10.1016/B978-0-12-809633-8.90723-4, (2019).
      • Causes and Consequences of Phenotypic Plasticity in Complex Environments, Trends in Ecology & Evolution, 10.1016/j.tree.2019.02.010, (2019).
      • Intraspecific variation in boldness and exploration shapes behavioral responses to stress in Galápagos sea lion pups, Behavioral Ecology and Sociobiology, 10.1007/s00265-019-2775-8, 73, 12, (2019).
      • Consistent within‐individual plasticity is sufficient to explain temperature responses in red deer reproductive traits, Journal of Evolutionary Biology, 10.1111/jeb.13521, 32, 11, (1194-1206), (2019).
      • Phenotypic plasticity or evolutionary change? An examination of the phenological response of an arctic seabird to climate change, Functional Ecology, 10.1111/1365-2435.13406, 33, 11, (2180-2190), (2019).
      • The neutrophil to lymphocyte ratio indexes individual variation in the behavioural stress response of wild roe deer across fluctuating environmental conditions, Behavioral Ecology and Sociobiology, 10.1007/s00265-019-2755-z, 73, 11, (2019).
      • Spatio-temporal variation in oxidative status regulation in a small mammal, PeerJ, 10.7717/peerj.7801, 7, (e7801), (2019).
      • Between- and Within-Individual Variation of Maternal Thyroid Hormone Deposition in Wild Great Tits ( Parus major ) , The American Naturalist, 10.1086/704738, (E000-E000), (2019).
      • Statistical power in genome-wide association studies and quantitative trait locus mapping, Heredity, 10.1038/s41437-019-0205-3, (2019).
      • Antioxidant capacity is repeatable across years but does not consistently correlate with a marker of peroxidation in a free-living passerine bird, Journal of Comparative Physiology B, 10.1007/s00360-019-01211-1, (2019).
      • A Practical Protocol for the Experimental Design of Comparative Studies on Water Treatment, Water, 10.3390/w11010162, 11, 1, (162), (2019).
      • Top‐down control by an aquatic invertebrate predator increases with temperature but does not depend on individual behavioral type, Ecology and Evolution, 10.1002/ece3.4367, 8, 16, (8256-8265), (2018).
      • The repeatability of cognitive performance: a meta-analysis, Philosophical Transactions of the Royal Society B: Biological Sciences, 10.1098/rstb.2017.0281, 373, 1756, (20170281), (2018).
      • Individual variation in reproductive behaviour is linked to temporal heterogeneity in predation risk, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.1499, 285, 1870, (20171499), (2018).
      • Variation and Evolution of Function-Valued Traits, Annual Review of Ecology, Evolution, and Systematics, 10.1146/annurev-ecolsys-110316-022830, 49, 1, (139-164), (2018).
      • Individual vigilance profiles in flocks of House Sparrows ( Passer domesticus ) , Canadian Journal of Zoology, 10.1139/cjz-2017-0301, 96, 9, (1016-1023), (2018).
      • Morphological homeostasis in the fossil record, Seminars in Cell & Developmental Biology, 10.1016/j.semcdb.2018.05.016, (2018).
      • Complex dynamics and the development of behavioural individuality, Animal Behaviour, 10.1016/j.anbehav.2018.02.015, 138, (e1-e6), (2018).
      • Habituation and individual variation in the endocrine stress response in the Trinidadian guppy (Poecilia reticulata), General and Comparative Endocrinology, 10.1016/j.ygcen.2018.10.013, (2018).
      • Consistent individual differences and population plasticity in network-derived sociality: An experimental manipulation of density in a gregarious ungulate, PLOS ONE, 10.1371/journal.pone.0193425, 13, 3, (e0193425), (2018).
      • Circadian Rhythms of Urinary Cortisol Levels Vary Between Individuals in Wild Male Chimpanzees: A Reaction Norm Approach, Frontiers in Ecology and Evolution, 10.3389/fevo.2018.00085, 6, (2018).
      • Applying the framework and concepts of parasitology to avian brood parasitism: a comment on Avilés, Behavioral Ecology, 10.1093/beheco/arx178, 29, 3, (520-521), (2017).
      • Methods for detecting and quantifying individual specialisation in movement and foraging strategies of marine predators, Marine Ecology Progress Series, 10.3354/meps12215, 578, (151-166), (2017).
      • Adult wheel access interaction with activity and boldness personality in Siberian dwarf hamsters ( Phodopus sungorus ), Behavioural Processes, 10.1016/j.beproc.2017.02.021, 138, (82-90), (2017).
      • Lifelong effects of trapping experience lead to age-biased sampling: lessons from a wild bird population, Animal Behaviour, 10.1016/j.anbehav.2017.06.018, 130, (133-139), (2017).
      • Body size predicts between-individual differences in exploration behaviour in the southern corroboree frog, Animal Behaviour, 10.1016/j.anbehav.2017.05.013, 129, (161-170), (2017).
      • Of Uberfleas and Krakens: Detecting Trade-offs Using Mixed Models, Integrative and Comparative Biology, 10.1093/icb/icx015, 57, 2, (362-371), (2017).
      • Corticosterone regulation in house sparrows invading Senegal, General and Comparative Endocrinology, 10.1016/j.ygcen.2017.05.018, 250, (15-20), (2017).
      • Foraging sparrows exhibit individual differences but not a syndrome when responding to multiple kinds of novelty, Behavioral Ecology, 10.1093/beheco/arx014, 28, 3, (732-743), (2017).
      • Synergistic effect of daily temperature fluctuations and matching light-dark cycle enhances population growth and synchronizes oviposition behavior in a soil arthropod, Journal of Insect Physiology, 10.1016/j.jinsphys.2016.10.002, 96, (108-114), (2017).
      • Long-term fitness consequences of early environment in a long-lived ungulate, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2017.0222, 284, 1853, (20170222), (2017).
      • Avoiding the misuse of BLUP in behavioural ecology, Behavioral Ecology, 10.1093/beheco/arx023, 28, 4, (948-952), (2017).
      • Statistical Quantification of Individual Differences (SQuID): an educational and statistical tool for understanding multilevel phenotypic data in linear mixed models, Methods in Ecology and Evolution, 10.1111/2041-210X.12659, 8, 2, (257-267), (2016).
      • Two types of dominant male cichlid fish: behavioral and hormonal characteristics, Biology Open, 10.1242/bio.017640, 5, 8, (1061-1071), (2016).
      • A simple statistical guide for the analysis of behaviour when data are constrained due to practical or ethical reasons, Animal Behaviour, 10.1016/j.anbehav.2015.11.009, 120, (223-234), (2016).
      • Employing individual measures of baseline glucocorticoids as population-level conservation biomarkers: considering within-individual variation in a breeding passerine, Conservation Physiology, 10.1093/conphys/cow048, 4, 1, (cow048), (2016).
      • Effective field-based methods to quantify personality in brushtail possums (Trichosurus vulpecula), Wildlife Research, 10.1071/WR15216, 43, 4, (332), (2016).
      • Rate of movement of juvenile lemon sharks in a novel open field, are we measuring activity or reaction to novelty?, Animal Behaviour, 10.1016/j.anbehav.2016.03.032, 116, (75-82), (2016).
      • Male experience buffers female laying date plasticity in a winter-breeding, food-storing passerine, Animal Behaviour, 10.1016/j.anbehav.2016.08.014, 121, (61-70), (2016).
      • Variable Signals in a Complex World, , 10.1016/bs.asb.2016.02.002, (319-386), (2016).
      • Glucocorticoid-Mediated Phenotypes in Vertebrates, , 10.1016/bs.asb.2016.01.002, (41-115), (2016).
      • Repeatability of locomotor performance and morphology–locomotor performance relationships, The Journal of Experimental Biology, 10.1242/jeb.141259, 219, 18, (2888-2897), (2016).
      • Interactions between boldness, foraging performance and behavioural plasticity across social contexts, Behavioral Ecology and Sociobiology, 10.1007/s00265-016-2193-0, 70, 11, (1879-1889), (2016).
      • Telomere length covaries with personality in wild brown trout, Physiology & Behavior, 10.1016/j.physbeh.2016.07.005, 165, (217-222), (2016).
      • Context dependency of trait repeatability and its relevance for management and conservation of fish populations, Conservation Physiology, 10.1093/conphys/cow007, 4, 1, (cow007), (2016).
      • Endocrine Flexibility: Optimizing Phenotypes in a Dynamic World?, Trends in Ecology & Evolution, 10.1016/j.tree.2016.03.005, 31, 6, (476-488), (2016).
      • SIMR: an R package for power analysis of generalized linear mixed models by simulation, Methods in Ecology and Evolution, 10.1111/2041-210X.12504, 7, 4, (493-498), (2016).
      • Demystifying animal ‘personality’ (or not): why individual variation matters to experimental biologists, The Journal of Experimental Biology, 10.1242/jeb.146712, 219, 24, (3832-3843), (2016).
      • When Push Comes to Shove: Compensating and Opportunistic Strategies in a Collective-Risk Household Energy Dilemma, Frontiers in Energy Research, 10.3389/fenrg.2016.00008, 4, (2016).
      • Exploration is dependent on reproductive state, not social state, in a cooperatively breeding bird, Behavioral Ecology, 10.1093/beheco/arw119, (arw119), (2016).
      • Using repeatability to study physiological and behavioural traits: ignore time-related change at your peril, Animal Behaviour, 10.1016/j.anbehav.2015.04.008, 105, (223-230), (2015).
      • Predictors of Individual Variation in Movement in a Natural Population of Threespine Stickleback (Gasterosteus aculeatus), Trait-Based Ecology - From Structure to Function, 10.1016/bs.aecr.2015.01.004, (65-90), (2015).
      • Dynamics of among-individual behavioral variation over adult lifespan in a wild insect, Behavioral Ecology, 10.1093/beheco/arv048, 26, 4, (975-985), (2015).
      • Personality does not constrain social and behavioural flexibility in African striped mice, Behavioral Ecology and Sociobiology, 10.1007/s00265-015-1937-6, 69, 8, (1237-1249), (2015).
      • Among-year variation in the repeatability, within- and between-individual, and phenotypic correlations of behaviors in a natural population, Behavioral Ecology and Sociobiology, 10.1007/s00265-015-2012-z, 69, 12, (2005-2017), (2015).
      • Sand lizard (Lacerta agilis) phenology in a warming world, BMC Evolutionary Biology, 10.1186/s12862-015-0476-0, 15, 1, (2015).
      • Changes in wild red squirrel personality across ontogeny: activity and aggression regress towards the mean, Behaviour, 10.1163/1568539X-00003279, 152, 10, (1291-1306), (2015).
      • An approach to estimate short‐term, long‐term and reaction norm repeatability, Methods in Ecology and Evolution, 10.1111/2041-210X.12430, 6, 12, (1462-1473), (2015).
      • A practical guide and power analysis for GLMMs: detecting among treatment variation in random effects, PeerJ, 10.7717/peerj.1226, 3, (e1226), (2015).
      • Epidemiology and Heritability of Major Depressive Disorder, Stratified by Age of Onset, Sex, and Illness Course in Generation Scotland: Scottish Family Health Study (GS:SFHS), PLOS ONE, 10.1371/journal.pone.0142197, 10, 11, (e0142197), (2015).
      • Conspicuous Female Ornamentation and Tests of Male Mate Preference in Threespine Sticklebacks (Gasterosteus aculeatus), PLOS ONE, 10.1371/journal.pone.0120723, 10, 3, (e0120723), (2015).
      • Does metabolic rate predict risk‐taking behaviour? A field experiment in a wild passerine bird, Functional Ecology, 10.1111/1365-2435.12318, 29, 2, (239-249), (2014).
      • Power analysis for generalized linear mixed models in ecology and evolution, Methods in Ecology and Evolution, 10.1111/2041-210X.12306, 6, 2, (133-142), (2014).
      • Physiological flexibility in an avian range expansion, General and Comparative Endocrinology, 10.1016/j.ygcen.2014.07.016, 206, (227-234), (2014).
      • Born to win? Maybe, but perhaps only against inferior competition, Animal Behaviour, 10.1016/j.anbehav.2014.07.024, 96, (e1-e3), (2014).
      • Individual variation in thermal performance curves: swimming burst speed and jumping endurance in wild-caught tropical clawed frogs, Oecologia, 10.1007/s00442-014-2925-7, 175, 2, (471-480), (2014).
      • The evolution of flexible parenting, Science, 10.1126/science.1253294, 345, 6198, (776-781), (2014).
      • Within-population differences in personality and plasticity in the trade-off between vigilance and foraging in kangaroos, Animal Behaviour, 10.1016/j.anbehav.2014.04.003, 92, (175-184), (2014).
      • Baseline and stress-induced glucocorticoid concentrations are not repeatable but covary within individual great tits (Parus major), General and Comparative Endocrinology, 10.1016/j.ygcen.2014.08.014, 208, (154-163), (2014).
      • Individual and sex‐specific differences in intrinsic growth rate covary with consistent individual differences in behaviour, Journal of Animal Ecology, 10.1111/1365-2656.12210, 83, 5, (1186-1195), (2014).
      • Heritable, Heterogeneous, and Costly Resistance of Sheep against Nematodes and Potential Feedbacks to Epidemiological Dynamics, The American Naturalist, 10.1086/676929, 184, S1, (S58-S76), (2014).
      • Natural Selection on Individual Variation in Tolerance of Gastrointestinal Nematode Infection, PLoS Biology, 10.1371/journal.pbio.1001917, 12, 7, (e1001917), (2014).
      • Reaction Norms in Natural Conditions: How Does Metabolic Performance Respond to Weather Variations in a Small Endotherm Facing Cold Environments?, PLoS ONE, 10.1371/journal.pone.0113617, 9, 11, (e113617), (2014).
      • See more