Statistical Quantification of Individual Differences (SQuID): an educational and statistical tool for understanding multilevel phenotypic data in linear mixed models
Summary
 Phenotypic variation exists in and at all levels of biological organization: variation exists among species, amongindividuals withinpopulations, and in the case of l withinpopulations abile traits, withinindividuals. Mixedeffects models represent ideal tools to quantify multilevel measurements of traits and are being increasingly used in evolutionary ecology.
 Mixedeffects models are relatively complex, and two main issues may be hampering their proper usage: (i) the relatively few educational resources available to teach new users how to implement and interpret them and (ii) the lack of tools to ensure that the statistical parameters of interest are correctly estimated.
 In this paper, we introduce Statistical Quantification of Individual Differences (SQuID), a simulationbased tool that can be used for research and educational purposes. SQuID creates a virtual world inhabited by subjects whose phenotypes are generated by a userdefined phenotypic equation, which allows easy translation of biological hypotheses into quantifiable parameters.
 Statistical Quantification of Individual Differences currently models normally distributed traits with linear predictors, but SQuID is subject to further development and will adapt to handle more complex scenarios in the future. The current framework is suitable for performing simulation studies, determining optimal sampling designs for userspecific biological problems and making simulationbased inferences to aid in the interpretation of empirical studies.
 Statistical Quantification of Individual Differences is also a teaching tool for biologists interested in learning, or teaching others, how to implement and interpret linear mixedeffects models when studying the processes causing phenotypic variation. Interfacebased modules allow users to learn about these issues. As research on effects of sampling designs continues, new issues will be implemented in new modules, including nonlinear and nonGaussian data.
Introduction
Variation is the most striking feature of the natural world (Hallgrímsson & Hall 2005). However, we do not always appreciate that phenotypic variation is caused by processes occurring at multiple hierarchical levels (Wilson 1998; Nussey, Wilson & Brommer 2007; Williams 2008; Westneat, Wright & Dingemanse 2015). Phenotypes vary across species, across populations of the same species, across individuals of the same population and across repeated observations of the same individual. One of the most important biological levels to both ecological and evolutionary processes is the individual, with variation among individuals influencing social interactions (Dingemanse & ArayaAjoy 2015), population demography (Dochtermann & Gienger 2012), community structure (Bolnick et al. 2011) and evolutionary dynamics (Dall et al. 2012). Moreover, individuals express traits repeatedly and these expressions are known to vary distinctly within and among individuals (e.g. behaviour, life history, morphology, physiology).
By sampling traits repeatedly for a given set of individuals, we can estimate how changeable or stable individuals are. This assessment can be achieved by applying statistical approaches aimed at partitioning phenotypic variation into within and amongindividual variance components (e.g. Nussey, Wilson & Brommer 2007; Nakagawa & Schielzeth 2010; Dingemanse & Dochtermann 2013). With this and other information in hand, we can start examining the genetic and environmental factors contributing to the variation observed at each hierarchical level (Wilson et al. 2010; Schielzeth & Nakagawa 2013). Doing so, however, requires specific sampling schemes and statistical tools to ensure that statistical parameters of interest are estimated accurately, precisely and with sufficient statistical power (Martin et al. 2011; van de Pol 2012; Dingemanse & Dochtermann 2013; ArayaAjoy, Mathot & Dingemanse 2015; Johnson et al. 2015; Kain, Bolker & McCoy 2015; Green & MacLeod 2016).
The mixedeffects modelling framework has become a particularly popular statistical tool to achieve such aims, especially in the field of evolutionary ecology (Kruuk 2004; Bolker et al. 2009; O'Hara 2009; van de Pol & Wright 2009; Nakagawa & Schielzeth 2010, 2013; Wilson et al. 2010; Dingemanse & Dochtermann 2013). This is because mixedeffects models explicitly model stratification in the data. One of the strengths of mixedeffects models is that they evaluate the importance of fixed effects while simultaneously estimating the relative magnitudes of random effects. Mixedeffects models, which include linear and generalized linear mixed models (Bolker et al. 2009), are often underutilized. One reason may be that they are not usually covered in introductory or midlevel statistical courses despite being more challenging to learn and more difficult to use correctly compared to traditional approaches (Bolker et al. 2009; van de Pol & Wright 2009; Nakagawa & Schielzeth 2013). We therefore require tools to educate people how to appropriately perform the analysis of complex data inherent to mixedeffects modelling.
With the complexity of mixedeffects models comes the difficulty of assessing whether statistical parameters of interest are correctly estimated. Several recent papers have addressed this problem by using simulations to evaluate power and accuracy of parameter estimation, particularly for models estimating among and withinindividual variance components (e.g. Martin et al. 2011; Garamszegi & Herczeg 2012; van de Pol 2012; Dingemanse & Dochtermann 2013; ArayaAjoy, Mathot & Dingemanse 2015; Kain, Bolker & McCoy 2015). Given the many possible biological questions that one may ask (e.g. Are individuals repeatable, do they vary in level of plasticity, are phenotypic traits correlated within or among individuals?), we require a flexible simulation environment that enables performance assessment (e.g. power and other sensitivity analyses) for all statistical parameters, both before and after data have been collected. As far as we are aware, there is no widely accessible simulation programme that targets ecologists and evolutionary biologists, and guides them in the best way to both sample and analyse data (as existing packages, detailed below, enable either one or the other rather than both).
In this paper, we introduce SQuID, Statistical Quantification of Individual Differences: an environment for simulating multilevel data. SQuID is an r (v3.3.0) package (R Core Team 2015) that, in addition to traditional r packages, can also be used through a user interface platform built by the shiny package (Chang et al. 2016). The latest released version of the SQuID package can be installed from CRAN (url: https://cran.rproject.org/package=squid) by running:

> install.packages(“squid”)>
The latest development version can be installed from github (url: https://github.com/hallegue/squid) by running:

> install.packages(“devtools”)

> devtools::install_github(“hallegue/squid”)
The following r code runs the SQuID application (browserbased interface):

> library(squid)

> squidApp()
SQuID serves two main purposes. First, it provides an educational tool useful for students, teachers and researchers who want to learn to use mixedeffects models. Users can learn how the mixedeffects model framework can be used to understand distinct biological phenomena (e.g. the environmental factors generating variation within and among individuals, the hierarchical structure of phenotypes) by interactively exploring simulated multilevel data generated based on phenotypic equations. Secondly, SQuID offers research opportunities to those who are already familiar with mixedeffects models, as SQuID enables the generation of data sets that users may use for a range of simulationbased statistical analyses such as power and sensitivity analyses of highly realistic and complex multilevel data. With these two purposes, SQuID allows both educational and primary research opportunities.
We note that while several other r packages (e.g. clusterPower, manque, nlmeU, odprism, pamm, RLRsim, simr) are available to run simulations and to study specific aspects of mixedeffects models (Scheipl, Greven & Kuchenhoff 2008; Martin et al. 2011; van de Pol 2012; Galecki & Burzykowski 2013; Reich & Obeng 2013; Wu 2014; Green & MacLeod 2016), most of them deal with issues related to statistical performance (i.e. power and bias). With its specific ability to separate the generation of data in two steps, one generating the world and the second generating the sampled data, SQuID provides more flexibility to manipulate the effects of data sampling on the results of a mixedeffects model analysis. Furthermore, our platform offers a rigorous framework to evaluate some special aspects of the sampling design. We think that the way data are sampled can have profound effects on the estimation of variance components in a mixedeffects model and this is why in addition to thinking of SQuID as a simulation tool we also conceived it as an educational resource.
The anatomy of SQuID
The core of SQuID is a phenotype generator. We introduce SQuID using individuals as the focal entities of interest, though other applications are possible. SQuID generates a virtual world inhabited by individuals whose phenotypes are generated by a userdefined (uni or bivariate) phenotypic equation (Nussey, Wilson & Brommer 2007) that is explicit on a temporal scale. Time is modelled in discrete steps, the number of which the user can define, but is typically very large to mimic continuous time. As a result, the world contains phenotypic values at all the time points and for all individuals. The SQuID framework therefore allows users to define the rules governing the SQuID world, simulate the environment and the phenotype of the individuals inhabiting it and then collect samples from this world (Fig. 1). Users can proceed to analyse the sampled data, make inferences about the data and try to reconstruct the world, while having the possibility to compare their inferences with the known rules that underlie the created world (Fig. 1).
The Phenotypic Equation
Phenotypic equation  
Summation of variance componentsa  
Component  Explanation  Variance componentb  Remarks 

Fixed effects  
β_{0}  Population mean  –  – 
β_{1}  Populationaverage response to an environmental effect x_{1} (with variance Var(x_{1}))  In SQuID Var(x_{1}) = 1  
β_{2}  Populationaverage response to an environmental effect x_{2} (with variance Var(x_{2}))  In SQuID Var(x_{2}) = 1  
β_{12}  Populationaverage interaction response to two environmental effects (x_{1}, x_{2})  Since in SQuID Var(x_{1}) = Var(x_{2}) = 1 and x_{1} and x_{2} are independent of each other, the expected variance of the product is Var(x_{1}x_{2}) = 1c  
Random effects  
I  Individualspecific deviations (random intercepts)  In the presence of random slope variation, V_{I} expresses the variance at the point where all covariates are zero. Since all covariates are centred to zero in SQuID, this represents the variance at average values of the covariate(s)  
S _{1}  Individualspecific response to an environmental effect x_{1} (random slopes)  In SQuID Var(x_{1}) = 1 and E(x_{1}) = 0, which considerably simplifies the equation to = Var(S_{1})  
S _{2}  Individualspecific response to an environmental effect x_{2} (random slopes)  In SQuID Var(x_{2}) = 1 and E(x_{2}) = 0, which considerably simplifies the equation to = Var(S_{2})  
S _{12}  Individualspecific response interaction to two environmental effects (x_{1,} x_{2}) (random slopes)  In SQuID Var(x_{1}) = Var(x_{2}) = 1 and E(x_{1}) = E(x_{2}) = 0 and independent of x_{1} and x_{2}, the expected variance of the product Var(x_{1}x_{2}) = 1 and the expected mean of the product is E(x_{1}x_{2}) = 0, which considerably simplifies the equation to = Var(S_{12})c  
I and S_{1}  Covariance between random intercepts and random slopes in response to an environmental effect x_{1}  In SQuID E(x_{1}) = 0 and hence the covariance does not contribute to total phenotypic varianced  
I and S_{2}  Covariance between random intercepts and random slopes in response to an environmental effect x_{2}  In SQuID E(x_{2}) = 0 and hence the covariance does not contribute to total phenotypic varianced  
I and S_{12}  Covariance between random intercepts and individualspecific response interaction to two environmental effects (x_{1}, x_{2}) (random slopes)  In SQuID E(x_{1}) = E(x_{2}) = 0 and an expected mean of the product of E(x_{1}x_{2}) = 0 and hence thecovariance does not contribute to total phenotypic varianced  
S_{1} and S_{2}  Covariance between random slopes in response to an environmental effect x_{1} and random slopes in response to an environmental effect x_{2}  In SQuID E(x_{1}) = E(x_{2}) = 0 and hence the covariance does not contribute to total phenotypic varianced  
S_{1} and S_{12}  Covariance between randomslopes in response to an environmental effect x_{1} and individualspecific response interaction to two environmental effects (x_{1}, x_{2})  In SQuID E(x_{1}) = E(x_{2}) = 0 and an expected mean of the product of E(x_{1}x_{2}) = 0 and hence the covariance does not contribute to total phenotypic varianced  
S_{2} and S_{12}  Covariance between random slopes in response to an environmental effect x_{2} and individualspecific response interaction to two environmental effects (x_{1,} x_{2})  In SQuID E(x_{1}) = E(x_{2}) = 0 and an expected mean of the product of E(x_{1}x_{2}) = 0 and hence the covariance does not contribute to total phenotypic varianced  
G  Higherlevel grouping variance (clusters, groups, families, etc.)  V_{G} = Var(G)  
e  Residuale  V_{R} = Var(e)  
y  Total phenotypic variance  V _{ P } 
 ^{a} Covariance parameters exist but do not contribute to total phenotypic variance.
 ^{b} Variances as they contribute to the total phenotypic variance. Note that we use V_{x} and Var(x) as alternative notations for the variances, COV_{x,y} and Cov(x,y) as alternative notations for covariances and E(x) for expectations.
 ^{c} We anticipate that the covariance between x_{1} and x_{2} can be set by the user while SQuID evolves, which will affect and E(x_{1}x_{2}).
 ^{d} Note the distinction between as a potential contributor to the variance and Cov(I, S_{1}) as a covariance between intercepts and slopes, and that it can be simulated and estimated. Mean centring of the environmental gradients has the advantage that we can interpret the intercept variance as the variance at an average environmental value and the interceptslope covariance as the location of the minimum of the betweenindividual variance. With arbitrary scaling of the environmental gradients, the interpretation of the intercept variance will change and will have to appear in the summation of variance components.
 ^{e} We use V_{R} to indicate Var(e) for two reasons. First, e is conventional notation for the deviation of an observation from the values predicted by a statistical model and V_{R} is conventional notation for the residual variance. In the SQuID modules, we also introduce V_{E}, the variance in phenotype due to environment. V_{e} and V_{E} would mean very different things, so to avoid confusion we adopt V_{R} to indicate residual variance.
Here, a single phenotypic value (y_{hi}), by individual i exhibited at instance h, is modelled as a function of an (userdefined) environmental gradient (x_{1hi} being the measure of that environmental variable x_{1} at instance h for individual i). Each phenotypic expression (y_{hi}) may be described by five distinct elements: (i) the populationmean phenotype in the average environment (β_{0}), (ii) the populationmean slope (β_{1}) to the environmental gradient (x_{1hi}), (iii) the individual's deviation from the populationmean phenotype (I_{i}), (iv) the individual's deviation from the populationmean slope (S_{1i}) in response to environmental gradient x_{1} (see next section) and (v) the instance's deviation from the individual's expected value due to unaccounted effects on the phenotype (residual; e_{hi}). Individual deviations from the populationmean value (intercept, I_{i}) and phenotypic response to the environment (slope, S_{1i}) are determined by a (co)variance matrix, a standard tabulation that holds the amongindividual variance in intercepts and slopes (on the diagonals) and their covariances (on the lower offdiagonals). Values of these individual deviations (for both, I_{i} and S_{1i}) are generated from a multivariate normal distribution (MNV) with a zeromean and variance/covariance equal to Ω_{IS} [i.e. MNV (0, Ω_{IS})]. More details on the meaning of MNV(0, Ω_{IS}) can be found in the stepbystep full tutorial module available on the SQuID application. Briefly, the covariance matrix holds all the (co)variance components necessary to generate the information associated with relative deviations of individual phenotypes. For instance, in the case of a single trait y with individual differences in intercepts and slopes, each individual deviation I_{i} (from intercept β_{0}) is generated based on the specified variance V_{I}; similarly, each deviation S_{1i} (from slope β_{1}) is generated based on the specified variance V_{S} and covariance Cov_{I,S}. More complexity can be added by modelling a second environmental gradient (x_{2}), a second trait z that is defined by its own phenotypic equation or a higher order random effect (see variable G in Table 1) suitable for investigating genetic variance (if G indicates family groups, e.g. Dingemanse et al. 2012) or among population variance (if G indicates populations, e.g. Westneat et al. 2014) or among taxon variation (if G represents species or above, e.g. Hadfield & Nakagawa 2010; Garamszegi, Marko & Herczeg 2013). Doing so necessitates the specification of trait covariance at each level of replication (detailed further in the different modules of SQuID). The current version of SQuID focuses on linear terms and Gaussian distributions of phenotype and environment, but we plan to develop it further.
The Environment
The SQuID world consists of a (uni or bivariate) environment that is generated for each time step and can exhibit (i) random fluctuations, (ii) temporal autocorrelation, (iii) temporal trends (e.g. phenological trends), (iv) cyclic changes (e.g. seasonal or daily fluctuations) or (v) a combination of these four types of effect, generated for each environmental variable separately (Figs 2a–c; and 3b). Environmental variables are mean and variance standardized to ease interpretation of parameters such as the intercept and variance in slopes. When two environmental variables are fitted, for simplicity, the current version of SQuID assumes that they are uncorrelated, though we appreciate that this assumption might often be invalid in real data. Environmental variables are either shared (‘general environmental effects’; Falconer & Mackay 1996) or nonshared (‘specific environmental effects’; Falconer & Mackay 1996) across simulated individuals (Fig. 2d–f). During data analysis, environmental variables can be treated as measured or unmeasured, which we refer to as ‘known’ or ‘unknown’ environmental effects, respectively. Known environmental effects represent a situation where scientists are aware of the potential causal effect of an environmental gradient and able to include it in the data analysis; in SQuID, these effects have their own explicitly defined variance component (Table 1). An unknown environmental effect has known effects on the generation of phenotypic values but is then not used in analyses, thereby representing a situation where a researcher has less prior knowledge about a particular system or logistic constraints for measuring all relevant environmental variables. Such effects typically end up in the residual variance, but they could affect some of the estimated components as well. The consequences of unmeasured environmental effects on the estimation of other fixed effects and the estimation of random effects can thus be explored in SQuID.
The Sampling Design
Within the SQuID world, users define a sampling design that is applied in order to ‘collect data’ (Figs 2g–l and 3d) and make inferences about the hypothetical world (Figs 2d–f and 3c), just as researchers collect samples to understand the real world. Time steps can be seen as continuous compared to the frequency at which the user samples the generated world. The decoupling of the creation of the virtual world and the sampling from this world is one of the core features of the SQuID environment. This decoupling allows the users to work with the datagenerating rules (i.e. the true parameters) that govern the world, the sampled data or all of the created data that could potentially be sampled. The SQuID environment allows for the simultaneous creation of multiple realizations of the world, replicates, from the same parameter setting (Fig. 3a), which facilitates simulation studies (detailed below). In addition, various sampling designs can be applied to the same world or replicate (Fig. 3d), like no (Fig. 2g–i) vs. substantial (Fig. 2j–l) amongindividual variance in timing of sampling, tailored to biological questions and practical constraints. Users can thus save different operational data sets from the same replicate. Note that we reserve the term ‘replicates’ for independent simulations, while the term ‘repeated measures’ is used to refer to assayed expressions of the phenotype by a focal simulated individual within a replicate.
A key component of the SQuID simulation environment is that it allows the traits of individuals to be sampled (repeatedly) and provides considerable flexibility as to how this sampling is done. One can, for example, determine how many individuals are sampled, how often individuals are sampled on average, and whether the number of repeated measures taken per individual is identical across individuals or variable by following a Poisson process with a constant expectation. One can also vary the amount of amongindividual variance in the timing of sampling. At one extreme, users can generate scenarios where the repeated samples from the same individual are highly clustered (e.g. Fig. 2j–l), such that some individuals are sampled repeatedly when they are ‘young’ (or early in the season), whereas other individuals are instead sampled repeatedly when they are ‘older’ (or late in the season). At the other extreme, it is possible to generate little or no amongindividual variance in the timing of sampling (e.g. Fig. 2g–i), such that all individuals, and their traits, are sampled on average (or exactly) at the same time. Importantly, full recovery of all variance components will not always be possible since two components might be completely conflated by the sampling regime (e.g. one observation per individual precludes the separation of between and withinindividual variances). After setting the sampling parameters, we have programmed SQuID to provide useful visualizations of the true and sampled phenotypes and environments at each point in time (Fig. 2).
How to cook and eat SQuID: applications
Primary Research
A major advantage of the SQuID environment is that it can be used efficiently and independently to conduct simulation studies on issues of general importance and publish standalone papers without empirical data. A large range of questions may be addressed, including those asking how to tradeoff the number of individuals vs. the number of replicates per individual, and how fitting a model to data that does not correspond to the datagenerating model biases estimates of variance components.
A general workflow to perform simulation studies using SQuID goes as follows. First, the researcher determines the simulation design, which consists of choosing the time frame of the simulation, and the number of replicate worlds (Fig. 3a). Secondly, the researcher determines the population characteristics: number of individuals in each replicate and the number of traits to study (Fig. 3a). Thirdly, the researcher defines how the environment in the SQuID world varies over time (Fig. 3b). Fourthly, the researcher defines the phenotypic equation that will determine the phenotype of each individual at each instance (Fig. 3c). At this step, the user defines the effect of the environment on the phenotype, the amount of amongindividual variation in average phenotype and level of phenotypic plasticity and the correlation between these two reaction norm components. As a final step, the researcher chooses a particular scheme to sample phenotypes from those generated in SQuID (Fig. 3d), ideally simulating potential protocols for use in the real world, and downloads the generated data sets for analysis (Fig. 3e). Note that the SQuID package can also be used without the user interface by running the function squidR(). This function could be easily included in existing r scripts and hence allows more advanced and efficient simulations.
SimulationBased Inferences
The SQuID environment also offers alternative means of interpretation for researchers that already implement the mixedeffects modelling approach into their statistical practices. The traditional approach to inference follows a linear process that starts from the design of sampling schemes and usually includes a single (or a very few) data collection step(s). During data collection, (random) samples are drawn from the base population, and statistical models are subsequently fitted to these data. Parameters of interest are then subject to biological interpretation. However, practical constraints for data collection at different hierarchical levels impose limitations for the performance of the mixedeffects modelling approach (e.g. Maas & Hox 2005). Furthermore, variance components are hard to estimate precisely and may become biased near their boundaries (e.g. nearzero variance), in particular if the sample size is small (Gelman & Hill 2007). Knowledge about such biases is essential for the interpretation of estimated parameters but, unfortunately, the sources of such biases are not always obvious in complex models and simulations are essential in order to learn about them.
Simulationbased procedures can help avoid misinterpretation and can be used either a priori or a posteriori to data collection. In the a priori phase, the researcher can use the SQuID r package to flexibly explore the consequences of alternative sampling scenarios and/or consider different phenotypic equations in a set of simulation studies. A benefit of performing simulations before any data are collected is that simulations are not constrained by practical limitations experienced in either the field or laboratory: it is thus possible to create the bestcase scenario for sampling. Experience obtained during this exploration stage can be incorporated in an actual study design, and the target sample sizes for the empirical part of the study can be appropriately determined. Therefore, the simulationconditioned sampling scheme can be used during the collection of real data, which then can be analysed with the predefined mixed model (ideally the same as the one used in the simulation study).
The SQuID environment can also be exploited in the a posteriori phase, wherein an analysis of collected data can feed back into previous simulations. This approach enables the researcher to reinvestigate the behaviour of the model by incorporating additional complexities not previously considered. By performing sets of parametric bootstrapping and sensitivity analyses, the precision and accuracy of the obtained parameter estimates will be determined relative to the distribution of parameter estimates from the model that is fitted to simulated data (and not relative to the error of parameter estimates from the same model). As a consequence, such a simulationsupported inference can lead to more objective biological conclusions that appropriately take into account the constraints of sampling and consider potentially confounding factors.
Education
The SQuID application, which can be launched by running the function squidApp(), offers educational material for those who are newcomers in the analysis of hierarchically structured data and those who want to teach mixedeffects model analysis to newcomers. Consequently, the application can be effectively implemented in both selftraining and teaching programmes on mixedeffects modelling. The tutorials in the application are organized into modules and are loosely ordered with increasing complexity from simple twolevel analysis towards models that rely on additional variance components, different environmental effects, hierarchical structures and interactions (Fig. 4). Going through the modules stepbystep permits the passive learning of the fundaments of model building. This interface also allows the user to interactively investigate the consequences of alternative input options for different components of the generated and sampled world, facilitating active learning. The generated data can be immediately visualized on the browser, but can also be externally saved and imported into statistical packages for those who wish to pursue data explorations on their own.
The exploitation of the educational material encourages researchers to appreciate the multilevel structure of phenotypes. By doing so, users will be able to formulate biological hypotheses in the form of phenotypic equations (Westneat, Wright & Dingemanse 2015) and translate these into statistical models to be analysed with actual data. Importantly, the use of the simulation interface can help understand how study design imposes constraints on the potential inferences that can be made about the world. By adopting simulationbased statistical interpretations, users will become familiar with the concepts of statistical power, accuracy and precision of estimated parameters in mixedeffects model analysis. We consider the interactive aspect of SQuID as embodying a substantially novel component in comparison with printed education materials, such as textbooks, on similar topics.
The evolution of SQuID
We expect SQuID to evolve. The SQuID environment has been created to allow learning and exploring various interesting aspects of mixedeffects models and data sampling. Among others, the current version of SQuID has three notable limitations, which will be resolved in the near future. First, trait distributions are limited to be Gaussian. Soon SQuID will allow simulations with nonGaussian trait distributions, specifically Poisson (with log and squareroot link functions) and Binomial (with logit and probit links) (Fig. 4). Secondly, environmental variables (i.e. x_{1} and x_{2}) are assumed to be uncorrelated mainly for convenience and simplicity. However, environmental variables are often, to some degree, correlated. Environmental variances are also constrained to unit variance, which we plan to retain through all modules (because otherwise the slope of the response to the environmental gradient and the variance in the environment conflate to influence the effect of the environment on phenotypes). Thirdly, when we view unsampled individuals as missing data, our basic sampling scheme is ‘missing completely at random’ (MCAR), as labelled in missing data theory (Little & Rubin 2002; Nakagawa & Freckleton 2008). This is because SQuID sampling does not depend on environmental variables (x_{1} and x_{2}). However, it is entirely possible that real sampling would be affected by (measured) environment variables (known as missing at random, MAR) or unmeasured environmental variables and/or traits of interest themselves (known as missing not at random, MNAR). A personality trait, boldness/shyness, is a good example of MNAR because shy individuals might be less likely to be sampled or more likely to be missing in the data set (Biro & Dingemanse 2009).
Additionally, the functionality of SQuID will further evolve as we extend its reach to other difficult problems, including the generation of scenarios where withinindividual residual variances differ across environments or individuals (Westneat, Schofield & Wright 2013; Cleasby, Nakagawa & Schielzeth 2015; Westneat, Wright & Dingemanse 2015), the inclusion of more complex hierarchical levels modelled by considering correlation matrices (such as relatedness matrices defined by pedigrees; Kruuk 2004; Wilson et al. 2010), the inclusion of effects of phenotypes expressed by other individuals (i.e. social environments such as parental and indirect genetics effects; McAdam, Garant & Wilson 2014), the incorporation of nonlinear responses to environments, the generation of simulations of selective mortality (van de Pol & Verhulst 2006) or consideration of individual variation in the timing of birth and death. Finally, it is possible to extend the current individualbased focus towards scenarios when other hierarchical levels (such as populations or species) are of interest. In such cases, additional sampling designs might be considered, for example when applied to situations where the number of repeats is determined by a biological predictor (Garamszegi & Møller 2011) or when correlation structures are determined by phylogeny or gene flow (Stone, Nee & Felsenstein 2011).
Acknowledgements
SQuID was conceived at the Symposium ‘Personality: Causes and Consequences of Consistent Behavioural Variation’ funded by the Volkswagen Foundation (2013), and born and pottytrained during two followup workshop at the Max Planck Institute for Ornithology (Seewiesen) funded by the Volkswagen Foundation (2014) and the International Max Planck Research School for Organismal Biology (2015). Y.G.A.A. and N.J.D were supported by the Max Planck Society, L.Z.G. by the Plan Nacional Program (CGL201570639P) and the National Research, Development and Innovation Office of Hungary (K115970), S.N. by an Australian Future Fellowship, D.R. by a Discovery Grant of the National Sciences and Engineering Research Council of Canada, H.S. by the German Research Foundation (SCHI 1188/11) and D.F.W. by the National Science Foundation of the U.S.A. Authors gratefully acknowledge feedback on an earlier version of the manuscript from Jarrod Hadfield, Sandra Hamel, Julien Martin and an anonymous reviewer.
Data accessibility
This paper does not include any data.