Volume 12, Issue 1 p. 42-53
Research Article
Open Access

Estimating heritability of social phenotypes from social networks

Reinder Radersma

Corresponding Author

Reinder Radersma

Biometris, Wageningen University & Research, Wageningen, The Netherlands


Reinder Radersma

Email: [email protected]

Search for more papers by this author
First published: 07 October 2020
Citations: 6


  1. For understanding how social behaviour evolves and responds to selection, we need to be able to accurately estimate heritability with quantitative genetic models. More recently, this has moved into using node-specific statistics from social networks as social phenotypes. However, parameter estimation can be problematic because social phenotypes are not independent observations and standard models tend to ignore the uncertainties around their estimates.
  2. Here I present a framework using latent variable modelling to account for these dependencies and uncertainties. I use edge weights, rather than node-specific network statistics, as dependent variables. From these edge weights, two types of latent (i.e. unobserved) phenotypes are estimated: the individual tendency to be social (i.e. social tendency) and the relative contribution to associations (i.e. social governance). Effects of the social environment and indirect genetic effects are accounted for in the model and can be estimated post hoc. If edge weights are a proportion (e.g. simple ratio index) their uncertainty can be accounted for by a binomial sampling process.
  3. I illustrate this method in Stan, a flexible Bayesian inference library, using a publicly available dataset on bottlenose dolphin networks. This method not only accounts for dependencies and uncertainties, it also illuminates aspects of social evolution which are not observed with standard quantitative genetic models. For instance, indirect genetic effects models predict heritable variation in sociality (21.9%), while latent variable modelling shows heritability of social tendency (28.7%), but not for social governance (0.0%). Covariates at different levels in the model (edge and node level) highlight differences in sociality between different foraging strategies and the sexes.
  4. This example shows that not properly accounting for the assumptions underlying the use of social network statistics can have misleading effects on conclusions. Although some model assumption violations are less common, others are inherit to the study of (semi)wild populations. The presented framework offers solutions for some critical assumptions and is a flexible tool to further develop and tailor to the needs of specific studies, to ensure the proper fit to the study system.


Most animals regularly engage in social interactions (Krause & Ruxton, 2002). These interactions can have profound consequences, as they may shape phenotypes and affect survival and reproduction (Frank, 1988; Wolf et al., 1998). In recent years, social network analysis has been successfully applied to animal populations, largely due to recent technological advances in animal tracking (Krause et al., 2014). Typically, the nodes of the network represent individuals and the edges are a dyadic measure of social behaviour between individuals such as an association strength. From these networks, statistics can be derived to quantify aspects of the whole network, specific clusters within the network, edges or nodes (Croft et al., 2008; Whitehead, 2008). Node-specific statistics are used to quantify aspects of individual social phenotypes that can be used to describe phenotypic variation in sociality within populations (Wey et al., 2008; Wilson et al., 2012). For many species, individuals consistently differ in network-derived social phenotypes (e.g. birds: Aplin et al., 2015; Hillemann et al., 2019; Plaza et al., 2019, mammals: Blaszczyk, 2018, fish: Jacoby et al., 2014; Krause et al., 2017 and insects: Formica et al., 2017). This raises the question whether social phenotypes have a genetic basis. Since there is currently a large interest in understanding how social behaviour evolved and is maintained (Ward & Webster, 2016), there is a drive for estimating and understanding genetic variance and heritability of these social phenotypes.

A critical issue with quantitative genetic analyses of social behaviour is the lack of independence of observations; social behaviour is only expressed in the presence of other individuals (Fuller & Hahn, 1976; Moore et al., 1997). This has two major consequences. First, a behaviour expressed by a focal individual is additionally affected by the phenotype or identity of the interactee and therefore heritability can operate via both a direct and an indirect route and indirect genetic effects are expected to be the norm (Moore et al., 1997). If not accounted for, quantitative genetic models will fail to properly estimate heritability and give improper insight with respect to the importance of the social environment (Bijma et al., 2007). Second, the measured social behaviour is an expression of the social phenotypes of both individuals and therefore not exclusively a phenotypic measurement of the focal individual, but also to some degree of its interactees (Fuller & Hahn, 1976). A non-random subset of interactees will therefore bias the social phenotypes which are estimated from network edges. Other dependencies are also important, for instance social interactions require individuals to be close in space, leading to correlations between the spatial distribution and perceived social behaviour of individuals (Radersma et al., 2017). This is particularly problematic if genotypes are not randomly distributed in space, such as occurs with limited natal dispersal, territoriality or in any kin structured population.

Various studies have attempted to estimate genetic variation in, and the heritability of, social phenotypes derived from social networks. Studies of humans (Fowler et al., 2009), yellow-bellied marmots (Marmota flaviventris; Lea et al., 2010) and rhesus macaques (Macaca mulatta; Brent et al., 2013) show considerable heritability for some social network metrics, describing both affiliative (average heritability ranging from 0.11 to 0.84) and aggressive (average heritability ranging from 0.11 to 0.66) behaviours. The human study is relatively little affected by the above-mentioned dependency issues; the authors used friendship networks that were directed and based on listings constructed outside the social context, reducing the risk of observations to be biased by the expression of phenotypes of others. However, real indirect effects might still be present. The studies of marmots and macaques potentially suffer from dependency issues. In both studies, group composition was not random—with relatedness to be higher within social groups (Brent et al., 2013; Wey & Blumstein, 2010)—making interactees a non-random subset of the populations which creates the risk of overestimating heritability. Overestimation of heritability might occur because related individuals experience a similar social environment (namely their kin; see Appendix S1, pp. 42–47 for a simulation demonstrating this), a phenomena that has been reported for individuals sharing their physical environment (e.g. Stopher et al., 2012; Van Der Jeugd & McCleery, 2002). The marmot and macaque studies used quantitative genetic models without accounting for indirect genetic effects to estimate heritability, but accounted partly for non-random interactees by introducing a group specific random effect into the models (both studies were based on multiple networks for different social groups).

Though not implemented for social phenotypes from networks to date, alternative modelling techniques are available to account for indirect effects. A class of quantitative genetic models exists which accounts for the social environment and indirect genetic effects for fixed numbers of interactees and allows all individuals within one group to affect each other to the same degree (Bijma, 2014). These models are particularly useful in experimental settings, for instance when animals are kept in pens or cages. The phenotypes of interactees are introduced into the models as covariates. Parallels can also be drawn with some maternal, paternal and parental models (Kirkpatrick & Lande, 1989) in which the phenotype of mother, father or both parents are added to the models as covariates (McAdam et al., 2014). Rather than assuming equal indirect genetic contributions by all interactees (or groups members), the contributions can also be weighted. For instance, in trees (Cappa & Cantet, 2008; Costa E Silva et al., 2013) and territorial mammals (Fisher et al., 2019) the indirect genetic effects of a fixed number of neighbours were weighted by their distance. Neighbours closer in space were assumed to have a larger effect on the phenotype of the focal individual. Instead of implementing a spatial distance matrix, a social network matrix can also be used. As such, the social network is used for understanding how the social environment is affecting a particular trait. This differs from the focus in this paper; estimating the contributions of genes and the environment to the social traits that structure the network. Moore et al. (1997) introduced models with reciprocal effects between the phenotype of the focal individual and its interactees, which is more appropriate for social phenotypes from networks. They suggested that in the case of multiple and variable numbers of interactees, each interaction can be treated as a repeated measure. These models can be very difficult to fit for social phenotypes from networks, due to the highly correlative nature of neighbouring node-specific statistics (see Appendix S1).

Here I show how latent variable modelling can be used to estimate genetic variation and heritability of social phenotypes, while accounting for the dependencies and uncertainties mentioned above. In latent variable modelling observed data are used to estimate underlying unobserved (i.e. latent) variables; in this case social interactions are used to infer the underlying social traits for the interacting individuals. This relies on an assumed relationship between the measured interactions and inferred social traits. Since individuals are interacting with multiple conspecifics, the contribution of each individual to the interaction can be estimated and hence the contribution by each interactee is assumed to be equal for all. I present the basic latent variable model first, and then discuss expansions of the model to account for edge and node specific covariates, and spatial and temporal effects. I illustrate this modelling framework with an example analysis with Stan—a Bayesian general purpose inference library (Carpenter et al., 2017)—using a publicly available dataset on bottlenose dolphins (Wild et al., 2019a) and compare latent variable modelling to standard quantitative genetic models. This example shows how heritability and genetic variation in unobserved social phenotypes are estimated. It also shows how these social phenotypes relate to node-specific network metrics, indirect genetic effects and other effects of the social environment. The example showcases how latent variable models provide additional insights into social behaviour, such as sex-specific social behaviour, which cannot be elucidated with standard quantitative genetic models.


In animal social networks, nodes represent individuals and the edges represent associations between them. Although other metrics are conceivable, edges often represent a ratio between the number of instances two individuals interacted and the number of instances either one or both individuals were observed. Different variations of these ratios exist, of which the simple ratio index (SRI) and half-weight index (HWI) are most commonly used. For more information and considerations of which index to use, see Croft et al. (2008), Whitehead (2008) or Hoppitt and Farine (2018). Edges between different pairs of individuals are assumed to be independent from each other, but this assumption is typically violated since social interactions are often also affected by others in close proximity (more on this in the discussion). From these edge weights node-specific network measures can be calculated and these can be used as social phenotypes (Figure 1a; Farine & Whitehead, 2015; Whitehead, 2008). These social phenotypes can be treated as dependent variables in quantitative genetic models to estimate their heritabilities (Figure 1b). Indirect genetic effects can be accounted for when the number of interactees is small (Moore et al., 1997) or fixed (Bijma, 2014) and all interactees are representative of the whole population (Fuller & Hahn, 1976). Maternal or parental effects models are good examples of this (McAdam et al., 2014) as are models which account for the effects of cage or pen mates (Bijma, 2014). With large numbers of interactees, these methods become impractical. Additionally, other dependencies such as spatial effects and uncertainties should be accounted for.

Details are in the caption following the image
(a) An example of a social network, in which the thickness of the edges indicates the edge weight. The size of the nodes indicates a node-specific statistic of sociality; for instance, the mean weighted degree (the mean edge weight of all connections a node has). (b) Graph representation of an indirect genetic effects model (current standard). Fixed effects such as covariates (ci) and random effects such as direct additive genetic (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0001), indirect additive genetic (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0002) and environmental effects (ei) contribute to a dependent variable (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0003), which is a network statistic calculated from the network prior to the quantitative genetics analysis. (c) Graph representation of a quantitative genetic latent variable model which is used to estimate unobserved social phenotypes for the two individuals engaged in an interaction. The social phenotypes (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0004 and urn:x-wiley:2041210X:media:mee313499:mee313499-math-0005) of both individuals, together with edge-specific covariates (cij) contribute to the edge weight (wij). Depending on the distribution of edge weight an error term (eij) can be included here if necessary (e.g. in case of normal distributed edge weights). Node-specific fixed effects such as covariates (ci and cj) and node-specific random effects such as additive genetic (ai and aj) and environmental effects (ei and ej) contribute to the social phenotype

2.1 Latent variable modelling

Here I take a latent variable approach. The rational of latent variable modelling is that observations are shaped by underlying unobserved processes. These processes can be modelled with functions that describe the assumed relationship between those observations and particular unobserved ‘latent’ variables (Beaujean, 2014; Loehlin & Beaujean, 2017). I aim to infer unobserved social phenotypes (the latent variables) that shaped the observed social interactions (Figure 1c). The relationship between observed interactions and underlying social phenotypes is highly dependent on the type of interactions and social phenotypes of interest. Here I describe a relationship between observed interactions and social phenotypes for undirected networks, but other relationships can be explored as well, even for directed networks such as aggression or grooming networks (more on this in the discussion). I treat edge weights as observations which have four underlying phenotypic latent variables—two for both the individuals connected by the edge. For both individuals, one latent variable is called social tendency and represents the willingness of individuals to interact. In the case of node-specific statistics this is equivalent to, but not the same as, mean weighted degree, which is the mean value of weights of all edges connected to an individual. The other latent variable is social governance and represents how much an individual is affecting the edge weights relative to other individuals. Estimating social governance is particularly useful when the propensity to interact is asymmetrical and mainly driven by one but not the other individual. In some cases, this will relate to some form of social dominance, but it does not have an equivalent network statistic. The relationship between edge weight and these social phenotypes is modelled by summing the social tendencies, which are weighted by their respective social governance;
In which wij is the edge weight between individual i and j, gi and gj are social governance for individual i and j respectively and si and sj are their social tendencies. g (the vector containing the social governance values for all individuals) has to be positive. If edge weight is a proportion such as the simple ratio index (SRI) we can also account for the number of observations the SRI values are calculated from. We can expect a great imbalance in the number of observations giving rise to SRI values for different dyads as typically some pairs of individuals will have many more opportunities to interact than others. Accounting for the variation in uncertainty between SRI values and letting them scale up into the estimates is therefore crucial. Rather than using SRI as edge weight, I assume that the number of times the two individuals interacted comes from a binomial distribution. The probability of interacting is a function of the social phenotypes and the total number of draws from the binomial distribution is the number of opportunities for interacting;
in which pij is the probability of an interaction occurring between individual i and j, and uij and vij being the number of interactions between them and the number of potential interactions respectively (the latter is typically the number of observations of either one or both). An important benefit of this approach is that uncertainty in edge weight wij (resulting from the finite number of potential interactions) is accounted for, and more importantly, propagates through the models into the eventual parameter estimates (e.g. additive genetic effects).
Taking this approach has other advantages as well; covariates can be added to both the edge weights and social phenotypes, to account for effects on the interactions as well as individual variation. First, I will explain how temporal and spatial overlap are accounted for, after that how covariates are introduced. The temporal overlap between two individuals can be accounted for by including in vij only instances for which both individuals were present at the study site (e.g. Armansin et al., 2016; Leu et al., 2010). If we want to account for spatial overlap, we can also introduce an overlap parameter, for instance a proportion of overlap between home ranges. The probability of interacting is multiplied by this factor.
in which oij is the proportion of overlap of the home ranges. Note that in territorial animals, a lack of home range overlap might be the result of individuals actively avoiding each other, rather than meaning that they did not have the opportunity to interact. Therefore in territorial animals correcting for home-range overlap might be undesirable, since home-range overlap is potentially part of the phenotype of interest. A more appropriate measure in that case could be the proportion of shared territory borders. The approach presented here can flexibly accommodate species biology.
In a similar fashion we can account for edge-specific covariates, but care must be taken that pij should vary between 0 and 1. To accommodate this, working on the logit (or probit) scale is useful. Edge specific covariates are for instance, useful when the data come from multiple networks (e.g. multiple populations or multiple years) to correct for population or year effects. In the example, I will account for foraging strategy and sex, assuming that the probability of two dolphins interacting depends on their resemblance with respect to those two traits. The covariate effects can be added to the effects of social phenotypes. The probability of interaction pij then becomes:
with xeij being a row vector containing the covariates for the edge between i and j and βe being the vector with the effect sizes for the edge covariates.
Next, the quantitative genetic parts of the model need to be defined. By using the logit (or probit) transformation, social tendency can be drawn from a normal distribution. Social governance is drawn from a log-normal distribution to ensure positive values. Since social governance is a relative value and its absolute values do not have any meaning, both the mean and standard variation should be conditioned. This can be done in various ways; for instance, by setting the intercept to 0 and the standard deviation for residuals (i.e. the environmental effects; σge) to 1 or alternatively set the sum of all variance components to 1 (as I will do in the example). Social tendency s and social governance g are estimated by summing covariate, additive genetic and environmental effects. Additive genetic effects follow a normal distribution but are sampled from a multivariate normal distribution to account for the correlating additive genetic effects for relatives, while the environmental effects come from a univariate normal distribution;
in which xsi and xgi are the node level covariates for individual i for social tendency and social governance respectively, βs and βg are the effect sizes for the covariates, as and ag are the additive genetic effects, es and eg are the environmental effects, A is the relatedness matrix and σsa, σse, σga and σge are the additive genetic and environmental standard deviations.

2.2 Indirect genetic and social environment effects

Since edge weights are decomposed into social phenotypes of two individuals, which are in turn decomposed into genetic and environmental contributions, there are no immediate estimates for indirect effects such as the effects of the social environment and indirect genetic effects on the social phenotypes. To gain insight into the level of influence on others' social behaviour and to estimate the relative contribution of indirect genetic effects to phenotypes, I make use of the additive nature of the various components in which edge weights were decomposed. First, I take all edges of a focal individual and treating its own genetic and environmental contributions to the edge weights as direct effects and its contributions to the social phenotypes of the interactees as indirect effects. Next, I calculate node-specific statistics for each component separately to estimate their relative contributions. Here I calculate the contributions for mean weighted degree, because it is one of the easiest and most biologically straightforward statistics (many social network metrices are highly correlated with mean weighted degree) and it relates clearly to the edge weights (Figure 2). Other node-specific statistics (such as centrality or disparity statistics) can be calculated as well, however, care must be taken when these statistics involve nonlinear transformations. Another important consideration is that edge weight is transformed to the logit scale, which eases further analysis, because the distribution of edge weight becomes normal, but has consequences for the interpretation of the results.

Details are in the caption following the image
Calculating indirect genetic and social environment effects from latent variable model output. (a) For all individuals additive genetic and environmental effects for social tendency are estimated (here the values for three individuals from one iteration are depicted). Individual i is the focal individual in this example. (b) For two interacting individuals, the additive genetic and environmental effects are weighted by social governance (g) to calculate their genetic and environmental contributions to edge weight (Equation 15). Values for the focal individual are treated as direct effects, while the values for the interactee are the indirect effects. (c) The direct and indirect components are used to calculate node-specific statistics, for example, weighted degree—the mean edge weight for all connected edges (Equations 16-19). (d) From these values the variances of the components are estimated (Equations 20-24). Here the indirect additive genetic and indirect environment effects are summed to the social environment effect
Indirect genetic and social environment effects can be estimated for node-specific statistics by using the breeding values (additive genetic effects) and environment effects of social tendency. We can decompose edge weight on the logit scale urn:x-wiley:2041210X:media:mee313499:mee313499-math-0018 into a part for the focal individual i and a part for the interactee j by weighting their respective social tendencies by their relative social governance;
Next, we can decompose social tendency of the focal individual si and the interactee sj into their respective fixed and random effects with Equation 7. Note that for the purpose of simplifying annotation, I drop the subscript s from as, es, Xs and βs, since the genetic effects and the environmental effects will always refer to social tendency. Edge weight on the logit scale urn:x-wiley:2041210X:media:mee313499:mee313499-math-0020 can be calculated as,
Since all components are additive, we can now separate the additive genetic and environmental effects of the focal individual from the indirect effects coming from the interactees—the indirect genetic and indirect environmental effects. Equation 14 can be rewritten as,
From the additive components in Equation 15, I calculate their contributions to node-specific statistics, the same way as they would do for edge weights, as long as the calculation of the node-specific statistic does not require any nonlinear transformations. Since mean weighted degree is the mean of all weights of the edges connected to the focal individual i, the (direct) additive genetic effect urn:x-wiley:2041210X:media:mee313499:mee313499-math-0023 and the (direct) environmental effect urn:x-wiley:2041210X:media:mee313499:mee313499-math-0024 can be calculated by taking the average of their contributions to all edges individual i is involved in:
in which Ji is the number of interactees for individual i. For the indirect genetic effect urn:x-wiley:2041210X:media:mee313499:mee313499-math-0027 and the indirect environmental effect urn:x-wiley:2041210X:media:mee313499:mee313499-math-0028 the contributions of individual i to the weighted degree of its interactees are calculated:
in which Jj is the number of interactees for individual j. Now the variances for direct additive genetic effects urn:x-wiley:2041210X:media:mee313499:mee313499-math-0031, direct environmental effects urn:x-wiley:2041210X:media:mee313499:mee313499-math-0032, indirect additive genetic effects urn:x-wiley:2041210X:media:mee313499:mee313499-math-0033 the social environment urn:x-wiley:2041210X:media:mee313499:mee313499-math-0034 and total genetic effects urn:x-wiley:2041210X:media:mee313499:mee313499-math-0035 for the whole population can be calculated as
Heritabilities at the level of the social phenotypes can be calculated directly from the posterior distributions by dividing the additive genetic variance by the additive genetic variance plus the environmental variance for both social tendency (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0041) and social governance (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0042):
The mean weighted degree heritability (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0045) and total heritability (proportion of the total phenotypic variance attributed to direct and indirect additive genetic variance; urn:x-wiley:2041210X:media:mee313499:mee313499-math-0046) can be calculated from the previously calculated variance components:

2.3 Empirical example

I illustrate the implementation of this model with a publicly available dataset of bottlenose dolphins Tursiops aduncus. With this dataset collected over 12 years, Wild et al. (2019b) showed that the use of sponges as foraging tools is culturally transmitted from mother to offspring. The authors compared multiple networks including social and genetic networks. Here I use their data for another purpose; I estimate the heritability of social network-derived phenotypes. I use the ‘horizontal’ social network, which contains 22,994 association strengths between 243 individuals and is based on the observation of 4,476 groups (group size range: 1–24, median 2). I use the genetic network as a relatedness matrix and sex and foraging strategy as covariates. Foraging strategy is a binary trait, with individuals being ‘spongers’ when using sponges while foraging or otherwise ‘non-spongers’. First, I built four standard quantitative genetic models predicting logit-transformed weighted degree, based on methods from Zhao et al. (2018). (a) A null model without covariates (b) A model with sex and foraging strategy as covariates. (c) A model with sex, foraging strategy and the average of the weighted degrees of all interactees as covariates. This model is a variation on maternal effects models with a trait-based approach (Kirkpatrick & Lande, 1989) in which the effect of the mother's phenotype is replaced by the mean phenotype of all interactees. (d) I also built an indirect genetic effects model, in which I account for genetic effects from the five conspecifics with the largest home-range overlap and weigh their contribution to the focal phenotype by their home-range overlap (see Appendix S1). Of these four models, I selected the best fitting one with leave-one-out cross validation using Pareto-smoothed importance sampling (Vehtari et al., 2017), which is an approximation of exact leave-one-out cross validation—in this case at the individual level. Next I use latent variable modelling to estimate the heritability of social tendency and governance. I add sex and foraging strategy as covariates at the node level and whether sex and foraging strategy were the same or differed for interactees at the edge level. This model cannot be compared to the other models with leave-one-out cross validation, because different dependent variables are used. I compare to the variance components and heritability estimates of the best standard model to those of the latent variable model.

2.4 Validation

To test whether the latent variable model is correctly estimating variance components, I performed a simulation study based on the empirical example. Details and code of the simulations can be found in Appendix S1. In short, I simulated social tendency and governance for all individuals by drawing new values for additive genetic and environmental effects under different levels of inheritance (using the relatedness structure of the example). From these, I calculated for all edges in the social network new values for the probability to interact (Equation 6). Next I drew for each of these edges a new number of interactions from a binomial distribution, based on the simulated probabilities to interact and the numbers of opportunities to interact from the example (Equation 3). I produced three simulations for all combinations of heritability of social tendency (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0049) and heritability of social governance urn:x-wiley:2041210X:media:mee313499:mee313499-math-0050 and ran the latent variable model to estimate heritabilities (total of 36 simulations, each for 1 thread, 5,000 iterations of which 4,000 were burn-in).


3.1 Standard models

The best fitting standard quantitative genetic model is the indirect genetic effects model (Appendix S1). Heritability for mean weighted degree was 21.9% (95% credible intervals [CI] = 11.4%–31.5%; Figure 3a), while the total heritable variance (direct plus indirect additive genetic variance) was 67.7% (CI = 50.3%–82.6%; Figure 3a). Mean weighted degree was affected by both sex and foraging strategies; males had a higher weighted degree as did the non-spongers (Figure 4). The direct additive genetic effect correlated strongly with indirect additive genetic effects (ρ = 0.87; CI = 0.67–0.98). This is perhaps not surprising, because dolphins are observed in groups—rather than pairs—resulting in a lot of overlap between the interactees for individuals which have a strong association in the network.

Details are in the caption following the image
Heritability estimates for (a) mean weighted degree in the best fitting standard model, (b) for social tendency and governance in the latent variable model and (c) the proportions of variance explained by direct additive genetic, direct environmental, indirect additive genetic and the total genetic for mean weighted degree by the latent variable model
Details are in the caption following the image
Effects sizes for covariates in the best standard quantitative genetic model. The effect for sex indicates the difference for males relative to females and the effect of foraging strategy difference for spongers relative to non-spongers

3.2 Latent variable model

The latent variable model shows substantial additive genetic variation for social tendency (h2 = 28.7%; CI = 6.3%–50.4%; Figure 3b), but not for social governance (h2 = 2.9 × 10−4%; CI = 1.6 × 10−7–1.9 × 10−3%; Figure 3b). For social tendency, the effects of sex were in the same direction as for weighted degree in the standard model, with males having higher social tendency (Figure 5a). Interestingly spongers had higher social tendency than non-spongers, which contradicts the findings of the standard model. Females and spongers had higher social governance (Figure 5b). At the level of edges, the probability of interacting was highest between same sex individuals with the same foraging strategy. Both differing sex and foraging strategy decreased the probability of interacting, with the latter having the largest negative effect (Figure 5c). The lower probability of interaction between different foraging strategies might explain why spongers had lower social tendency in the standard model. Spongers were in the minority (17.7%), meaning that their mean weighted degree would be disproportionally affected by edge weights between differing foraging strategies, which were lower than for edges between same foraging strategies.

Details are in the caption following the image
Effects sizes for covariates in the latent variable quantitative genetic model for (a) social tendency, (b) social governance and (c) edges. In (a) and (b) the effect for sex indicates the difference for males relative to females and the effect of foraging strategy difference for spongers relative to non-spongers. In (c) each combination of same or differing sexes and same or differing forage strategies are plotted

3.3 Social environment and indirect genetic effects

Indirect effects on social tendency or weighted degree on the logit scale are substantial; the social environment accounted for (47.5%; CI = 45.3%–49.3%) of the variation. (Direct) heritability was relative low (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0051%; CI = 4.2%–23.6%) as was the indirect heritability (urn:x-wiley:2041210X:media:mee313499:mee313499-math-0052%; CI = 4.1%–22.3%, Figure 3c). Total additive genetic variance was very similar to the additive genetic variance of social tendency τ2 = 28.7%; CI = 8.4%–45.7%. This indicates—together with the fact that the social environment accounts for about half the total variance—that in this example population structure did not have much effect on the variance estimates.

3.4 Validation

For social tendency, the latent variable model quite accurately estimated heritability, since the median estimates are close to the simulated values and are all well within the 95% credible intervals, though the posterior distributions are quite wide (Figure 6). For social governance the model estimated heritability well for a heritability of zero and for high heritabilities (0.8, Figure 6a,b,c,j,k,l), but not for intermediate heritability (0.3 and 0.6). For intermediate heritability the model is overconservative, estimating heritability to be close to 0 (Figure 6d,e,f,g,h,i).

Details are in the caption following the image
Posterior distributions for heritability of social tendency (x-axes) and social governance (y-axes) for validation simulations. Lighter colour means higher density. Red dashed lines indicate the anticipated heritabilities, while the red crosses indicate the achieved heritabilities for the three replicates per heritability combination. Distributions are the combined posterior distributions of the three replicates


Here I showed that latent variable modelling can be used to estimate heritability and indirect genetic effects for phenotypes derived from social networks. Other than in standard quantitative genetic models, dependencies between the individual's network positions and their uncertainties are accounted for. Rather than taking node-specific statistics as dependent variables, edge weights were used. The relative contributions of individuals to these edge weights were estimated and used for estimating indirect effects, such as indirect genetic effects and social environment effects. This method takes indirect effects into account and, if necessary, spatial and temporal effects as well. I show with an example that—since the model estimates different aspects of social behaviour—further biological insights are gained. I will sum up the advantages and disadvantages of using quantitative genetic latent variable models in particular, the use of Bayesian modelling framework in general, and finally give some recommendations for using this methodology.

4.1 Latent variable modelling

As pointed out before, with latent variable modelling some important dependencies in social network data are accounted for, indirect effects can be estimated and additional insights into the underlying biological phenomena can be gained. Here I estimated indirect effects for mean weighted degree as social phenotype. However, other descriptors of network position might better capture the social phenotype of interest. For instance, modelling centrality indices would give insight in the genetic basis of the role individuals can play in the spread of information or disease. Estimating indirect effects for social phenotypes other than mean weighted degree is feasible, but the assumption that components (genetic, environmental, social environmental effects etc.) are additive restricts this method to social network measures that do not require nonlinear transformations (e.g. local clustering coefficient, Barrat et al., 2004 or disparity, Whitehead, 2008, p. 175). It must also be noted that the chosen structure of the models comes with assumptions of the underlying biological mechanisms, but by cross-validating various models, alternative hypotheses can be explored. A drawback of using the more complex latent variable models over standard quantitative genetic models is of course the higher number of degrees of freedom taken by the model. As a consequence, more data are needed to get reliable parameter estimates. Because of the many dependencies in social network data, the predictive capabilities of the models do not linearly increase with more data (Farine & Strandburg-Peshkin, 2015; Sánchez-Tójar et al., 2018). With some assumptions, models can be simplified in case the information carried by the data is not sufficient. For instance, by assuming that all individuals contribute equally to the edges weight, the social governance for all individuals can be set to one and the number of degrees of freedom increases substantially. As the validation simulations show, genetic effects for social governance are difficult to estimate in the example, which would argue for simplifying the model. Alternatively, informative priors can be used to restrict parameter space to sensible values and therefore improving model convergence (Lemoine, 2019).

4.2 Heritability

With the breeder's equation, heritability can be used to predict a population's response to selection (Walsh & Lynch, 2018). Indirect genetic effects complicate these predictions as they provide an alternative route of inheritance (Wolf et al., 1998). This implies that for estimating the response to selection not only direct additive genetic variance, but also indirect additive genetic variance and their covariance should be accounted for (Bijma et al., 2007). In case of the latent variable model, heritability is estimated at the level of social phenotypes. Heritability can therefore not be directly used to predict changes due to selection in social network structure and the position of nodes. However, since all parameters of the latent variable model can be stored in Stan, these changes can be estimated by simulating social networks after a selection event. For instance, imposing differential fitness effects on edge weights, the distribution of social phenotypes in the population will be modified and its consequence for the social network structure can be monitored with network statistics.

4.3 Indirect genetic and common environment effects

The presence of indirect genetic effects for social network traits is not surprising; as stated in the introduction, social network traits are measured within and shaped by a specific social context (Fuller & Hahn, 1976). By definition, interactees affect social network measures, but whether individuals truly affect each other's underyling social phenotypes remains to be explored. What can be tested is whether prior social interactions affect the current social phenotypes of individuals, making indirect genetic effects on social interactions a potential route for inheritance. In various animal species, the social environment in early life affects social phenotypes later in life (e.g. McDonald, 2007; Shimada & Sueur, 2018; Van Den Berg et al., 1999). Modelling the effects of prior social interactions on the social phenotype can be done by implementing indirect genetic effects in the quantitative genetic part of the latent variable model. Not the social network for which the underlying social phenotypes are estimated, but a different social network should in that case be chosen for estimating indirect genetic effects, for instance the social network of a previous year or from early life. Similarly common environment effects can be accounted for by assuming spatial autocorrelation. Rather than drawing the environmental effects on the social phenotypes independently from a normal distribution, one can assume correlations based on, for instance, territory overlap. Most types of social interactions are, however, based on spatial proximity and therefore correcting for spatial autocorrelation potentially affects the estimates of variation in social phenotypes.

4.4 Modelling framework

Using a flexible inference library like Stan has some clear benefits. It uses efficient algorithms for exploring parameter space and the model structure can be tailored to specifically suit the study (McElreath, 2016; de Villemereuil, 2019). Here I present an example estimating latent social phenotypes from dyads, but interactions can, and often will, involve more than two individuals. To analyse interactions of groups larger than two, equation 1 can be modified by adding more individuals. If necessary, equation 1 can be replaced by an equation which better reflects the biology of the study system. For example, Mueller et al. (2013) studied social learning of migration routes in cranes and found that deviations from a straight path were not genetically determined, but depended on the age of the oldest individual in the group. Analysing this data with the latent variable approach is possible by introducing all group members into Equation 1 (allowing for variable group sizes) and estimating social governance as a function of experience (or age). Similarly, directed networks, in which social interactions are directed from one individual towards another (e.g. aggression, food-sharing, grooming), can also be implemented by modifying Equation 1. In the presented model for undirected networks, the social phenotypes (social tendency and social governance) are the same for both interactees, but for directed networks different social phenotypes for the initiators and receivers of the social interactions can be estimated. For instance, rather than social tendency, social phenotypes could be attacking and victimizing or giving and receiving food. Edge weight would be a function of social phenotypes for the initiator (e.g. aggressiveness or probability to share food) and social phenotypes for the receiver (e.g. probability of being victimized and probability to accept food). For all individuals, social phenotypes as an initiator and receiver will be estimated based on their interactions and the interactions of their kin.

4.5 Recommendations

When applied to social phenotypes from networks, quantitative genetic latent variable models offer solutions for some critical violations inherent in standard quantitative genetic models for social network data. Although latent variable models are data hungry, the example shows that they can be applied to long-term datasets. Some suggestions for model simplifications are discussed in this paper. Example code can be found in Appendix S1 (for latent variable models) and elsewhere (for linear mixed modelling: McElreath, 2016; for quantitative genetic models: Zhao et al., 2018) and can be adjusted to study-specific needs.


I am grateful to Ben Sheldon's Social Network Group (supported by ERC grant AdG 250164) and Colin Garroway for discussions on the methodology and their comments on the manuscript. I thank Julien Martin, Alfredo Sánchez-Tójar, Sebastian Sosa and Alastair Wilson for their constructive comments. The computations were in part performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at Lunarc at Lund University and High Performance Computing Cluster Anunna at Wageningen University & Research.


    The code used here is available in Appendix S1. The data was previously published and are available in the Dryad Digital Repository https://doi.org/10.5061/dryad.sc26m6c (Wild et al., 2019a). Code and data objects used for analyses can be found in the Open Science Framework https://doi.org/10.17605/osf.io/kzcgv.