Estimating heritability of social phenotypes from social networks
Abstract
- For understanding how social behaviour evolves and responds to selection, we need to be able to accurately estimate heritability with quantitative genetic models. More recently, this has moved into using node-specific statistics from social networks as social phenotypes. However, parameter estimation can be problematic because social phenotypes are not independent observations and standard models tend to ignore the uncertainties around their estimates.
- Here I present a framework using latent variable modelling to account for these dependencies and uncertainties. I use edge weights, rather than node-specific network statistics, as dependent variables. From these edge weights, two types of latent (i.e. unobserved) phenotypes are estimated: the individual tendency to be social (i.e. social tendency) and the relative contribution to associations (i.e. social governance). Effects of the social environment and indirect genetic effects are accounted for in the model and can be estimated post hoc. If edge weights are a proportion (e.g. simple ratio index) their uncertainty can be accounted for by a binomial sampling process.
- I illustrate this method in Stan, a flexible Bayesian inference library, using a publicly available dataset on bottlenose dolphin networks. This method not only accounts for dependencies and uncertainties, it also illuminates aspects of social evolution which are not observed with standard quantitative genetic models. For instance, indirect genetic effects models predict heritable variation in sociality (21.9%), while latent variable modelling shows heritability of social tendency (28.7%), but not for social governance (0.0%). Covariates at different levels in the model (edge and node level) highlight differences in sociality between different foraging strategies and the sexes.
- This example shows that not properly accounting for the assumptions underlying the use of social network statistics can have misleading effects on conclusions. Although some model assumption violations are less common, others are inherit to the study of (semi)wild populations. The presented framework offers solutions for some critical assumptions and is a flexible tool to further develop and tailor to the needs of specific studies, to ensure the proper fit to the study system.
1 INTRODUCTION
Most animals regularly engage in social interactions (Krause & Ruxton, 2002). These interactions can have profound consequences, as they may shape phenotypes and affect survival and reproduction (Frank, 1988; Wolf et al., 1998). In recent years, social network analysis has been successfully applied to animal populations, largely due to recent technological advances in animal tracking (Krause et al., 2014). Typically, the nodes of the network represent individuals and the edges are a dyadic measure of social behaviour between individuals such as an association strength. From these networks, statistics can be derived to quantify aspects of the whole network, specific clusters within the network, edges or nodes (Croft et al., 2008; Whitehead, 2008). Node-specific statistics are used to quantify aspects of individual social phenotypes that can be used to describe phenotypic variation in sociality within populations (Wey et al., 2008; Wilson et al., 2012). For many species, individuals consistently differ in network-derived social phenotypes (e.g. birds: Aplin et al., 2015; Hillemann et al., 2019; Plaza et al., 2019, mammals: Blaszczyk, 2018, fish: Jacoby et al., 2014; Krause et al., 2017 and insects: Formica et al., 2017). This raises the question whether social phenotypes have a genetic basis. Since there is currently a large interest in understanding how social behaviour evolved and is maintained (Ward & Webster, 2016), there is a drive for estimating and understanding genetic variance and heritability of these social phenotypes.
A critical issue with quantitative genetic analyses of social behaviour is the lack of independence of observations; social behaviour is only expressed in the presence of other individuals (Fuller & Hahn, 1976; Moore et al., 1997). This has two major consequences. First, a behaviour expressed by a focal individual is additionally affected by the phenotype or identity of the interactee and therefore heritability can operate via both a direct and an indirect route and indirect genetic effects are expected to be the norm (Moore et al., 1997). If not accounted for, quantitative genetic models will fail to properly estimate heritability and give improper insight with respect to the importance of the social environment (Bijma et al., 2007). Second, the measured social behaviour is an expression of the social phenotypes of both individuals and therefore not exclusively a phenotypic measurement of the focal individual, but also to some degree of its interactees (Fuller & Hahn, 1976). A non-random subset of interactees will therefore bias the social phenotypes which are estimated from network edges. Other dependencies are also important, for instance social interactions require individuals to be close in space, leading to correlations between the spatial distribution and perceived social behaviour of individuals (Radersma et al., 2017). This is particularly problematic if genotypes are not randomly distributed in space, such as occurs with limited natal dispersal, territoriality or in any kin structured population.
Various studies have attempted to estimate genetic variation in, and the heritability of, social phenotypes derived from social networks. Studies of humans (Fowler et al., 2009), yellow-bellied marmots (Marmota flaviventris; Lea et al., 2010) and rhesus macaques (Macaca mulatta; Brent et al., 2013) show considerable heritability for some social network metrics, describing both affiliative (average heritability ranging from 0.11 to 0.84) and aggressive (average heritability ranging from 0.11 to 0.66) behaviours. The human study is relatively little affected by the above-mentioned dependency issues; the authors used friendship networks that were directed and based on listings constructed outside the social context, reducing the risk of observations to be biased by the expression of phenotypes of others. However, real indirect effects might still be present. The studies of marmots and macaques potentially suffer from dependency issues. In both studies, group composition was not random—with relatedness to be higher within social groups (Brent et al., 2013; Wey & Blumstein, 2010)—making interactees a non-random subset of the populations which creates the risk of overestimating heritability. Overestimation of heritability might occur because related individuals experience a similar social environment (namely their kin; see Appendix S1, pp. 42–47 for a simulation demonstrating this), a phenomena that has been reported for individuals sharing their physical environment (e.g. Stopher et al., 2012; Van Der Jeugd & McCleery, 2002). The marmot and macaque studies used quantitative genetic models without accounting for indirect genetic effects to estimate heritability, but accounted partly for non-random interactees by introducing a group specific random effect into the models (both studies were based on multiple networks for different social groups).
Though not implemented for social phenotypes from networks to date, alternative modelling techniques are available to account for indirect effects. A class of quantitative genetic models exists which accounts for the social environment and indirect genetic effects for fixed numbers of interactees and allows all individuals within one group to affect each other to the same degree (Bijma, 2014). These models are particularly useful in experimental settings, for instance when animals are kept in pens or cages. The phenotypes of interactees are introduced into the models as covariates. Parallels can also be drawn with some maternal, paternal and parental models (Kirkpatrick & Lande, 1989) in which the phenotype of mother, father or both parents are added to the models as covariates (McAdam et al., 2014). Rather than assuming equal indirect genetic contributions by all interactees (or groups members), the contributions can also be weighted. For instance, in trees (Cappa & Cantet, 2008; Costa E Silva et al., 2013) and territorial mammals (Fisher et al., 2019) the indirect genetic effects of a fixed number of neighbours were weighted by their distance. Neighbours closer in space were assumed to have a larger effect on the phenotype of the focal individual. Instead of implementing a spatial distance matrix, a social network matrix can also be used. As such, the social network is used for understanding how the social environment is affecting a particular trait. This differs from the focus in this paper; estimating the contributions of genes and the environment to the social traits that structure the network. Moore et al. (1997) introduced models with reciprocal effects between the phenotype of the focal individual and its interactees, which is more appropriate for social phenotypes from networks. They suggested that in the case of multiple and variable numbers of interactees, each interaction can be treated as a repeated measure. These models can be very difficult to fit for social phenotypes from networks, due to the highly correlative nature of neighbouring node-specific statistics (see Appendix S1).
Here I show how latent variable modelling can be used to estimate genetic variation and heritability of social phenotypes, while accounting for the dependencies and uncertainties mentioned above. In latent variable modelling observed data are used to estimate underlying unobserved (i.e. latent) variables; in this case social interactions are used to infer the underlying social traits for the interacting individuals. This relies on an assumed relationship between the measured interactions and inferred social traits. Since individuals are interacting with multiple conspecifics, the contribution of each individual to the interaction can be estimated and hence the contribution by each interactee is assumed to be equal for all. I present the basic latent variable model first, and then discuss expansions of the model to account for edge and node specific covariates, and spatial and temporal effects. I illustrate this modelling framework with an example analysis with Stan—a Bayesian general purpose inference library (Carpenter et al., 2017)—using a publicly available dataset on bottlenose dolphins (Wild et al., 2019a) and compare latent variable modelling to standard quantitative genetic models. This example shows how heritability and genetic variation in unobserved social phenotypes are estimated. It also shows how these social phenotypes relate to node-specific network metrics, indirect genetic effects and other effects of the social environment. The example showcases how latent variable models provide additional insights into social behaviour, such as sex-specific social behaviour, which cannot be elucidated with standard quantitative genetic models.
2 MATERIALS AND METHODS
In animal social networks, nodes represent individuals and the edges represent associations between them. Although other metrics are conceivable, edges often represent a ratio between the number of instances two individuals interacted and the number of instances either one or both individuals were observed. Different variations of these ratios exist, of which the simple ratio index (SRI) and half-weight index (HWI) are most commonly used. For more information and considerations of which index to use, see Croft et al. (2008), Whitehead (2008) or Hoppitt and Farine (2018). Edges between different pairs of individuals are assumed to be independent from each other, but this assumption is typically violated since social interactions are often also affected by others in close proximity (more on this in the discussion). From these edge weights node-specific network measures can be calculated and these can be used as social phenotypes (Figure 1a; Farine & Whitehead, 2015; Whitehead, 2008). These social phenotypes can be treated as dependent variables in quantitative genetic models to estimate their heritabilities (Figure 1b). Indirect genetic effects can be accounted for when the number of interactees is small (Moore et al., 1997) or fixed (Bijma, 2014) and all interactees are representative of the whole population (Fuller & Hahn, 1976). Maternal or parental effects models are good examples of this (McAdam et al., 2014) as are models which account for the effects of cage or pen mates (Bijma, 2014). With large numbers of interactees, these methods become impractical. Additionally, other dependencies such as spatial effects and uncertainties should be accounted for.






2.1 Latent variable modelling












2.2 Indirect genetic and social environment effects
Since edge weights are decomposed into social phenotypes of two individuals, which are in turn decomposed into genetic and environmental contributions, there are no immediate estimates for indirect effects such as the effects of the social environment and indirect genetic effects on the social phenotypes. To gain insight into the level of influence on others' social behaviour and to estimate the relative contribution of indirect genetic effects to phenotypes, I make use of the additive nature of the various components in which edge weights were decomposed. First, I take all edges of a focal individual and treating its own genetic and environmental contributions to the edge weights as direct effects and its contributions to the social phenotypes of the interactees as indirect effects. Next, I calculate node-specific statistics for each component separately to estimate their relative contributions. Here I calculate the contributions for mean weighted degree, because it is one of the easiest and most biologically straightforward statistics (many social network metrices are highly correlated with mean weighted degree) and it relates clearly to the edge weights (Figure 2). Other node-specific statistics (such as centrality or disparity statistics) can be calculated as well, however, care must be taken when these statistics involve nonlinear transformations. Another important consideration is that edge weight is transformed to the logit scale, which eases further analysis, because the distribution of edge weight becomes normal, but has consequences for the interpretation of the results.
































2.3 Empirical example
I illustrate the implementation of this model with a publicly available dataset of bottlenose dolphins Tursiops aduncus. With this dataset collected over 12 years, Wild et al. (2019b) showed that the use of sponges as foraging tools is culturally transmitted from mother to offspring. The authors compared multiple networks including social and genetic networks. Here I use their data for another purpose; I estimate the heritability of social network-derived phenotypes. I use the ‘horizontal’ social network, which contains 22,994 association strengths between 243 individuals and is based on the observation of 4,476 groups (group size range: 1–24, median 2). I use the genetic network as a relatedness matrix and sex and foraging strategy as covariates. Foraging strategy is a binary trait, with individuals being ‘spongers’ when using sponges while foraging or otherwise ‘non-spongers’. First, I built four standard quantitative genetic models predicting logit-transformed weighted degree, based on methods from Zhao et al. (2018). (a) A null model without covariates (b) A model with sex and foraging strategy as covariates. (c) A model with sex, foraging strategy and the average of the weighted degrees of all interactees as covariates. This model is a variation on maternal effects models with a trait-based approach (Kirkpatrick & Lande, 1989) in which the effect of the mother's phenotype is replaced by the mean phenotype of all interactees. (d) I also built an indirect genetic effects model, in which I account for genetic effects from the five conspecifics with the largest home-range overlap and weigh their contribution to the focal phenotype by their home-range overlap (see Appendix S1). Of these four models, I selected the best fitting one with leave-one-out cross validation using Pareto-smoothed importance sampling (Vehtari et al., 2017), which is an approximation of exact leave-one-out cross validation—in this case at the individual level. Next I use latent variable modelling to estimate the heritability of social tendency and governance. I add sex and foraging strategy as covariates at the node level and whether sex and foraging strategy were the same or differed for interactees at the edge level. This model cannot be compared to the other models with leave-one-out cross validation, because different dependent variables are used. I compare to the variance components and heritability estimates of the best standard model to those of the latent variable model.
2.4 Validation
To test whether the latent variable model is correctly estimating variance components, I performed a simulation study based on the empirical example. Details and code of the simulations can be found in Appendix S1. In short, I simulated social tendency and governance for all individuals by drawing new values for additive genetic and environmental effects under different levels of inheritance (using the relatedness structure of the example). From these, I calculated for all edges in the social network new values for the probability to interact (Equation 6). Next I drew for each of these edges a new number of interactions from a binomial distribution, based on the simulated probabilities to interact and the numbers of opportunities to interact from the example (Equation 3). I produced three simulations for all combinations of heritability of social tendency () and heritability of social governance
and ran the latent variable model to estimate heritabilities (total of 36 simulations, each for 1 thread, 5,000 iterations of which 4,000 were burn-in).
3 RESULTS
3.1 Standard models
The best fitting standard quantitative genetic model is the indirect genetic effects model (Appendix S1). Heritability for mean weighted degree was 21.9% (95% credible intervals [CI] = 11.4%–31.5%; Figure 3a), while the total heritable variance (direct plus indirect additive genetic variance) was 67.7% (CI = 50.3%–82.6%; Figure 3a). Mean weighted degree was affected by both sex and foraging strategies; males had a higher weighted degree as did the non-spongers (Figure 4). The direct additive genetic effect correlated strongly with indirect additive genetic effects (ρ = 0.87; CI = 0.67–0.98). This is perhaps not surprising, because dolphins are observed in groups—rather than pairs—resulting in a lot of overlap between the interactees for individuals which have a strong association in the network.


3.2 Latent variable model
The latent variable model shows substantial additive genetic variation for social tendency (h2 = 28.7%; CI = 6.3%–50.4%; Figure 3b), but not for social governance (h2 = 2.9 × 10−4%; CI = 1.6 × 10−7–1.9 × 10−3%; Figure 3b). For social tendency, the effects of sex were in the same direction as for weighted degree in the standard model, with males having higher social tendency (Figure 5a). Interestingly spongers had higher social tendency than non-spongers, which contradicts the findings of the standard model. Females and spongers had higher social governance (Figure 5b). At the level of edges, the probability of interacting was highest between same sex individuals with the same foraging strategy. Both differing sex and foraging strategy decreased the probability of interacting, with the latter having the largest negative effect (Figure 5c). The lower probability of interaction between different foraging strategies might explain why spongers had lower social tendency in the standard model. Spongers were in the minority (17.7%), meaning that their mean weighted degree would be disproportionally affected by edge weights between differing foraging strategies, which were lower than for edges between same foraging strategies.

3.3 Social environment and indirect genetic effects
Indirect effects on social tendency or weighted degree on the logit scale are substantial; the social environment accounted for (47.5%; CI = 45.3%–49.3%) of the variation. (Direct) heritability was relative low (%; CI = 4.2%–23.6%) as was the indirect heritability (
%; CI = 4.1%–22.3%, Figure 3c). Total additive genetic variance was very similar to the additive genetic variance of social tendency τ2 = 28.7%; CI = 8.4%–45.7%. This indicates—together with the fact that the social environment accounts for about half the total variance—that in this example population structure did not have much effect on the variance estimates.
3.4 Validation
For social tendency, the latent variable model quite accurately estimated heritability, since the median estimates are close to the simulated values and are all well within the 95% credible intervals, though the posterior distributions are quite wide (Figure 6). For social governance the model estimated heritability well for a heritability of zero and for high heritabilities (0.8, Figure 6a,b,c,j,k,l), but not for intermediate heritability (0.3 and 0.6). For intermediate heritability the model is overconservative, estimating heritability to be close to 0 (Figure 6d,e,f,g,h,i).

4 DISCUSSION
Here I showed that latent variable modelling can be used to estimate heritability and indirect genetic effects for phenotypes derived from social networks. Other than in standard quantitative genetic models, dependencies between the individual's network positions and their uncertainties are accounted for. Rather than taking node-specific statistics as dependent variables, edge weights were used. The relative contributions of individuals to these edge weights were estimated and used for estimating indirect effects, such as indirect genetic effects and social environment effects. This method takes indirect effects into account and, if necessary, spatial and temporal effects as well. I show with an example that—since the model estimates different aspects of social behaviour—further biological insights are gained. I will sum up the advantages and disadvantages of using quantitative genetic latent variable models in particular, the use of Bayesian modelling framework in general, and finally give some recommendations for using this methodology.
4.1 Latent variable modelling
As pointed out before, with latent variable modelling some important dependencies in social network data are accounted for, indirect effects can be estimated and additional insights into the underlying biological phenomena can be gained. Here I estimated indirect effects for mean weighted degree as social phenotype. However, other descriptors of network position might better capture the social phenotype of interest. For instance, modelling centrality indices would give insight in the genetic basis of the role individuals can play in the spread of information or disease. Estimating indirect effects for social phenotypes other than mean weighted degree is feasible, but the assumption that components (genetic, environmental, social environmental effects etc.) are additive restricts this method to social network measures that do not require nonlinear transformations (e.g. local clustering coefficient, Barrat et al., 2004 or disparity, Whitehead, 2008, p. 175). It must also be noted that the chosen structure of the models comes with assumptions of the underlying biological mechanisms, but by cross-validating various models, alternative hypotheses can be explored. A drawback of using the more complex latent variable models over standard quantitative genetic models is of course the higher number of degrees of freedom taken by the model. As a consequence, more data are needed to get reliable parameter estimates. Because of the many dependencies in social network data, the predictive capabilities of the models do not linearly increase with more data (Farine & Strandburg-Peshkin, 2015; Sánchez-Tójar et al., 2018). With some assumptions, models can be simplified in case the information carried by the data is not sufficient. For instance, by assuming that all individuals contribute equally to the edges weight, the social governance for all individuals can be set to one and the number of degrees of freedom increases substantially. As the validation simulations show, genetic effects for social governance are difficult to estimate in the example, which would argue for simplifying the model. Alternatively, informative priors can be used to restrict parameter space to sensible values and therefore improving model convergence (Lemoine, 2019).
4.2 Heritability
With the breeder's equation, heritability can be used to predict a population's response to selection (Walsh & Lynch, 2018). Indirect genetic effects complicate these predictions as they provide an alternative route of inheritance (Wolf et al., 1998). This implies that for estimating the response to selection not only direct additive genetic variance, but also indirect additive genetic variance and their covariance should be accounted for (Bijma et al., 2007). In case of the latent variable model, heritability is estimated at the level of social phenotypes. Heritability can therefore not be directly used to predict changes due to selection in social network structure and the position of nodes. However, since all parameters of the latent variable model can be stored in Stan, these changes can be estimated by simulating social networks after a selection event. For instance, imposing differential fitness effects on edge weights, the distribution of social phenotypes in the population will be modified and its consequence for the social network structure can be monitored with network statistics.
4.3 Indirect genetic and common environment effects
The presence of indirect genetic effects for social network traits is not surprising; as stated in the introduction, social network traits are measured within and shaped by a specific social context (Fuller & Hahn, 1976). By definition, interactees affect social network measures, but whether individuals truly affect each other's underyling social phenotypes remains to be explored. What can be tested is whether prior social interactions affect the current social phenotypes of individuals, making indirect genetic effects on social interactions a potential route for inheritance. In various animal species, the social environment in early life affects social phenotypes later in life (e.g. McDonald, 2007; Shimada & Sueur, 2018; Van Den Berg et al., 1999). Modelling the effects of prior social interactions on the social phenotype can be done by implementing indirect genetic effects in the quantitative genetic part of the latent variable model. Not the social network for which the underlying social phenotypes are estimated, but a different social network should in that case be chosen for estimating indirect genetic effects, for instance the social network of a previous year or from early life. Similarly common environment effects can be accounted for by assuming spatial autocorrelation. Rather than drawing the environmental effects on the social phenotypes independently from a normal distribution, one can assume correlations based on, for instance, territory overlap. Most types of social interactions are, however, based on spatial proximity and therefore correcting for spatial autocorrelation potentially affects the estimates of variation in social phenotypes.
4.4 Modelling framework
Using a flexible inference library like Stan has some clear benefits. It uses efficient algorithms for exploring parameter space and the model structure can be tailored to specifically suit the study (McElreath, 2016; de Villemereuil, 2019). Here I present an example estimating latent social phenotypes from dyads, but interactions can, and often will, involve more than two individuals. To analyse interactions of groups larger than two, equation 1 can be modified by adding more individuals. If necessary, equation 1 can be replaced by an equation which better reflects the biology of the study system. For example, Mueller et al. (2013) studied social learning of migration routes in cranes and found that deviations from a straight path were not genetically determined, but depended on the age of the oldest individual in the group. Analysing this data with the latent variable approach is possible by introducing all group members into Equation 1 (allowing for variable group sizes) and estimating social governance as a function of experience (or age). Similarly, directed networks, in which social interactions are directed from one individual towards another (e.g. aggression, food-sharing, grooming), can also be implemented by modifying Equation 1. In the presented model for undirected networks, the social phenotypes (social tendency and social governance) are the same for both interactees, but for directed networks different social phenotypes for the initiators and receivers of the social interactions can be estimated. For instance, rather than social tendency, social phenotypes could be attacking and victimizing or giving and receiving food. Edge weight would be a function of social phenotypes for the initiator (e.g. aggressiveness or probability to share food) and social phenotypes for the receiver (e.g. probability of being victimized and probability to accept food). For all individuals, social phenotypes as an initiator and receiver will be estimated based on their interactions and the interactions of their kin.
4.5 Recommendations
When applied to social phenotypes from networks, quantitative genetic latent variable models offer solutions for some critical violations inherent in standard quantitative genetic models for social network data. Although latent variable models are data hungry, the example shows that they can be applied to long-term datasets. Some suggestions for model simplifications are discussed in this paper. Example code can be found in Appendix S1 (for latent variable models) and elsewhere (for linear mixed modelling: McElreath, 2016; for quantitative genetic models: Zhao et al., 2018) and can be adjusted to study-specific needs.
ACKNOWLEDGEMENTS
I am grateful to Ben Sheldon's Social Network Group (supported by ERC grant AdG 250164) and Colin Garroway for discussions on the methodology and their comments on the manuscript. I thank Julien Martin, Alfredo Sánchez-Tójar, Sebastian Sosa and Alastair Wilson for their constructive comments. The computations were in part performed on computer resources provided by the Swedish National Infrastructure for Computing (SNIC) at Lunarc at Lund University and High Performance Computing Cluster Anunna at Wageningen University & Research.
Open Research
DATA AVAILABILITY STATEMENT
The code used here is available in Appendix S1. The data was previously published and are available in the Dryad Digital Repository https://doi.org/10.5061/dryad.sc26m6c (Wild et al., 2019a). Code and data objects used for analyses can be found in the Open Science Framework https://doi.org/10.17605/osf.io/kzcgv.