Detecting and quantifying social transmission using network-based diffusion analysis.

Although social learning capabilities are taxonomically widespread, demonstrating that freely interacting animals (whether wild or captive) rely on social learning has proved remarkably challenging. Network-based diffusion analysis (NBDA) offers a means for detecting social learning using observational data on freely interacting groups. Its core assumption is that if a target behaviour is socially transmitted, then its spread should follow the connections in a social network that reflects social learning opportunities. Here, we provide a comprehensive guide for using NBDA. We first introduce its underlying mathematical framework and present the types of questions that NBDA can address. We then guide researchers through the process of selecting an appropriate social network for their research question; determining which NBDA variant should be used; and incorporating other variables that may impact asocial and social learning. Finally, we discuss how to interpret an NBDA model's output and provide practical recommendations for model selection. Throughout, we highlight extensions to the basic NBDA framework, including incorporation of dynamic networks to capture changes in social relationships during a diffusion and using a multi-network NBDA to estimate information flow across multiple types of social relationship. Alongside this information, we provide worked examples and tutorials demonstrating how to perform analyses using the newly developed NBDA package written in the R programming language.

disentangling social from asocial influences on learning in contexts where animals are free to interact (or not) with each other and with their environment. Experimental methods such as the two-action and control design (Whiten & Mesoudi, 2008) remain the 'gold standard' for distinguishing social from non-social learning, but these can be difficult to implement in wild populations (though see van de Waal, Borgeaud, & Whiten, 2013). Often, only observational data are available, motivating the development of alternative methods for inferring social learning.
Network-based diffusion analysis (NBDA) is a means to infer social transmission of information in the wild (Franz & Nunn, 2009;Hoppitt, Boogert, & Laland, 2010). NBDA follows the assumption that individuals that frequently interact are more likely to learn from one another (Coussi-Korbel & Fragaszy, 1995). Thus, social transmission is inferred if the spread of an innovation follows a social network that is thought to reflect social learning opportunities (Hoppitt, 2017). Rather than assume a given behaviour diffuses entirely through either social or asocial processes, NBDA estimates the strength of social learning relative to asocial learning, thereby facilitating evaluation of how different factors (e.g. genetic, phenotypic and ecological) impact both processes (Hoppitt, Boogert, et al., 2010;Wild et al., 2019). Although originally developed to study the transmission of innovations, NBDA may also prove useful in disentangling social versus non-social influences on disease spread (Silk et al., 2017).
Here, we aim to provide a comprehensive and up-to-date resource for researchers interested in using NBDA, and to illustrate the use of the newly developed nbda package for r . In the Supporting Information, we provide tutorials showing how to implement these analyses using the nbda package.
We first introduce the mathematical framework underlying the basic NBDA model in order to provide readers an intuitive sense of how the model operates (Section 2) before presenting the types of research questions that NBDA can address and discussing which network types are most suitable for these questions (Section 3). Next, readers are guided through the process of selecting an appropriate NBDA variant (Section 4.1) and we illustrate how NBDA can be extended to model multiple diffusions (whether multiple behavioural patterns spreading within a single group or a single innovation spreading through multiple groups; Section 4.2). In Section 5, we demonstrate how individual-level variables, such as sex or age, can be incorporated into an NBDA to evaluate their effects on asocial and/or social learning. We then discuss in Section 6 how to interpret the various parameters and their associated confidence intervals in an NBDA. Section 7 extends NBDA to quantify social transmission effects simultaneously across multiple network types (e.g. affiliative, agonistic and proximity-based). Next, we show how to compare the relative support for alternative NBDA models (Section 8.1) and how multi-model inference can be used to incorporate information across multiple candidate models (Section 8.2). Finally, we discuss additional considerations regarding the use of NBDA and highlight potential future extensions (Section 9).

| THE BA S I C NB DA MODEL
Understanding the basic NBDA model is key to understanding and interpreting the various forms of NBDA and its extensions, so we first present the model's mathematical formulation in its most fundamental form and explain it in some detail. A glossary of key terms is provided in Box 1. The basic NBDA model can be expressed as where λ i (t) is the rate at which individual i acquires the target behaviour as a function of time, λ o (t) is a baseline rate function, z i (t) is the 'status' of individual i at time t (1 = informed; 0 = naïve), N is the number of individuals in the population and a ij is a non-negative value indicating the connection strength from j to i in a social network. The key model output is the relative strength of social transmission, s, the value of which is estimated when the model is fitted to the data (much as the slope parameter is estimated when fitting a linear regression, except here maximum likelihood is used instead of least squares; see Supporting Information Section 1). This model can be expanded in various ways (such as by including individual-level variables that modify asocial and/ or social learning), as we describe and define in the Sections below.
In NBDA, the term 'rate' is used in the sense of the 'hazard rate' in a survival analysis (Moore, 2016)-that is, the current expected  a z j (t) + 1 , by social transmission is assumed to be proportional to ∑ N j = 1 a z j (t), the total network connection of i to informed individuals at time t, multiplied by the baseline rate function, λ o (t). Consequently, s estimates the rate of transmission per unit connection relative to the rate of asocial learning of the target behaviour. Note that alternative parameterizations of s are possible, which we discuss in Section 3 of the Supporting Information. Finally, (1 -z i (t)) ensures that only naïve individuals can learn, since when i is informed, z i (t) = 1, meaning (1 -z i (t)) = 0 and λ i (t) = 0.
Although s is normally estimated from the data, it is also possible to constrain s to certain values. In particular, models can be fit in which s is constrained to 0 (i.e. no social transmission), meaning the rate of acquisition is determined by the rate of asocial learning alone.
We refer to such models as 'asocial learning models' or 'asocial' models, which should be taken as shorthand for a model with asocial learning only, since asocial learning also occurs when s > 0.

| D IFFERENT T YPE S OF NE T WORK S
The social network is the key predictor variable in an NBDA. In principle, one can use any social network type that specifies a nonnegative connection in each direction for each dyad. However, the exact meaning of the s parameter can vary depending on the type of network used and the appropriateness of different network types may depend on the research question (Hoppitt, 2017). There are generally two primary aims when employing NBDA: to evaluate the strength of evidence for social transmission (Section 3.1) and to elucidate the typical pathways of information flow (Section 3.2).

| Networks for detecting and quantifying social transmission
When the goal of an NBDA is simply to detect and quantify the impact of social transmission, there are many types of social network a researcher can choose. There are many excellent resources providing guidance on constructing empirical social networks (Croft, James, & Krause, 2008;Farine & Whitehead, 2015;Whitehead, 2009).
Perhaps the most obvious choice for NBDA is an association network, where a ij estimates the proportion of time i associates with j. Ideally, one would assume that individuals can only socially learn from one another when associating. For this assumption to be reasonable, the criterion for association must be specified at an appropriate spatial scale. Individuals recorded as associating must be within observation distance, whereas individuals recorded as not associating must tend to be at a distance at which observation is impossible or unlikely (Hoppitt, 2017). It is important to note that what constitutes a reasonable distance over which social transmission is likely to occur will depend on the sensory modalities involved (see Section 3.5 for further discussion). For example, in a study investigating the spread of lobtail feeding through a humpback whale Megaptera noviangliae population, whales needed to be within two body lengths to be recorded as 'associating' (Allen et al., 2013).
As the study was conducted over an area of approximately 1,000 square miles and lobtail feeding occurs during foraging bouts that require close coordination among whales, the aforementioned assumption seems reasonable. In such cases, s can be interpreted as

BOX 1 Glossary
Asocial (or individual) learning: learning through trial-anderror or personal sampling. In NBDA, this refers to learning the target behaviour independently of others, that is, not through social transmission.
Asocial model: an NBDA model in which the target behaviour is never learned through social transmission, that is, learning is always asocial learning.
Diffusion data: data detailing the spread of a target behaviour pattern through an animal population or group.
Homogeneous network: In NBDA, a network in which typically all dyads are connected, and all connections are set to 1.

Individual-level variable (ILV): a variable that varies among
individuals and is included in an NBDA for its potential effect on asocial and/or social learning rates.

Network-based diffusion analysis (NBDA): a statistical
method for quantifying the influence of social transmission, mediated by one or more social networks, in the spread of a target behaviour through a group or population.
Order-of-acquisition diffusion analysis (OADA): an NBDA variant that takes as data the order in which individuals acquired a target behaviour (usually inferred from the time at which they first perform it).
Social learning: learning that is facilitated by observation of, or interaction with, another individual or its products (Hoppitt & Laland, 2013after Heyes, 1994. Social learning can (but does not always) result in the social transmission of behaviour.
Social network: A mathematical description of social structure, in which nodes (usually representing individuals) are linked by connections indicating some form of social relationship. It is formally represented as an adjacency matrix (Farine & Whitehead, 2015).
Social transmission: occurs when the prior acquisition of a behavioural trait T by one individual A, when expressed either directly in the performance of T or in some other behaviour associated with T, exerts a lasting positive causal influence on the rate at which another individual B acquires and/or performs T (Hoppitt & Laland, 2013).

Time-of-acquisition diffusion analysis (TADA): an NBDA
variant that takes as data the time at which individuals acquired a target behaviour (usually inferred from the time at which they first perform it). the social transmission rate from an informed to naïve individual during periods when they are associating, relative to the rate of asocial learning (Hoppitt, 2017). In contrast, other studies (Boogert, Nightingale, Hoppitt, & Laland, 2014;Boogert, Reader, Hoppitt, & Laland, 2008;Hasenjager & Dugatkin, 2017) have used a criterion based on proximity (e.g. nearest neighbour) within small enclosures of a few square meters, such that dyads observed as not associating may still be within observation distance. We refer to the former as 'large-scale association networks' and the latter as 'small-scale association networks'.
When using small-scale association networks, s may not be interpretable in the same specific manner as for large-scale association networks. That is, s may not necessarily provide the rate of social transmission during periods in which individuals are able to observe knowledgeable individuals. Rather, a small-scale association network represents the hypothesis that individuals are more likely to learn from demonstrators that they tend to be found near to than from those that are more spatially distant (see Section 3.2 for further discussion). Further considerations regarding the construction and use of proximity-based networks can be found in Franks, Ruxton, and James (2010) and Farine (2015).
An alternative network type is an observation network, where a ij represents the number of opportunities i has had to observe j performing the target behaviour. Such a network is perhaps the most direct method for detecting and quantifying social transmission. If an observation network is to be used, it makes sense to use a dynamic (time-varying) version, so we delay further discussion until Section 3.4.

| Networks for establishing the typical pathways of information transfer
Another aim of an NBDA may be to elucidate the typical pathways of diffusion by comparing the fit of alternative NBDA models that include different networks (Franz & Nunn, 2009;Hoppitt, 2017). In this sense, networks represent hypotheses about how information is expected to spread. The result of this process would suggest the types of relationship that are important in providing the opportunity and/or motivation to observe and learn from others. For example, a study on ravens Corvus corax found that a social network based on affiliative interactions (e.g. allopreening, food sharing) predicted the spread of a novel foraging behaviour better than networks based on aggressive interactions or proximity (Kulahci et al., 2016).
Information theoretic approaches can be used to compare the support for alternative models (Section 8); whichever network best approximates the true transmission pathway(s) is likely to be favoured (Hoppitt, 2017).
Several types of networks might be included in such an analysis, including large-and small-scale association networks, interaction networks and theoretically derived model networks (Hoppitt, 2017).
For a large-scale association network, s can be interpreted as the rate of social transmission from informed to naïve individuals (relative to the asocial learning rate) during periods in which the latter can observe the former. A small-scale association network represents the more general hypothesis that individuals in close proximity will tend to learn from one another more often than those that are more spatially distant. Support for interaction networks would suggest that a particular interaction type (e.g. allopreening, trophallaxis) predicts the rate at which individuals learn from one another. Finally, theoretically derived networks can be constructed that represent hypothesized transmission pathways. For example, if transmission is predicted to occur primarily between individuals who have previously interacted, a researcher could include a model network in which familiar individuals are socially connected (a ij = a ji = 1) and unfamiliar dyads are not (a ij = a ji = 0) (Atton, Galef, Hoppitt, Webster, & Laland, 2014).
The estimate of s yielded from small-scale association networks, interaction networks or model networks is more general and abstract than for large-scale association networks: for the former, s estimates the social transmission rate from informed to naïve individuals per unit of network connection, relative to the asocial learning rate. In such cases, s may be difficult to interpret biologically and may not be comparable across networks measured on different scales (e.g. grooming networks vs. proximity networks), making it difficult to gauge the importance of social transmission. A solution is to convert the estimate of s into the estimated proportion of learning events that occurred by social transmission as opposed to asocial learning (see Section 6.5).

| Including transmission weights
The standard NBDA model implicitly assumes that all individuals perform the target behaviour at a similar rate once they have learned it.
However, some individuals may perform the behaviour more often, thus providing more opportunities for social transmission than those that perform it less frequently. If a researcher has a measure of the rate at which individuals performed the target behaviour during the diffusion, this information can be included in the model as transmission weights, W j , by replacing a ij with W j a ij . The rate of transmission is then assumed to be proportional to rate of performance. For further guidance about including transmission weights, see Supporting Information Section 4.

| Dynamic networks and observation networks
The basic NBDA model (Equation 1) assumes that the social network does not change during the diffusion, that is, that it is a 'static network'. However, under some circumstances, network structure may undergo substantial changes during the diffusion process, for example, due to demographic processes or shifting dominance ranks. By extending the basic NBDA model so that it can incorporate a timevarying network, we can include such temporal changes in the analysis (Hobaiter et al., 2014). Such time-varying networks are referred to in the NBDA literature as 'dynamic'; use of this term does not imply constantly changing networks nor any specific processes (e.g. feedback loops) driving network changes. Dynamic networks are included simply by replacing a ij (i.e. the connection from individual j to i) with a ij (t), the connection from individual j to i at time t: We advise caution when considering whether to include an association or interaction network as a dynamic network in an NBDA.
If the network is broken down into time periods that are too small, apparent changes in network structure may simply be the result of sampling error. In addition, by breaking up network data into smaller chunks, estimates of connection strength may become less precise.
Therefore, we suggest that dynamic association or interaction networks only be used if there is sufficient data in each time period to ensure precise estimates of network connections (Hoppitt & Farine, 2018). Further information and guidance on the use of dynamic network approaches can be found in Blonder, Wey, Dornhaus, James, and Sih (2012), Hobson, Avery, andWright (2013) andFarine (2018).
Conversely, when using observation networks (which, as a reminder, directly reflect opportunities i has had to observe j performing the target behaviour) it usually makes sense to use a dynamic network. If one wishes to detect and quantify social transmission, then ideally, one would like a complete record of when the target behaviour was performed, by whom, and who observed each performance. It is possible to obtain data close to this level of resolution for many captive populations or in cases where the target behaviour is only performed in a specific location (or locations) that can be monitored closely. For example, Hobaiter et al. (2014) used NBDA to analyse the diffusion of moss sponging-using pieces of moss to obtain water from holes in trees-in chimpanzees Pan troglodytes. As the initial spread of sponging was documented at only a single water hole, use of a dynamic observation network was appropriate. Researchers might obtain a similar level of resolution using artificial foraging tasks that can be closely monitored (e.g. van de Waal, Renevey, Favre, & Bshary, 2010), or when information transfer is largely restricted to particular locations, such as the honeybee 'dancefloor' .
In a dynamic observation network, a ij (t) is the number of times i has observed j perform the target behaviour prior to t. In practise, it will usually be sufficient to specify the network only at the times at which each acquisition event occurred. In the corresponding static observation network, a ij gives the total number of times i observed j up until the time at which i learned the behaviour. This latter network does not incorporate information on the actual time course of observation and acquisition events, tantamount to assuming that all observations were made prior to the start of the experiment, before any individuals had even learned the behaviour. To illustrate why this matters, imagine a group of three chimpanzees (A, B and C) learning moss sponging by social transmission (see Figure 1). Chimp A learns how to sponge first, and is observed performing it three times by B, after which B learns this behaviour. Next, C observes A perform moss sponging four times and then learns the behaviour. The static observation network (taken from Event 3 in Figure 1) would represent this pattern as a A,B = 3 and a A,C = 4, and would predict that C will learn before B. In reality, we would expect B to learn first, as predicted by the dynamic observation network.
Use of a dynamic observation network has the advantage that it can infer social learning if the chance order in which individuals observe the behaviour predicts the order of diffusion, even if there is little or no underlying social network structure. In other words, if the behaviour is socially transmitted, we would expect that individuals that happen to observe the behaviour earlier or more often than others should also tend to learn sooner than them, even if there is no underlying tendency for some dyads to associate more than others. Unfortunately, s cannot be interpreted here as the rate of social transmission from informed to naïve individuals during periods of association (Hoppitt, 2017) and so may not be comparable across studies. We suggest that researchers obtain an estimate of the proportion of learning events that occurred by social transmission as an interpretable measure of its strength (see Section 6.5).
It will often not make sense to include an observation network alongside association, interaction or model networks in a model comparison meant to establish the typical pathways of information transfer. (2)

F I G U R E 1 An example showing the predictive power of a hypothetical dynamic observation network whereby three individuals (A, B and C) learn to perform a particular behaviour.
Network connections indicate the number of times a naïve individual observed an informed individual perform the target behaviour prior to acquiring it themselves Such an analysis aims to find a network that best approximates opportunities for observation and social learning. The observation network bypasses this approximation since it directly quantifies these opportunities. One may test whether a network that receives strong support in the NBDA correlates with the observation network. If so, this suggests that support for the former derives at least in part from the fact that it predicts the likelihood of observing the behaviour. However, a strong predictive network might also reflect differences in the effect on learning per observation, for example, observations of higher-ranking individuals might carry more weight. Canteloup, Hoppitt, and van de Waal (2020) show how NBDA can be extended to test for such effects.
It may also make sense to compare models with different observation networks representing different types of observations  Box 3) to determine which network (or combination of networks) best explains the diffusion data (see Section 7). See Hobaiter et al. (2014) and Hoppitt (2017) for further recommendations on using a dynamic social network in an NBDA.

| Non-visual social learning and learning from products
Thus far we have assumed that social transmission of novel behaviour occurs when one individual observes another performing it.
The term 'observes' should not be taken to mean restricted to the visual modality, but rather should be interpreted in a broad sense, where behaviour can be observed in any modality. Familiar examples include species that learn vocalizations by listening to others-for example, whale song (Noad, Cato, Bryden, Jenner, & Jenner, 2000).
The recommendations provided above should therefore be considered in light of the modality in question. For instance, a large-scale association network needs to reflect the scale over which social learning can occur-for example, auditory cues may travel much further than visual ones.
It is also well documented that behaviour can be transmitted when a naïve individual encounters the products of an informed individual's performance of that behaviour (Leadbeater & Chittka, 2008;Terkel, 1995). For example, bumblebees Bombus terrestris learn to rob nectar from the bases of flowers by first encountering and using holes created by previous robbers before eventually learning to create their own (Leadbeater & Chittka, 2008). In such cases, a network's predictive power in an NBDA will depend on the extent to which it approximates i's opportunities to encounter the products of j's behaviour. To date, we are aware of no uses of NBDA targeted at behaviour transmitted through product learning, but this remains a potential application.

| D IFFUS I ON DATA AND T YPE S OF N B DA
In the context of NBDA, diffusion data refers to the pattern of spread of the target behaviour and provides the response variable for the analysis. There are two main NBDA variants: order-of-acquisition diffusion analysis (OADA), which takes as data the order in which individuals acquired the target behaviour, and time-of-acquisition diffusion analysis (TADA), which uses the times of acquisition of the target behaviour. TADA can be further subdivided into versions that treat time as either a continuous variable (continuous TADA or 'cTADA') or as a discrete variable split into units (discrete TADA or 'dTADA'). We first explain how to decide between the different variants.

| OADA, cTADA or dTADA?
The original form of NBDA was the dTADA (Franz & Nunn, 2009), with the OADA and cTADA being proposed soon afterwards (Hoppitt, Boogert, et al., 2010). All forms can be expressed as shown in Equations 1 and 2. Choice of OADA versus cTADA versus dTADA depends on the diffusion data available and the assumptions one is willing to make about how the rate of learning changes over time (i.e. the shape of the baseline rate function, λ o (t)). We discuss the latter issue first.
OADA makes no assumptions about the shape of λ o (t), but only assumes that this function is the same for every individual in the diffusion (to understand why, see the Supporting Information Section 1). In contrast, TADA requires a researcher to make assumptions about the form of λ o (t), and fit parameters controlling its shape. When these assumptions are met, TADA offers more statistical power than OADA (Hoppitt, Boogert, et al., 2010), particularly when the network is densely connected with little variation in connection strength. Indeed, when the network is completely homogeneous (i.e. all possible connections exist and are of equal strength), OADA cannot distinguish social transmission from asocial learning since all orders of acquisition would be equally likely in both models. Conversely, homogeneous networks pose less of an issue when using TADA. This is because TADA is also sensitive to accelerating learning rates, which can result from social transmission due to increasing numbers of informed individuals (though purely asocial processes can also produce such a pattern; see below).
In the simplest case, one can fit a TADA that assumes a constant baseline learning rate, λ o (t) = λ o , with an extra parameter, λ o , fitted to the data (Franz & Nunn, 2009;Hoppitt, Boogert, et al., 2010).
However, this assumption may often not hold-for example, individuals might initially exhibit neophobic responses towards a learning task, but as neophobia fades over time, asocial learning rates should increase. Such circumstances can cause a spurious positive result for social transmission in a TADA (Hoppitt, Kandler, Kendal, & Laland, 2010). To understand why, let us assume a constant baseline learning rate and no social transmission. Under these conditions, the rate at which new individuals solve the task should decrease over time as the pool of potential learners becomes depleted. Conversely, social transmission will increase the rate at which individuals solve the task, at least initially, as individuals' likelihood of learning increases with the number of informed associates. However, if the baseline learning rate is not constant, but instead increases over time, it too will predict an increased solving rate that may be mistaken for a social transmission effect. On the other hand, if λ o (t) decreases over time-perhaps because the resources necessary to learn the behaviour begin to deplete-this can counter social transmission's acceleratory effect, thereby reducing the power of TADA to detect it.
Fortunately, TADA can be modified to have a non-constant baseline rate. Although any positive function can be specified for λ o (t), the nbda package has two functions built-in which will be sufficient in most cases. One corresponds to a gamma distribution of latencies under asocial learning (Hoppitt, Kandler, et al., 2010) If instead λ o (t) fluctuates unpredictably, this can badly reduce the power of TADA, but OADA will remain unaffected (Hoppitt, Boogert, et al., 2010). For example, if a field experiment is conducted in which a population is presented with a foraging task, there may be many factors influencing the rate at which individuals in the population solve the task at any given time, such as weather conditions, predation risk or diurnal rhythms. In principle, if all the variables causing fluctuations in the baseline acquisition rate can be identified and included in the model (see Section 5), TADA could still be appropriate. However, OADA is a far easier option (but see Supporting Information Section 6).
So, what does this mean for a researcher choosing between OADA, cTADA and dTADA? If only data on the order in which individuals acquired the behaviour is available, then OADA must be used ( Figure 2). However, if data on precise times of acquisitions are available, there is a choice between OADA and cTADA. If it is likely that λ o (t) fluctuates unpredictably, then OADA is again to be preferred. However, if the researcher is confident that the baseline rate function can be assumed to be constant or can be modelled as a potentially systematically increasing or decreasing function, then cTADA is to be initially preferred, since it offers more statistical power under these circumstances. In such cases, we recommend that models with both constant and Weibull (and/or gamma) baseline functions be fitted, and the best fitting baseline function be used to generate parameter estimates (see Section 8.2). However, if very different results are obtained from models with different baseline functions (e.g. strong support for asocial learning vs. strong support for social transmission), it suggests that the analysis is dominated by the time course of events as opposed to the pattern of diffusion through the network (to understand why, see Supporting Information Section 5). Such a situation is presented in Tutorial 7 in the Supporting Information. In such cases, we recommend that researchers switch to OADA since it is invariant to the shape of λ o (t), and sensitive only to the diffusion pattern through the network. We summarize the above recommendations in Figure 2.

F I G U R E 2 Flowchart for selecting the appropriate network-based diffusion analysis model
In other cases, some information on acquisition times may be available, but exact times are not known-for example, if the population is sampled periodically, giving a temporal snapshot of who is informed at any given time. The researcher then knows only the time period in which each individual was first observed performing the behaviour. The natural choice here is a dTADA, though if sampling periods are sufficiently frequent, it may be possible to resolve the order of acquisition, enabling the use of OADA (a few ties can be accommodated in OADA, see Supporting Information Section 7). In such cases, the same reasoning can be used for choosing between OADA and cTADA described above (Figure 2). We might also have inexact acquisition times if there is observation error in the recorded time of acquisition, which can inflate the false positive error rate for social transmission in both dTADA and cTADA (Franz & Nunn, 2010).
However, by using a dTADA with a sufficiently long time unit, this problem may be alleviated (see Franz & Nunn, 2010 for further guidance).
If TADA is chosen, it is important that the times entered into the model are cumulative times that include only those periods during which it was possible for learning to occur. For example, imagine a foraging task presented to a group of animals at 09:00-10:00 hr each day. If individual A learns to solve the task 5 min into the session on the second day, it would be attributed 65 min as its time of acquisition, since A could only solve the task when the task was available to be solved.
Note that in a TADA, evidence for a model with social transmission over an asocial model supports the presence of social transmission but does not necessarily constitute evidence that transmission follows the provided network. We therefore recommend that researchers include an additional model (or set of models) in which the social network is replaced with a homogeneous network (connections of 1 for all dyads). If the homogeneous network is favoured over the measured social network (see Section 8), it implies either that transmission occurs homogeneously within the group, or, more likely, that the measured network differs substantially from the real transmission pathways (Whalen & Hoppitt, 2016). See Tutorial 4.3 in the Supporting Information for an example of this approach. It is important to note that when making these comparisons, a fully connected homogeneous network may not necessarily be appropriate.
For example, certain pairs may never have the opportunity to interact due to non-overlapping home ranges or demographic processes, and so should not be connected in the homogeneous network (see Tutorial 6.6 in the Supporting Information); a similar issue is discussed further in the context of multiple diffusions in Section 4.2.
In the nbda package, the process for including homogeneous networks in the analysis is the same as for any other network, meaning users have the flexibility to construct a homogeneous network that appropriately reflects their study system (e.g. assigning values of 0 to dyads that could never have interacted). For example, in long-term studies, it may be that some individuals die or emigrate from a population before others are born or immigrate into it. It is possible to incorporate such demographic information into an NBDA by specifying for each learning event, which individuals are present in the population (see Tutorial 1.5 in the Supporting Information) and constructing dynamic networks (including homogeneous networks) that capture this population turn-over (Section 3.4). In a similar fashion, other types of networks can be used to capture non-social effects on a diffusion that may otherwise be mistaken for social transmission. For example, to test whether a diffusion may be explained by patterns of spatial overlap rather than social transmission, a spaceuse network could be included (alongside the social and homogeneous networks) where dyads that share space are connected in the network and those that do not are left unconnected (see Tutorial 6.6 in the Supporting Information, and Wild et al., 2019 for an alternative method for disentangling social and spatial effects in an NBDA).
For useful guidance on the use of null models in animal social network analysis, see Farine and Whitehead (2015) and Farine (2017).

| Modelling multiple diffusions
Thus far we have assumed that the researcher has data from a single diffusion, that is, the spread of a single behaviour through a single population or group. A researcher can also combine data from multiple diffusions (e.g. the same foraging task presented to different groups) into a single NBDA model, thereby increasing the power to detect social transmission. It also enables researchers to ask whether the strength of social transmission varies across different groups, contexts or learning tasks. There are several ways this can be done. Let us first extend the NBDA model from Equation 2 to multiple diffusions: Here, subscript l denotes the diffusion number (i.e. λ il (t) is the rate of acquisition for individual i in diffusion l).
The first option is to fit an OADA in which the shape of the baseline rate, λ ol (t), is unspecified and allowed to vary among diffusions. In this case, the analysis is sensitive only to the acquisition order within each diffusion. This approach may also result in lower statistical power to detect social transmission, as it ignores information on the relative timing of learning events across groups that may be used to infer social learning. For instance, imagine a study consisting of three diffusions in which everyone in group 1 learns in the first 5 min, everyone in group 2 learns in the middle of the experiment, and everyone in group 3 learns at the end. Assuming there is no reason to suspect that groups systematically differ in their asocial learning rate, this pattern is consistent with innovations arising at different times in each group and rapidly spreading via social learning. It would be ignored, however, by the OADA described above that focuses only on learning order within groups. This short coming is addressed by alternative options below, but if the baseline learning rate is thought to fluctuate unpredictably both within and across groups (Section 4.1), then the above form of OADA may be preferred. ( Alternately, a researcher could use a TADA if the assumptions are reasonable (see Section 4.1). Here, the nbda package assumes that the shape of the baseline function is the same in all diffusions, λ ol (t) = λ o (t), but one can control for the possibility of a different asocial learning rate in each group by including a 'group' individual-level variable (see Section 5). However, as described in Section 4.1, TADA requires selecting a baseline rate function. If the results are not robust to this choice, then OADA may be preferred (Figure 2).
A compromise between the above options is to assume that λ ol (t) = λ o (t) (i.e. how learning rates change over time is assumed to be the same across all diffusions), but to leave λ o (t) unspecified. This assumption may be warranted, for example, if the individuals within each diffusion were raised and tested under near-identical laboratory conditions. To do this, we can fit an OADA in which all diffusions are treated as a single diffusion. Thus, the acquisition order is specified across all diffusions, but individuals from different diffusions are not connected in the network. We refer to this model as a 'stratified OADA'. As with a TADA, a 'group' individual-level variable can be included to control for the possibility that groups differ in their asocial learning rate.
In a multiple diffusion analysis using TADA or stratified OADA, comparing a network-based social learning model to an asocial model tests for evidence of social transmission, but does not specifically test whether transmission follows the network within each group. For example, if everyone in each group is as likely to learn from one group mate as another, then the network provided to the analysis is likely to reasonably approximate the pathway of learning due to the zero connections between individuals in different groups.
To specifically test whether the diffusion follows the social network within each group, an alternative model must be fit in which connections within each group are set to 1 and, if using a stratified OADA, connections between groups are set to 0. We term this the 'group network' as it identifies co-membership in groups (and therefore the potential for transmission) without assuming any underlying structure to within-group relationships. If the social network provides a substantially better fit than the group network (see Section 8.1), this suggests that the social network approximates the learning pathways within each group. If instead the group network is favoured over both the asocial and social network models, then there remains evidence of social transmission, but either transmission is operating homogeneously within groups or the real transmission pathways substantially differ from the social networks included for each group (see Tutorials 6.5 and 7.5 in the Supporting Information for examples of how to implement this approach). Our previous recommendations regarding construction of homogeneous networks also apply to group networks (see Section 4.1).
So far, we have assumed that researchers are analysing multiple diffusions on different sets of individuals. Alternatively, it could be that individuals are present in more than one diffusion, for example, if different foraging tasks are presented to the same group. In such cases, each individual's rate of acquisition may be correlated across diffusions. For example, an especially exploratory individual may be among the first to solve each task that is presented to its group. This can be accounted for by including random effects in the NBDA that model among-individual variation in acquisition rates. The nbda package allows this to be done in an OADA using the coxme package (Therneau, 2018), using the technique described by (Hoppitt, Boogert, et al., 2010).

| Seeded demonstrators
In many diffusion studies, some individuals start the diffusion already informed, often because they are trained to perform the target behaviour and 'seeded' in the diffusion. Such individuals are easily accounted for in an NBDA by simply setting status, z j (t), to 1 for all t > 0. The nbda package easily allows for incorporating such information (see Tutorial 1 in the Supporting Information).

| INDIVIDUAL-LE VEL VARIAB LE S
Network-based diffusion analysis can be expanded to include other predictor variables that might influence the rate of social transmission and/or asocial learning, termed 'individual-level variables' (ILVs; Hoppitt, Boogert, et al., 2010). We expand Equation 2 to include the effects of V continuous or binary ILVs as follows: where x k,i is the value of the kth variable for individual i, β k is the coefficient of the effect of variable k on asocial learning, and γ k is the coefficient of the effect of variable k on social transmission (see Section 6.1 for how these coefficients can be interpreted).

| Why include ILVs?
The most obvious reason to include ILVs in an NBDA is if the researcher is interested in the effect those variables may have on asocial and/or social learning (Box 2). For example, age shapes reliance on social information in capuchin monkeys Cebus capucinus (Barrett, McElreath, & Perry, 2017), whereas female guppies Poecilia reticulata solve foraging-related tasks quicker than males (Reader & Laland, 2000). Alternatively, one may wish to include a potentially confounding variable that might cause a spurious social transmission effect. This can occur if a variable is both correlated with the network and affects asocial learning (Hoppitt, Boogert, et al., 2010)-for example, older individuals may tend to associate with one another and be more likely to acquire a novel foraging trait through asocial learning. Hoppitt, Boogert, et al. (2010) showed that such confounds could be statistically controlled for by including the relevant ILV in the NBDA.

| Additive, multiplicative and unconstrained models
When NBDA was first extended to include ILVs, two variants were proposed (Hoppitt, Boogert, et al., 2010). The additive model assumed that all ILVs affected only the asocial learning rate, Г i = 0, whereas the multiplicative model assumed that all ILVs influenced both asocial learning and social transmission, and did so by the same amount-that is, β k = γ k for all k. However, as an ILV may have different effects on social transmission and asocial learning, we generally prefer fitting an 'unconstrained' model (Hoppitt & Laland, 2013) in which β k and γ k are estimated independently.
Nonetheless, it may make sense for some variables to assume a priori that they only operate on asocial learning (γ k = 0), only on social transmission (β k = 0), or that they affect asocial learning and social transmission the same amount (β k = γ k ). The nbda package allows the user to specify which of these options should be used for each ILV.

| Entering ILVs
s is estimated relative to the baseline asocial learning rate, which is the rate of asocial learning when all ILVs in the model are set to zero.
As such, ILVs should be entered in a way that makes interpretation of s most meaningful. Turning to sex, we find that females are estimated to be e 19.84 = 4.13 × 10 8 times faster than males at asocially learning! If we examine the profile log-likelihood, we find that it is very asymmetric (Box 2 Figure). In fact, it approaches an asymptote as the estimated coefficient for males' asocial learning rate moves towards negative infinity. This is because only females ever learned with zero network con-

| Continuous variables
For continuous variables (e.g. body size, mass), we recommend centring (subtract the mean) such that all such variables have a mean of zero. In this way, the baseline asocial learning rate is set to the mean of all continuous variables. Dividing each variable by its standard deviation such that it is fully standardized (mean = 0, SD = 1) is also advisable since it improves the probability of model convergence.

| Binary variables
For two-level factors, such as sex, the variable can be coded as 0/1 (e.g. males = 0, females = 1) such that the estimated effect β k or γ k gives the difference on the log scale between the two levels (see Section 6.1). This means that the baseline asocial learning rate will be set to whichever factor level is set to zero. It may also be necessary to re-code binary variables once the analysis has been run to obtain interpretable estimates of s (see Section 6.4).

| Factors
Categorical variables with F > 2 levels can be broken down into F − 1 indicator variables (as in a standard regression analysis). For example, if we have four age classes (adults, sub-adults, juveniles and infants), this could be broken down into a variable 'inf' which takes the value 1 for infants and 0 for adults/sub-adults/juveniles, a variable 'juv' which takes the value 1 for juveniles and 0 for adults/ sub-adults/infants, and a variable 'sub' which takes the value 1 for sub-adults and 0 for adults/juveniles/infants. Adults then become the reference level (inf = 0, juv = 0, sub = 0) to which infants (inf = 1, juv = 0, sub = 0), juveniles (inf = 0, juv = 1, sub = 0) and sub-adults (inf = 0, juv = 0, sub = 1) are compared. Whichever factor level is set as the reference is also the baseline rate of asocial learning. In our example, s is estimated relative to the adult asocial learning rate.
Again, it may be necessary to re-code factors once the analysis has been run to obtain interpretable estimates of s (see Section 6.4).  (Moore, 2016)). Therefore, e k estimates the multiplicative effect of one unit increase in x k on the rate of asocial learning, and e k estimates the multiplicative effect of one unit increase in x k on the social learning rate (i.e. incoming social transmission). If the variable has been standardized, estimates give the effect of one SD increase in x k . One can transform the effect back to the original scale by dividing β k and γ k by the SD for the unstandardized variable.

| Time-varying ILVs
For example, imagine that we have an ILV 'age', which had a SD of 10 years. We standardized the variable and obtained the estimates β age = 1.5 and γ age = −0.8. We can therefore estimate that for an increase in age of 1 SD (10 years

| Binary variables
For binary variables coded as 1/0, e k estimates the ratio of asocial learning rates between the two levels. Likewise, e k estimates the ratio of social learning rates between the two levels. For example, imagine we have an ILV 'sex' with 0 = male and 1 = female. We get β sex = 1.8 and γ sex = −1.2. Therefore, females are an estimated e 1.8 = 6.05× faster than males at asocial learning and an estimated e −1.2 = 0.30× slower at social learning. Alternately, we can reverse the sign of the γ sex coefficient: males are an estimated e 1.2 = 3.32× faster than females at social learning.

| Factors
Coefficients can be interpreted in the same way as binary variables in a pairwise manner. For our example in Section 5.3, imagine that we got β inf = −0.22, β juv = 0.74 and β sub = 0.32. Juveniles are an estimated e 0.74 = 2.10× faster at asocial learning than adults, whereas sub-adults are an estimated e 0.32 = 1.38× faster than adults at asocial learning. Conversely, infants are estimated to only be e −0.22 = 0.8× as fast as adults at asocial learning. To get the estimated difference between two of the non-reference (i.e. non-adult) age classes, we back-transform the difference between their coefficients. For example, if we are interested in comparing asocial learning rates between juveniles and sub-adults, we find that juveniles are an estimated e ( juv − sub ) = e (0.74−0.32) = 1.52× faster at asocial learning than sub-adults.

| Social transmission (s)
In general, s is the social transmission rate per unit network connection, relative to the baseline asocial learning rate, but may have a more specific interpretation depending on the network used (see Section 3). The baseline rate of asocial learning is obtained by setting all ILVs to zero (see Section 5.3). For example, imagine that we have a large-scale association network (see Section 3.1), a continuous ILV 'age' centred on zero, and a binary variable 'sex', coded as males = 0, females = 1 and we obtain an estimate of s = 3.2. We can conclude that the rate of social transmission from informed to naïve individuals during periods when they are associating was estimated at 3.2× the baseline asocial learning rate (i.e. the asocial learning rate for a male of average age).

| Obtaining and interpreting confidence intervals
Confidence intervals (CIs) give a plausible range for the real value of a parameter; that is, an X% CI is expected to contain a parameter's true value on X% of occasions. CIs therefore should be obtained, reported and interpreted for any parameters of interest, including s. A common way to obtain CIs (e.g. in a generalized linear model) is to take the maximum likelihood estimate ± 1.96× the standard error, referred to as Wald confidence intervals. However, Wald CIs can be misleading if the uncertainty in the value of a parameter is asymmetrical, as is often the case for parameters in an NBDA. In particular, for s, there is often more certainty in the lower limit than for the upper limit.
A preferred approach for such a scenario is to use the established profile likelihood technique (Morgan, 2010), which provides CIs reflecting any asymmetry in the certainty in a parameter (Figure 3).
The profile log-likelihood is the −log-likelihood for a specified value of the parameter of interest, once all other parameters in the model have been optimized. If a value, v, for the parameter has a profile log-likelihood that is within 1.92 units of the minimum, then v falls within the 95% CI; this is because the 95% profile CI contains all values that would not be rejected at the 5% level in a likelihood ratio test (see Section 8.1). To find the 95% CI, researchers must plot the or 'high', in many cases this will be difficult. Researchers can instead transform the upper and lower limits of the 95% CI into upper and lower estimates of the percentage of events that occurred by social transmission (see Section 6.5).
Confidence intervals for the effects of ILVs can be interpreted in an analogous manner, but parameter values should first be back-transformed as described in Section 6.1, after which, the point of no effect is e 0 = 1. CIs for ILVs could also potentially include values in either direction (i.e. > and/or <1).

F I G U R E 3
Profile log-likelihood plot for obtaining confidence intervals for parameters in which there is asymmetry in the uncertainty regarding their values. The profile log-likelihood is the −log-likelihood for a specified value of the target parameter once all other parameters in the model have been optimized. The lowest point of the curve (A) corresponds to the profile log-likelihood for the parameter value obtained from the fitted model. The dashed line indicates 1.92 units above this minimum −log-likelihood. The values at which the curve crosses this dashed line indicate the lower (B) and upper (C) values for the 95% CI. Here, the estimate from the fitted model is 1.54 (95% CI: 0.40, 6.61)

| Dealing with large estimates for s
Note that sometimes very large estimates of s can be obtained, especially in an OADA, which can seem difficult to interpret.
This also usually means that we cannot find an upper limit for the 95% CI for s (Section 6.3). There are two main reasons for such large estimates: either an ILV has a very large positive coefficient or the diffusion follows the network very closely (as in Box 3). Further information about why this occurs and steps that researchers can take can be found in the Supporting Information (Section 8).

| Estimating the percentage of events occurring by social transmission
For some network types, interpreting s in an intuitive manner is challenging (Section 3), making it difficult to evaluate the importance of

BOX 3 Testing for social transmission across multiple pathways
A target behaviour may be socially transmitted across multiple pathways (i.e. network types), but at different rates in each. To test for this, one can input multiple networks into an NBDA and estimate a separate social transmission rate (s) for each. For example, honeybees Apis mellifera can learn about foraging opportunities through multiple forms of interaction. Waggle dances performed by successful foragers provide the location of profitable foraging sites to naïve bees, while chemosensory information about currently available food sources (e.g. floral odour) is obtained during trophallactic food-sharing and by antennating other foragers (Cholé et al., 2019;Grüter & Farina, 2009). To assess the relative importance of these transmission pathways during recruitment of foragers to a novel foraging site, we recorded all interactions that occurred within the hive between demonstrator bees trained to collect food from a feeding station and a cohort of potential recruits that had never before visited that site. The order in which naïve bees successfully located the site was also recorded. To capture the temporal ordering of in-hive interactions between demonstrators and recruits, all three networks-dance-following interactions, trophallaxes and antennation-were input as dynamic networks (Section 3.4). Box 3 Table provides the relative support for a set of candidate models. Comparing models 2 and 3, either with a likelihood ratio test ( 2 2 = 11.12, p = 0.004) or based on AICc, reveals that estimating s separately for each network type is favoured over assuming a uniform transmission rate across all networks. However, model 1 which includes only the dynamic dance-following network is clearly favoured (w 1 = 0.96), meaning there is little uncertainty regarding the best model out of those considered here.
That the temporal ordering of dance-following interactions is key is revealed by model 1 receiving e (25.48/2) = 341,124× the support as model 5 (which used the corresponding static dance-following network). Model 4, which assumed that discovering the feeding station occurred through independent search alone, received virtually no support. Model 1 yielded a large estimate of s = 9.94 × 10 7 , most likely because the order of recruitment followed the dance-following network extremely closely (Section 6.4). Converting this value into %ST estimates that following dances for the feeding station accounted for a 100% (95% CI: 91.2%, +∞) of the 16 recruitment events. The code for these analyses is found in the Supporting Information. where i is the individual that learned during event e, and t e is the time at which event e occurred. This is the predicted relative social transmission rate divided by the predicted total relative learning rate for i at

| MULTI -NE T WORK NB DA
The approaches described in Equations 1-5 assume that social transmission follows a single pathway, represented by a single network (or a single type of network when modelling multiple diffusions; Section 4.2). An alternative approach is to allow for the possibility that social transmission might follow multiple pathways within a group and do so at different rates (for a worked example, see Box 3).
This situation can be modelled using a multi-network NBDA , expanding Equation 5 as follows: where a n,ij (t) is the connection from j to i in netswork n at time t, and s n is the transmission rate per unit connection in network n (relative to the rate of asocial learning).
This model can be compared with those in which some or all of the s parameters are constrained. For example, comparison with a model in which s 1 = s 2 tests for a difference in transmission rate between network 1 and network 2. We could also consider models in which there is no transmission in a specific network, e.g. s 1 = 0, to test for evidence of social transmission on a specific pathway.
We can also estimate the percentage of events occurring by social transmission via a specific network n, %ST n (see Section 6.5). We first expand Equation 6 to calculate the probability that each event occurred due to social transmission via network n: We then take the mean value of p n,e across all events to obtain %ST n .
See Farine, Aplin, et al. (2015) for further discussion on how to quantify the influence of each network in a multi-network NBDA.
Another potential use of multi-network NBDA is to break down association or observation networks into different pathways to test for transmission biases. For example, to test for a rank bias in transmission we might break down an association network into two networks: network 1 containing the links from higher to lower ranks (and 0 connections elsewhere), and network 2 containing links from lower to higher ranks. We can then compare this model with one in which s 1 = s 2 in order to test for a rank bias-that is, are individuals more (or less) likely to learn from those with higher rank than those with a lower rank? Hoppitt (2017)

| Model comparison
In the preceding sections, we have alluded to several different situations where the fit of two or more NBDA models must be compared in order to assess the evidence for competing hypotheses, including: a. Comparing a network-based model of social transmission to a model with a homogeneous network (Section 4.1) or a group network (Section 4.2).
b. Comparing models with different networks (Section 3.2) or different combinations of networks (Section 7) to ascertain which best approximates the pathways of transmission.
c. Comparing multi-network models with models in which some or all s n are constrained (e.g. s 1 = s 2 , or s 1 = 0; Section 7).
In some cases, the models to be compared are nested, meaning one model is a special case of the other, with constraints imposed . on one or more parameters (this is true for c above unless different baseline rate functions are fitted in each model). When models are nested, one can use a likelihood ratio test (LRT) to obtain a P value quantifying the evidence against the null hypothesis represented in the constrained model (Morgan, 2010 Models with lower AICc are those that explain the data better after penalizing for the number of parameters in the model. The penalty imposed is not arbitrary; it is chosen such that the difference in AICc between any two models fitted to the same data estimates the difference in Kullback-Leibler (K-L) information. In turn, K-L information measures the extent to which the predicted distribution for the response variable differs from its true distribution. In other words, it estimates the information that is lost when moving from the true distribution to the model. Consequently, AICc provides a theoretically well justified measure of the relative fit of two or more models. We can transform the difference in AICc between two models (ΔAIC) to obtain the relative support for the two models, e (ΔAIC/2) . This value quantifies the ratio of probabilities that each model is the one with the best K-L information (termed the 'best K-L model').
For example, imagine that we fit models with a proximity network (AICc = 382) and a network quantifying grooming interactions (AICc = 373.5). These data suggest that the grooming network better approximates the pathways of transmission than the proximity network, but how certain of this result can we be? It might just be a chance result of sampling error. The ΔAICc between these two models is 9.5, giving a relative support of e (9.5/2) = 115.6. This means that the grooming network is 115.6× more likely to be a closer approximation to the transmission pathways than the proximity network, which we would take to be very strong support in favour of the grooming network.
If a researcher has several candidate models, they can list them in increasing order of AICc to show the order of preference in model fit (Box 3). They can then calculate the Akaike weight for each model as a measure of its support. To do this, one first calculates the AICc difference between each model, i, and the best model, Δ i = AICc i − AICc best . The Akaike weight for model i is then , and can be interpreted as the probability that model i is the best K-L model in the set, accounting for sampling error.

| Multi-model inference
If there are ILVs to consider in addition to our competing hypotheses about social transmission, this complicates model selection.
We could simply include all ILVs in all candidate models but requiring these models to fit additional parameters may decrease the precision of our estimates for s. Ideally, we only want to include variables for which there is evidence of an effect on asocial and/ or social learning. The traditional approach would be to select the combination of ILVs that provides the best model fit and base our inferences on that model. With modern computing power, it would even be feasible to fit all possible combinations of ILVs and select the lowest AICc as our best model. However, this approach inherently assumes we are certain that the best-supported model really is the best one (in the sense of minimizing K-L information loss). As we saw in Section 8.1, there is often substantial uncertainty due to sampling error over which model really is the best; this model selection uncertainty is quantified by the Akaike weight (Burnham & Anderson, 2002;Burnham et al., 2011).
Multi-model inference is a set of tools (available in the nbda package) allowing us to account for model selection uncertainty when we make our inferences. The first such tool allows us to quantify the overall strength of evidence for a particular network (or combination of networks) by calculating the total Akaike weight for that network (otherwise simply known as 'support' for that network). This is done by simply summing the Akaike weights, Σw i , for all the models that contain the network. This value can be thought of as the probability that the best K-L model is one that includes the network of interest.
We can obtain the support for all the networks (or network combinations) we are considering as an overall measure of the extent to which each one approximates the pathways of transmission. For this to be a fair comparison, there must be the same number of models for each network. However, if the same combinations of ILVs are considered for each network, this condition will be met. Support can also be obtained for an effect of each ILV on asocial and social learning rates in an analogous manner. We can also compare the overall fit of models with different baseline rate functions, or particular combinations of baseline function and network(s).
The question remains as to whether we can validly use Σw i to measure the level of support for asocial models versus social models (i.e. models with a social transmission component). This depends on the set of models that we are considering. As described in Section 5.2, previous NBDA approaches compared sets of additive and multiplicative models. Under these circumstances, a fair comparison between asocial and social models is sometimes possible (to understand why, see Supporting Information Section 9).
However, this may not be the case when using unconstrained models, which we argued were preferable to the previous addi- For s parameters, we recommend obtaining MAEs and USEs that are conditional on the relevant network(s) being presented in the model. If a large number of networks are considered, then any given s parameter will be absent from the vast majority of models in the set, and MAEs and USEs will be misleading. Conditioning on the subset of models that contain a specific network reweights the Akaike weights such that they sum to 1 within the subset, and then carries out multi-model inference using those models. This is equivalent to asking 'given that the best K-L model contains network n, what is our best estimate of s?' The MAE for an s parameter can still be misleading if there are some models in the set for which s is estimated arbitrarily large (see Section 6.4). Even if these models have a tiny Akaike weight, they can still badly skew the estimate of s. In such cases, we suggest that the model weighted median for s is obtained instead as an estimate that is robust to extreme estimates with low Akaike weight.
Unconditional standard errors provide a useful way of calculating unconditional 95% CIs for parameters that account for model selection uncertainty: one simply calculates MAE ± 1.96 × USE.
However, these CIs can be misleading when the profile likelihood is asymmetrical for the same reason Wald CIs can be (see Section 6.3).
Burnham and Anderson (2002) suggest a method for inflating 95% profile likelihood intervals (Section 6.3) to account for model selection uncertainty. However, this approach is not always reliable for NBDA. Instead, we recommend obtaining the 95% CI conditional on the best model containing a parameter.
Since there is usually particular interest in determining how strong, at a minimum, social transmission is, we recommend assessing the robustness of the lower limit of the 95% CI to model selection uncertainty. This can be done by obtaining and interpreting the 95% lower limit for all models containing the relevant s parameter and the corresponding estimate of %ST. For example, if all values are >0, then the evidence for social transmission is robust to model selection uncertainty. We also suggest providing a model-averaged version of the value of %ST corresponding to the 95% lower limit, as a lower plausible limit on the importance of social transmission after accounting for model selection uncertainty (see Tutorial 7 in the Supporting Information). (c) controlling for variation in exposure to a learning task that may generate a spurious social transmission effect; and (d) Bayesian approaches to NBDA.

| FURTHER E X TEN S I ON S AND CONS IDER ATIONS
As a final note, although NBDA provides a flexible approach for inferring social transmission from observational data, it may not be appropriate in all circumstances. Experience-weighted attraction (EWA) models provide an alternative approach that focuses on a slightly different question (McElreath et al., 2008). Whereas NBDA is primarily concerned with the spread of innovations through groups or populations, EWA models instead evaluate how animals use social information to decide amongst two or more alternative behaviours. For example, Barrett et al. (2017) employed EWA models to evaluate social information use strategies of capuchin monkeys C. capucinus deciding amongst several extractive foraging techniques. In addition, in its current form NBDA assumes that transmission rates are proportional to individuals' connectedness to informed demonstrators (referred to as a 'simple contagion').
However, some behaviours may instead spread via 'complex contagions', where adopting a trait may depend less on connection strength to informed individuals and more on the relative proportion of informed and uninformed associates (Firth, 2020). Nonetheless, NBDA could be modified to model such cases by changing Equation 1 to reflect different transmission rules underpinning such complex contagions.

DATA AVA I L A B I L I T Y S TAT E M E N T
Tutorials, code and data to perform all analyses is provided in the Supporting Information. These materials are also available from the Dryad Digital Repository https://doi.org/10.5061/dryad.280gb 5mnj .