Common datastream permutations of animal social network data are not appropriate for hypothesis testing using regression models

Social network methods have become a key tool for describing, modelling, and testing hypotheses about the social structures of animals. However, due to the non-independence of network data and the presence of confounds, specialized statistical techniques are often needed to test hypotheses in these networks. Datastream permutations, originally developed to test the null hypothesis of random social structure, have become a popular tool for testing a wide array of null hypotheses. In particular, they have been used to test whether exogenous factors are related to network structure by interfacing these permutations with regression models. Here, we show that these datastream permutations typically do not represent the null hypothesis of interest to researchers interfacing animal social network analysis with regression modelling, and use simulations to demonstrate the potential pitfalls of using this methodology. Our simulations show that utilizing common datastream permutations to test the coefficients of regression models can lead to extremely high type I (false-positive) error rates (> 30%) in the presence of non-random social structure. The magnitude of this problem is primarily dependent on the degree of non-randomness within the social structure and the intensity of sampling We strongly recommend against utilizing datastream permutations to test regression models in animal social networks. We suggest that a potential solution may be found in regarding the problems of non-independence of network data and unreliability of observations as separate problems with distinct solutions.


Introduction
response variable, these procedures change the distribution of Y, instead of breaking relationships 147 between the variables. If the network has non-random social structure, even structure entirely 148 unrelated to X, then we will typically see a reduction in the variance of Y as we permute the raw 149 data. When Y has a larger variance in the observed data than in the permutations, more extreme 150 values of β are more likely to occur in the observed data, even if the null hypothesis is true. This 151 procedure is therefore likely to result in much higher rates of false-positive type I error than is 152 acceptable (Figure 1). 153 Changes in variance between the observed and permuted data is more than just a technical issue. 154 There is a fundamental problem with this approach when it comes to testing hypotheses using 155 regression models. When researchers fit regression models to predict network properties from 156 exogenous variables, the null hypothesis they will be testing against can be stated as "the variation 157 in network structure is not related to the exogenous variable." This, however, is not the null 158 hypothesis tested by the commonly used datastream permutation methods. Rather, the null 159 hypothesis that is proposed by these datastream permutations could be stated as "the degree of 160 variation in network structure and its relationship to the exogenous variable are both due to random 161 interactions of individuals within constraints." The researcher cannot disentangle the null hypothesis 162 of no relationship between the network and the predictor from the null hypothesis of random social 163 structure. In other words, a significant result from this procedure could be due to a relationship 164 between the predictor and the network, or because individuals do not interact at random, whether 165 or not the true social structure is related to the predictor. This fundamental mismatch between the 166 null hypothesis of interest and that tested by the datastream permutation algorithm makes tests of 167 regression models using this procedure nearly uninterpretable. 168 Here, we demonstrate the problems that occur when combining datastream permutations of animal 169 social network data with regression using two simulated scenarios. In these scenarios, we generate 170 datasets with simple non-random social structure. We then introduce a random exogenous variable that has no relationship to social structure, and test for a relationship between network structure 172 and this variable with linear models, using datastream permutations to determine statistical 173 significance. We show that even in the absence of any true relationship between exogenous 174 variables and social structure, datastream permutations are highly prone to producing significant p-175 values when social structure is non-random. We caution against using these datastream 176 permutations to test the coefficients of regression models, and we discuss possible solutions and 177 alternative methods for regression analysis in social networks. 178 179

Methods 180
General framework 181 To illustrate the problems with using datastream permutations to test the coefficients of regression 182 models, we carried out simulations across two different scenarios, reflecting common research 183 questions in animal social network analysis. The first scenario simulates a case in which researchers 184 are interested in whether dyadic covariates (e.g. kinship or phenotypic similarity) influences the 185 strength of social bonds, which we will refer to as a case of "dyadic regression". The second scenario 186 simulates a case when researchers are interested in how a quantitative individual trait (e.g. age or 187 personality) influences individual network position, which we refer to as "nodal regression." 188 While the methods of network generation differ slightly for each scenario, the general steps are the 189 For each simulation, we perform 200 runs, with varying parameter values (Table 1) In our first simulation, we investigate the case in which the researcher is interested in the influence 208 of a dyadic predictor (such as similarity in phenotype or kinship) on the rates at which dyads 209 associate or interact. Our simulation framework is heavily inspired by those of Whitehead & James 210 (2015) and Farine & Whitehead (2015). We simulate a population of N individuals, and assign each 211 dyad an association probability pij from a beta distribution with mean μ and precision ϕ (α = μϕ, β = 212 (1-μ) ϕ). By assigning association probabilities in this way, we create non-random social preferences 213 in the network, and thus larger variance in edge weights than would be expected given random 214 association (Whitehead et al., 2005). 215 We then simulate t sampling periods. For simplicity, individuals are sighted in each sampling period 216 with a constant probability o, and associations between dyads where both individuals are sighted 217 occur with probability pij. We then build the observed association network by calculating dyadic 218 simple-ratio indices: 219 Where Xij is the total number of sampling periods in which i and j were observed associating, and Dij 221 is the total number of periods in which either i or j was observed (including periods where they were 222 observed, but did not associate with any individuals). 223 We then assign each individual a trait value from a uniform distribution (0,1). We do not need to 224 specify what this trait represents for our simulation, but it could represent any quantitative trait 225 used as a predictor in social network studies (age, personality, cognitive ability, dominance rank, 226 parasite load, etc.). Note that the trait value is generated after the observations of association and 227 has no influence on any network property. 228 We then fit the linear model: 229 and save the estimate of 1 . We compare this coefficient to a null model generated using the 231 sampling period permutation method proposed by Whitehead (1999). There are several algorithms 232 available to perform these swaps. We use the "trial swap" procedure described by Miklós & Podani 233 (2004) and suggested for social network studies by Krause et al. (2009). For each trial, this procedure 234 chooses an arbitrary 2 by 2 submatrix of the lower triangle within a random sampling period. If a 235 swap is possible, it is performed (and symmetrized), otherwise the matrix stays at its current state. 236 These steps when the matrix is not changed are referred to as "waiting steps." This algorithm is ideal 237 because it ensures that the Markov chain samples the possible matrices uniformly, while other 238 algorithms that do not include waiting steps exhibit biases in their sampling of the possible matrices 239 (Miklós & Podani, 2004). We generate 10,000 permuted datasets for each simulation, with 1,000 240 trial swaps between each permutation, and re-fit our linear model to each permuted dataset, 241 recording the coefficient. We then use this distribution of coefficients to calculate the p-value of the linear model's coefficient. Across the 200 runs, we vary the parameters of the simulation by drawing 243 μ, ϕ, N, o, and t randomly using Latin hypercube sampling (Table 1) This gives the simulation the property that individuals with higher assigned gregariousness scores 266 tend to be seen in larger groups, and vice versa. This leads to non-random differences in 267 gregariousness (and thus weighted degree) between individuals. We then calculate the association 268 network, again using the SRI: 269 Where Xij is the number of groups in which the dyad was seen together, and Yi and Yj are the number 271 of groups in which only i or only j were seen, respectively. After calculating the network, we 272 determine each individual's weighted degree. We again generate a trait value for each individual at 273 random from a uniform distribution on (0,1) and fit the linear model 274 and again save the estimate of 1 . We compare this coefficient to random coefficients fit to 276 networks generated using the group-based permutation procedure proposed by (Bejder et al., 277 1998). This procedure again sequentially permuted the observed dataset, while maintaining the size 278 of each group and the number of groups per individual. We again use the trial swap method to 279 perform these permutations, generating 10,000 permuted datasets with 1,000 trials per 280 permutation, and derived p-values in the same way as above. We vary the parameters of this 281 simulation by using Latin hypercube sampling to draw values of N, M, G, and σ (see Table 1 for 282 ranges). 283 284 Analysis 285 We use the outputs of the simulations primarily to derive overall type I error rates for both 286 scenarios, calculated as the portion of runs in which a p-value less than 0.05 was obtained. We 287 further investigated the sensitivity of these results to non-random social structure, sampling effort, 288 and population size. Previous work suggests that the sensitivity of datastream permutation 289 techniques are highly dependent on variation in social structure and sampling intensity (Whitehead, 290 2008). We use binomial generalized linear models to summarize how population size, response 291 variance, and sampling intensity influence the probability of false positives. We further analyse these 292 relationships qualitatively using conditional probability plots. 293

Dyadic regression 296
The overall type I error rate for the dyadic regression case was high, with 35% of runs giving false 297 positive results (70 out of 200 runs). Sensitivity analysis suggested that the most important factors 298 influencing type I error rate in our simulations were the average number of sightings per individuals 299 and the variance of association probabilities. As the average number of sightings increased, so did 300 the false positive rate (β = 0.012 ± 0.004, z = 3.149, p = 0.002, Figure 2a). Similarly, networks with 301 higher variance in edge weights experienced higher type I error rates (β = 8.35 ± 8.93, z = 2.37, p = 302 0.02, Figure 2b). There was a less clear, but statistically significant relationship between network size 303 and type I error rates, with networks of larger size typically having lower type I error rates (β = -304 0.014 ± 0.007, z = -2.02, p = 0.04, Figure 2c). 305 306

Nodal regression 307
The nodal regression case resulted in even higher type I error rates than the dyadic case, with almost 308 half of runs giving false positive results (95 out of 200 runs; 47.5%). The rate of type I errors was 309 strongly influenced by the variance in weighted degree; as the standard deviation of the response 310 increased, so too did the false positive rate (β = 1.18 ± 0.50, z = 2.34, p = 0.019, Figure 3a). In 311 contrast, as the size of the network increased, the false positive rate decreased, although never 312 approaching the target false positive rate of 0.05 in our simulations (β = -0.02 ± 0.01, z = 2.89, p = 313 0.004, Figure 3c). In this simulation, the number of sightings per individual did not appear to 314 significantly influence the type I error rate (β = 0.018 ± 0.013, z = 1.43, p = 0.153, Figure 3b). This 315 may be because, in networks with few groupings but high sightings per individual, there were fewer 316 possible permutations of the observed network, and therefore the permuted networks were more 317 similar to the original network. 318 319 Discussion 320 These two simple simulated scenarios show that the commonly used datastream permutation 321 procedures for animal social network data produce extremely high and thus unacceptable false-322 positive rates when applied to regression models. This is because datastream permutations do not 323 generate appropriate null distributions for testing the significance of model coefficients. We 324 therefore strongly warn against using this procedure. 325 We now turn to some potential solutions to this problem that may still facilitate inference in these 326 situations. This is not intended to be a comprehensive guide to hypothesis testing in social networks, We feel that these methods have the potential to address the current issue that we have identified 375 and we strongly encourage new work to explore and validate these approaches. It is important to 376 note that the methods we propose are only useful if the question of interest is about the structure 377 of social affinity, rather than the empirical pattern of encounters between individuals. If, instead, 378 researchers are interested in the actual rates of contact (as is the case in disease research and 379 studies of social learning), this approach may not be appropriate. Extensions of recent work using 380 hidden state modelling may be more appropriate for disentangling true association patterns when 381 detections are potentially biased or imperfect (Gimenez et al., 2019). 382

383
Building better null models 384 The problems we have identified here arise because the commonly used null models for animal 385 societies do not generate datasets representing the null hypothesis of interest in a regression 386 setting. These models were specifically designed to test the null hypothesis of random social 387 structure, not the null hypothesis that aspects of social structure are unrelated to exogenous factors. 388 An obvious way forward would be the development of permutation procedures that generate 389 datasets that correctly represent the relevant null hypothesis. In the case of dyadic regression, these 390 datasets would maintain the structure of the data (e.g. sightings per individual, associations per 391 sampling period, spatial patterns of observations), randomise identities of associated individuals, 392 and simultaneously preserve the variance in edge weights. In the case of nodal regression, permuted 393 datasets would maintain the same (or at least a similar) distribution of individual centrality within 394 the network, in addition to structural confounds such as the size of groups, sightings per individual, 395 and timing of sightings. The design of such procedures is far from trivial, and is beyond the scope of 396 this paper, but we suspect that the development of algorithms that simultaneously maintain aspects 397 of data structure and features of the social system will be an important area of methodological 398 research going forward. 399 400

Conclusion 401
The development of permutation techniques that control for sampling biases while maintaining 402 temporal, spatial, and structural aspects of the raw data is an important development in the study of 403 animal social systems, and we suspect that these procedures will remain a key tool for hypothesis 404 testing in ecology and evolution. However, a lack of consideration regarding the matching up of the 405 null hypothesis being tested with the null model being generated using datastream permutations 406 has led to unwarranted application of these techniques, particularly in the context of hypothesis 407 testing using regression models. 408 We recommend that researchers think critically and carefully about the null hypothesis they wish to 409 test using social network data, and ensure that the null model they specify does in fact represent 410 that hypothesis. We suspect that in most cases, the null hypothesis of random social structure will 411 clearly not be appropriate, and therefore traditional datastream permutations will not be a viable 412 approach. We hope that our discussion of this issue and the results of our simulations will result in 413 reconsideration of how researchers employ null models when analysing animal social networks, 414 promote further research and discussion in this area, and lead to the development of procedures 415 that correctly specify null hypotheses and allow robust inference in animal social network studies.