Inferring multispecies distributional aggregation level from limited line transect‐derived biodiversity data

Ecologists have generally concluded that species distributions are not random (e.g. aggregate), based on single‐species studies that applied single‐species–based statistical methods, like the negative binomial model. Although it is common knowledge that some specific species in an ecological community present aggregate distributions, this does not necessarily imply that the entire community presents an aggregate distribution. Studying community‐level distributional aggregation patterns requires different statistical methods. Herein, by utilizing a novel conspecific‐encounter index derived from a multiple species Markov transition model that accounts for the non‐independent sampling of consecutive individuals along line transects, we were able to show that tree assemblages in tropical forest ecosystems can present a strong signal of extensive distributional interspersion. This interesting result is not unexpected, given the fact that neighbouring tree individuals in highly diverse tropical forests are usually of different species, resulting in strong niche packing or interspersed patterns. In contrast, for the amphibian assemblages surveyed from southwestern China and central‐south Vietnam, the conspecific‐encounter index was found to be consistently high, implying that amphibian communities tend to be highly aggregate in space. Conclusively, using the conspecific‐encounter index derived from the Markov non‐independent sampling model, we provide a legible definition of community‐level distributional aggregation as an interspersed or cluster‐like distribution of different species. This definition is not idiosyncratic, as it is coincident with the definition of the contagion index used in landscape ecology. To this end, the model used in this paper establishes a framework explicitly linking community ecology and landscape ecology from a multi‐object perspective.


| INTRODUC TI ON
Species distributions are not random in nature. Diverse distributional patterns are commonly found in field practice, including regular, aggregate and random distributions. Nearly all previous studies evaluating distributional patterns of species were conducted at the single-species level (He & Gaston, 2000;Zillio & He, 2010), largely ignoring the joint effects of multiple species and the associated joint distribution patterns. Studies of single-species distributional aggregation patterns imply that they are less likely to accurately evaluate how multiple species interact to cause joint distributional aggregation patterns from a whole-community perspective. This is because, community-level biodiversity patterns can present nonlinear and second-order interactions that might not be revealed using singlespecies metrics or indices (Chen, Shen, Condit, & Hubbell, 2018;Katabuchi et al., 2017). To this end, the development of statistical methods quantifying multispecies distributional aggregation pattern at the community level becomes urgently necessary, which is particularly true when the sampling and collection of field biodiversity data are limited.
Compared to other methods like quadrat-based sampling, the line transect method can be more time saving and labour inexpensive.
Moreover, relevant statistical methods on estimating population size of species have been well-established for line transect sampling (Burnham et al., 1980). However, to date, no statistical methods have been built for inferring community-level diversity and distributional aggregation patterns based on the limited biodiversity data collected from line transects.
There are indeed some metrics available for quantifying spatial distributional aggregation of species in early ecological literature (Clark & Evans, 1954;Green, 1966;Lloyd, 1967;Morisita, 1959;O'Neill et al., 1988;Pielou, 1959). However, most of them are established at the single-species level, and many are calculated using the biodiversity data derived from quadrat-based sampling (Green, 1966;Lloyd, 1967;Morisita, 1959). Some metrics are developed based on sampling distances between neighbouring individuals (Clark & Evans, 1954;Mountford, 1961;Pielou, 1959). However, this may increase the sampling workload (or at least sampling time) as ecologists have to record the geographic coordinates for each individual. To this end, there is no biodiversity index available for quantifying multispecies distributional aggregation patterns in the context of sequential sampling of organisms without knowing sampling distances or coordinates along line transects.
In this study, we introduce a multispecies Markov transition model to account for the non-independent sampling of subsequent individuals along line transects. On the basis of this Markov model, we develop a novel conspecific-encounter index that can allow ecologists, at the community level, to evaluate multispecies distributional aggregation patterns effectively based on the limited biodiversity data sampled from line transects. By conducting extensive numerical simulations and empirical tests using the proposed index, we proved that multispecies distributional patterns in highly diverse tropical forest communities are not as aggregate as inferred based on limited information that some specific species in the target communities present highly aggregate distributions. In other words, even though we may observe that some species are aggregately distributed, distributions of different species at the community level tend to overlap with each other, possibly implying a strong signal of dense niche packing or overlap (May, 1974;Rappoldt & Hogeweg, 1980).

| A Markov model for sequential sampling of individuals and the conspecific-encounter index ν
We utilized a Markov sampling model (Solow, 2000) to establish a non-independent sampling framework for line transect sampling of individuals of species in field practice. To be specific, suppose the true relative abundances of S species in a metacommunity are given Moreover, suppose ecologists consecutively sample m individuals one by one from the metacommunity, in which we record the sampling sequence as Z k , k = 1,2, …, m (representing the species label of the kth sampled individual). Specifically, the underlying probability model of the sampling process is that the first individual is assumed to have been randomly sampled based on the species' relative abundance, that is, and subsequently sampled individuals follow the transition probabilities of a first-order Markov chain (Solow, 2000) as The aim of Equation 2 is to model the relationship between any two adjacent individuals of the line transect sample. Specifically, given that the (k − 1)th individual belongs to species i (i.e. Z k-1 = i), the next individual observed (the kth individual) in the line transect sample can be either from species i (i.e. Z k = i) with probability (1 − π)p i + π or from another species j (i.e. Z k = j) with probability (1 − π)p j , where species j can be any species except species i. This model is called the first-order Markov probability model (Equation 2) because the sum of all the probabilities is equal to 1) and the kth state is only determined by the state at the (k − 1)th state.
Note that parameter π, with a value ranging from 0 and 1, is closely related to the measurement of the degree of non-independence of any two adjacent individuals which are of different species or the same species in the sampling model of Equations 1 and 2. This is because, if π = 0, the sampling procedure actually leads to an independent sample of individuals (i.e. the species identity of the next individual sampled is fully determined by the relative abundance of species). In contrast, if π = 1, the sampling procedure will only sample individuals from a single species that is permitted in the metacommunity. A high π value implies that it is unlikely an individual from a different species will be encountered.
Given a line transect sample of sequential species labels Z k s, we A simple hypothetical example demonstrating the calculation of the new index ν for quantifying the multispecies distributional aggregation is shown in Figure 1.
Actually, the index ν is an unbiased estimate of the probability of observing any two adjacently sampled individuals that belong to the same species in the metacommunity when using data from a line transect sample of m individuals. Note that this index has an immediate relationship with parameter π as E ( ) = + � 1 − � ∑ S i=1 p 2 i , but the proposed index ν has a more immediate meaning for our research purpose than the latter. Additionally, as to the research aim in Solow (2000) of estimating the sample coverage that is crucial to species richness estimations, he suggested that ν could serve as a good surrogate of π when applied to the Markov sampling model. In contrast, an application of index ν to our topic is much more straightforward and important since it not only involves no parameter estimations (which usually makes the index estimation complicated, thus inflating variation) but also implicitly takes the parameters (π and species relative abundances, p i s) of the Markov model into account.
For constructing an interval estimation of the conspecific-encounter index, we also propose a variance estimator of the index as follows: where the sample size m requires a constraint as m ≥ 3 and

Detailed derivation of Equation 4 is given in the Supporting
Information. Using the result in Equation 4, a 95% confidence interval of the conspecific-encounter index is of the form as ± 1.96 √V ar ( ).
We also evaluate the robustness of the conspecific-encounter index and the above variance estimator in case when the first-order Markov property is violated. Details are presented in the Supporting Information.

| Numerical simulation
We utilized a Poisson cluster process to simulate spatial distributions of individuals for each of 100 hypothetical species over a 200 × 200 lattice (Chen, 2013(Chen, , 2014Plotkin et al., 2000). In brief, we created a random number of parental points for a species using a Poisson process with intensity ρ, which is assumed to be the same for each species with a value of 3. Then, each parental point produces a random number of offspring points (treated as species individuals), drawn independently from any appropriate probability models (e.g. Poisson distribution). Here, we consider that the offspring numbers were drawn from a geometric distribution with parameter 0.01. The hypothetical demonstration of how multispecies distributional aggregation pattern at the community level influences the ν value. For simplicity, the line transect here is underscored as a shaded band while the distributional pattern of each of three hypothetical species (marked in circles; species 1, diamonds; species 2 and squares; species 3 respectively) is aggregate. The red arrow indicates the sampling order. Individuals of different species do not overlap and tend to avoid each other in (a) while they overlap to some extent in (b). Values of ν presented in the upper right corner were calculated from hypothetical data and used for demonstration. In (a), the ordered biodiversity data collected from the line transect were denoted as 111222333, the sample size was 9, and the ν value could be easily derived as 6/8 = 0.75. In contrast, the subsequently sampled individuals' labels were 112132323, and the ν value therefore became 1/8 = 0.125 positions of offspring points, denoted by (x, y)s, relative to their parental point were randomly generated by a bivariate normal distribution with the probability density function f(x, y) = 1 2 2 exp ( where (x p , y p ) is the mean vector of the normal distribution, and it represents the location of the parental point that was randomly distributed over the study grid cells. Parameter σ 2 is determined by the variance-covariance matrix of the distribution and controls the dispersion parameter of the data points. In our study, to generate a species' distribution with a more aggregate (or a less aggregate) pattern, we set σ 2 to a fixed value of 50 (or 500). As a simple demonstration of the simulation, distributional patterns of two hypothetical species in highly aggregate or less aggregate communities are visually presented and compared in Figure 2. This process may be more formally named as the Thomas cluster process (Baddeley & Turner, 2005;Thomas, 1949) with slight modifications (i.e. the density of offspring of each cluster follows a geometric distribution) or the Neyman-Scott cluster process (Baddeley, Rubak, & Turner, 2015).
In addition to the Thomas cluster process used here, for testing the general relationship between diverse spatial distribution patterns of species and the proposed conspecific-encounter index, we also simulated four other alternative clustering or regular spatial point models using r package spatstat (Baddeley & Turner, 2005), detailed introductions of which are presented in Supporting Information.

| Empirical tests
We utilized mapping data from a Barro Colorado Island (BCI) forest plot in Panama to illustrate the proposed conspecific-encounter index. The BCI forest plot has an area of 50 ha (1,000 m × 500 m) and was initially established by Stephen Hubbell andRobin Forster in 1980 (Condit, 1998;Condit, Hubbell, & Foster, 1996;Condit et al., 2002Condit et al., , 2012Hubbell et al., 1999). In the present study, we only used 2005 census data. All trees with a diameter at breast height of ≥10 mm were used in this study.
We used line transects to conduct subsequent sampling of individuals in the BCI forest plot and the two simulated hypothetical communities for the following analyses. In detail, a line transect with a fixed width (e.g. 1 m here) is randomly placed in the community

| Field application
To demonstrate the field application of the proposed index ν, Details about the simulation procedure, results and discussion were presented in the Supporting Information. What we can confirm here was that the combination of multiple line transects had similar estimation of the index with respect to single line transects ( Figures S4 and S5 in the Supporting Information), and accordingly, the validity of using combined multiple line transect data in the analyses was supported.

| RE SULTS
In the numerical tests, for the more aggregate hypothetical community in which distributions of species are highly aggregated and tend to present clustering patterns (σ 2 = 50), the conspecific-encounter index, ν, was very high as expected ( Figure 2a; Table 1). In contrast, for the less aggregated community in which species' distributions are widespread (σ 2 = 500; Figure 2b), the conspecific-encounter index, ν, had very low values (less than and close to 0.05; Table 1). The distribution of ν values over the replicates further confirmed these results, regardless of how the sample size varied (Figure 4). Furthermore, the mean and median of the fitted values were coincident and did not change when the sample size increased (Figure 4). The cases for alternative clustering processes (Matern and Cauchy cluster processes) presented similar results ( Figure S1): the conspecific-encounter index ν will decrease when increasing the scale parameter (increasing clustering degree). The regular processes (Strauss and Strauss hard processes) will have a reverse relationship ( Figure   S2). That is, increasing the inhibition strength (the resultant distribution of species will become more regular) will result in low conspecific-encounter index ν ( Figure S2).
For the empirical test using the BCI dataset, the value of ν of the conspecific-encounter index was around 0.1 (Table 1). The distribution of ν values over the replicates further confirmed such a result, regardless of how the sample size varied (Figure 4). Also, like the results for the hypothetical communities above, the mean and median of the fitted values were coincident and did not change when the sample size increased (Figure 4).
F I G U R E 3 Random placement of a line transect (grey-shaded band) in a target area in which the community contains three hypothetical species (denoted by circles, squares and diamonds  The observed number of species for the less aggregated community increased much more quickly with the sample size than that for the more aggregated one (Table 1). Although we expected that more species would be observed when the sample size enlarged, the aggregate patterns of species individuals can vary with the increasing rate when the compared communities had the same species richness.
For the field application, it was observed that, in three national parks of central-south Vietnam and the Minya Konka of southwest China, the conspecific-encounter index ν calculated from the surveyed amphibian data using the sequential line transect sampling was consistently high, being larger than or equal to 0.5 in most cases (

| D ISCUSS I ON
Ecologists usually have the common sense that ecological communities should present aggregate distributions, especially in tropical forests (Plotkin et al., 2000;Umaña et al., 2017;Zillio & He, 2010).
Such a belief is based on observations of highly aggregate distributions of some specific species found within a community (Shao et al., 2018;Zillio & He, 2010), in which each species is independently studied without considering interspecific interactions. Here, using line transect sampling along with a novel statistical model, we challenge this intuitive recognition and present a counterintuitive observation: ecological communities, at least for tropical tree species in BCI forest plot, might not be as aggregate in distribution as we expected (Table 1) were consistently found to be high (Table 2). For the BCI forest plot, the distributions of different tree species tend to intersperse with each other. Such an observation on BCI tree community has been confirmed previously using a negative multinomial model (Chen et al., 2018). Why is this community-level evaluation of distributional aggregation in tree assemblages of BCI plot so surprising and contradictive to our traditional recognition? The reasons can be multifaceted.
A key reason is that if some species present wide-ranging distributions, other competing species may tend to be aggregately distributed so as to reduce interspecific interactions and avoid coexisting with other species. The consequence of biotic interactions is that the distributional pattern of each species differs within an ecological community: some species have very aggregate distributions, while others may have random or even regular distributions. As a result, the community-level joint distributional pattern of all species will very likely be random.
Another reason may be due to the fact that the relatively small extent of a studied local forest plot has relatively homogeneous environmental conditions, in which environmental covariates do not change remarkably (even though the responses of each specific species to the homogeneous environmental conditions may vary, that is, some species grow quickly and tend to actively disperse across the studied area, while other species may grow slowly with a fairly low dispersal ability). The joint response of all species distributed in a relatively homogeneous landscape, therefore, will present a somewhat random distributional pattern. As a result, this makes the value of the conspecific encounter, ν, very low. That is, it is very easy to encounter individuals of different species when travelling across the ecological community, implying that the distributions of different species within the community are interlaced over the space.
The above discussion is restricted to plant communities.
However, for animal communities, at least for the amphibian assemblages surveyed in the field, individuals of the same species tend to group in space (Table 2). This observation implies that amphibian assemblages distribute aggregately, which can be understandable from the facts that this taxonomic group has limited dispersal capacity and several adult organisms can reproduce many tadpoles in the same habitat mosaic (Fei, 1999;Fei, Ye, & Jiang, 2012;Gonçalves, Honrado, Vicente, & Civantos, 2016;Semlitsch, 2008;Smith & Green, 2005 to the interspersed distributions (or inversely, non-overlapping distributional clusters) of different species (Figure 1). That is, when the community is filled with species that tend to avoid each other by forming distinct spatial patches or clusters, the expected value of the conspecific-encounter index would be high, because it is very likely that two subsequent individuals from the same specie will be observed. Therefore, the patch-like (or alternatively, non-interspersing) distributions of species in an ecological community will greatly contribute to the community-level distributional pattern ( Figure 1). This suggests that the community-level distributional pattern quantifies the degree of spatial interspersion of different species. Correspondingly, community-level distributional aggregation quantifies the spatial non-overlapping clustering of species. This conclusion is not idiosyncratic in ecology. Actually, the definition of an aggregation index in landscape ecology (also called a contagion index) is also related to the patch-like distribution of different landscape classes (He, DeZonia, & Mladenoff, 2000;McGarigal & Marks, 1995;O'Neill et al., 1988): when different landscape classes have patch-or cluster-like distributions and tend to avoid each other, the aggregation index is expected to be high.
Finally and notably, the definition of community-level distributional aggregation or clustering discussed above makes sense with regard to the multiple species case. As an extreme case, if an ecological community has a single species and its distribution over the studied domain is very even or random, then computation of the conspecific-encounter index, ν, could be as high as 1.
This calculated value (implying that the distribution of the species is highly aggregate) is contradictive to the true situation (i.e. the actual distribution of the species is even or random as just mentioned). Therefore, we argue that the conspecific-encounter index, ν, is meaningful only under multispecies situations (i.e. species number ≥2). In contrast, a single-species distribution over a target area represents a single distributional cluster (regardless of whether its distribution is densely aggregate, even or random), with no evidence of distributional interspersion. Given that the assessment of interspersed distribution requires the number of compared species to be at least two so as to have a contrast over different species, the conspecific-encounter index ν is most applicable to measure multispecies distributional aggregation patterns at the community level.
The above discussion also implies that the interspersion-minimizing or patch-like distribution of species within an ecological community is an indicator that intraspecific biotic interactions are prevailing over interspecific biotic interactions. To this end, the conspecific-encounter index, ν, can also reflect the relative impor- Conclusively, the Markov model for fitting subsequent sampling of individuals along line transects used in the present study establishes an elegant link between community ecology and landscape ecology. By using the conspecific-encounter index to quantify correlated distributions of neighbouring individuals of species, it was found that community-level distributional patterns actually reflect interspersed or cluster-like distributions of different species ( Figure 1).

ACK N OWLED G EM ENTS
The authors thank the editor Dr. Andres Baselga and two anonymous reviewers for their constructive comments that greatly help improve the manuscript; and the Center for Tropical Forest Science (CTFS) for generously providing the BCI plot data. This re- Tao Nguyen Thien for the support, and Anh Luong Mai and Yen Nguyen Thi for the assistance in the field works. The authors declare that there is no conflict of interest.

AUTH O R S' CO NTR I B UTI O N S
Y.C., T.-J.S., R.C. and S.P.H. conceived the initial ideas, contributed the research data and tools; Y.C. wrote a preliminary draft and conducted numerical simulations; T.-J.S. derived the model and analysed the data; Y.C. and J.P. designed the field survey protocol and examined the field data; H.V.C. and S.C. conducted the field sampling and handled the raw data. All authors contributed critically to the drafts and gave final approval for publication.

DATA ACCE SS I B I LIT Y
The dataset (BCI) we used in the paper is from the Center for Tropical Forest Science (CTFS). The terms and conditions prevent us from archiving these data in a publicly accessible repository. However, anyone can apply for using the BCI dataset of CTFS after filling out an application form which can be found here: http://ctfs.si.edu/ webat las/datas ets/bci/bcifo rmtest_downl oad.php. The Chinese and Vietnamese amphibian datasets we used in the section "Field application" are archived in and downloadable from an online repository (https ://doi.org/10.5281/zenodo.2647164).