Modelling mobile agent‐based ecosystem services using kernel‐weighted predictors

Agriculture benefits from ecosystem services provided by mobile agents, such as biological pest control by natural enemies and pollination by bees. However, methods that can generate spatially explicit predictions and maps of these ecosystem services based on empirical data are still scarce. Here we propose a generic statistical model to derive kernel functions to characterize the spatial distribution of ecosystem services provided by mobile agents. The model is similar in spirit to a generalized linear model, and uses data of landscape composition and ecosystem services assessed at target sites to estimate parameters of the kernel. The approach is tested in a simulation study and illustrated by an empirical case study on parasitism rates of the diamondback moth Plutella xylostella. The simulation study shows that the scale parameter of the exponential power kernel can be estimated with limited bias, whereas estimation of the shape parameter is difficult. For the case study the model provides biologically relevant estimates for the kernel associated with parasitism of P. xylostella. These estimates can be used to generate ecosystem service maps for existing or planned landscapes. The case study reveals that predictions can be sensitive to the parameter values for the width and shape of the kernel, and to the link function used in the statistical model. In the last two decades numerous empirical studies assessed ecosystem services at target sites and related these to the surrounding landscape. Our method can take advantage of these data by estimating underlying kernels that can be used to map the spatial distribution of ecosystem services. However, empirical data that can discriminate between alternative kernel shapes remain critical.


| INTRODUCTION
Ecosystems provide a range of goods and services that are essential for human well-being (Daily, 1997). Some ecosystem services, such as carbon sequestration and soil conservation, are closely associated with habitats and show little spatial dynamics or interlinkages at the landscape level. In contrast, ecosystem services that are based on mobile organisms, such as pollinators, natural enemies of pests and seed dispersers, are highly dynamic in space and time, and the strength of the ecosystem service depends on factors operating at the landscape level (Bianchi, Booij, & Tscharntke, 2006;Kremen et al., 2007).
With growing concerns about the deterioration of ecosystem services (Millennium Ecosystem Assessment, 2005), there is a need for methods that can quantify and map ecosystem services at relevant spatial scales to support land-use management and planning for ecosystem service conservation.
In the last two decades, many studies have assessed the intensity of mobile agent-based ecosystem services, typically by placing sentinels in a range of landscapes, and regressing the measured ecosystem service intensity against landscape variables observed in concentric rings or circles of varying width around the points of measurement (Thies & Tscharntke, 1999). This regression approach has been widely used to identify source habitats for service-providing organisms, and to assess the spatial extent of the associated agent-based ecosystem services (e.g. Bianchi, Goedhart, & Baveco, 2008;Haenke, Scheid, Schaefer, Tscharntke, & Thies, 2009;Holzschuh, Dormann, Tscharntke, & Steffan-Dewenter, 2013;Schmidt & Tscharntke, 2005;Thies, Steffan-Dewenter, & Tscharntke, 2003). We will refer to this method as spatial indexed regression (SIR).
However, SIR has important shortcomings (Miguet, Fahrig, & Lavigne, 2017). First, the rings are discrete but the agent movement process is continuous, which represents a fundamental conceptual mismatch. Second, the chosen radii of concentric rings are arbitrary and small perturbations of the radii can affect significance of regression coefficients and therefore the conclusion of a study. Third, the method is inevitably very parameter rich: for every habitat variable of interest a parameter needs to be estimated for each ring or circle around the target site. Fourth, regressions for subsequent rings are assumed independent which they are not. Fifth, an analysis per ring has reduced power as compared to an overall analysis of landscape effects on ecosystem service provisioning. Moreover, as a result of limitations 1, 2 and 4, the SIR approach is not well suited for making spatial maps of services.
Here we propose a generalization of the SIR approach which overcomes these limitations while using the same empirical sentinel data.
The method assumes that service-providing mobile agents disperse from source habitats to a target location. Furthermore, it is assumed that there is a dose-response relationship between agent abundance at the target and the ecosystem service provided. The dispersal process of mobile agents is modelled using a two-dimensional distribution, or kernel for short, such as the bivariate normal distribution. Such kernels have few parameters and have been widely used to characterize spatial movement processes in ecology (Bullock et al., 2017;Clark, Silman, Kern, Macklin, & HilleRisLambers, 1999;Snäll, O'Hara, & Arjas, 2007).
The approach outlined here is based on the following premises: (1) ecosystem service providing agents are colonizing the surrounding landscape from source habitats such as forests; (2) service-providing agents are radiating out from the source habitats following a kernel which is a decreasing function of the distance between the source and the target (cf. Miguet et al., 2017); (3) the abundance of agents is assumed to be proportional to the area of source habitat; and (4) there is a continuous dose-response relationship between the abundance of agents at a target site and the ecosystem service they provide at the target. Note that premises (1), (3) and (4) are also implicitly made in the SIR approach.
It follows from (1) and (2) that the abundance of the ecosystem service providing agents at a given target site is a weighted sum of contributions from different sources in the surrounding landscape where the weights are defined by the kernel and the distance between source and target. In this way the spatial structure of service-providing agents is captured into a single predictor, i.e. the kernel-weighted sum of contributions. In most cases the abundance of mobile agents at sources remains unknown, whereas the land use is known, and therefore premise (3) is required. In the sequel, the weighted sum of contributions will either be called abundance of agents or, whenever more appropriate, amount of source habitat. Conceptually, there is a proportionality factor between this distance-weighted amount of source habitat at the target and the abundance of the agents at the target.
This proportionality factor represents the density of agents provided per unit area of source habitat. In practice, this proportionality factor is not identifiable if the abundance of agents at the target is not measured, and in the sequel, we set it therefore to one. In contrast to the SIR approach where separate regressions are conducted for each ring or circle, premise (4) states that there is only a single dose-response regression model, which links the abundance of agents at the target site to the ecosystem service at the target site. Fitting this model to ecosystem service data yields estimates of the kernel (which describes the way agents are radiating out from a source), and the parameters of the dose-response model (which translates the abundance of agents at a target site to a measured ecosystem service). The kernel model and the dose-response model are fitted simultaneously.
The purpose of this methodology is twofold. First, to identify source habitats in the landscape that are associated with ecosystem services provided by mobile agents, and to assess the spatial scale at which these agents influence ecosystem services. This objective is congruent with the SIR methodology. Second, to map the estimated spatial distribution of the resulting ecosystem services across the landscape by projecting the kernel and dose-response model around source habitats. While the mapping procedure reflects the methods proposed by Lonsdorf et al. (2009) for pollination services and Jonsson et al. (2014) for biological control, our methodology requires a relatively low number of parameters that can directly be estimated from empirical data. The purpose of the paper was to outline the modelling methodology, analyse its performance using a simulation study, and provide a proof of concept by applying it to a case study.

| MATERIALS AND METHODS
We first introduce kernel functions for the spatial distribution of ecosystem service providing agents around patches of a single type of source habitat. We then introduce the dose-response model that links the abundance of agents at the target to the delivery of ecosystem services at the target. Then we explain how the kernel and dose-response model are simultaneously fitted to data, and finally we outline how the resulting model can be used to map ecosystem services. Two applications are provided for illustration: a simulation study that explores whether the model parameters are identifiable from simulated data, and an empirical study based on the ecosystem service of parasitization of caterpillars by parasitoid wasps.

| Abundance of ecosystem service providing agents arriving in a target site
We make the simplifying assumption that the mobile agents associated with the ecosystem service move randomly without a preference for direction and independently of the structure of the landscape. This implies that the spatial distribution of mobile agents around a point source can be described with a kernel f(r) that is only a function of the distance r to the source. The kernel f(r) thus represents in a single dimension the "proportion" of agents active at a distance r from the source (details in Appendix S1).
Mobile agents at the target site may originate from multiple source habitat patches at different distances from the target site. To obtain a mathematical expression for the total abundance of agents arriving at a target consider a target site which is placed at the origin (r = 0).
Define a(r, θ) as the abundance of the ecosystem service providing agents at a location (r, θ), i.e. at a distance r and angle θ from the target location. Note that a(r, θ) is in fact a density, i.e. the number of agents per unit area, which we conveniently refer to as abundance.
The abundance of the agents from this point source which arrives at the target site is then given by the product a(r, θ) f(r). The total abundance of agents T arriving at a target site is then the integral over all contributions from source habitats: The extra term r in the integral is due to the transformation from Cartesian coordinates to polar coordinates (Nathan, Klein, Robledo-Arnuncio, & Revilla, 2012) (Appendix S1).
In practical applications, it is usually not feasible to fully specify the agent abundance function a(r, θ). However, the area of source habitat (e.g. forest) in a series of non-overlapping rings can usually be obtained from geographical information systems and this area is used as a proxy for the number of agents provided. We approximate the total agent abundance T in Equation 1 in the following way.
Denote r 0 = 0, r 1 , r 2 ,…, r n as a set of ordered radii which define n rings around a target, and define A j as the total amount of source habitat in ring j. Further define the mean function value F j of the kernel f(r) in ring j as The total abundance of agents T at the target site (Equation 1) can then be approximated by the summation over all rings of the product of A j and F j (details in Appendix S1): Obviously, the approximation becomes worse when the rings become wider and their number decreases. Moreover, when the rings do not cover the complete range where f(·) and a(·) are both positive, the total amount T will be underestimated. Here, we elaborate the model using one type of source habitat, but extensions with multiple types of source habitats are possible.

| The exponential power kernel
The shape of the kernel f(r) is important because kernels with fat tails provide higher agent abundances at locations farther from source habitats than thin-tailed kernels. It is therefore convenient to have a family of kernels with an extra shape parameter. Such a family is given by the exponential power kernel, see, e.g. Clark et al. (1999) and Snäll et al. (2007): in which Γ(·) is the gamma function, α is a scale parameter and c is a shape parameter. This flexible kernel includes some familiar ones as special cases. For c = 2 a rotationally symmetric bivariate normal distribution with variance α 2 /2 in both directions is obtained. For c = 1 the kernel is exponentially shaped and has fatter tails than for c = 2. For c = 0.5 a square root exponential kernel is defined which has fat tails (Kot, Lewis, & van den Driessche, 1996). Online Appendix S2 provides the mean value F j in Equation 2 for the exponential power kernel. An alternative parameterization of this kernel employs, instead of α, the mean displacement μ of the associated marginal one dimensional probability density (online Appendix S2). This parameterization is used in the simulation study.
Online Appendix S3 shows, for some special cases of the exponential kernel and using artificial landscapes, that the approximation of the total T in Equation 3 is quite satisfactory when 20 or more rings are used.

| Linking the abundance of mobile agents to the ecosystem service
A dose-response model is required to describe the link between the abundance T of the providers and the measured ecosystem service at the target site. This dose-response model will usually take the form of a generalized linear regression model (McCullagh & Nelder, 1989;Zuur, Ieno, Walker, Saveliev, & Smith, 2009). Suppose the ecosystem service measurement follows some statistical probability distribution with mean λ, and there is interest in the relation between λ and the abundance T. In general the dose-response model has the form is a link function. For models of proportion (e.g. biological control effect or efficacy of pollination compared to "full pollination") a logit link would be appropriate, while a log link or identity link may apply for other services (e.g. pollinator visits, pollen load or fruit weight per plant).
For a set of ecosystem service measurements, the parameters α, c, β 0 and β 1 in Equation 5 can be estimated by means of maximum likelihood. This is an iterative fitting process because estimates for the beta's depend on the kernel parameters and vice versa. For fixed values of the kernel parameters α and c the predictor T(α, c) can be calculated for every target, and the regression parameters β 0 and β 1 can be estimated with standard software for generalized linear models. In case the values of α and/or c are unknown, which will usually be the case, a general maximization routine, or a grid search for α and c, can be used to optimize the likelihood.
Goedhart, Lof, Bianchi, Baveco, and van der Werf (2018) provides r code to fit the logistic regression model for the case study, for a fixed value of c.

| Prediction of the spatial distribution of the ecosystem service for new landscapes
The spatial distribution of the ecosystem service, more precisely the mean λ in Equation 5 Two-dimensional convolution based on the Fast Fourier Transform (FFT) of A k and f(·) offers a computationally efficient alternative (Bracewell, 2000). Goedhart et al. (2018) provides r code to map the parasitism rate for the case study, using the logistic and exponential link function.

| Testing model performance in a simulation study
The performance of the logistic regression model for parasitism rates using the exponential power kernel was evaluated by means of a simulation study in which we determined bias and precision of the estimated parameters. The simulation study employs 30 landscapes and the amount of source habitat at each landscape is defined by a Beta distribution. The exponential power kernel was used with shape parameter c = 0.5, 1 and 2, and mean displacement parameter μ = 1.
True parasitism rates ranged between 13% and 95%. Full details of the setup of the simulation study are given in Appendix S4. Bianchi et al. (2008) investigated how landscape composition affects parasitism rates of larvae of the diamond black moth, Plutella xylostella, in 22 organic Brussels sprout fields across the Netherlands. When the caterpillars are parasitized, one or more parasitoids will emerge from the pupa, and the host is killed. Hence, parasitization is an example of the ecosystem service of biological control. Observed parasitism rates ranged between 4% and 94% and were related to landscape variables measured in circles with radius 0.15, 0.5, 1 and 5 km. Univariate analysis, using a generalized linear mixed model with the binomial distribution and a logistic link, indicated that parasitism rates were positively related to area of forest, forest edge and road verges at different radii.

| Analysis of an empirical case study
The dataset is re-analysed here, assuming that the number of parasitized P. xylostella per field follows a quasi-binomial distribution with parasitism probability π, and the total number of recovered larvae per field was taken as binomial totals. The quasi-binomial distribution was used because the data are heavily overdispersed with respect to the binomial distribution, and therefore we used maximum quasi-likelihood rather than maximum likelihood (McCullagh & Nelder, 1989). In addition to the logit link, an exponential link, given by π = 1 − exp(−β 1 T(α, c)), which is also known as the one-hit model (Turner, 1975), was used and the results were compared. Note that the exponential model lacks (5) g(λ) = β 0 +β 1 ⋅ T(α, c)

| Simulation study
Simultaneous estimation of the shape parameter c and the mean displacement μ, or equivalently the scale parameter α, of the exponential power kernel was difficult (Appendix S4). This was particularly evidenced by wide confidence intervals for c. Therefore, in a second simulation study the shape parameter c was fixed. Simulated data for c = 0.5, 1 and 2 were analysed for models with fixed values c = 0.5, 1 and 2, respectively, to investigate the effect of misspecification of c on the estimates of the kernel displacement parameter μ, and the regression parameters β 0 and β 1 . When the simulated data were analysed with the true value of c, the mean estimates were close to their true values with limited bias (Appendix S4). Misspecification of c, however, gave relatively large bias for the true value c = 0.5, and small bias for the true value c = 2.

| Empirical case study
Re-analysis of the dataset of parasitism rates of larvae of P. xylostella  Table 1).
The estimates for β 0 and β 1 were relatively independent of the value of c, and so was the mean displacement parameter μ for area of forest.
However, when using area of forest edge as predictor, the estimate of μ for c = 0.5 was twice as large as for c = 2. For area of forest the estimate of the mean displacement μ was such that the kernel extended well beyond the 5 km limit. This was evidenced by cumulative proba-

| DISCUSSION
We provided a proof of concept for using sentinel data to estimate kernels that characterize the spatial extent of ecosystem services provided by mobile organisms moving and foraging in a landscape with spatially scattered source habitats. The kernel model represents the result of movement and dispersal of mobile agents from source habitat to a single target where the ecosystem service has been measured.
The kernel approach provides more informative results than the traditional method of SIR. First, the estimated kernel is continuous in space as opposed to the discrete regression coefficients per landscape sector, and selection of the most significant ring or circle is not necessary in the kernel approach since the areas in all rings are weighted by means of the kernel. Second, the kernel approach allows the evaluation of kernels with different shapes, thus not only providing information on the distance over which ecosystem services decline, but also on the shape of the decline with distance. Third, the resulting kernels can easily be projected on maps which can help in management and planning for ecosystem service conservation in existing and future landscapes.
The SIR approach has been used to make inferences on the spatial scale of ecosystem services provided by mobile agents. Landscape variables are typically assessed at scales ranging from about 100 m to 3-5 km radius (e.g. Bianchi et al., 2008;Haenke et al., 2009;Holzschuh et al., 2013;Schmidt & Tscharntke, 2005;Thies et al., 2003), with a notable exception of O'Rourke, Rienzo-Stack, and Power (2011) who considered a scale of 20 km radius. These studies highlight that it is not uncommon to find the strongest correlations at the largest spatial T A B L E 1 Mean deviance and parameter estimates for the logistic kernel model applied to the case study on parasitism rates of Plutella xylostella (Bianchi et al., 2008), for fixed values of the kernel shape parameter c = 0.5, c = 1 and c = 2. Parameters α and μ relate to the mean of the kernel and β 0 and β 1 are regression parameters. The model was fitted for two choices of predictors: area of forest and area of forest edge F I G U R E 2 Predicted spatial distribution, for the central area depicted in Figure 1, of parasitism rates of larvae of Plutella xylostella expressed as percentages for the case study, with area of forest edge as predictor, and the logistic and exponential link with fixed shape parameters c = 0.5 and c = 2 scale, suggesting that functional spatial scales may extend beyond the maximum scale in the analysis (O'Rourke et al., 2011;Schmidt, Thies, Nentwig, & Tscharntke, 2008). Re-analysis of the case study dataset of P. xylostella parasitism using the kernel approach indicated that the spatial extent of biocontrol services associated with area of forest may well exceed 5 km, while in the original analysis, using four landscape sectors up 2 km from the target, the highest significance was found for forest area at a scale of 500 m radius (Bianchi et al., 2008). This 10fold difference in spatial scale highlights the sensitivity of the analysis method used, and sets an agenda for further work on the analyses of the functional spatial scale of ecosystem service providers.
Our approach uses measurements of ecosystem services in landscape replicates, differing in the amount and spatial distribution of source habitats, to estimate the spatial distribution of mobile agentbased ecosystem services around source habitats. In the conceptual framework, an estimate of the total abundance of agents T at the target is calculated from the kernel. However, this is only an intermediate result that is not further analysed because, in landscape scale studies which focus on the quantification of ecosystem services, the abundance of agents at target sites is rarely recorded (e.g. Bianchi et al., 2008). However, when measurements of the mobile agents are available, these can be directly modelled as a function of the distance to sources. Examples of this approach are pollen grain and fungal spore dispersal (Shaw, Harwood, Wilkinson, & Elliott, 2006), pollinating bee visitation rates (Olsson, Bolin, Smith, & Lonsdorf, 2015) and abundance of wild bees (Kennedy et al., 2013).
Recently, Miguet et al. (2017) used a similar approach to the one proposed here, combining kernel modelling with generalized linear regression, to estimate the abundance of mobile agents in landscapes.
They use the nonlinear link function primarily to add additional flexibility to the kernel modelling, whereas in our approach the link function reflects a mechanistic description of the relationship between the abundance of agents and the service provided. While the conceptual underpinnings differ, the mathematical elaboration is similar. Further work is needed to compare these approaches in more depth, and to determine the best framework for the analysis of sentinel data.
Our method assumes that the kernel is spatial invariant and that intervening habitats do not interfere with the movement of the agents.
These assumption are also made in the conventional SIR approach and in the framework of Miguet et al. (2017). The assumption of spatial invariance, i.e. an identical kernel for every target location, is widely used in geostatistics and is the starting point of any spatial analysis. Local effects, such as crop type or management practices at the target site (Kennedy et al., 2013), or heterogeneity of the landscape around the target (Kennedy et al., 2013;Olsson et al., 2015) can be explored by adding these as extra covariates or factors to the regression model. Moreover additional, possibly intervening, habitats, e.g. tree rows blocking movement or attractive habitats arresting the agents (Wratten et al., 2003), can also be added to the regression model. For instance, a logistic model with two source habitats x 1 and x 2 then becomes logit(π) = β 0 + β 1 T 1 (x 1 ; α 1 , c 1 ) + β 2 T 2 (x 2 ; α 2 , c 2 ) in which every source habitat type can have its own kernel parameters α and c. This may result in negative regression coefficients representing disservices.
In our experience, however, a typical dataset does not contain enough landscape replicates (usually around 20) to estimate so many parameters. Hence, the idea of adding further complexity will require a major boost in data collection effort, beyond what is currently standard.
Moreover, adding additional habitats as explanatory variables would still not account for the interaction between habitats as a result of their spatial positioning with respect to the target. While the assumption of a spatially invariant kernel simplifies reality, the assessment of kernels currently seems the most promising way to integrate the "action at a distance" of resource habitats as pioneered in the SIR approach. Our model can be fitted in turn for single habitat types as predictors. Model selection using AIC or BIC can then be used to select the habitat type with the greatest explanatory power (Bolker, 2008).
The methodology presented here has similarities to regression with a functional predictor (Ramsay & Silverman, 2005) which, in our case, would amount to using ∑ j δ j A j as a predictor instead of Equation 3. In functional regression the δ j parameters are estimated using a penalty on the roughness of the δ's resulting in a series of smooth estimates.
Since we are assuming that mobile agents are radiating out of sources, giving less agents further away, it is more elegant to replace the δ j 's by the kernel based decreasing values F j of Equation 2, which are based on the mechanistic dispersal kernel, and to estimate the few parameters of the smooth kernel instead. Heaton and Gelfand (2011) used kernel-averaged predictors, assuming a spatial process for both the predictor and the response. In our case a spatial process for the predictor is not necessary because the amount of source habitat in every ring is known. Moreover, a spatial process for the response is not relevant for our data because the parasitism rate was observed at locations which were far apart, as is typically the case in the SIR approach.
Application of the kernel method resulted in some practical guidelines. First, the precise radii of the landscape sector rings are not critical, although the approximation in Equation 3 becomes worse when the number of rings decreases, and care needs to be taken that the largest ring exceeds the spatial scale of the ecosystem service of interest. The use of twenty rings or more with sufficient radii near the target site seemed to suffice. Rather than using rings, the model can also be fitted using a regular square grid thereby replacing Equation 3 by Equation 6. Second, the simultaneous fitting of the kernel parameters α and c was cumbersome in both the simulation (30 target sites) and the case study (22 target sites). The simulation with truly binomially distributed response showed that confidence intervals for the shape parameter c were very wide, and this will worsen when there is overdispersion relative to the binomial distribution. Fitting exponential power kernels with shape parameter c = 0.5, 1 and 2 to mark-recapture data of the parasitoid Diadegma semiclausum also revealed that models with different c values received similar support from the data (Bianchi, Schellhorn, & van der Werf, 2009), confirming that c is hard to estimate. Therefore, it will often be necessary to choose a fixed value for c in advance. Clark et al. (1999) also assumed a fixed value for c and then estimated α. For our purpose, the exponential power kernels with c = 0.5, 1 and 2 provided enough variation in shape. A list of alternative kernels is provided by Bullock et al. (2017) who found that the exponential power kernel was one of two best performing kernels when fitted to plant dispersal data. Third, in this paper we used the example of parasitism rates with a binomial or quasibinomial distribution. However, our method is flexible, and any linear or generalized linear model can be used in combination with kernel-averaged landscape variables. Fourth, the kernel approach allows addition of extra predictors and/or a second kernel-weighted habitat predictor. The latter was tested for the case study by fitting a logistic regression model with the two habitats area of forest and area of forest edge, again for fixed values of c = 0.5, 1 and 2.
However, the mean deviance of this double kernel model was only slightly smaller than those for the single kernel models, indicating no relevant improvement of fit for the case study.
The ecosystem service kernels can be projected on any map that contains data on source habitat. On the one hand this is convenient as the empirically derived kernels can be employed at areas outside the original study area, assisting in management and planning of areas of interest where no data are available. On the other hand, this also holds the risk that kernels are applied to regions for which they are not representative. For instance, key ecosystem service providing agents may show different responses in regions that differ in climate, management or vegetation. The comparison of ecosystem service kernels derived across regions could provide useful information on the variation in the spatial extent of ecosystem services and highlight to what extent kernels are region-specific. As a general rule, the use of kernels in the same ecophysical region as where the experimental data originate seems reasonable.

ACKNOWLEDGEMENTS
Work reported in this paper was financially supported by the Dutch

DATA ACCESSIBILITY
Data and r code for the analysis of the empirical case study on parasitism rates of larvae of the diamond black moth are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.3v590 (Goedhart et al., 2018).