assignR: An r package for isotope‐based geographic assignment
The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13426
Abstract
- Methods for inferring geographic origin from the stable isotope composition of animal tissues are widely used in movement ecology, but few computational tools and standards for data interpretation are available.
- We introduce the assignR r package, which provides a structured, flexible toolkit for isotope‐based migration data analysis and interpretation using a widely adopted semi‐parametric Bayesian inversion method.
- assignR bundles data resources and functions that support data interpretation, hypothesis‐testing and quality assessment, allowing end‐to‐end data analysis with only a few lines of code. Tools for post hoc analysis offer robust, standardized methods for aggregating information from multiple individuals, assignment of individuals to a sub‐region of the study area and comparison of potential regions of origin using odds ratios. Assessment tools quantify the quality and power of the isotopic assignments and can be used to test prototype study designs.
- The assignR package should increase the accessibility of isotopic geolocation methods. assignR supports flexible data sources and analysis decisions, making it suitable for a wide range of applications, but also promotes standardization that will help foster increased consistency and comparability among studies and a more holistic understanding of animal migration. Lastly, assignR can help make isotope‐based geolocation research more efficient by helping researchers plan projects to be optimally aligned with their research questions.
1 INTRODUCTION
Environmental isotope ratios (e.g. 2H/1H, 13C/12C and 87Sr/86Sr) vary systematically across space due to natural isotope‐discriminating processes (West, Bowen, Dawson, & Tu, 2010). Because animals assimilate these isotopes from their environment with often‐systematic offsets, the isotopic composition of animal body tissues can be used to retrospectively infer the geographic origin or movement history of individuals (Chamberlain et al., 1997; Hobson & Wassenaar, 1997; Wassenaar & Hobson, 1998). For example, hydrogen isotope ratios (reported as δ2H = Rsample/Rstandard − 1; R = [2H]/[1H] and standard is Vienna Standard Mean Ocean Water) vary along continental‐scale gradients due to isotope effects in the water cycle. Animal tissue δ2H values are compared with spatial models representing the distribution of isotopes in the environment, termed isoscapes (Bowen, 2010), to constrain the geographic region within which the tissue was grown. Such isotopic geolocation applications have proven useful in fields ranging from migration biology to anthropology and forensic science (Hobson & Wassenaar, 2018).
Given the proliferation of isotopic geolocation applications, there is a need to improve access to these methods, promote best practices and facilitate comparison among studies. The recently introduced r package IsoriX (Courtiol et al., 2019) offers a set of relevant software tools focused largely on mixed‐model approaches to isoscape generation and analysis. The most critical challenges for many users in the ecology and wildlife fields remain un‐addressed, however, including streamlined access to public datasets, accessible implementations of widely adopted methods and tools that facilitate interpretation of results (i.e. hypothesis testing, inference and assessment of uncertainty). Here we introduce the assignR r package, which includes data and software addressing each of these challenges.
2 THE r PACKAGE assignR
The assignR package is available on CRAN or at https://github.com/SPATIAL‐Lab/assignR. Vignettes published at both locations provide examples. The package consists of four components: data resources, tools for producing probability‐of‐origin maps, tools for interpretation of those maps and quality assessment tools. In a typical application, users can either import their own models (isoscapes) and data (isotope values for tissues from animals of known geographic origin) or extract data from the assignR package; use these to develop probability maps for one or more samples of unknown origin; assign samples to part of the study area or test hypotheses about sample origins; and/or use the quality tools to assess and optimize study elements (Figure 1).

2.1 Data resources
Three types of data products are bundled with assignR. The first is environmental isoscapes, statistically modelled H and O isotope values for amount‐weighted, growing‐season precipitation at 5 arc‐minute resolution, and 1σ uncertainty associated with these values (d2h_world and d18o_world).
The second is a spatial points data frame containing H and O isotope values of known origin bird feather, human hair and butterfly wing samples (knownOrig) complied from the published literature (a list of data sources is maintained in the manual page). Metadata include the tissue growth location, taxonomy, citation and optional age class and quality codes. The function subOrigData extracts subsets from knownOrig and returns a spatial point object for use in other assignR functions. The third data type is spatial objects used in package examples, currently a continental boundary map for North America (naMap).
Most assignR analyses will start with the first two types of information—environmental isoscapes to represent the spatial pattern of variation and known origin tissue data to calibrate the relationship between environmental and tissue values. The knownOrig data are provided to support exploratory work and allow users to conduct analyses where collecting their own reference data is not possible. Users can substitute their own, application‐specific data (e.g. an externally created isoscape, known origin dataset or species distribution map) for any of the above data products (Figure 1).
2.2 Probabilistic analysis
(1)
is their covariance, estimated by randomly drawing values from the environmental isoscape distribution at each known origin sample location and iteratively fitting the rescaling function (100×), then calculating the covariance of the simulated isoscape and rescaling model residuals.
(2)
(3)
2.3 Assignment and interpretation
(4)The jointP and unionP functions aggregate results from multiple samples and output a single posterior probability grid. jointP calculates the probability that all samples within the pdRaster object originated from cell j, whereas unionP returns the probability that any sample came from j. The former is the product of the individual sample probabilities, and the later is the sum, in each case re‐scaled after aggregation so that results sum to unity across the study area.
The qtlRaster function assigns samples to a geographic sub‐region of the study area based on the posterior probabilities and a user‐specified probability or area threshold. For a probability threshold, qtlRaster identifies the smallest area within which the posterior probabilities sum to the threshold value. For an area threshold, qtlRaster finds the specified fraction of the domain that contains the highest aggregate posterior probability. In neither case is the returned area constrained to be contiguous. qtlRaster can be used on objects returned from either pdRaster or jointP/unionP and returns a Boolean raster object with one layer for each input probability surface (Figure 2b).
2.4 Quality assessment
The QA function integrates with other assignR tools to evaluate the quality of assignments. The user provides an environmental isoscape and known origin dataset and can set optional parameters (e.g. number of iterations n, spatial mask). QA randomly splits the known origin data into calibration and validation subsets and runs calRaster using calibration samples and pdRaster using the validation subset. It then calls qtlRaster iteratively to obtain area‐ and probability‐thresholded assignment surfaces at threshold values from 0 to 1, repeating the procedure n times. Results are summarized by calculating (a) the posterior probability at the location of origin for each validation sample, (b) the proportion of validation sample origins contained in the assignment region for each qtlRaster threshold type and value and (c) the proportion of the study area contained within the assignment region for each qtlRaster probability threshold. It returns an object of type QA, a list including metadata and data frames containing the summary statistics described above.
A plot method is implemented for QA objects that presents four quality assessment plots. The first shows the proportion of the study area excluded from the assignment as a function of probability threshold (Figure 3a). This illustrates how evenly the posterior probabilities are distributed across the domain, with more uneven distributions implying potential for more granular, geographically specific assignments.

The second plot shows how the proportion of validation samples assigned correctly varies as a function of the qtlRaster probability threshold (Figure 3b). If the prior, conditional and marginal probabilities in Equation 2 are specified correctly, each grid cell value in the pdRaster output represents the probability that the sample originated at that location. Similarly, the sum of probabilities for any group of cells represents the probability that the sample originated within the group. Averaging across a large number of validation samples, then, the fraction of samples correctly assigned should be equal to the probability threshold value used. Deviations from this 1:1 scaling identify bias in one or more of the terms in Equation 2.
The third plot shows the proportion of validation samples assigned correctly as a function of area threshold (Figure 3c). This is a function of the probability versus area scaling and any bias in the estimation of probabilities, the two values plotted previously, and provides an integrated assessment of the sensitivity of the analysis. A fourth plot (not shown) shows the distribution of odds ratios for the known origins of the validation samples relative to random, summarizing the weight of evidence for the true locations of origin.
3 RESULTS AND DISCUSSION
The assignR package decomposes isotope‐based geolocation into a set of discrete, structured steps. As such, it has the potential to streamline and standardize components of the workflow that are usually developed de novo for each application. For example, many studies have developed precipitation‐tissue calibration functions, but there is variance in how precipitation isoscape and calibration function uncertainties are propagated to the tissue isoscape (e.g. Hobson, Doward, Kardynal, & McNeil, 2018; Vander Zanden et al., 2018). The calRaster function implements a standardized, verifiable approach to error propagation suitable for the most common H‐isotope geolocation workflows.
Analysis and interpretation of posterior probability surfaces also broadly lack standardization (e.g. Arizaga et al., 2016; Vander Zanden et al., 2018). To some degree this is due to different goals and questions for specific projects, but adopting common approaches to operations such as assignment, aggregation and comparison of regions would streamline many analyses and enable clearer comparability among studies. The assignR tools are far from comprehensive, but provide solutions for each of these operations. For example, with one line of code the assignment surface for a hypothetical Loggerhead shrike sample with δ2H = −110‰ can be analysed to calculate an odds ratio for origin in Utah versus New Mexico (20:1, Figure 2a) or the individual assigned to the upper 90% of the total posterior probability density (Figure 2b).
The QA tool is another novel resource for the isotope geolocation field. Results and metrics produced by QA should help researchers assess assignment strength during data analysis, as described above, but also support methodological comparisons. We illustrate this with 200 synthetic tissue samples generated by adding Gaussian noise (1σ = 10‰) to precipitation isoscape mean predictions from 200 random locations across North America. We ran QA twice, the first time setting precipitation isoscape uncertainty to zero for all grid cells and the second using the native d2h_world uncertainty. Because the synthetic samples were generated directly from the mean precipitation isoscape values, the first analysis correctly represents the contribution of precipitation isoscape uncertainty to the tissue isoscape (none). The second analysis adds extraneous uncertainty that, if uncorrected, would smooth the distribution of posterior probabilities. The results (Figure 3) show that assignR produces nearly identical assignments for both analyses, confirming that the correction for covariance of isoscape and calibration function residuals effectively corrected for the inflated uncertainty introduced in the second analysis.
Such method comparisons may aid project design. Before starting work, an investigator could use published, known origin data to prototype their project, running simulations to identify which isotope markers are likely to best constrain geographic origin within their study system and how to optimize collection of known origin samples. Once known origin data have been collected, QA could be used to compare alternative environmental isoscapes and optimize data analysis.
4 FUTURE WORK
A ubiquitous challenge in isotope geolocation is the development of accurate and appropriate tissue isoscapes, usually dependent on high‐quality data from known origin samples. The data in knownOrig are a sample of potentially useful data, but cover only two isotope systems (2H/1H and 18O/16O) and suffer from methodological and standardization issues (e.g. Soto, Koehler, Wassenaar, & Hobson, 2017). We are currently reconciling disparate standardization approaches and compiling data for other isotope systems to increase the utility of this compilation. A second focus is extending assignR to support analysis with multiple isotopic tracers (e.g. Van Wilgenburg & Hobson, 2010). assignR is an open source project, and the development team welcomes ideas and contributions from the community via the project's GitHub repository.
ACKNOWLEDGEMENTS
We thank the participants in our project workshops and those who contributed data apart from the SPATIAL short course who helped test and improve assignR.
AUTHORS' CONTRIBUTIONS
G.J.B., H.B.V.Z. and M.B.W. conceived the project; C.M. and G.J.B. wrote the assignR code, conducted the analysis and led writing, all authors contributed to project design and writing.
Open Research
DATA AVAILABILITY STATEMENT
Scripts and data used to conduct the analyses and prepare figures are available at https://github.com/SPATIAL‐Lab/assignR (Bowen, Ma, Vander Zanden, & Wunder, 2020) and within the assignR package.




