Volume 15, Issue 5 p. 978-993
RESEARCH ARTICLE
Open Access

Automated detection of an insect-induced keystone vegetation phenotype using airborne LiDAR

Zhengyang Wang

Corresponding Author

Zhengyang Wang

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

College of Life Sciences, Sichuan University, Chengdu, China

Insect Renaissance Group, Boston, Massachusetts, USA

Correspondence

Zhengyang Wang

Email: [email protected]

Naomi E. Pierce

Email: [email protected]

Andrew B. Davies

Email: [email protected]

Search for more papers by this author
Robert Huben

Robert Huben

Insect Renaissance Group, Boston, Massachusetts, USA

Search for more papers by this author
Peter B. Boucher

Peter B. Boucher

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Chase Van Amburg

Chase Van Amburg

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Jimmy Zeng

Jimmy Zeng

Insect Renaissance Group, Boston, Massachusetts, USA

Search for more papers by this author
Nina Chung

Nina Chung

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Jocelyn Wang

Jocelyn Wang

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Jeffrey King

Jeffrey King

Insect Renaissance Group, Boston, Massachusetts, USA

Search for more papers by this author
Richard J. Knecht

Richard J. Knecht

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Ivy Ng'iru

Ivy Ng'iru

Mpala Research Centre, Nanyuki, Laikipia, Kenya

School of Biosciences, Cardiff University, Cardiff, UK

UK Centre for Ecology and Hydrology, Wallingford, UK

Search for more papers by this author
Augustine Baraza

Augustine Baraza

Mpala Research Centre, Nanyuki, Laikipia, Kenya

Search for more papers by this author
Christopher C. M. Baker

Christopher C. M. Baker

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Search for more papers by this author
Dino J. Martins

Dino J. Martins

Mpala Research Centre, Nanyuki, Laikipia, Kenya

Turkana Basin Institute, Stony Brook University, Stony Brook, New York, USA

Search for more papers by this author
Naomi E. Pierce

Corresponding Author

Naomi E. Pierce

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Museum of Comparative Zoology, Harvard University, Cambridge, Massachusetts, USA

Correspondence

Zhengyang Wang

Email: [email protected]

Naomi E. Pierce

Email: [email protected]

Andrew B. Davies

Email: [email protected]

Search for more papers by this author
Andrew B. Davies

Corresponding Author

Andrew B. Davies

Department of Organismic and Evolutionary Biology, Harvard University, Cambridge, Massachusetts, USA

Correspondence

Zhengyang Wang

Email: [email protected]

Naomi E. Pierce

Email: [email protected]

Andrew B. Davies

Email: [email protected]

Search for more papers by this author
First published: 12 March 2024
Handling Editor: Paul Galpern

Abstract

  1. Ecologists, foresters and conservation practitioners need ‘biodiversity scanners’ to effectively inventory biodiversity, audit conservation progress and track changes in ecosystem function. Quantifying biological diversity using remote sensing methods remains challenging, especially for small invertebrates. However, insect aggregations can drastically alter landscapes and vegetation, and these ‘extended phenotypes’ could serve as environmental landmarks of insect presence in remotely sensed data.
  2. To test the feasibility of this approach, we studied symbiotic ants that alter the canopy shape of whistling thorn acacias (Acacia [syn. Vachellia] drepanolobium), a keystone tree species of the black cotton soils of east African savannas. We demonstrate a protocol for using light detection and ranging (LiDAR) data to collect, prepare (including a customizable tree-segmentation algorithm) and apply a convolutional neural network-based classification for the detection of ant-inhabited acacia tree phenotypic variations. Applying this protocol enabled us to effectively detect intra-specific tree phenotypic variation induced by insects.
  3. Surveying ant occupancy across 16 ha and 9680 acacia trees took 1000 work hours, whereas surveyed patterns of ant distribution were replicated by our trained classifier using only an hour-long airborne LiDAR collection time.
  4. We suggest that large-scale surveys of insect occupancy (including insect-vectored disease) can be automated through a combination of airborne LiDAR and machine learning.

1 INTRODUCTION

Ecologists, foresters and conservation practitioners need ‘biodiversity scanners’—timely, cost-effective landscape observation technology to inventory biodiversity, audit conservation progress and track changes in ecosystem function (Bush et al., 2017; Jetz et al., 2019; Ji et al., 2022). One promising candidate is the use of light detection and ranging (LiDAR) scanners to provide high-resolution scans of terrain and vegetation structure across vast, difficult-to-access landscapes (Boucher et al., 2023; Kellner et al., 2019). LiDAR scanners emit pulses of light into the survey environment and use temporal differences in received light reflection to reconstruct surrounding objects as a collection of point clouds in three-dimensional space (Gatziolis & Andersen, 2008). Such LiDAR scans generate landscape-level measurement of terrain and vegetation structure, such as tree canopy height, cover and density (Atkins et al., 2018; Terryn et al., 2023). When combined with ground data from animals (e.g. GPS tracking, camera traps, field observations), new insights on patterns of animal behaviour and distribution can be deduced (Vierling et al., 2008), including movement strategies (Evans et al., 2020), nest site selection (Davies et al., 2019) and predator–prey interactions (e.g. landscape of fear, Davies et al., 2021). From a community ecology perspective, indices of structural complexity have been shown to correlate with occupancy and species richness of bats, birds and invertebrate species (see Goetz et al., 2007 as one of the earliest examples; see review by Davies & Asner, 2014) in similar ways to how remotely sensed spectral diversity can be used as a proxy for plant species diversity (see review by Wang & Gamon, 2019). However, establishing correlations between structural or spectral diversity and biological diversity is not the same as direct species detection. Directly quantifying biological diversity using LiDAR, therefore, remains a challenging task.

Direct detection of invertebrates with remote sensing is especially difficult. Most insects are small and cannot be distinguished from even high-resolution (i.e. <10 cm) remote sensing data, for example using unoccupied aerial vehicles (UAVs) or terrestrial LiDAR (but see Rhodes et al., 2022 for a review of indirect remote sensing methods for insects, and Fisher et al., 2021 or Menz et al., 2022 for direct telemetry-based methods). However, understanding the presence and ecosystem functioning of these ‘little things that run the world’ (Wilson, 1987) is an integral part of understanding landscape and agricultural health (Dangles & Casas, 2019; Losey & Vaughan, 2006). Large (social or eusocial) insect aggregations can drastically alter landscapes and vegetation (Bentz & Endreson, 2004; Propastin, 2013; Schowalter, 2017), and these ‘extended phenotypes’ (Dawkins, 1983) could serve as environmental landmarks of insect presence in LiDAR data.

Perhaps the only prominent example of using LiDAR (or other remote sensing techniques) to detect insects' extended phenotype to-date is the identification of termite mounds in African savannas (e.g. Davies et al., 2014; Hockridge et al., 2022; Levick et al., 2010), with the mound structure being consistent enough for machine learning algorithms to detect them in LiDAR-derived digital terrain models (Brodrick et al., 2019; Davies et al., 2020). Just like termites leaving their mark on the land surface, many insect species (and often insect-vectored parasites) induce characteristic phenotypic changes in their vegetative host or habitat, which could possibly be observed with LiDAR: common examples include large hives constructed by bees (Cook et al., 2018), silk webs of spiders, ants and caterpillars that span across canopies (Avilés & Tufiño, 1998; Devarajan, 2016; Fitzgerald & Willer, 1983) and Taphrina fungi-induced ‘witches' broom’ (Christita et al., 2022). Other examples are wasps inducing large galls on tree branches (e.g. oak apples, whose tannin-rich extracts are used to make ink, Stone & Schönrogge, 2003), and Myrmelachista schumanni ants using formic acid to poison all plants except their host trees, creating monoculture ‘Devil's gardens’ amid the Amazonian rainforest (Frederickson et al., 2005). Moreover, insect or insect-vectored pestilence (e.g. Dendrolimus caterpillars, fall webworms, pine wood nematodes) can completely defoliate plant canopies (Day & Leather, 1997; Ji et al., 2011; Speight & Wylie, 2001), which will likely be visible in LiDAR point clouds. If such vegetation alterations are (1) morphologically indicative of unique insect symbiont species and (2) detectable in LiDAR scans, machine learning methods could be utilized to identify them (Brodrick et al., 2019).

To test the feasibility of such an approach, we chose the symbiotic ants of the whistling thorn acacia (Vachellia [syn. Acacia] drepanolobium, hereby acacia) as our study system. Acacia is a keystone tree species of the black cotton soils of east African savannas, forming monoculture stands that fix nitrogen and carbon and provide food for large browsers such as elephants (Loxodonta africana) and giraffes (Giraffa spp.) (Boyle et al., 2019; Palmer & Young, 2017). One such monoculture stand at Mpala Research Centre (MRC), Laikipia, Kenya, has been intensely studied for decades (Figure 1a,b). In this landscape, each acacia tree typically recruits one of four species of symbiotic ants: Crematogaster sjostedti, C. mimosae, C. nigriceps or Tetraponera penzigi. Three of the four ant species (C. mimosae, C. nigriceps and T. penzigi) are obligate mutualists, nesting only in acacias and defending their host trees against enemies such as pathogens and herbivores in return for housing and nectar (Palmer & Brody, 2007; Stanton et al., 1999); the fourth species, C. sjostedti, is free-living, typically nesting in the soil or rotten logs but regularly patrolling nearby acacias (Palmer & Brody, 2013). Each species of ant engages in territorial warfare against the three other species, with a competitive hierarchy among the four species (Palmer et al., 2000). An equilibrium is thought to be maintained because of a trade-off whereby less competitive species have adaptive advantages in dispersal, colonization, defensive behaviour and energy requirements (Palmer, 2003; Palmer et al., 2013). Since a single acacia tree lives much longer than a colony of any of the ant species, trees experience considerable turnover in their ant occupants over their lifetimes. Simulations have shown that a tree that has been occupied by all four ant species has a higher fitness than one occupied by a fewer number of species (Palmer et al., 2010). Changes in community composition could be indicative of drought-induced vegetation die-off, fire-induced succession events or invasive species incursion (e.g. invasive Pheidole megacephala ants out-competes symbiotic ants, leading to increased herbivore and pathogen damage and loss in above-ground carbon sequestration in the region; see Milligan et al., 2021, 2022). Therefore, monitoring the ant occupancy of each tree can be used as a means of assessing ecosystem stability (Câmara et al., 2019; Houadria & Menzel, 2017; Vandermeer et al., 2008). However, censusing the ant occupancy of each of the acacia trees in MRC is time-consuming, as each tree needs to be observed in the field to determine its ant occupant.

Details are in the caption following the image
Ant-induced canopy shape variations in acacia (Acacia drepanolobium) monoculture in Laikipia, Kenya. (a) The study plot (25 transects × 41 rows, 41 ha) on black cotton soil, above which UAV LiDAR data were collected. Blue-shaded quadrats represent ground-survey areas where the ant occupancy of each tree was recorded 16 ha (410 quadrats). Non-shaded quadrats are areas where training data for the CNN classifier were collected. (b) Aerial view of the acacia monoculture. Each tree contains one species of ant occupant (photo credit: CCMB). (c) Acacia trees occupied by Crematogaster nigriceps (our detection target) show stumped and clustered canopies. LiDAR point clouds on the right are exemplars of training input labels for C. nigriceps-occupied trees (photo credit: NC). (d) Acacia trees occupied by Crematogaster mimosae, the most abundant of the four ant species in the plot (photo credit: DJM). LiDAR point clouds on the right are exemplars of training input labelled as C. mimosae-occupied trees. In both (c) and (d), each training image is normalized for tree height, with points colourized to indicate proximity to the viewer.

Field observations have suggested that certain acacia canopy shapes are indicative of their associated ant species. In particular, workers of C. nigriceps actively trim their host trees to reduce canopy contact with neighbouring trees occupied by other (more aggressive) ant species, resulting in a stumped and clustered canopy architecture (Stanton et al., 1999). The trimming behaviour reduces acacia flower production to the extent of sterilizing the host trees (Young et al., 1996)—blurring the line between a mutualist and a parasite. We hypothesized that the distinctive and relatively consistent phenotypic difference between trees occupied by C. nigriceps and those occupied by the other three ant species could be captured in LiDAR point clouds (Figure 1c,d), which would allow us to algorithmically identify C. nigriceps-occupied acacias using machine learning models.

Convolutional neural networks (CNN), a style of deep learning model for image and video machine learning, has increasingly been used to identify biophysical features of ecological interest, such as termite mounds (Davies et al., 2020), coral reefs (Arsad et al., 2023) and oil palm plantations (Li et al., 2019), from large remotely sensed datasets. Moreover, CNN-based classifiers have been used to identify different species of trees from LiDAR point clouds of forests in the Mediterranean (Allen et al., 2022), temperate Europe and the United States (Seidel et al., 2021) and subtropical Asia (Zou et al., 2017). The overall classification accuracy of these studies ranges from 80% to 96%, but they have all used segmented, pre-labelled point clouds as training data. To our knowledge, (1) there has not been a demonstrably feasible protocol covering the complete process from the collection of LiDAR data, to the labelling of training data, to the construction of a CNN classifier and (2) the effectiveness of bespoke in silico high-accuracy tree classification models has not been verified in a field setting. Furthermore, (3) while existing CNN classifiers focus on tree species delineation, an arguably more challenging application would be to classify intra-species phenotypic variation, such as within trees of A. drepanolobium occupied by different ant symbionts.

Here, we demonstrate a protocol for using LiDAR data to collect, prepare (including a customizable tree-segmentation algorithm), train and apply a CNN-based classifier for the detection of C. nigriceps-occupied, within-species acacia phenotypic variation. Crucially, to validate the ecological relevance of our methods, we surveyed 16 ha of acacia savanna for patterns of C. nigriceps occupancy and compared our ground-truth data with that generated from LiDAR-based automated detection. After describing our pipeline and evaluating its effectiveness for recapitulating ground-based ecosystem inferences, we suggest how large-scale surveys of insect occupancy (or insect-vectored disease) can be automated through a combination of airborne LiDAR and machine learning.

2 MATERIALS AND METHODS

2.1 Overview of methodology

Field work was conducted under Government of Kenya NACOSTI permits (NACOSTI/P/22/18988, NACOSTI/P/21/12633). All UAV flights were conducted with permission from the Kenya Civil Aviation Authority (KCAA/OPS/2117/4).

We studied the distribution of C. nigriceps with field surveys across 16 ha of savanna acacia monoculture (Figure 1a, areas in blue shade) as described in Section 2.2. We then examined whether the distribution patterns of C. nigriceps obtained from these ground surveys could be recapitulated by our airborne LiDAR analysis pipeline, which included LiDAR data collection and processing (Section 2.3, Figure 2a,b), tree segmentation (Sections 2.4 and 2.5, Figure 2c) and CNN classifier training (Sections 2.6 and 2.7, Figure 2d). Finally, analysis comparing community dynamics between ground surveys and classifier predictions are described in Section 2.8.

Details are in the caption following the image
Pipeline for automated detection of insect-induced vegetation phenotype from airborne LiDAR data. (a) A top-down view of a part of the canopy height model generated from LiDAR data. Colours indicate tree height, with yellow representing taller trees. An exemplar ‘tree patch’ of closely located trees with overlapping canopies (yellow shade) is selected for tree segmentation. (b) Point cloud of the selected tree patch. (c) After implementing the customized tree segmentation, point clouds representing each individual tree in the patch are identified and shown in distinct colours. (d) Each individual tree point cloud was input into the CNN classifier. Dark colours indicate trees predicted as occupied by Crematogaster nigriceps.

Ground-survey results (Supporting Information S1), with code and LiDAR point clouds for CNN training (Supporting Information S2), are deposited at Zenodo (Wang, 2023, https://doi.org/10.5281/zenodo.8175736). Our script for tree segmentation, with user instructions, is deposited on Github (Huben, 2024, https://doi.org/10.5281/zenodo.10570318).

2.2 Ground survey of ant occupancy

Our study area, the western portion of the Forest Global Earth Observatory (ForestGEO) plot at Mpala Research Centre, on the Laikipia Plateau in central Kenya (0°17′22″ N; 36°51′56″ E; mean annual temperature = 17.7°C; mean annual precipitation = 842 mm), is a A. drepanolobium savanna monoculture, located at 1800 m above sea level on black cotton soil west of an escarpment (the eastern side of the escarpment, at 1600 m, is red soil with more diverse vegetation composition). The site is divided into 25 transects and 41 rows, forming a total of 1025 quadrats, each 20 m × 20 m (Figure 1a, quadrats delineated by lines). Acacia density increases from the escarpment edge (row 41) to the western quadrats (row 1), with as many as 90 trees per quadrat on the western edge. Each acacia is typically occupied by only one of the four ant symbionts at any one time (see Section 1).

From June to July 2022, we surveyed the ant occupancy of each acacia in transects 10–17, as well as in transects 1 and 25 (a total of 410 quadrats, Figure 1a, areas in blue shade; hereby referred to as ‘ground survey area’ or ‘survey area’). It was logistically infeasible to survey all 1025 quadrats. Our survey scheme sampled a gradient of acacia density from east to west (rows 1–41), and covered both the northernmost and southernmost transects. We disturbed each acacia by gently tapping its branches and recorded the species of the alarmed ants defending their host tree. Overlap between tree canopies prevented us from geotagging every surveyed tree (Figure S1A,B).

2.3 UAV LiDAR data collection and processing

We collected high-resolution LiDAR data over the study site using the Harvard Animal Landscape Observatory (HALO) in January 2022. HALO is equipped with a Riegl VUX-1LR LiDAR sensor (Phoenix LiDAR Systems, Austin, Texas, USA), which for this study was flown using a Freefly Alta-X rotary-wing unoccupied aerial vehicle (UAV; Freefly Systems, Woodinville, Washington, USA). Multiple flights of ~15–20 min each were performed to cover the site. The UAV was flown 50 m above-ground level at 8 m/s in a serpentine flight pattern with a line speed of 114 lines/s and an 820 kHz pulse rate. Flight trajectories were corrected during post-processing using GPS data from a nearby mobile base station. LiDAR data were then denoised, classified (following Axelsson, 2000) and aligned using the Terrasolid software suite (Terrasolid Ltd, Espoo, Finland). The average point density was ~300 points/m2 after denoising. We created a digital terrain model (DTM) of the site using a triangulated model of ground points. The height above-ground was then computed for each point based on its vertical distance to the DTM, and a canopy height model (CHM) was constructed using the maximum height of each point at a 0.25 m resolution.

2.4 CHM-based segmentation

To provide an initial segmentation of trees, we used the watershed transform propagation algorithm of Meyer and Beucher (1990), implemented as the SegmentCrowns function in the R package ForestTools (Plowright & Roussel, 2021), with a linear variable radius window function of 0.3 times tree height (Popescu & Wynne, 2004) and no restriction on minimum canopy height in tree location detection. The segmented crowns were then converted to polygons to delineate an individual .las file for point clouds of each tree.

2.5 Tree segmentation

The CHM-based segmentation in the previous step isolated point clouds representing acacia trees, but introduced two sources of error in delineating the boundaries of individual trees: a single tree with branching canopies could be split into multiple ‘trees’; multiple closely located trees, especially those with overlapping canopies, could be interpreted as a single ‘tree’ (Figure S2A,B). Since both error types resulted from incorrect assignment of canopy boundaries, we merged all neighbouring ‘trees’ in our CHM-based results into ‘tree patches’ (merge command in PDAL; PDAL Contributors, 2020) and devised a new algorithm for tree segmentation within each tree patch.

Instead of segmenting point clouds into trees using canopy height models (e.g. Dalponte & Coomes, 2016; Meyer & Beucher, 1990) or from a top-down clustering of point clouds (e.g. Kaartinen et al., 2012; Li et al., 2012), we devised an algorithm, conceptually similar to Burt et al. (2019), that first identifies the available tree stems. Our algorithm started with the highest point in the patch and identified a path (i.e. stem) within a 20 cm diameter of points that extended to the ground (-dr argument for DESCENDENTS_RADIUS). This descending path was extended by 30 cm during each iteration (-dh argument for DESCENDENTS_HEIGHT). If a path (i.e. a stem) was found, point clouds within 70 cm of the stem were excluded (-fsdr argument for FOUND_STEM_DISABLE_RADIUS). Once all the stems in a patch had been found, the non-stem point clouds were iteratively ‘grown back’ in a 20 cm radius, 40 cm high cylinder from points that belonged to the tree (-gr and -gh arguments for GROW_RADIUS and GROW_HEIGHT). This process was iterated until no more points belonging to the tree could be found. All other unclassified points were then assigned to a tree by proximity. Our tree-segmentation code is deposited on Github (Huben, 2024, https://doi.org/10.5281/zenodo.10570318).

2.6 Training data collection

In July 2022, we surveyed the ant occupancy of trees outside of the transects where we collected ground-survey data previously (Figure 1a, areas without blue shade), and recorded their coordinates at sub-1 m accuracy using a Trimble GeoXT Differential GPS (dGPS). In areas where trees were sparsely distributed, this level of accuracy allowed for matching geotagged tree occupancy information with the LiDAR data (Figure S1C). During the survey, we loaded CHM polygon shapefiles (see Section 2.4) onto the dGPS screen to ensure that real-time geotags landed on (or were in close proximity to) their corresponding tree (Figure S1D,E). These LiDAR point clouds, represented as polygons, are matched with occupancy information and were used as labelled training data.

2.7 Ant occupancy classification

Classification of 3D point clouds is often achieved through projection-based classification of 2D images (reviewed in Goyal et al., 2021). We therefore transformed 3D point clouds of segmented trees with associated ant occupancy labels (see Sections 2.5 and 2.6) into 2D projections following a protocol similar to that employed by Goyal et al. (2021) and Allen et al., 2022. For each point cloud representation of an acacia tree, we selected three horizontal projections at 0, 60 and 120 degree angles. Each image was presented in 256 × 256 pixels that precisely cropped the highest point of the tree canopy (transforms.Resize function in PyTorch; Paszke et al., 2019). In other words, tree height was not represented in the classifier input. The pixel value of the image scaled linearly with a point's proximity to the viewer along the y-axis (Figure 1c,d; see Table 1 for other representation schemes tested). We used a 70%–15%–15% training-validation-testing split (while keeping images projected from the same point cloud in the same split). Since point clouds with labels of C. nigriceps occupancy were rare in our dataset, we augmented C. nigriceps data in our training and validation pool by generating projections at −10, 10, 50, 70, 110 and 130 degrees to account for training imbalance (Beery et al., 2018; Huang et al., 2016).

TABLE 1. Comparison of 3D point cloud representation schemes used as convolutional neural network (CNN) model inputs.
Height representation Proximity representation Rare class augmentation Overall accuracy Rare class accuracy Common class accuracy
Present study
Selected model 0.74 0.31 0.91
Colour gradient Transparency 0.8 0.4 0.9
Colour gradient Transparency Training split 0.79 0.48 0.87
Colour gradient Training split 0.82 0.42 0.91
Image at bottom Colour gradient Training split 0.77 0.15 0.9
Image at centre Colour gradient Training split 0.75 0.47 0.82
Colour gradient Training and validation split 0.82 0.67 0.87
Other studies
Allen et al. (2022) Colour gradient 0.81 0.62 0.84
Seidel et al. (2021) Training and validation split 0.86 0.63 0.94
  • Note: When representing 3D point clouds as multiple horizontal view 2D images, considerations include whether to represent object height, proximity of each point to the viewer and to augment input images for rare classes. Rare and common classes in this study refer to classification results of C. nigriceps and non-C. nigriceps-occupied trees, while they refer to the classification results of the rarest and most common tree species in other studies. Classification accuracy refers to the percentage of predictions that are correct.

We started classification training with a pretrained ResNet18 CNN network (models.resnet18() option in torchvision, Marcel & Rodriguez, 2010). The pretrained network contained 11.6 million parameters entrained on 1000 categories of images from ImageNet 1K (Deng et al., 2009), with accuracy ranging from 69% to 89% per category (He et al., 2015). We reset the final layer (the last of the fully connected classification layers) of the ResNet18 network down to two output features: C. nigriceps and non-C. nigriceps (nn.Linear function in PyTorch). Within the non-C. nigriceps class, C. mimosae, C. sjostedti and T. penzigi were not further distinguished. For each image input into ResNet18, after feature extraction and classification, we computed the cross-entropy loss (negative log-likelihood of model output probabilities) as our minimization target (Good, 1952, CrossEntropyLoss function in PyTorch). The gradient (i.e. change in network weights) computed based on the loss function was back-propagated across the network (backward function in PyTorch). We implemented a stochastic gradient descent algorithm as our optimizer to fine-tune the weights of the network (Ruder, 2017, optim.SGD function in PyTorch). For hyperparameters, we decayed our learning rate (set at 0.001) every seven epochs by a factor of 0.1 (StepLR (optimizer_ft, step_size = 7, gamma = 0.1) in PyTorch) to help converge on a local minimum (You et al., 2019). We obtained the iteration of model weights that provided the highest accuracy in the validation split. To generate a prediction, the trained network input an image (from the 15% test split), and its output was decided by choosing the category in the final layer of the network (C. nigriceps or non-C. nigriceps) with the highest ‘probability’ (i.e. the element with the highest value in the first row of the output tensor; torch.max (outputs, 1) in PyTorch). Final training accuracy was reported based on the result of the 15% testing split.

The entrained classifier was then applied to predict every segmented tree point cloud that occurred in areas where we had ascertained ant occupancy during the ground surveys. We removed point clouds less than 1.2 m in maximum height (such short saplings were likewise not recorded during ground surveys) and with less than 100 points. For each tree point cloud representation, we generated eight 2D projections at equidistant angles, and analysed the classification output of each projection. We designated a tree point cloud as C. nigriceps if it had no less than 50% of its 2D projections classified as such (i.e. majority ensembled voting; Dietterich, 2000).

2.8 Ecological validation

To evaluate the consistency between predicted patterns of C. nigriceps occupancy (as derived in Section 2.7) and that of the ground survey (as derived in Section 2.2), we first conducted dimensionality reduction analyses of quadrat-level species composition. Specifically, we built a site × species matrix (410 quadrats with the count of C. nigriceps and non-C. nigriceps of each quadrat) for ground-survey and classifier-predicted data. We then performed a principal component analysis (PCA; princomp function in R; R Core Team, 2021) and principal coordinate analysis (PCoA; pcoa function in R package ape; Paradis & Schliep, 2018) on the Bray–Curtis dissimilarity matrices of ground-survey and classifier-predicted site × species matrices (Bray & Curtis, 1957). To investigate whether changes in quadrat-level species composition are reflected visually in such dimensionality reduction analyses, we simulated a scenario in which each Crematogaster nigriceps-occupied tree in the 200 quadrats on the western side of the plot (which is more densely vegetated) has an 80% chance of turnover to other species that do not produce distinct canopy shapes. We generated PCoA visualizations before and after the simulation using both ground-survey and classifier-predicted data.

Next, we tallied the total number of trees, the number of C. nigriceps-occupied trees and the proportion of trees occupied by C. nigriceps in each quadrat and calculated the correlation of each of these counts between the ground survey and the classifier-predicted data. We calculated both the Pearson correlation coefficients between the numerical count per quadrat (which does not account for the spatial positioning of the quadrats, cor.test function in R; R Core Team, 2021) and a Mantel correlation between the Bray–Curtis dissimilarity matrices among quadrats (Mantel, 1967; mantel.rtest function in R package ade4, Dray & Dufour, 2007). To account for spatial autocorrelation, we generated spatial weighting matrices (SWMs) based on the grid design of our plot (implemented using the listw.candidates function in the R package adespatial, Dray et al., 2023), and from these SWMs further selected a subset of Moran's Eigenvector Maps (MEMs, Dray et al., 2006, implemented using the listw.select function in the R package adespatial), following the optimization protocol of Bauman et al. (2018). We conducted variance partitioning to identify the proportion of marginal variance in the ground-survey distribution patterns uniquely explained by classifier-generated matrices versus MEMs (implemented using the varpart function in the R package vegan, Oksanen et al., 2022, set up as described in Peres-Neto et al., 2006). Finally, we simulated sets of null hypotheses, in which the counts of C. nigriceps-occupied trees per quadrat were randomly assigned as a proportion of total trees per quadrat (we followed the proportion of C. nigriceps recorded in the ground survey). We then calculated the correlation between the null hypothesis simulation and the ground surveys. We repeated the variance partitioning analysis using simulated distribution patterns as response variables.

3 RESULTS

3.1 Ground survey of ant occupancy

We surveyed 410 quadrats for ant occupancy, eight of which had labelling mismatches and were dropped from subsequent analysis (Figure 3; blank squares in ‘total tree count’ represent quadrats dropped from analysis). Among a total of 9680 acacias assessed, 1044 (10.8%) were occupied by C. nigriceps (the target of our classification). There were 4847 (50.1%) trees occupied by C. mimosae, 2734 (28.2%) by C. sjostedti and 1054 (10.9%) by T. penzigi. Quadrats on the western side of the plot (lower row numbers) had higher tree densities, while there was no change in tree density along the north–south (transect) gradient (Figure 4; Figure S3A). Surveying 410 quadrats took our eight-person team approximately 1000 work hours (125 work hours per person).

Details are in the caption following the image
Distributional correlation between ground survey and classifier predictions across 16 ha (410 quadrats, only surveyed transects shown). (a) Significant correlation between the per quadrat number of trees counted in the ground survey and the number of trees counted with the tree segmentation (Pearson coefficients = 0.86, Mantel correlation = 0.73, p < 0.001 for both). (b) Significant correlation between the per quadrat number of C. nigriceps-occupied trees recorded in the ground survey and that predicted by the CNN classifier (Pearson coefficients = 0.41, Mantel correlation = 0.31, p < 0.001 for both).
Details are in the caption following the image
Consistency of principal coordinate analysis (PCoA) between ground survey and classifier predictions across 410 quadrats. (a) Each dot represents the ant community composition of a single quadrat, shown on the first and second principal coordinate axes. Colours indicate the total number of trees in each quadrat. (b) Same as (a), but coloured by each quadrat's row number. See Figure S3 for the same figure coloured by C. nigriceps proportion and transect numbers. See Figure S4 for similar results from principal component analysis (PCA). (c–e) Trends of tree count, C. nigriceps count and C. nigriceps proportion from the western to the eastern side of the plot (low to high row numbers) are consistent between ground survey and classifier predictions (same signs for the estimates), although the significance of linear coefficients differ. Each dot represents a quadrat, and the red line indicates a linear regression fit.

3.2 UAV LiDAR data collection and processing

HALO flights obtained LiDAR data across 84.46 ha, covering the entire western extent of (and beyond) the ForestGEO plot. At a realized density of 276.85 pulses per m2, we obtained an average of 401.30 points per m2 at an average of 0.04 m spacing among points.

3.3 Tree segmentation

The CHM-based segmentation resulted in 43,709 ‘trees’ (coverage extended beyond the ForestGEO plot), from which 18,674 were within or bordering our ForestGEO quadrats and were selected for further processing (Figure 1a, delineated by white lines). We obtained 9885 ‘tree patches’ after merging neighbouring ‘trees’, 2750 of which were in the area that had been ground surveyed (Figure 1a, areas in blue shade). Our largest ‘tree patch’ contained 990,273 points spanning six quadrats (merged from 107 neighbouring ‘trees’), which outputted 258 trees after customized segmentation. On average, after tree segmentation, each patch was split into 3.09 trees (SD = 7.95). After maximum height and point density filtering, we recovered a total of 8364 trees (compared with 9680 trees in the ground survey).

3.4 Training data

Outside the survey area, we collected a total of 812 dGPS geotagged tree locations associated with their ant occupant (Figure 1a, areas without blue shade). After LiDAR data processing and tree segmentation, 674 (83%) of these points could be unequivocally associated with a point cloud representation of a tree. In total, we obtained 127 point clouds representing C. nigriceps (target species) and 547 point clouds representing non-C. nigriceps (447 C. mimosae, 82 C. sjostedti and 18 T. penzigi).

3.5 Ant occupancy classification

Each C. nigriceps-labelled point cloud generated nine horizontal projections, and each non-C. nigriceps-labelled point cloud generated three horizontal projections. After a random 70%–15%–15% training-validation-testing split, our input training split consisted of 756 images labelled C. nigriceps and 1107 images labelled non-C. nigriceps; the validation split consisted of 162 images labelled C. nigriceps and 234 images labelled non-C. nigriceps; the testing split consisted of 57 images labelled C. nigriceps and 225 images labelled non-C. nigriceps (see Supporting Information S2).

The input representation scheme with the highest overall accuracy was one similar to that described in Goyal et al. (2021) and Allen et al. (2022) (Table 1). On the 15% testing split, this model achieved 82% overall accuracy and 66.7% accuracy at detecting images of C. nigriceps trees. We applied the model to post-segmentation point clouds representing trees in the surveyed area (Section 3.3, a total of 8364 trees input as 66,912 horizontal view images): 665 (8.0%) segmented point clouds were identified as C. nigriceps. In comparison, 10.8% of a total of 9680 trees in the ground survey were recorded as C. nigriceps-occupied.

3.6 Ecological validation

Dimensionality reduction analyses (both PCA and PCoA) of species composition using ground-survey and classifier-predicted data showed near-identical patterns: quadrat similarity was driven by total tree count (Figure 4a) and the proportion of C. nigriceps in each quadrat (Figure S3B). Quadrats with lower row counts on the western side of the plot were more similar to each other, whereas quadrats with higher row numbers were compositionally more distinct (Figure 4b). There was no north–south distribution pattern (Figure S3A; see Figure S4 for PCA results). In the simulated ant species turnover scenario, a decrease in plot diversity (due to the replacement of rare C. nigriceps) is reflected as a thinning of PCoA spread on the second principal coordinate axis in the dimensionality reduction analysis using ground-survey data (Figure S5A). This change was mirrored when we conducted the same simulation on classifier-generated data (Figure S5B).

The total number of trees across the 410 quadrats generated as a result of the tree segmentation showed a significant correlation with counts obtained from the ground survey (Pearson coefficients = 0.86, Mantel correlation = 0.73, p < 0.001 for both comparisons). The number of C. nigriceps-occupied trees predicted by the classification showed a lower, but significant, correlation with counts from the ground survey (Pearson coefficients = 0.41, Mantel correlation = 0.31, p < 0.001 for both comparisons), as did the proportion of C. nigriceps trees per quadrat (Pearson coefficients = 0.34, Mantel correlation = 0.30, p < 0.001 for both comparisons) (Figure 3). In contrast, the randomly simulated dataset of C. nigriceps as 8% (the proportion of C. nigriceps predicted by the classifier) and 10% (the proportion of C. nigriceps recorded in the ground survey) of the total tree composition did not achieve a high level of correlation with ground-survey counts (Table 2). In the variance partitioning analysis, 11% of the variance in ground-survey species composition is uniquely explained by the classifier-generated predictions; 7% of the variance is uniquely explained by spatial autocorrelation; and a total of 60% of the variance is explained jointly by the classifier prediction and spatial autocorrelation (Table 2). The fraction of variance explained by partitioning the simulated distribution data is lower than that from the classifier, at 8%–9%.

TABLE 2. Correlation and variance partitioning of the classifier and simulation-generated predictions from the ground survey.
Segmentation and classifier Simulation: 8% C. Nigriceps Simulation: 10% C. Nigriceps
Pearson Mantel Pearson Mantel Pearson Mantel
Total tree count 0.86*** 0.73***
C. nigriceps count 0.41*** 0.31*** 0.22*** 0.12** 0.15*** 0.03
C. nigriceps proportion 0.34*** 0.30*** −0.04 0.01 0.01 0.04
Environmental Spatial Environmental Spatial Environmental Spatial
Marginal variance fraction 0.11 0.07 0.08 0.07 0.09 0.08
  • Note: Simulation values of 8% and 10% were derived from the percentage of total C. nigriceps proportion predicted by the classifier and the percentage of C. nigriceps recorded in the ground survey. Top panel: Pearson correlation coefficients were calculated from numerical counts per quadrat, while Mantel correlations were calculated from Bray–Curtis dissimilarity matrices among quadrats (p-values: * < 0.05; ** < 0.01; *** < 0.001). Bottom panel: proportion of variance in ground-survey distribution patterns uniquely explained by classifier-generated (or simulated) environmental matrices versus those generated by subsets of Moran's Eigenvector Maps corresponding to plot spatial designs.

4 DISCUSSION

We present a protocol for detecting insect-induced intra-specific vegetation phenotypic variation using airborne LiDAR. We tested the validity of our ecological inferences on 16 ha (410 quadrats) of savanna containing 9680 acacia trees, and showed that the distribution pattern of a relatively rare ant symbiont can be predicted by a custom-trained convolutional neural network (CNN) classifier based on tree morphology. Our claim to the effectiveness of the CNN classifier at detecting ecological dynamics rests principally not on the absolute accuracy of individual tree phenotype classification, but rather on the demonstrated high congruence of beta-diversity patterns between ground survey and classifier-generated predictions, even after accounting for spatial autocorrelation (Table 2). For practical management purposes, such congruence can be visualized as similarity between dimensionality reduction plots, and we show that under simulated environmental disturbance scenarios, visible changes in intra-specific phenotypic variation can be tracked through our automated monitoring pipeline (Figure 4; Figures S3–S5).

We suggest that similar procedures can be adapted to enable the identification of other biologically relevant ‘extended phenotypes’ using LiDAR or other remotely sensed data, increasing the efficiency of monitoring ecological dynamics. We first discuss the effectiveness of our pipeline as applied to the A. drepanolobium-ant symbiosis system (Section 4.1). We then discuss some of the limitations of our procedure and plans for future improvements (Section 4.2). Finally, we provide examples of how our protocol can be applied to other large-scale ecosystem monitoring endeavours (Section 4.3).

4.1 Evaluating a classifier to detect intra-specific phenotypic variation

Consistent with previous results (Palmer et al., 2000) at the same site, Crematogaster nigriceps occupied only 10% of all trees in our survey area. When collecting training data outside our ground-survey plot, we biased our search towards finding C. nigriceps-occupied trees, with the resulting phenotype accounting for 18% or our training data. At this proportion, C. nigriceps detection still constitutes a ‘rare class classification’ problem: without sufficient training data, accuracy for classifying rare classes is low (Beery et al., 2018; Huang et al., 2016). In the context of classifying tree species from LiDAR data, small amounts of training data usually result in the rarest tree species having the lowest classification accuracy. For example, in classifying five tree species in a Mediterranean forest (Allen et al., 2022), Pinus pinaster, the rarest species, comprised 5% of the total tree data and achieved 62% accuracy, compared with an 81% total accuracy for the dataset. Similarly, Seidel et al. (2021) classified seven species of trees from labelled LiDAR data across temperate Europe and the United States: Quercus rubra (red oak) was the rarest class, constituting 14% of the data and achieving 63% accuracy compared with an average dataset accuracy of 86%. Our 66.7% accuracy for C. nigriceps, among an overall accuracy of 82%, is therefore comparable to these previous studies (Table 1), suggesting a generalizable limitation in rare class detection accuracy. However, our study represents the first attempt to distinguish intra-specific phenotypic variation, rather than tree species, using a custom-trained classifier.

The above-mentioned model accuracies were estimated based on the 15% ‘testing split’ of all training data collected. In practical ecological and conservation applications, ecologists and managers are typically less concerned with a model's reported accuracy than its ability to recapitulate ecosystem dynamics. Ground-truthing 16 ha (410 quadrats) of C. nigriceps occupancy took us 1000 work hours, while a UAV survey took less than an hour. A model that would allow us to detect patterns of species composition change based on future UAV flights would be immensely convenient for tracking the ecosystem health of this savanna, as changes in ant occupancy are closely linked to savanna productivity, herbivore activity and carbon cycling (Milligan et al., 2021; Palmer et al., 2008, 2010).

Ant occupancy in the 16-ha (410 quadrats) ground-survey area was tallied at the quadrat level (rather than at the level of individual trees, due to limited dGPS accuracy; see Figure S1); correspondingly, we tallied our model prediction of C. nigriceps-occupied trees by quadrats. In effect, this approach treats each quadrat as an ecological community, and we therefore compared measured beta-diversity patterns (an informative index of ecological dynamics and conservation planning; see Socolar et al., 2016) between the ground surveys and those predicted by our CNN classifier, finding a high level of congruence in the total tree count, C. nigriceps spatial distribution, and C. nigriceps proportion. Our direct approaches to calculating the ‘correlation’ between ground-survey and classifier-predicted beta-diversity patterns have limitations: (1) Pearson correlation treats each quadrat as independent and does not account for spatial autocorrelation; (2) while the Mantel test is a valid method for demonstrating the degree of difference between multivariate dissimilarity matrices (in our case, Bray–Curtis dissimilarity matrices), it does not reveal correlation between geographical distributional patterns per se (see Legendre et al., 2015 for caution against the use of the Mantel test in spatial analysis). Indirectly, a variance partitioning approach with spatial MEMs as a proxy for autocorrelation (Peres-Neto et al., 2006) explains the unique contribution of the classifier-generated distribution pattern (conditioned on spatial pattern) towards the ground-surveyed distribution pattern; in this context, we interpret a high partitioned contribution as high congruence. In our study, a combination of all three approaches show congruence between the ground survey and modelled predictions, even after accounting for spatial autocorrelation; this finding attests to the validity of our classification approach (Figure 4b; Table 2). Although the predicted proportion of C. nigriceps-occupied trees per quadrat was significantly correlated with the ground-survey results, these correlations had smaller coefficients compared with the C. nigriceps count correlations, likely because of error introduced from both the total tree count (acquired through tree segmentation) and the C. nigriceps counts (acquired through tree classification). Importantly, however, these spatial correlations could not be derived from simply randomly simulating a proportion of trees occupied by C. nigriceps (Table 2, ‘null hypothesis’)—strictly speaking, even our choices of simulating C. nigriceps at 8% or 10% occurrence are biologically informed, which might account for the high (and sometimes significant) correlation between our simulated patterns and those from the ground survey. To achieve an even higher correlation with ground-survey beta-diversity patterns than these simulations did, our classifier must have correctly identified a high proportion of the targeted phenotype.

Although we were able to replicate ecologically meaningful patterns with our model classification, we acknowledge that a map with a 0.41 correlation to ground-survey counts (e.g. Figure 3b) is not very helpful for pinpointing C. nigriceps in the field (but see our suggestions for improving model accuracy in Section 4.2). Rather than static ‘treasure maps’, our approach is best suited to detect dynamic ecological community changes along spatial or temporal axes. Using dimensionality reduction analysis, ground-surveyed communities were arrayed along the axis of total tree count and proportion of C. nigriceps per community (Figure 4a; Figure S3B). Quadrats on the western side of the plot were more similar to each other than those on the eastern side (Figure 4b, lower row numbers cluster more closely), while there were no distinguishable patterns along the north–south axis (Figure S3A, no pattern by transect number). These patterns likely reflect a soil nutritional gradient, which is present along the east–west but not the north–south axis of the plot (Childers, 2021; Palmer & Young, 2017). Soil nutrition either directly drives ant species composition and tree use (Farji-Brener & Werenkraut, 2017; Kenfack et al., 2021; Wagner & Fleur Nicklen, 2010), or affects herbivore use of the landscape, which drives the patterning of ant occupancy (Kaspari, 2020; Palmer et al., 2008). Regardless of the reasons behind this distribution pattern, it is almost identically recapitulated by our classifier-predicted datasets (Figure 4a,b). This successful recapitulation of patterns suggests that our model was sensitive enough to replicate patterns of species distribution at a community level, even though individual-level prediction accuracy left considerable room for improvement. In our simulated turnover scenario (reflecting a loss of focus species due to invasive ant encroachment; see next paragraph), the dimensional reduction results of all quadrats change visibly; the same pattern of change was mirrored when the simulation was applied to classifier-generated data (Figure S5). These results suggest that if a different plot were surveyed or if the pattern of C. nigriceps distribution of the current plot changed (e.g. due to drought-induced vegetation die-off, fire-induced succession events or invasive species incursion), we can be confident that observable community changes would be reflected in our LiDAR-derived classifier.

Such an application of our protocol is timely and relevant: an invasive ant species, the big-headed ant (Pheidole megacephala), established an invasion front less than 2 km from our plot. Pheidole megacephala out-competes all four mutualist ant species and does not defend acacias from browsers, leading to increased herbivore and pathogen damage and loss in above-ground carbon sequestration (Milligan et al., 2021, 2022). Such shifting invasion fronts are difficult to detect on the ground but could be delineated from monitoring the retreat of C. nigriceps-occupied trees with future LiDAR surveys.

4.2 Limitations and future improvements

Due to logistic constraints, our collection of on the ground ant occupancy data (Sections 2.2 and 2.6) lagged 6 months behind UAV data collection. During this interval, it is possible (but unlikely) that ant occupancy on acacias could have changed (no data on ant turnover rate on host trees have been published, but Palmer et al., 2010 estimated that a 20-year-old acacia tree would have experienced 3–7 turnover events). Such potential turnover would decrease the correlation between the ground-surveyed C. nigriceps distribution and that predicted by the classifier (although it is also unclear how long C. nigriceps-induced structural change would persist after a turnover event).

Collecting more training data of C. nigriceps-occupied trees will likely increase the accuracy of our model (Huang et al., 2016), which would involve either geotagging more C. nigriceps-occupied trees or finding creative ways of bootstrapping inputs from graphic engines (e.g. Beery et al., 2018; Das et al., 2021). Better representation schemes for input point clouds is another option to explore: in reducing 3D point clouds to multi-aspect 2D representations, structural information is lost. However, classifiers that directly input 3D point clouds can outperform image-based representation schemes (Budei et al., 2018; Lin et al., 2023). Lastly, given sufficient computational resources, hyperparameters of machine learning, such as those determining the network learning rate and its stepwise decay, could be optimized in a Bayesian framework (e.g. Frazier, 2018). While these enhancements would quantitatively improve the accuracy of our model, ant occupancy is one of many factors in predicting acacia canopy topology. Tree age, soil type, climate, herbivory and competition also influence tree shapes; it remains challenging to predict all intra-species phenotypic variation with a single variable (Auger & Shipley, 2013; Dube et al., 2014; Jiménez & Díaz-Delgado, 2015).

At our field site, C. nigriceps is one of four ant species found on A. drepanolobium. Although C. nigriceps-occupied trees exhibit the most distinguishable phenotypic changes, the other three ant species might also be identifiable with high-resolution LiDAR data. For example: (1) C. sjostedti, the most competitive of the four ant species, tends to occupy older, taller trees (Palmer & Young, 2017). While our current input representation scheme is height-independent (see Section 2.7), tree height could be used to identify C. sjostedti. (2) Ant species that aggressively trim and/or consume acacia leaves (i.e. C. nigriceps and T. penzigi) make their host trees less verdant, which would likely be detected with infrared or RGB colour sensors (Madawy et al., 2019). (3) Tetraponera penzigi is the best disperser among the four ant species (Stanton et al., 2002). Spatial variables such as distance to the nearest neighbouring tree could be useful for predicting T. penzigi occupancy. We emphasize that a nuanced appreciation of ant life history and behavioural ecology is the foundation of such data-driven ecological inferences.

4.3 Future applications

Vegetation, when interacting with insects, exhibits phenotypic plasticity (reviewed in Ananthakrishnan, 2000; Ashra & Nair, 2022). Our study demonstrates that these intra-specific phenotypic variations can be identified in ultra-high-resolution (10 cm pixels) LiDAR data and used to reliably infer the occupancy of small invertebrates. Modern-day agriculture and forestry typically consist of large-scale monoculture plantations that are highly concerned with monitoring insect-vectored infections. Our approach could be extended to these settings to provide rapid assessment of agricultural and forestry health at reduced costs compared with ground-based surveys. For example, the hemlock woolly adelgid (Adelges tsugae) is a major forestry pest in the northeastern United States, defoliating the midstorey and understorey of eastern hemlock trees (Tsuga canadensis) before signs of infection occur in the forest canopy. This damage makes quantification of infection difficult from satellite images, costly from the ground, but convenient using 3D-LiDAR point clouds (Boucher et al., 2020). Following the protocol outlined in our study, a ‘woolly adelgid infestation classifier’ could be trained on topologies of infested trees and then applied to surveil pestilence using UAV LiDAR flights. More broadly, the most economically important tree crop diseases, such as wilt, rust, powdery mildew and anthracnose, exhibit consistent, identifiable phenotypic changes in canopy shape or colour in their early stages on pine, rubber, chocolate, cacao and palms before defoliation. A combination of high-resolution LiDAR data, CNN training and ground-survey data collection hold promise for identifying these diseases at large scales (Abdulridha et al., 2020; Liu et al., 2021; Moriya et al., 2021; Yu et al., 2021; Zeng et al., 2022).

AUTHOR CONTRIBUTIONS

Zhengyang Wang, Christopher C. M. Baker, Dino J. Martins, Naomi E. Pierce and Andrew B. Davies conceived the ideas for this project. Zhengyang Wang, Robert Huben, Peter B. Boucher, Chase Van Amburg, Jimmy Zeng and Andrew B. Davies designed and developed the methodology. Zhengyang Wang, Peter B. Boucher, Chase Van Amburg, Nina Chung, Jocelyn Wang, Jeffrey King, Richard J. Knecht, Ivy Ng'iru, Augustine Baraza, Naomi E. Pierce and Andrew B. Davies collected the data. Zhengyang Wang, Robert Huben, Peter B. Boucher, Chase Van Amburg and Jimmy Zeng analysed the data. Zhengyang Wang led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

ACKNOWLEDGEMENTS

NEP and ABD's research groups provided valuable feedback on the manuscript. This work was supported by the William F. Milton Fund of Harvard University. CVA was supported by the Harvard Museum of Comparative Zoology Grant-In-Aid of Undergraduate Research fund. NC was supported by the Harvard College Research Program and a Harvard Center for African Studies summer travel grant. JW was supported by a Herchel Smith Undergraduate Science Research Program grant from Harvard University. NEP, DJM and RJK were supported by a grant from the Putnam Expeditionary Fund of the MCZ (to NEP). We thank Jeff Blossom from the Harvard Center for Geographic Analysis for assistance with dGPS data collection and processing, Pat Milligan for discussions on invasive ants, and Weilin Meng and Allison King for discussions on machine learning methods. We thank the Government of Kenya (NACOSTI/P/22/18988, NACOSTI/P/21/12633) and Mpala Research Centre for permission to perform the study, and the Kenyan Civil Aviation Authority for authorization to perform the UAV surveys (KCAA/OPS/2117/4). We thank the staff at Mpala Research Centre, especially Jackson Ekwam Etele, Wilson Mureyian, Steven Gitau Githuthu, Godfrey Amoni, Godfrey Gitimu, Patrick Kamukunji, Aimee Gaitho, Cosmas Nzomo and Peter Lochomin, for their assistance.

    CONFLICT OF INTEREST STATEMENT

    The authors have no conflict of interests.

    STATEMENT ON INCLUSION

    Our study brings together authors from four continents, including scientists based in Kenya, where the field work was carried out. All authors were engaged early on with the research and study design to ensure that the diverse sets of perspectives they represent were considered from the onset.

    PEER REVIEW

    The peer review history for this article is available at https://www.webofscience.com/api/gateway/wos/peer-review/10.1111/2041-210X.14298.

    DATA AVAILABILITY STATEMENT

    Ground-survey results (Supporting Information S1), with code and LiDAR point clouds for CNN training (Supporting Information S2), are deposited at Zenodo (Wang, 2023, https://doi.org/10.5281/zenodo.8175736). Our script for tree segmentation, with user instructions, is deposited on Github (Huben, 2024, https://doi.org/10.5281/zenodo.10570318).