Experimental evidence that novel land management interventions inspired by history enhance biodiversity

1. To address biodiversity declines within semi-natural habitats, land management must cater for diverse taxonomic groups. Integrating our understanding of the ecological requirements of priority (rare, scarce or threatened) species through ‘biodiversity auditing’, with that of the intensity and complexity of historical land use, encour ages novel forms of management. Experimental confirmation is needed to establish whether this enhances biodiversity conservation relative to routine management.


| INTRODUC TI ON
Across Europe, conservation tends to focus on semi-natural habitats shaped by a long history of human management (EC, 1992;Ratcliffe, 1977), yet priority plants (Hülber et al., 2017) and invertebrates (Seibold et al., 2019) continue to disappear from such areas.
Within surviving habitat remnants, land management often mimics elements of historic (pre-industrial c. 1,200-1,750) practices on the assumption that this will support assemblages that persisted through human activity (Fuller et al., 2017;Wright et al., 2012), or is justified by reference to the ecology of a taxonomically biased species subset (Clark & May, 2002;Griffiths & Dos Santos, 2012).
Many interpretations of 'traditional' management are incomplete and potentially suboptimal for threatened biodiversity, but the promotion of alternative novel approaches requires supporting evidence. A new emphasis on 'rewilding' (Pettorelli et al., 2018) and a progressive shift from biodiversity conservation for the intrinsic value of species, to ecosystems and the goods and services they provide (Mace, 2014), further increase the need for approaches which can quantify and predict biodiversity responses to landscape-scale interventions. Where biodiversity is well-characterised and autecological knowledge is strong (e.g. in much of Europe), the biodiversity audit approach-a bioregional process where biodiversity records are collated and priority species with shared autecological requirements are subsequently grouped into cross-taxa 'management guilds'-provides an objective way of informing and optimising the conservation benefits of management interventions . However, while recent audits support the importance of historical management to priority biota , there is a pressing need for experimental confirmation involving multiple taxa.
Crucially, perceptions and implementation of 'traditional management' tend to be simplified and homogenised (e.g. the notion of conservation grazing, Fuller et al., 2017). Historical management was, in reality, characterised by repeated biomass removal and physical disturbance through complex multi-layered land use that often overlapped grazing with other forms of resource harvest and varied spatio-temporally within sites and across landscapes (Fuller et al., 2017;Linnell et al., 2015). Synthesising autecological knowledge through biodiversity audits  and a detailed understanding of historic land-use complexities (Fuller et al., 2017;Linnell et al., 2015) both inspire novel interventions (hereafter, 'enhanced management') that emphasise physical disturbance, grazing, nutrient removal, spatio-temporal variability, early successional habitats and structural complexity (Fuller et al., 2017). This might involve near-accurate replication of specific pre-industrial practices (e.g. coppicing, Merckx et al., 2012) or the use of wild or domestic herbivores to create and maintain dynamic mosaics (consistent with some principles of rewilding, Van Klink & WallisDeVries, 2018); but in other circumstances it may be appropriate to adopt new approaches that provide the resources needed by the widest range of species, particularly priority species not helped by routine management.
Despite recent calls for strategies that deploy novel forms of enhanced management (Fuller et al., 2017;Linnell et al., 2015), this approach is untested. First, it is unclear whether target priority species are able to colonise newly established suitable habitats (Thomas, 1994) and whether the benefits of management are offset by negative impacts on species intolerant of the intervention. Second, because knowledge of land-use history is not exact, and modern techniques offer interventions that differ from historical methods, experiments are needed to establish whether treatment detail matters. In many cases accelerated succession (from increased rates of 4. Assemblage composition (pooling non-priority and priority species) varied between subtreatments for plants, ants, true bugs, spiders, ground, rove and other beetles; but only 1-year-old fallowed deep-cultivation increased priority richness across multiple taxa.

Treatments produced similar biodiversity responses across various dry grassland
'habitats' that differed in plant composition, allowing simplified management guidance.
6. Synthesis and applications. Our landscape-scale experiment confirmed the considerable biodiversity value of interventions inspired by history and informed by systematic multi-taxa analysis of ecological requirements across priority biota. Since assemblage composition varied between subtreatments, providing heterogeneity in management will support the widest suite of species. Crucially, the intended recipients responded most strongly, suggesting biodiversity audits could successfully inform interventions within other systems.

K E Y W O R D S
biodiversity audit, cultural landscape, dry grassland, ground disturbance, landscape-scale conservation, lowland heathland, multi-taxa, semi-natural habitat nitrogen deposition, Tipping et al., 2019) may reduce the duration and pattern of colonisation of suitable micro-habitats, such that more severe longer lived interventions may be beneficial (Härdtle et al., 2006;Pedley et al., 2013). Third, conservation advocacy would be streamlined if evidence supports consistent interventions across habitats that share similar ecological processes despite differing in plant composition. Last, most tests of intervention efficacy within semi-natural habitats focus either on vegetation structure as a proxy for biodiversity, or on single species or a limited subset of taxa (e.g. Lepidoptera, Goodenough & Sharp, 2016;or birds, Żmihorski et al., 2016). Given that semi-natural habitats are especially valued for their diverse assemblages (Ratcliffe, 1977), such studies are unlikely to be good substitutes for robust multi-taxa experiments.
Here, we test the multi-taxa consequences of enhanced management interventions across an extensive, semi-natural mosaic (3,850-ha) of calcareous and acidic dry grasslands of varying age and long-established lowland heathland (hereafter collectively 'grassland').
Grassland conservation practices emphasise the role of grazing (Bakker et al., 1983;Wells, 1969), but the needs of many priority species may be better met by temporally and spatially dynamic physical disturbance (e.g. Denton, 2013). A biodiversity audit of the Breckland region of Eastern England (characterised by low rainfall, sandy soils and internationally important grassland habitats, Dolman et al., 2012) suggests 61% of the 629 priority species associated with dry-open habitats require heavy physical disturbance, fallows or habitat juxtaposition. Such management was characteristic of pre-industrial landscapes generally (Fuller et al., 2017) and particularly in Breckland, where grassland habitats were disturbed by infrequent cultivation of long-rotation fallows, rabbit farming and resource extraction (Bailey, 1989;Dolman & Sutherland, 1991).
This combination of autecology and history justifies creating mosaics of cultivations that vary in disturbance intensity and fallow age. However, because previous ground disturbance experiments have examined homogenous, even-aged interventions (e.g. Hawkes, Smart, Brown, Jones, Lane, et al., 2019;Pedley et al., 2013;Pywell et al., 2007), the multi-taxa consequences of enhanced management are unclear.
To examine the effects of such management on grassland biodiversity, we conducted a well-replicated, landscape-scale experiment. We tested two contrasting ground disturbance treatments that provided structural complexity: shallow-cultivation with a rotatory rotavator and deep-cultivation with an agricultural plough that may disrupt vegetation less severely or more drastically, respectively, than historic cultivation by oxen-drawn plough. Treatments were built-up over 3 years to create complexes comprising subtreatments that varied in time since cultivation and disturbance frequency (single or repeated cultivation; Figure 1). We quantified responses across nine taxonomic groups, separately for non-priority and priority species, comparing treatments to areas of grassland managed with light grazing and limited or no ground disturbance. To test the efficacy of treatments based on autecological synthesis and historic land use, we compared responses to shallow-and deep-cultivation at the 'complex level' (pooling across subtreatments) examining species richness (hereafter 'richness') and composition (quantified as the percentage of the species pool supported within, or unique to, each treatment) in contrast to controls. To test whether biodiversity auditing usefully predicted responses to treatment interventions, we examined whether those species associated with dry-open habitats whose autecology indicates an association with physical disturbance (the intended recipients) responded more strongly to treatment.
Last, to refine management recommendations, we examined responses to differing subtreatments within complexes, and whether efficacy differed with grassland type and composition.

| Experimental treatments
In early 2015, forty 2-ha (100 × 200 m) ground disturbance plots (20 shallow-cultivated using a rotary rotavator; 20 deep-cultivated using F I G U R E 1 Development of a treatment complex over three successive winters to the final 4-ha complex (in 2017), comprising four 1-ha subtreatments: CR, repeatedly cultivated (brown); C1, first-time cultivated (light brown); F1, 1-year-old fallow (light grey); F2, 2-year-old fallow (grey), also shown is a single 4-ha control plot (C, green) of which the central 1-ha (white outline) was sampled. See Supporting Information Figure S3 for mean vegetation height and bare ground extent in subtreatments and controls an agricultural plough) and twenty-one 4-ha (≈200 × 200 m) control plots were established in grassland mostly excluding, but sometimes close to, scattered trees or scrub (for details, see . Treatments were repeated during each of the next two winters, again cultivating 2-ha (100 × 200 m), but with half overlapping a central repeatedly treated subplot and half first-time cultivation, to form a 4-ha treatment complex by 2017.
Less-intensive vegetation disturbance such as cutting and removal was not considered, as structural effects are ephemeral (Dolman & Sutherland, 1994) with little benefit for priority dry-open habitat species (Pedley et al., 2013).
Studies in the inner part of STANTA were precluded by potential unexploded ordnance (see Figure S1 in Supporting Information), but otherwise, treatments and controls were randomly distributed across the study area, stratified across four non-randomly distributed grassland strata (following Hawkes, Smart, Brown, Jones, Lane, et al., 2019), defined by soil type, age since cultivation and plant composition as:  Table S1).

| Responses to treatment
Responses to treatment were assessed in 2017. Invertebrates were sampled across all 40 treatment complexes and 21 control plots, and vascular plants (hereafter 'plants') across 32 complexes (16 shallowcultivated and 16 deep-cultivated) and 16 controls (randomly selected, constrained to strata). Invertebrate trapping intensity across each of the four 1-ha subtreatments per complex and one central 1-ha plot within each control (hereafter 'samples') was consistent (see below). Greater sampling intensity per treatment complex than per control was accounted for subsequently by rarefaction (see below). deployed once in a central 15 × 15 m grid, for 3 consecutive days, between 1 July and 26 August. If efficiency was less than half the maximum trap-days per plot, the array was repeated, after which 96% of pitfall trap and 94% pan trap deployments were active for the whole exposure period (see Appendix S1 for details). Plant incidence was sampled between 10 April and 7 July, from 16 quadrats (1 × 1 m) distributed evenly (11-14 m apart) along two parallel 100-m transects (30-33 m apart), giving frequency (0-16) per species. Data were pooled across pitfall-months and sampling methods (pitfalls, pan traps, quadrats) giving one composite sample (n = 21 control plots; and n = 160 subtreatment plots, nested within cultivated complexes).

Pitfall traps sampled seven invertebrate groups [spiders
Most sampled taxa were identified to species level, the few unresolved plants (0.3%); spiders (<0.1%); ground, rove and other beetles (<0.1%, 3.5%, 1.0%); true bugs (3.7%); and bees and wasps (0.9%) were not considered further. Species were considered as conservation priorities when classified as either: Threatened (IUCN Critically Endangered, Endangered and Vulnerable) or Near Threatened in Great Britain, or Nationally Rare, Nationally Scarce, or earlier designations of Red Data Book or Nationally Notable (see Table S3 for sources); remaining species were considered 'non-priority'.

| Analysing richness and composition at the complex level
For each taxonomic group (separately for non-priority and priority species), we used sample completeness curves (derived from sample-based rarefactions, rescaled to the number of individuals, using the Mao Tau function) from the package iNEXT (Chao et al., 2014) to estimate sample coverage-a measure of sampling efficiency-at the observed sample size for each subtreatment category, complex category (shallow-and deep-cultivation, pooling across subtreatment) and controls. Next, cumulative richness (pooled across all nine taxa and per taxon, separately for non-priority and priority species) of both treatment complex categories and controls (hereafter, collectively 'regimes') was examined by sample-based rarefaction. As the number of individuals sampled differed among regimes, following  (Chao et al., 2014), setting the BBS to the largest sample size was inappropriate. Consistent with other studies (e.g. Schall et al., 2018), we also set the BSS to the smallest sample size (classical rarefaction) to ensure findings are robust. Richness estimates were considered to differ between regimes when pairwise 95% CIs obtained from 200 bootstrapping replications did not overlap (Chao et al., 2014); following convention, no post hoc correction for multiple comparisons was applied. To avoid unreliable extrapolation, we did not analyse any taxonomic groups where the number of observed species, from any of the three regimes, was less than three (judged separately for non-priority and priority species). Last, because the overall (cross-taxa) non-priority and priority comparisons considered eight complexes (four deep-cultivated and four shallow-cultivated) and five controls that lacked plant data, we tested whether removing these samples entirely from both analyses altered inference. All analyses were carried out in R (R Core Team, 2015).
Irrespective of relative richness, treatment complexes and controls may support distinctive assemblages, or unique species not recorded in any other regime. We quantified the percentage of the species pool supported within, and the percentage unique to, each regime, separately for non-priority and priority species, for the overall assemblage and each taxon, using Euler diagrams (in package Note, because multiple subtreatment samples could be selected from the same complex (mean 11 ± 1 SD separate complexes per treatment, per iteration), analyses were slightly biased in favour of the controls which always resampled 16 distinct locations (thus potentially having greater gamma diversity).

| Analysing responses of habitat and management guilds
To establish whether the treatment complexes increased the richness of the intended recipients, we used two existing autecological classifications. First, using the online tool 'Pantheon' (Heaver et al., 2017), we classified non-priority and priority invertebrate species associated with dry-open habitats onto a composite ecological gradient of increasing physical disturbance intensity (hereafter 'habitat guilds'): from 'tall swards and scrub', through 'short swards without exposed sand' (hereafter, 'short swards'), to 'short sward with exposed sand' (hereafter, 'short swards and bare ground'). Second, we used an earlier biodiversity audit  to classify the same dry-open priority invertebrate species into management guilds requiring 'no', 'light', or 'heavy' ground disturbance and either 'no/light' or 'heavy' grazing. Species with unknown or undifferentiated structural or management requirements and five species considered by Dolman et al. (2012) to be associated with wet/shaded grassland were excluded (Table S2).
Although priority plants were also classified by biodiversity auditing, analysis was restricted to invertebrates for consistency with Pantheon habitat guilds. For each guild, we compared overall invertebrate richness between regimes using rarefaction, separately for non-priority (habitat guilds) and priority (habitat or management guilds) species. As grazing was light across the study landscape, we predicted that species associated with heavy ground disturbance would respond more strongly to the treatment, irrespective of their grazing classification.

| Analysing richness and composition at the subtreatment level
Richness was compared between subtreatments (nine levels: four shallow-cultivated subtreatments, four deep-cultivated subtreatments, plus control) using GLMMs, separately for non-priority and priority species within each taxonomic group (though omitting any excluded from the complex rarefactions). Strata was included as a fixed effect, but with 'young' and 'calcareous' grasslands a priori merged as 'calcareous grassland' owing to the similarity of their vegetation structure (Hawkes, Smart, Brown, Jones, Lane, et al., 2019) and plant composition (Table S1), thus reducing model complexity. To determine whether treatment efficacy varied between strata, we examined a subtreatment × strata interaction. Plot (for controls) or complex (subtreatments) identity was included as a random effect to control for non-independence of subtreatment samples within complexes. To account for slight variation in trap success, the total number of pitfall trap-days (pooled across sampling rounds) and pan trap-days were included as separate random effects for each invertebrate group sampled using that method (both were included for true bugs and bees and wasps). For each GLMM the appropriate error term (Poisson or negative binomial) was selected by examining the ratio of deviance/residual degrees of freedom of full (global) models. Candidate models comprising three possible variable combinations (subtreatment, strata and the subtreatment × strata interaction; additive subtreatment and strata effects without the interaction term; subtreatment alone) were examined using the package lmE4 (Bates et al., 2017). The top-ranked model was considered 'best' if ΔAICc > 2 (Akaike's information criterion corrected for small sample size) relative to the next-ranked model (Burnham & Anderson, 2002); for competing models within 2 ΔAICc the most parsimonious was selected as additional variables lacked strong support (Burnham & Anderson, 2002). Next, where strata was retained in the selected model, we merged strata levels with similar coefficients if this did not reduce model performance (ΔAICc ≤ 2). Last, the fixed effect of subtreatment was considered to be supported if performance of the selected model deteriorated (ΔAICc > 2) upon its removal; in these cases, category means were compared by Tukey's pairwise comparison using the package mulTcomp (Hothorn et al., 2008). Spatial autocorrelation of model residuals was examined by Moran's I using the package ApE (Paradis et al., 2004).
Assemblage composition of subtreatments and the influence of strata were examined by Redundancy Analyses (RDA, using Euclidean distance measures), separately for each taxonomic group (pooling non-priority and priority species), with square-root transformed species matrixes (that provided better-fitting models than Hellinger transformation) and downweighting of rare species (to reduce the influence of particularly abundant or rare species) using the VEgAN package (Oksanen et al., 2018). Samples with fewer than 10 observations for that taxonomic group were omitted to avoid over-representing localities where the assemblage was poorly characterised. To determine whether the fixed effects of subtreatment and strata were important to species composition, we used backwards stepwise selection from the full RDA model (using the 'ordistep' function, Oksanen et al., 2018) with 1,000 permutations (p < 0.05, based on ANOVA-like tests).  Table S3 for species   and Table S4 for numbers sampled per invertebrate group). Priority (but not non-priority) ants, true flies and plants were excluded from separate taxonomic analyses because fewer than three species were observed on controls (ants and plants) or shallow-cultivated complexes (ants and true flies). Pitfall traps sampled more true bug species, while pan traps sampled more bee and wasp species (the only taxonomic groups sampled with both trapping methods; Figure S4).

| Richness at the complex level
Sampling completeness estimates for treatment complexes and controls exceeded 90% of estimated total non-priority and priority species richness for every taxon with the sole exception of priority bees and wasps on the controls (80%; Table S5a). For non-priority species, overall richness was greater on both treatments (deep-cultivated: 610 species, 95% CI 599-620; shallow-cultivated: 553, 543-564) than controls (445,, and deep-cultivation supported more species than shallow-cultivation (Figure 2a). For separate taxa, richness was greater on treatments than controls for eight (deep-cultivation: other beetles; rove beetles; ground beetles; true bugs; bees and wasps; ants; true flies; and plants) and four (shallow-cultivation: rove beetles; ground beetles; true bugs; and plants) of the nine groups. Deepcultivation supported greater richness than shallow-cultivation for five groups (other beetles; true bugs; bees and wasps; true flies; plants).
Lowering the BSS to the smallest sample size affected inferences for four of 17 rarefaction analyses (Table S6); in three, non-significant differences became significant (owing to narrower CIs): inference at twice the lowest sample size is therefore conservative.
Inference from the overall cross-taxa analyses was not affected when complexes and control plots lacking plant data were excluded ( Figure S5). For the two groups sampled by two trapping methods, removing the least effective method meant deep-cultivated complexes no longer supported more priority true bugs than controls, but did not affect inference for bees and wasps and non-priority true bugs ( Figure S6).

| Responses of habitat and management guilds
Of the 707 non-priority and 170 priority invertebrate species, 551 non-priority and 135 priority species were associated with dry-open habitats according to Pantheon (75 and 14 were associated with wet/shaded habitats, while broad ecological requirements of 81 and 21 were unknown); of these, 518 (94%) and 123 (91%) were classified among the three habitat guilds (Table S3). For non-priority species, richness of the 346 'tall sward and scrub' associated species was greater on both complex treatments than controls, richness of the 94 'short sward' species was similar across both treatments and controls, and for the 78 'short sward and bare ground' species richness was greater for one treatment (deep-cultivation) than controls ( Figure 3). For priority species, richness of the 35 'tall sward and scrub'-associated species was greater on one treatment (shallowcultivation) than controls, richness of the 33 'short sward' species was again similar across both treatments and controls, and for the 55 'short sward and bare ground' species richness was nearly three times greater on both treatments than on controls.
Of the 135 dry-open priority invertebrate species, 105 (78%) were classified into five management guilds (Table S3). Response to both complex treatments was progressively greater for management guilds with more intense requirements (Figure 3): for the 15 priority species autecology classified as 'no ground disturbance and no/light grazing', richness was similar across both treatments and controls (indicating a lack of treatment penalty); for those classified as 'no ground disturbance and heavy grazing' (17 species) or 'light ground disturbance and no/light grazing' (15 species), richness was greater for shallow-cultivation or both treatments (respectively) than controls; for those classified as 'heavy ground disturbance and no/light grazing' (33 species), richness on both treatments was double that on controls, while for those classified as 'heavy ground disturbance and heavy grazing' (25 species), richness on both treatments was three times that on controls. Lowering the BBS to the smallest sample size affected inference for one of the six habitat guild analyses and none of the five management guild analyses (Table S6).

F I G U R E 2
Richness and composition of non-priority and priority species in shallow-or deep-cultivated treatment complexes (the enhanced management) and controls, shown for all species pooled and separately for each of nine taxonomic groups. Rarefaction plots (a) contrast richness between regimes through sample-based rarefaction (rescaled to numbers of individuals) using 20 complexes per treatment and 21 control plots (for pooled species and invertebrate groups) or16 complexes per treatment and 16 controls (for plants). Symbols and solid lines denote observed and interpolated richness, respectively, shading represents 95% CI bounds, the vertical dashed line denotes the base sample size (twice the smallest sample size) where richness was compared. Eulers (b) show the mean and 95% CI of total species richness across regimes (below each panel), and the percentage of this pool recorded within (outer bold values) and unique to (internal white values) each regime, based on 200 resampling iterations each comprising 16 subplots per treatment and all 16 control plots. Separately for both responses (percentage of the species pool and unique species), regimes that share a superscript do not differ (following pairwise comparisons with Bonferroni correction). For mean percentage overlap between each regime, see Table S9. For plants, true flies and ants, limited numbers of priority species prohibited separate examination

| Richness at the subtreatment level
Subtreatments differed in vegetation structure; vegetation height was similar across the first-time and repeatedly cultivated subtreatments, irrespective of cultivation method (shallow-or deepcultivation; Figure S3), but the deep-cultivations contained more abundant bare ground than the shallow-cultivations. Vegetation height and cover recovered after fallowing, and more quickly after shallow-cultivation, while deep-cultivation still retained abundant bare ground as 1-year-old fallows. Ewe abundance in May 2017 did not vary between controls and subtreatments ( Figure S2); but appeared to increase on the subtreatments through the summer (R. W. Hawkes, pers. obs.).
Sample completeness was high (>90%) across all eight individual subtreatments, for non-priority and priority rove and ground beetles, and non-priority other beetles, spiders, true bugs, bees and wasps, ants and plants (Table S5b). Where sampling completeness was less than 90% it remained strong (e.g. 80%-90% for priority spiders and true bugs in one subtreatment, for priority other beetles in two subtreatments, and for priority bees and wasps across five subtreatments; and 78%-88% for non-priority true flies across all deep-cultivated subtreatments).
For non-priority species, richness was greater on all eight subtreatments, compared to controls, for rove beetles, ground beetles, and bees and wasps (Figure 4). Non-priority other beetle richness was greater on two subtreatments (1-year-old-fallow deep-cultivated and repeatedly shallow-cultivated), non-priority plant richness was greater on one subtreatment (1-year-old-fallowed deep-cultivated), and non-priority true bug richness was lower on two subtreatments (repeatedly and first-time deep-cultivated). Non-priority spider, ant and true fly richness did not differ between the sub-  Table S7).

F I G U R E 3
Response to enhanced grassland management of multi-taxa invertebrate guilds with differing habitat association and management requirements. Left panels consider three habitat guilds (from the Pantheon database) ranked along a composite gradient of increasing disturbance intensity: from tall swards and scrub, through short sward, to short sward with bare ground, separately for nonpriority and priority species. Right panels (priority species only) consider biodiversity auditing classification in relation to independent gradients of grazing and ground disturbance intensity. For each habitat or management guild, sample-based rarefaction (rescaled to numbers of sampled individuals) contrasts richness between shallow-(n = 20) or deep-cultivated (n = 20) treatment complexes and controls (n = 21). Symbols and solid lines denote observed and interpolated richness, respectively, and shading represents 95% CI bounds. The vertical dashed line denotes the base sample size (twice the smallest sample size), where richness was compared For the priority species, richness was greater on the 1-yearold-fallowed deep-cultivated subtreatment, compared to controls, for other beetles, ground beetles, true bugs and bees and wasps ( Figure 4). For priority other beetles and bee and wasps, richness was also greater on the repeatedly shallow-cultivated subtreatment and the 2-year-old fallowed subtreatments (both shallow-and deep-cultivated) respectively. Priority rove beetle and spider richness did not differ between any of the subtreatment categories and controls. For priority ground beetles and other beetles, richness was greater on calcareous and intermediate-aged grassland (pooled) than ancient-acid grassland (again, interactions between subtreatment and strata were not supported, Table S7).
Residuals from six of the 15 richness models (non-priority and priority spiders, non-priority ground beetles, priority rove beetles, non-priority true bugs and non-priority ants) were significantly, though weakly, spatially autocorrelated (Moran's I = 0.03, −0.02, 0.08, 0.08, 0.04 and 0.06 respectively), suggesting some variation attributable to geographic factor(s) not considered in the models.
Nevertheless, we consider inference for subtreatment effects to be robust, as treatments and controls were distributed randomly.

| Composition at the subtreatment level
Redundancy analyses ( Figure 5) supported differences in composition among subtreatment and strata categories for other beetles, rove beetles, ground beetles, spiders, true bugs, ants and plants (bees and wasps and true flies were omitted from this analysis). RDA models explained between 25% (other beetles) and 43% (ground beetles, Figure 5) of overall variance in sample composition. For F I G U R E 4 Richness of non-priority and priority species, within each of nine taxonomic groups, across repeatedly cultivated (CR), first-time cultivated (C1), 1-year-old fallow (F1) and 2-year-old fallow (F2) subtreatments, within shallowand deep-cultivated complexes, and in untreated controls (C). Subtreatments and controls were compared by GLMMs that controlled for the fixed effect of strata (when retained during model simplification, see Table S10 for model coefficients), with strata merged either as 'calcareous and intermediate-aged grassland' (left offset) relative to ancient-acid grassland (right offset; 'CI vs. A'), or intermediate-aged grassland (left offset) relative to 'calcareous and ancient-acid grassland' (right offset; 'I vs. CA'), or with strata excluded from the model (indicated by *). Symbols denote predicted richness, bars denote 95% CIs and open circles denote individual data points. Subtreatment means that share a superscript (homogenous subsets, a-e) did not differ significantly (Tukey's pairwise comparisons p > 0.05). Where no pairwise comparisons are reported, the effect of treatment was not supported. For plants, true flies and ants, limited numbers of priority species prohibited separate examination spiders, ground beetles, rove beetles, true bugs and ants, subtreatment explained more variance than strata; for other beetles and plants, subtreatment and strata explained similar variance (Table S8).
For the three beetle groups, the composition differed markedly between the repeatedly/first-time-cultivated subtreatments (both shallow-and deep-cultivated) and controls, while the 1-and then 2-year-old-fallowed subtreatments converged towards controls (but did not overlap). Spiders showed a similar pattern, but with less convergence towards the controls. For plants and true bugs, the four deep-cultivated subtreatments and the repeatedly shallow-cultivated subtreatment differed markedly from controls, while the remaining three shallow-cultivated subtreatments converged towards controls (but remained distinct). Ants showed a less clear pattern, but the first-

| D ISCUSS I ON
Through one of the largest multi-taxa land management experiment yet conducted in a European grassland, we quantified consequences of management interventions inspired by a priori knowledge of the intensity and complexity of historic land use, and informed by systematic, cross-taxa analysis of priority species and their ecological requirements. Sampling over 130,000 invertebrates and using 28,000 observations of plants showed that using ground disturbance to create treatment complexes increased the overall richness of both non-priority and priority species, while complexes held more unique priority species than controls. Within treatment complexes, the 1-year-old deep-cultivated fallow subtreatment increased priority species richness across multiple taxa, but assemblage composition varied between all subtreatments. Our studies demonstrate that providing the full subtreatment complement, through different establishment methods, will cater for the needs of the widest range of species.
Both treatment complexes increased structural complexity and supported a greater overall richness of non-priority and priority species than controls, consistent with the well-established benefits of habitat heterogeneity (Stein et al., 2014). More surprising was the magnitude of response-especially of priority species-which nearly doubled in richness with the complex treatment. Although priority species responded particularly well to the 1-year-old deep-cultivated fallows (but not the shallow-cultivated equivalent), the shallow-and deep-cultivated complexes were as effective at enhancing overall priority species richness, potentially as responses to each establishment method varied between taxa. For example, priority ground beetles responded more strongly to the barer deep-cultivated F I G U R E 5 Redundancy analysis (RDA) relating assemblage composition to subtreatment and strata, separately for each of seven taxa (pooling non-priority and priority species). See Table S8 for model details. For each model, the sample size is reported at the top left of the panel, the overall percentage variance explained (constrained in bold type; unconstrained in parentheses), adjusted R 2 and the constrained variance explained by each axis (adjacent to each axis) are also shown; open circles show individual samples. Red points denote the centroid of each subtreatment and strata category. For bees and wasps and true flies, limited numbers of sampled individuals prohibited separate examination complexes; probably because some important ruderal food plants (e.g. Chenopodium album) were more abundant on this treatment (Table S3; most priority ground beetles were granivores). Priority true bugs, by contrast, responded more strongly to the more vegetated shallow-cultivated complexes; possibly because some species were less tolerant of the more intensive deep-cultivated treatment (subtle differences in management can drastically alter leafhoppers communities, Biedermann et al., 2005). Interestingly, priority bee and wasp richness was similar across regimes; it is likely that many of these species utilised the bare-open subtreatments on the treatment complexes for nesting, and the fallowed subtreatments and controls for foraging. Pan traps may have also sampled large numbers of wide-ranging individuals.
Within treatment complexes, composition was distinct between individual subtreatments for nearly every taxonomic group considered, suggesting the efficacy of the complex was not attributable to one subtreatment per se but the full subtreatment complement.
Juxtaposition of cultivation subtreatments within complexes may have further increased richness by providing, in close proximity, the contrasting micro-habitats needed by species whose requirements vary during their life cycle. Irrespective of this, the provision of additional structural complexity by overlapping and abutting treatments was simpler and less costly than creating an equivalent set of independent, isolated, subtreatment plots.
At the complex level, treatment effects on richness were greater for priority than for non-priority spiders, ground beetles, other beetles and true bugs (shallow-cultivation only), perhaps because for these groups species associated with 'short sward' or 'short sward and bare ground' comprised a greater proportion of the priority than of the non-priority species ( Figure S7). As complexes were optimised to inferred cross-taxa requirements of the largest number of priority species , it is encouraging that benefits from treatments were much greater for priorities than for non-priorities across most groups, but also appeared to avoid penalty to species not classified as requiring physical disturbance.
Across most taxonomic groups, both complex treatments and controls supported a similar proportion of unique non-priority and priority species. Supported by our analysis of assemblage composition at the subtreatment level, this demonstrates that, while complexes may increase richness across most taxonomic groups, no single establishment method can deliver the resource requirements of the whole species pool (a central finding of Fuller et al., 2017). To cater for the broadest range of species, efforts to implement enhanced management should adopt different establishment methods to create complex nested heterogeneity, while also retaining some untreated habitat; however further work would be required to optimise the relative extent of treatments to untreated resource. While the present study focused on maximising cross-taxa richness of priority species, as this is an appropriate goal for conserving biodiversity, further research would be needed to determine the consequence of these treatments for the phylogenetic or functional diversity that may be relevant to other ecosystem services.
The mosaic of grasslands on different soils support contrasting plant assemblages including calcareous grassland, acid grassland or lowland heathland (Rodwell, 1991(Rodwell, , 1992, that differed in composition for most invertebrate groups. Nevertheless, they were subject to similar historic land use (Fuller et al., 2017;Sheail, 1979) and are characterised by similar ecological processes of nutrient limitation, drought and physical disturbance (Dolman & Sutherland, 1992).
Crucially, treatment efficacy for biodiversity never varied between strata, indicating results are applicable to a wide range of dry grassland habitats, regardless of compositional differences. Similarly, across fen habitat and the superficially very different wet grassland-ditch complexes of grazing marsh habitat, biodiversity audit analysis of priority species' requirements showed similar functional dependence on littoral margins, undulating topography, early and late succession and extended management rotations . Together, these findings challenge a long-held paradigm that difference in plant composition between functionally similar habitats predicate differences in conservation practice.
Based on a bioregional analysis of priority species requirements and knowledge of land-use history, we predicted that dry-open habitat species associated with heavy disturbance would respond more strongly to treatment than those associated with little or no disturbance. For priority invertebrate species, those a priori thought to be associated with the heaviest forms of physical disturbance responded most strongly to treatment, regardless of whether these were classified along a single composite physical disturbance gradient (habitat guilds) or on independent gradients of grazing intensity and ground disturbance (management guilds). This confirms the importance of prevailing historic management to extant priority species (Fuller et al., 2017) and the success of the Biodiversity Audit approach in targeting enhanced management interventions appropriate to the ecological requirements of priority species . For non-priority invertebrates, those apparently associated with short swards and bare ground responded positively to deep-cultivations, while those associated with tall swards and scrub responded positively to both treatments. This again reflects the structural heterogeneity of treatment complexes, which provided open, short swards on recent cultivations and 1-year-old fallows, and provided taller swards on the 2-year-old fallows. While this experiment focused on mechanical interventions, other more natural approaches-such as the use of wild boar Sus scrofa or large herbivores-can promote dynamic mosaics with resulting benefits for some priority species and taxa (De Schaetzen et al., 2018;Garrido et al., 2019) and may be equally effective for priority species associated with heavy forms of disturbance. Fuller et al. (2017) showed that a better appreciation of the complexity and intensity of historical management, combined with knowledge of priority species requirements, encourages novel forms of enhanced intervention within cultural landscapes. Through an unprecedented landscape-scale biodiversity experiment, we confirm that restoring structural complexity and nested heterogeneity to grassland, irrespective of fine-scale differences in vegetation structure and composition, both increased non-priority species richness and, crucially, doubled priority species richness. To maximise cumulative richness, complexity should be created through a range of establishment methods, as shallow-or deep-cultivation each supported unique species. Additionally, within the complexes, the full subtreatment complement is needed to support the widest suite of species (1-and 2-year-old fallows, repeat and first-time cultivations), as assemblage composition varied with subtreatment. This management should be implemented in such a way that treatment complexes are surrounded by untreated habitat, which supports its own distinct assemblage and unique species.

| CON CLUS IONS
Responses to treatments varied considerably across taxa, for example, differing markedly between priority ground beetles and rove beetles at the complex level; this emphasises the value of multi-taxa sampling when evaluating the biodiversity consequence of management interventions (e.g. Vessby et al., 2002;Yong et al., 2020).
While taxonomic surrogacy may therefore be inadvisable, multi-taxa species groups defined by their ecological associations (habitat-and management-guilds) usefully predicted responses to interventions.
Thus, systematic analysis of the relative frequency of species with contrasting resource requirements, across the full complement of priority species, can also inform management strategies and prescriptions in other biogeographical regions, without the need for costly multi-taxa experiments. Where biodiversity is well-characterised and autecological knowledge is strong (e.g. much of Europe), we recommend such regional Biodiversity Audits, synthesised with a detailed understanding of historic land use, to better inform conservation interventions. These findings could be extrapolated to other biogeographical regions that comprise similar semi-natural habitats but lack equivalent levels of biodiversity data and autecological knowledge.

ACK N OWLED G EM ENTS
The Royal Society for the Protection of Birds and Natural England

DATA AVA I L A B I L I T Y S TAT E M E N T
Data available via the Dryad Digital Repository https://doi. org/10.5061/dryad.f7m0c fxv0 (Hawkes et al., 2020).