Designing forest biodiversity experiments: general considerations illustrated by a new large experiment in subtropical China

Biodiversity–ecosystem functioning (BEF) experiments address ecosystem‐level consequences of species loss by comparing communities of high species richness with communities from which species have been gradually eliminated. BEF experiments originally started with microcosms in the laboratory and with grassland ecosystems. A new frontier in experimental BEF research is manipulating tree diversity in forest ecosystems, compelling researchers to think big and comprehensively. We present and discuss some of the major issues to be considered in the design of BEF experiments with trees and illustrate these with a new forest biodiversity experiment established in subtropical China (Xingangshan, Jiangxi Province) in 2009/2010. Using a pool of 40 tree species, extinction scenarios were simulated with tree richness levels of 1, 2, 4, 8 and 16 species on a total of 566 plots of 25·8 × 25·8 m each. The goal of this experiment is to estimate effects of tree and shrub species richness on carbon storage and soil erosion; therefore, the experiment was established on sloped terrain. The following important design choices were made: (i) establishing many small rather than fewer larger plots, (ii) using high planting density and random mixing of species rather than lower planting density and patchwise mixing of species, (iii) establishing a map of the initial ‘ecoscape’ to characterize site heterogeneity before the onset of biodiversity effects and (iv) manipulating tree species richness not only in random but also in trait‐oriented extinction scenarios. Data management and analysis are particularly challenging in BEF experiments with their hierarchical designs nesting individuals within‐species populations within plots within‐species compositions. Statistical analysis best proceeds by partitioning these random terms into fixed‐term contrasts, for example, species composition into contrasts for species richness and the presence of particular functional groups, which can then be tested against the remaining random variation among compositions. We conclude that forest BEF experiments provide exciting and timely research options. They especially require careful thinking to allow multiple disciplines to measure and analyse data jointly and effectively. Achieving specific research goals and synergy with previous experiments involves trade‐offs between different designs and requires manifold design decisions.


Introduction
Positive effects of biodiversity on the functioning of ecosystems have been observed in numerous experiments (Loreau, Naeem & Inchausti 2002;Hooper et al. 2005;Balvanera et al. 2006;Worm et al. 2006;Duffy 2009). Starting from the first experiments in climate chambers (Naeem et al. 1994) and on grassland field plots (Leadley & K€ orner 1996;Tilman, Wedin & Knops 1996), experiments have become more and more sophisticated, often in constructive response to criticism (e.g. Grime 1997;Huston 1997;Schmid et al. 2002). Most biodiversity-ecosystem functioning (BEF) experiments have employed small model systems with fast-growing primary producers, in particular herbaceous plants (for reviews see Loreau, Naeem & Inchausti 2002;Hooper et al. 2005;Scherer-Lorenzen et al. 2007;Cardinale et al. 2011). However, considering the large contribution of forests to ecosystem services such as carbon storage, climate regulation, water filtration or erosion control at the global scale (Durieux, Machado & Laurent 2003;Bala et al. 2007;Quijas et al. 2012), it is important to test whether the results obtained for the simpler systems of smaller and short-lived organisms can be extrapolated to forest ecosystems harbouring the largest and longest lived plant species on land.
Recent studies and meta-analyses using forest plots from sample surveys in established forests indeed found significant correlations between tree species richness and ecosystem properties (e.g. standing biomass and associated diversity of faunistic groups) and processes (e.g. litter decomposition, herbivory, productivity; Gamfeldt et al. 2013;Scherer-Lorenzen 2013;Chisholm et al. 2013). Despite a potential publication bias, tree diversity might thus indeed play a critical role for ecosystem functioning. However, as with all observational studies, it is not clear whether these correlations reflect causal relationships, in which direction causality works, or whether additional 'third' variables are involved, for example, stand age or tree density (Marquard et al. 2009a). In recent years, a number of new BEF experiments have, therefore, been initiated with plots deliberately planted with different tree species richness and composition (Scherer-Lorenzen et al. 2005a;Nadrowski, Wirth & Scherer-Lorenzen 2010;Verheyen & Scherer-Lorenzen 2012). Here, we discuss the major issues that we encountered when designing a new forest BEF experiment in subtropical China (referred to as BEF-China). We draw from experience with previous experiments and explain our own design choices to provide a case-study example. While we are not striving for a complete review of forest BEF experiments, we aim to provide guidelines to assist others in designing and establishing further forest BEF experiments.

M A J O R Q U E S T I O N S A D D R E S S E D B Y B E F E X P E R I M E N T S
Major questions addressed by BEF experiments so far have been whether random species loss, mostly of plants, can negatively affect ecosystem functioning, in particular primary productivity, nutrient cycling and the diversity and abundance of other trophic groups (Balvanera et al. 2006). This has most often been addressed by assembling experimental communities with different species numbers. In other words, the scenario simulates the random extinction of species from a local species pool. Extinction is simulated by leaving species out when sowing or planting the experimental communities, usually at constant total densities of individuals per community. From these major BEF questions, numerous additional questions can be derived, which will influence the particular choice of the experimental design .
These additional questions include mechanisms through which biodiversity may enhance ecosystem functioning, in particular whether average performance across all species is higher at higher diversity (the complementarity effect, Loreau & Hector 2001) or whether diverse mixtures have a higher chance to contain high-performing species (sampling or selection effect, Huston 1997;Loreau & Hector 2001). Examining this question requires growing all species in monocultures, an aspect not considered in early biodiversity experiments. When all species are available in monoculture, the additive partitioning method of Loreau & Hector (2001) provides a statistical method to separate complementarity and selection effects, but further design and measurement aspects need to be considered to analyse the underlying mechanisms of these statistical effects, for example, resource partitioning between species (for example see von Felten et al. 2009).
Other additional questions concern the particular aspect of biodiversity causing an effect, including whether it is species richness per se or the functional diversity of a community that matters (Hooper et al. 2005). The latter would be expected if biodiversity effects are related to functional differences among species, for example, regarding resource uptake, as has been shown for the combination of legumes and grasses in grassland systems (Spehn et al. 2002). To disentangle species richness and functional diversity effects, special designs can be envisaged that vary functional diversity within species richness levels and vice versa, yet this is difficult to achieve in a balanced way (see e.g. Le Roux et al. 2013 ; Table A1). Furthermore, differences among plant species in attracting pests and pathogens may cause positive biodiversity effects (Petermann et al. 2008;Schnitzer et al. 2011). In the German BIOTREE experiment, Hantsch et al. (2013) demonstrated a negative effect of tree richness on the pathogen load of common powdery mildew species on Quercus petraea. Finally, a rarely tested question is whether genetic diversity within species affects ecosystem functioning (e.g. Crutsinger et al. 2006;Moreira & Mooney 2013).
Further questions address the type of species loss simulated in biodiversity experiments. Whereas initial BEF experiments (Naeem et al. 1994;Leadley & K€ orner 1996) simulated random extinction scenarios by nesting all species compositions of lower diversity within compositions of the next higher diversity, most new experiments do not nest compositions anymore [but see Bell et al. (2005) with a new random partitions design (Bell et al. 2009)]. This unfortunately reduces the power to disentangle effects of diversity and composition. In any case, random extinction scenarios are the best choice in the absence of information about drivers of extinction, or if drivers do not lead to biased extinction of species with particular contributions to ecosystem functioning. However, if such biased extinction does occur, appropriate nonrandom extinction scenarios should be used (see e.g. Schl€ apfer, Pfisterer & Schmid 2005). These may reflect the preferential loss of species with particular traits and thus particular contributions to ecosystem functioning. As a cautionary note, it should be mentioned that when sufficient information about extinction drivers is lacking or several, perhaps diffuse, extinction drivers interact with each other, nonrandom extinction scenarios focusing on a wrong or a single major cause of extinction may be even more unrealistic than random extinction scenarios.
Finally, there are questions concerning the context in which BEF experiments are conducted. Biodiversity effects may differ between ecosystem types such as aquatic vs. terrestrial (Cardinale et al. 2006) or grasslands vs. forests (for which we address design questions here), between homogeneous and heterogeneous environmental conditions (Wacker et al. 2008), between low-and high-nutrient environments (Weigelt et al. 2009), and between different ecosystem functions such as productivity vs. erosion control. Different contexts may require additional treatments or measurements in BEF experiments. In particular, it is increasingly recognized that biodiversity effects may be most relevant when multiple ecosystem functions are considered (e.g. Hector & Bagchi 2007;Zavaleta et al. 2010;Isbell et al. 2011;Pasari et al. 2013). This finding might be explained by trade-offs between ecosystem functions that may be more severe in low-than in high-diversity ecosystems.

U S I N G F O R E S T E X P E R I M E N T S I N B E F R E S E A R C H
Forests are globally important ecosystems because of their wide geographical cover and their prominent standing biomass, thus provisioning unique ecosystem goods, such as timber, food, fuel or medicinal plants, and delivering services for humans, such as carbon sequestration, soil erosion control, water retention and purification, nutrient provision, local climate regulation, global climate change mitigation and cultural services (Quijas et al. 2012). Understanding the role of biodiversity for forest ecosystems renders forest BEF experiments highly relevant.
Despite a number of disadvantages compared with smaller, short-lived organisms such as microbes and annual or perennial plants (the most commonly used test organisms in BEF experiments), trees also have some advantages when used in BEF experiments. First, and as already discussed in detail by Scherer-Lorenzen, K€ orner & Schulze (2005b) and Scherer-Lorenzen et al. (2007), tree diversity experiments allow for working at the level of single individuals to quantify mechanisms underlying BEF relationships, such as the role of demographic processes in plant-plant interactions, albeit at a constant and thus somewhat unnatural spacing distance between individuals (in the absence of mortality). For example, net biodiversity effects observed at the community level could be traced back to enhanced per capita growth of single individuals Potvin & Dutilleul 2009). Individual-based approaches also allow for determining the role of intraspecific trait plasticity for complementarity effects, or elucidating the role of individual vs. population fluctuations for ecosystem stability. Second, as species have different growth rates, interactions between different species are likely to change with age. Because biodiversity effects become more pronounced with time in grassland experiments (Marquard et al. 2009b;Reich et al. 2012), we may expect even stronger effects with trees. However, BEF experiments with trees contain a single cohort of individuals, whereas age-structured populations and communities can develop in BEF experiments with smaller organisms, including grassland perennials. Third, the large sizes of tree individuals allow for a more detailed study of interactions with the local topographic, microclimatic and edaphic environment, for which we here use the term 'ecoscape', and which can potentially be both co-driver and response of biodiversity effects in forest BEF experiments.
In addition to some of the above-mentioned disadvantages of using trees and shrubs in BEF experiments, they pose the following more specific challenges. Most important of all, they need large areas and long experimental time spans. Related to the latter are the potentially later onset of direct tree-tree interactions (whereas indirect interactions, e.g. via pathogens, may start early) between individuals due to large planting distance and slow growth compared with other plants and smaller, short-lived organisms.

A short manual for forest BEF experiments
The slow growth, large size and longevity of trees affect the appropriate plot size, planting density and mixing pattern for the observation of processes and functions that are close to those of natural mature forest stands. Furthermore, the large spatial scale causes high variation in initial microclimatic and edaphic conditions within plots and across the whole experimental site. This variation needs to be considered to obtain snap-shots of the ecoscape against which later changes in soil microclimate, soil properties and matter fluxes at the plant-soil boundary can be compared.
Biodiversity experiments rely on a species pool from which the different species combinations are then sampled to create communities of different species richness. This sampling can be performed in many ways. Here, we focus on the distinction between simulated random extinction scenarios, which at the same time ascertain that all species occur at all diversity levels in the same number of plots, and nonrandom extinction scenarios that would occur if known extinction drivers differentially affected species, for example rare species or species with particular traits.
We illustrate the design issues related to the different aspects marked above in bold with examples from the BEF-China experiment established in subtropical China in 2009/2010. Using a pool of 40 tree species, planted on 566 plots over a net area of 38Á4 ha, we manipulated species richness and composition as well as genetic diversity to study their effects on a range of ecosystem functions, including primary productivity, carbon and nutrient cycling, soil processes and abundance and biodiversity of other trophic groups. A sloped experimental site was chosen to assess biodiversity effects on soil erosion as a particularly relevant ecosystem function provided by forests in subtropical China and elsewhere (Wang et al. 2005;Geißler et al. 2013). Table 1 provides a decision table to illustrate the decisions to be made when designing a BEF experiment with trees as well as the decisions taken in the specific case of BEF-China. The next sections follow the order of rows in Table 1. As specific questions in other forest, BEF experiments are best met with particular designs, the decisions to be made may follow different routes.

P L O T S I Z E
The decision on plot size can be considered crucial as the relationships between tree diversity and ecosystem functions such as productivity in non-experimental studies have been found to be scale-dependent (Chisholm et al. 2013). As a rule of thumb, to prevent edge effects, Scherer-Lorenzen et al. (2005a) suggested a side length of a plot of about twice the maximum final tree height, usually corresponding to a plot area of 0Á5-1 ha. However, the only two forest BEF experiments using plots of this size or even largerthe Borneo experiment ) and the BIOTREE experiment in Germany (Scherer-Lorenzen et al. 2007; Fig. 1)could only be set up in the context of large afforestation projects. Such contexts may not always be available or impose restrictions on other design choices. While such large plot sizes help to reduce edge effects when plots are surrounded by open land, they may not be required in a matrix of established forest or of adjacent experimental forest plots. However, arranging plots of different species composition next to each other will create more subtle between-treatments edge effects. Thus, trees along the plot margins should be treated differently from central trees and ecosystem functions should be measured in the centre of plots, as we do in our BEF-China experiment. Interestingly, in the only grassland biodiversity experiment addressing plot sizes, biodiversity effects did not differ at all between grassland plots of 3Á5 9 3Á5 and 20 9 20 m, and the ecosystem function of plant productivity could be measured successfully on even smaller scales (Roscher et al. 2005).
For statistical reasons, we strongly advocate minimizing plot size in favour of greater number and improved management (e.g. weeding) of plots. More replicates increase the statistical power to detect plot-level biodiversity effects Table 1, no. 1). The power analysis in Fig. 2 shows the probability of detecting the biodiversity effect on basal area we measured in our comparative study plots (CSPs) in a natural forest near to the experimental site (Gutianshan National Nature Reserve, hereafter GNNR; Bruelheide et al. 2011). A statistical power of 0Á8 is achieved with 35 plots. Thus, 35 plots (e.g. of one of our random extinction scenarios, Table 2, second line) provide us with an 80% chance to detect a biodiversity effect comparable with that found in our CSPs.
It is desirable that the plot size in forest BEF experiments is at least as large as needed to potentially accommodate all species at the highest diversity level with at least one mature individual. Whether this will eventually be achieved of course not a choice left to the researcher, because differential mortality and potential extinction of species from plots is a normal and accepted feature in BEF experiments (e.g. with a strong selection effect few species may eventually remain in an originally diverse mixture, what matters is that the few species 'were selected' from an originally large number of species). Nevertheless, extinctions will be less likely if initial population sizes of species are not too small (Outbor 1993). The desirable minimum plot size thus increases linearly with the highest diversity level included in a BEF experiment and with the reciprocal of planting density, that is, the square of distance between trees.
In BEF-China, we calculated the desirable minimum plot size such that 130 years after establishment a single plot could still hold one mature individual for each of 16 species in the most extreme case. With a mature average canopy size of 36 m 2 , as expected for 130-year-old trees in subtropical forests of the study region (Bruelheide et al. 2011 and A.C. Lang, pers. comm..), the minimum plot size would then be 576 m 2 , which we increased to 667 m 2 , that is, 25Á82 9 25Á82 m, the traditional Chinese areal unit of 1 mu (Table 1, no. 1). This size will allow some species in diverse plots to have clearly above-average canopy sizes (whereas others will have belowaverage canopy sizes). Our CSPs mentioned previously also have a size of 1 mu appropriate to record plot-level forest functions such as tree growth , rainfall characteristics (Geißler et al. 2013) and herbivory (Schuldt et al. 2012). However, because we also aimed at higher population sizes per tree species (a minimum of four individuals per species per plot) and to allow for a nested shrub diversity treatment, we assembled some of our 1-mu plots with identical tree species composition into parcels of 2 9 2 plots, yielding 4-mu (0Á267 ha) 'super-plots' (see first line in Table 2). Given a total area of 50 ha available at our two experimental sites, a plot size of 1 mu allowed us to establish 566 plots (Table 1, no. 2).

P L A N T I N G D E N S I T Y
It is common practice in forestry to plant higher densities of saplings than the desired final tree density, anticipating losses and allowing for selection (Nyland 2002; Scherer-Lorenzen  (Both et al. 2012;Lang et al. 2012). While higher initial densities imply higher costs for raising and planting more seedlings, they increase the flexibility to maintain desired species richness and evenness, in particular in the case of irregular natural mortality of seedlings. Higher initial densities also allow for more rapid canopy closure, which facilitates suppression of weeds, commonly the largest management effort in biodiversity experiments (Marquard et al. 2009b; see also Weiner et al. 2010).
Planting distance between individual saplings in the BEF-China experiment was 1Á29 m in horizontal projection ( Fig. 3a) which is rather low compared with other forest BEF experiments (Scherer-Lorenzen et al. 2005a). This corresponds to 400 seedlings per 1 mu or c. 6000 seedlings per ha (Table 1, no. 3). As all plots in our experiment were planted at the same community density, population sizes per species decrease inversely with increasing species richness. This substitutive design (de Wit 1960) is commonly used in BEF research and avoids a problem encountered with additive designs, which lead to larger apparent ecosystem functioning effects of higher diversity than substitutive designs do, which, however, are due to confounding community density and diversity (Balvanera et al. 2006). To maintain our desired plot compositions during the phase of early establishment, herbaceous and nonplanted woody species have been removed twice per year from all plots in weeding campaigns. Originally, the planned duration of the experiment over 100 years was considered appropriate to allow all species to equilibrate with associated soil microbial communities (e.g. Visser 1995), herbivores and predators (e.g. Watt, Hunter & Stork 1997) and to capture the most relevant forest dynamics (e.g. Pretzsch & Schu¨tze 2009). However, based on our experiment, we can now testify that interactions between tree species start even in the first years after planting. For example, crowns of the fast-growing species began to touch each other in the third year. This is also supported by observations on crown interactions between the initially fastgrowing deciduous and some of the initially slow-growing evergreen broadleaved species in native forest remnants of the study region (von Oheimb et al. 2011).

M I X I N G P A T T E R N
The spatial arrangement of individuals and species may affect species interactions (Pacala & Deutschman 1995;Stoll & Prati 2001;. While regular mixing of species maximizes interspecific interactions, patchwise mixing minimizes them, at least during early stages of stand development (Scherer-Lorenzen et al. 2005a). These two approaches define the endpoints of a design trade-off between optimizing early detection of biodiversity effects and long-term maintenance of biodiversity gradients. Weak competitors among the species may quickly be out-competed by strong ones in regularly mixed plots (e.g. ABABABAB, where each letter represents an individual of species A or B), reducing 'realized' species richness. This may be prevented by patchwise planting (e.g. AAAABBBB) to ensure survival of weak competitors with minimum tending efforts (Saha et al. 2012). For example, patchwise mixing of species was used in the BIOTREE experiment in Germany, where patch size was defined as the mature crown area of the species with the largest crown requirement (Scherer-Lorenzen et al. 2005a). However, patchwise mixing has the disadvantage that most neighbour interactions between trees are initially intraspecific, even in diverse plots.
In BEF-China, we first considered to use patchwise vs. individual mixing of species as an additional split-plot treatment but then decided against to reduce the complexity of the design. Instead, we assigned species randomly to individual planting positions (e.g. ABBABAAB) on a regular quadratic grid within plots (Table 1, no. 4). With this, we aimed to avoid the potential disadvantages of either regular individual or patchwise mixing. In addition, random mixing has the advantage that local neighbourhood diversity will vary to some degree, which is more natural and allows a richer analysis of the response of individual trees to diversity than do the other two planting schemes. For example, the regular mixing of six species in  led to a maximum local diversity of only three species. With random mixing, local neighbourhoods vary from clumped, that is, monospecific patches, to very diverse patches. In our case, the most diverse possible neighbourhood would include eight different species in the four closest and four diagonal neighbours around a single target tree.

T O P O G R A P H Y
Generally, BEF experiments, including those involving trees, have been established on flat land (Fig. 1, but see Paul et al. 2012). This facilitates manual work. Moreover, ecosystem variables expressed per horizontal area do not need to be corrected for slope. In many regions around the world, flat land is more fertile than sloped land, due to alluvial deposits, and is thus used for agriculture, whereas forests are allowed to grow on slopes, where in addition to providing timber they help to prevent landslides, rockslides, avalanches and erosion. However, establishing forest BEF experiments on slopes poses additional challenges such as dealing with topographic heterogeneity within and among plots, substantially more manual work (Fig. 3b) and issues of slope correction, that is, whether ecosystem variablesincluding the density of individuals and of species, standing crop, litter decomposition, stocks of elements, Power analysis for detecting the biodiversity effect on basal area we measured in our Comparative Study Plots. The analysis was based on the 12 youngest plots with a stand age of <60 years since the last logging event in the Gutianshan National Nature Reserve  close to the experimental site. Using the effect size and variance of these plots, we simulated 1000 data sets for sample sizes from 5 to 200 plots. Each of these data sets was then analysed for a biodiversity effect. We counted the number of analyses, in which the biodiversity effect given in the simulation was significantly different from zero at a 5% level. The statistical power (black line) shows the proportion of the significant analyses. The horizontal red lines indicate the thresholds of 80% and 100% statistical power. The vertical lines indicate the sample sizes of three main aspects of the experiment: the trait-oriented extinction scenarios using specific leaf area (SLA) and rarity (SLA/rarity; lines 5 and 6 in Table 2), the first random extinction scenario on large plots (VIP, Very Intensively Studied Plots, line 1 in Table 2), and the complete sample size of the random extinction scenarios, including replicates and the 2nd and 3rd random extinction scenarios (Random; lines 1-4 in Table 2). Broken lines indicate the sample size of one of the experimental sites (A or B), solid lines the sample size for both experimental sites combined. The intersection of the power curve in black and the sample sizes for the different aspects of the experiment give the statistical power of this part of the experiment. The trait-oriented scenarios SLA/rarity exceed a statistical power of 80% when the two sites A and B are combined, while in the VIP random extinction scenario on the large plots, the statistical power of 80% is already achieved using the sample size of one of the sites alone.
nutrient turnover and erosionshould be measured per surface area or per area of horizontal projection.
In BEF-China, we established our plots on sloped terrain (average slope 27Á5°for site A and 31°for site B, Table 1, no. 5), accepting the additional topographic heterogeneity across spatial scales, which we address in the next section ('Ecoscape'). We plan to do slope corrections and will express ecosystem variables per area of horizontal projection, to be consistent with geographical maps. However, we did encounter a problem when measuring root biomass and productivity with soil cylinders and in-growth cores; here, we decided to change from vertical positioning to positioning perpendicular to the slope.

E C O S C A P E
To cope with large-scale spatial heterogeneity of experimental sites, it is generally recommended to block experimental plots (Dutilleul 1993). Blocking is also used to avoid accidental clumping of similar treatments that may occur in completely randomized designs (Hurlbert 1984). However, when there are many different treatment combinations, blocks would have to be very large to contain each treatment combination at least once, rendering it difficult to delineate blocks. In such cases, it can be preferable to use a completely randomized design (Table 1, no. 6). Even without blocking the potential of accidental spatial clustering of replicates of the same treatment combinations can be avoided by constraining complete plot randomization with a minimum distance rule that must be maintained between replicates. Furthermore, blocks can still be defined a posteriori to account for spatial heterogeneity at the between-plot scale. Alternatively, environmental covariates can be measured and used to adjust for potential environmental heterogeneity across a study site, an approach that we refer to here as 'ecoscape' and which we explain in detail in the following two paragraphs.
The reason to adjust for environmental heterogeneity in BEF experiments is that it may interfere with biodiversity effects on tree growth. For example, in the Sardinilla forest, BEF experiment in Panama even comparatively low environmental heterogeneity had strong effects on productivity and plant mortality (Healy, Gotelli & Potvin 2008). If environmental heterogeneity has a low-dimensional spatial pattern, it can usually be accounted for by using spatial coordinates and func- The total number of 566 (=271 + 295) refers to 1-mu plots. For the selection of species compositions and monocultures (i.e. the first random extinction scenario referred to as species pool 1 at each site), four such plots were arranged in a 2 9 2 plot super-plot (first line in each of the tables). Within these super-plots, the four plots ranged in shrub richness from 0 to 2 to 4 to 8 species. The second line shows the replicates of the species compositions of these scenarios. The third and fourth lines refer to the random extinction scenarios starting from species pools 2 and 3, respectively, at each site (see Table A1). The fifth and sixth lines show the trait-oriented (nonrandom) extinction scenarios for the specific leaf area (SLA) and rarity scenario, respectively (same at the two sites). The seventh line shows the plots used for commercial plantation species, with five plots each per site for Cunninghamia lanceolata and Pinus massoniana. Finally, the eighth line in the table for site B shows the plots for the genetic diversity scenarios, which were only established at site B. Finally, there are 3 free succession plots at each site. In consequence, the total numbers of 1-mu plots are 271 and 295 for sites A and B, respectively, not considering 0.25-mu plots for shrub monocultures.
tions of them as covariates (see e.g. Le Roux et al. 2013 for a grassland BEF experiment). However, if this is not the case, the environmental heterogeneity should be measured directly to characterize the ecoscape of the site at which an experiment is established. It is then possible to use the environmental measures defining the ecoscape as covariables in the statistical analysis of biodiversity effects. When using an ecoscape approach, it is important to recognize that biodiversity itself can and will influence the ecoscape as the experiment progresses. However, quantifying the effect of how biodiversity affects the ecoscape can only be performed if the initial ecoscape has been measured. While it is necessary to control for environmental covariates when testing biodiversity effects, it is also important to be able to generalize BEF relationships across different sites varying in topography, geology, soils and climate (Table 1,no. 8). For the greatest possible generalization, each experiment should be replicated at different sites in a region  and with different species pools (Wacker et al. 2009).
In the very heterogeneous slopes of our BEF-China experiment, and given the many diversity levels, blocking was not possible. We therefore used a completely randomized design with the constraint that replicates of the same treatment combination were located at least 100 m apart from each other. We are now characterizing the ecoscape of the experimental sites (Table 1, no. 7) by taking initial measures of environmental variables, such as soil erosion, evaporation and plant water availability in a grid across the study plots, and also measuring density and other features of herbaceous vegetation within plots (Both et al. 2012). We will then be able to relate these environmental variables to spatial covariates such as coordinates, altitude, slope and aspect (Yang et al. 2013). Later, it will allow us to analyse how diversity treatments modify the ecoscape during the course of time. Finally, to increase generality, we established plots at two sites A and B (Fig. 4) with two separate species pools that overlap by only eight of 40 total species in our BEF-China experiment.

S P E C I E S P O O L
A crucial aspect in BEF experiments is the species pool, the total set of species from which individual species are chosen to create experimental communities. It should represent the full tree community that could naturally occur at a site to enable generalization with respect to the natural ecosystem. However, many biodiversity experiments start with already reduced local species richness at the highest diversity level. For example, six and 28 species used in a tropical forest BEF experiment in Panama (Potvin & Dutilleul 2009;C. Potvin pers. comm..) or 16 dipterocarp species used in a similar experiment in Borneo ) represent a small proportion of species and functional diversity typically occurring in adjacent natural forests. In other cases, the opposite occurs: the five species used in a boreal forest BEF experiment in Finland (Vehvil€ ainen & Koricheva 2006), and the 16 species used in a temperate forest BEF experiment in Germany (Scherer-Lorenzen et al. 2007) probably represent a higher species and functional diversity than typically occurs in adjacent natural forests.
Beside reflecting natural species assemblages, large species pools also allow for planting higher species diversity and functional diversity per plot. Assuming that species differ in their traits because of limiting-similarity constraints (MacArthur & Levins 1967;Pacala & Tilman 1994;Weiher & Keddy 1999), larger species pools should cover larger portions of the multivariate trait space. Inconveniently, however, larger species pools also increase the number of required plots. Moreover, to assess species-specific contributions to the functioning of mixtures, all species should also be studied in monocultures (Loreau & Hector 2001). The number of monoculture plots can easily reach or exceed the number of mixture plots when a large species pool is used. Thus, one of the greatest challenges for large forest BEF experiments is raising sufficient numbers of individuals of all desired species. This is a tremendous logistic endeavour in countries with high tree species richness where often only a small proportion of them are used in forestry and where the majority of other species cannot be acquired from commercial nurseries (Yang et al. 2013).
In BEF-China, we included 40 native broad-leaved tree species and 18 shrub species in the entire species pool, which is high compared with other BEF experiments (Table 1, no. 9; Appendix Table A1). In addition, we added two commonly used commercial coniferous plantation species for monoculture comparisons (Cunninghamia lanceolata and Pinus massoniana). We were able to grow more than 1 million seedlings of our target species in two specifically contracted nurseries (Fig. 3c). At the time of planting, all seedlings had approximately the same age between 1 and 2 years and had a minimum height of 20 cm to allow for placing metal labels on them and insure their maintenance. To avoid the problem of increasing overlap of species compositions with increasing species richness (variance-reduction effect of Huston 1997; see following section), we used random extinction scenarios in which three (overlapping) pools of 16 tree species at each site were divided into nonoverlapping communities at lower diversities (see following section; Table 2, Table A1). In addition, we included an extra-'high' richness level of 24 tree species, combining species from the different pools (Table 1, no. 10). While we did not consider functional diversity as an additional design variable, we insured that our tree species represented a large range of families. This increases the chance that high species richness levels will also reflect high functional diversity levels but has the disadvantage that our design cannot distinguish between the two. Nevertheless, we will be able to use functional richness measures instead of species richness in forthcoming analyses as we are assessing traits for all species in our experiment. Furthermore, our design, like other BEF experimental designs, always allows us to contrast communities in which particular functional groups of species such as deciduous or evergreen trees are present with communities in which they are absent.
The 18 shrub species were used to create shrub-richness levels of 2, 4 or 8 shrub species of a total pool of 10 shrub species per site. These were factorially crossed with tree species richness in the 4-mu super-plots (mentioned in the section 'Plot size' above) for one of the three random extinction scenarios at each site (yielding two random scenarios with shrub diversity treatment in total) and for the 24-tree species communities. The shrubs were planted in the interspaces of the trees, with the same density (400 individuals per mu), and thus, had the same distance as trees from each other (1Á29 m) and a distance of 0Á91 m to the nearest tree neighbour. In the end, our plots with the highest diversity of 24 tree plus eight shrub species per 1 mu attained a richness level comparable with the mean richness of natural forests near the experimental site, where we observed a mean richness of 23Á4 tree and 18Á4 shrub species per 30 9 30 m CSP .

R A N D O M E X T I N C T I O N S C E N A R I O
As the fundamental question of BEF experiments is whether random species loss reduces the degree of ecosystem functioning, a central design question has always been how such species loss is best achieved (Lamont 1995;Allison 1999;Mikola, Salonen & Setala 2002;Scherer-Lorenzen et al. 2005a;Balvanera et al. 2006;Schmid et al. 2008;Table 1, no. 11).
The first issue is how to create a gradient of diversity, which can be performed by sowing or planting defined num- bers of species into plots or by deleting species from existing diverse communities. In most BEF experiments, species loss has been simulated by using predefined levels of diversity, for example 1, 2, 4, 8 and 16 species. The alternative method for simulating species loss, experimental species removal from established communities (D ıaz et al. 2003) may appear tempting, as it would allow for working with mature trees from the start. Unfortunately, however, this approach has several severe shortcomings. The major shortcoming is a confounding of resulting biodiversity with disturbance and community density. To compensate, trees would also need to be removed from high-diversity treatments to reach the same low community density as in low-diversity plots. This in turn would lead to unnaturally low densities. Additionally, many ecosystem processes, including those below-ground, may continue to be determined by the original tree diversity over long time periods, so that removal would control standing diversity, but not the associated ecosystem processes. As roots cannot be reliably extracted, one very obvious example concerns roots biomass, decomposition and root-related processes.
The second and perhaps more important issue for determining how species loss alters ecosystem functioning is how to establish experimental replicates. Early BEF experiments used only one random extinction series with community replicates of this series (e.g. Naeem et al. 1994;Niklaus et al. 2001). However, exact replicates of a single extinction series offer no ability to distinguish plant diversity effects per se from effects of the particular community and its component species. To distinguish those effects, it is necessary to have multiple random extinction series that provide different specific species compositions at each plant diversity level (Givnish 1994). Therefore, in BEF-China, we used three partially overlapping sets of 16 species each (six total sets), of the total pool of 40 tree species (Table A1). Each species set was then used to create a separate random extinction scenario, for a total of six random extinction scenarios across the two sites.
While replication at the extinction scenario level is required to separate diversity effects from effects of particular communities, replication of particular species compositions and monocultures is necessary to separate these effects from between-plot variation (Table 1, no. 14). For example, to statistically test transgressive overyielding of mixtures compared with monocultures, replicated monocultures and mixtures must be available . In BEF-China, to replicate species compositions without doubling the required labour, we established a full replicate of one of the three random extinction scenarios at each site. The replicated random extinction scenario was thus established once on a 4-mu superplot (allowing for the shrub-richness treatment to be applied among the four plots of this super pot) and once on a separate 1-mu plot ( Table 2).
The next issue in establishing a gradient of species loss is how to best draw random communities out of the total species pool for each extinction scenario. The total number of different k-species mixtures that can be drawn from a pool of n species is given by the binomial coefficient: For a pool of 16 species, the number of different 1-, 2-, 4-, 8and 16-species compositions are 16, 120, 1820, 12,870 and 1, respectively. Realizing all of them on 1-mu plots would require a total area of about 10 km 2 . Typically, this is beyond the scope of any single BEF experiment. Fortunately, testing for diversity effects does not require all species combinations, but a reasonable number of representative compositions per diversity level.
Choosing such subsets can be performed in a nested way, by successively leaving out species from more diverse communities to less diverse ones, for example, along an extinction scenario of 16-8-4-2-1 species per plot (Table 1,no. 12). In BEF experimental designs where such purposeful nesting is not performed (fully randomly-drawn designs), the number of diversity levels and plots per levels in which species occur, differs very widely between species, causing nonorthogonalities between the presence or absence of particular species and diversity levels (see e.g. Roscher et al. 2004) and preventing analyses of species-specific responses to biodiversity. Furthermore, fully randomly drawn non-nested designs have increasingly overlapping species compositions at higher diversity levels, leading to the so-called variance-reduction effect (Huston 1997;Bell et al. 2009).
In BEF-China, we have ensured that all species are equally represented at each diversity level (Table 2) by using a 'broken-stick' design. In this 'broken-stick' design ( Fig. 5a), the starting species composition was randomly partitioned into nonoverlapping fractions (Bell et al. 2005(Bell et al. , 2009Salles et al. 2009). At each of the two sites of BEF-China, the three partly overlapping starting compositions (species sets, Table 2,  Table A1) of 16 species each were randomly broken down into nonoverlapping eight-species compositions (Table 1, no. 13). To include every community of lower diversity as a subset of a community of higher diversity, we continued the partitioning for the eight-species compositions and so on, thus obtaining unique random extinction scenarios down to the monoculture of every species (Fig. 5a).

N O N R A N D O M O R T R A I T -O R I E N T E D E X T I N C T I O N S C E N A R I O S
Most BEF experiments so far have used random extinction scenarios because species differences in extinction proneness predicted under different future environmental scenarios are largely unknown (Schmid & Hector 2004;but see Schl€ apfer, Pfisterer & Schmid 2005). However, random extinction scenarios may underestimate (Zavaleta & Hulvey 2004) or overestimate (Schl€ apfer, Pfisterer & Schmid 2005) effects of real-world biodiversity loss on ecosystem functioning because extinction in the real world might be biased towards species with particular features (e.g. Grime 2002;Lep s 2004;Schmid & Hector 2004;Solan et al. 2004). For example, Grime (2002) pointed out that stress-tolerant species, characterized by slow growth and long life span, may be more prone to go extinct with increasing human disturbance. Therefore, in BEF-China, we added two nonrandom (in other words 'trait-oriented', 'directed', 'biased', or 'informed') extinction scenarios (Table 2,no. 11), one based on local rarity and one on specific leaf area (SLA). Eliminating species sequentially from the rar-est to the second-most common can be considered a simulation of what might happen during habitat fragmentation and reduction, two on-going processes in Chinese subtropical forests (Wang, Kent & Fang 2007). We expect that at least some ecosystem functions may be differently affected by the prefer- ential loss of rare tree species than by random species loss. For example, Schuldt et al. (2012) showed that herbivory on tree saplings in the GNNR increased with increasing local commonness of tree species.
Specific leaf area was chosen as the key trait for the second nonrandom extinction scenario because it reflects the leaf economics spectrum (LES) from long-lived and nutrient conserving leaves (low SLA) to short-lived ones with high-nutrient contents and high net photosynthetic capacity (high SLA; Reich et al. 1995;Wright et al. 2004). Moreover, Osnas et al. (2013) demonstrated that many mass-based leaf traits, such as photosynthetic capacity (A max ), dark respiration rate (R dark ), nitrogen and phosphorus content, covary with the LES mainly because of their proportionality to leaf area. Communityweighted mean SLA typically decreases during succession in these subtropical forests as fast-growing species are replaced by more slowly growing species . We expect that with proceeding succession or increasing temperature or dryness through climate change, species with high SLA (i.e. deciduous ones) should go extinct first. Information on rarity and SLA was obtained from the previously mentioned 27 CSPs in the nearby GNNR, first based on expert knowledge (Fang Teng, pers. comm.) and later corroborated by extensive trait measurements .
The trait-oriented extinction scenarios were derived from starting compositions of the 20 most common species or the 20 species with lowest SLA, assembled from the total pool of 24 species at each of the two sites. Rare species or species with large SLA, respectively, were sequentially eliminated to yield decreasing diversity levels from 16-to 8-, 4-and 2-species mixtures (Fig. 5b), the monocultures already being available from the random extinction series (Table 2). Three different species compositions were constructed at each diversity level from a set of species that shared the same degree of commonness, or magnitude of SLA, respectively. In total, there were 24 different 1-mu plots each in the rarity and SLA scenario plots in each of the two sites (Table 2). With this sample size (48 plots in each site), the experiment has a statistical power of 0Á9 (Fig. 2) to detect an existing biodiversity effect comparable with that found in our CSPs. In addition, we will be able to compare the strength of diversity effects between random and trait-oriented scenarios.

G E N E T I C D I V E R S I T Y W I T H I N -T R E E S P E C I E S A N D C O M M U N I T I E S
Genetic variation within-tree species can largely affect variation in tree performance and tree response to different environments (Whitham et al. 2003). However, most BEF experiments have focused on manipulating species richness and ignoring population-genetic variation within species (Balvanera et al. 2006) implicitly assuming it is equally distributed among plots. With some additional initial effort, it would often be possible to consider population-genetic composition and diversity at least to some degree (Table 1, no. 15).
In our BEF-China experiment, we included trees of known maternal seed families at one of our two sites (site B). We can thus test whether the population-genetic composition of tree populations affects the response to different species diversity levels or environmental variables (e.g. our ecoscape), including variables affected by species diversity itself, such as changed light or moisture levels. If this is the case, suggested, for example, by different responses of different seed families, it implies that species diversity may affect the evolution of the constituent species (Lipowsky, Schmid & Roscher 2011).
We also address whether the population-genetic diversity of the tree populations in the experimental communities affects tree performance, for example, because lower genetic diversity leads to higher herbivory or pathogen load (Henery 2011) and change multitrophic interactions (Moreira & Mooney 2013). To this end, we manipulated the genetic diversity of 24 experimental plots at site B by establishing them for each species with trees from either single or several seed families. The resulting factorial combination of species diversity (1 or 4 species) and genetic diversity (1 or 4 seed families per species) will allow us to examine, for example whether effects of plant species diversity are reduced for genetically depauperate communities.

M E A S U R E M E N T S A N D D A T A
The design of forest BEF experiments implies that data can be obtained at several levels of interest, mainly the plot, individual tree and within-tree levels. Examples are measurements of soil lipids at the plot level (Wu et al. 2012), growth rates of individual trees (X.F. Li et al. in preparation) or leaf toughness (Schuldt et al. 2012) of individual leaves belonging to those trees. Adequate combination and aggregation of such datawith their long tail typical for extensive data set compilations (Heidorn 2008), that is, with few data sets with very large amounts of similarly structured data (e.g. from climate loggers) and many smaller data sets, with single or few measures of plots, species per plot or trees per plotis required for addressing interdisciplinary questions at the level of the ecosystem. As measurements are taken by different researchers or research teams based in different scientific laboratories across China and Europe, it is crucial to manage naming conventions simply, reliable and online. Therefore, building upon Ecological Metadata Language (Fegraus et al. 2005;Michener & Jones 2012), we developed a web application tailored for the needs of BEF-China that manages naming conventions within primary data ). Our application is open source, developed in concert with an R package to access data (BEF-China Research Group 2013) and can also be adopted by other collaborative research platforms.

S T A T I S T I C A L A N A L Y S E S O F B E F E X P E R I M E N T S
The discussed design aspects of BEF experiments also have consequences for their statistical analyses Bell et al. 2009;Hector, von Felten & Schmid 2010). Basically, data from forest BEF experiments can be analysed at three levels: that of the plot, the population and the individual plant. Statistically speaking, the population level corre-sponds to the combination 'plot identity 9 species identity' and the individual level to the combination 'plot identity 9 species identity 9 plant identity'. However, because in monocultures, there is only one species per plot, care has to be taken when analysing species responses at these levels, and we therefore recommend separate contrasts between species within monocultures and within mixtures, allowing them to be tested at the corresponding error levels (plot for monocultures and population for mixtures). Analyses at the population and individual level may include all variation from the plot level by using a single random term for plot identity; this can be convenient when focusing on within-plot effects.
The above procedure is possible because the random term for plot identity in a BEF experiment contains all information that can be used for the plot-level analysis, that is, it also includes variation among plots due to species diversity or species composition. To address the effects of diversity or composition, we therefore aim to use contrasts within the plot term. The first contrast in case of replicated species compositions is the one between the random term species composition or 'community' and the remainder, that is, residual plot variation (equivalent to plot identity within community). The random term community can then be further split into fixed terms of interest and remainder, that is, residual community variation. Typically, such fixed-term contrasts are monocultures vs. mixtures, (log-) linear species richness, species richness as multilevel factor; these three may also be incorporated into the analysis as sequential contrasts, one nested within the other.
Other fixed-term contrasts may be crossed with these, for example, the presence of within communities of particular functional groups or species (e.g. Yang et al. 2013). Additional levels come into the statistical analyses when repeated measures are made on plots, populations or plant individuals. These include measures of the same dependent variable at different time points or in different vertical positions as well as measures of different variables. Sometimes repeated measures can be aggregated into single-value measures of change (e.g. slope over time), variation (e.g. stability as the inverse of the coefficient of variation) or multifunctionality (Zavaleta et al. 2010;Isbell et al. 2011).

Outlook
With our short manual to forest BEF experiments, we hope to contribute to the continued development of this new type of research in ecology and environmental studies. Forest BEF experiments provide exciting and timely research options requiring interdisciplinary collaboration and laborious setups. Achieving specific research goals and synergy with previous experiments requires manifold design decisions and calls for investing appropriate time and expertise when designing new experiments. Because of the complexity of issues related to the multifaceted aspects of biodiversity, the large effort in setting up and managing an experiment and the involvement of multiple research teams from different fields or expertise, careful planning and transparent decisions should be the guiding principle.
We do not suggest that all future forest BEF experiments should follow or build on the approach we used in BEF-China. Rather, it will be important that a diversity of approaches is followed to answer the diversity of questions facing us today in the context of global change and biodiversity loss. With our short manual, we aim to indicate which issues typically come up in BEF experiments and which reasoning can be involved in choosing between different design, measurement and analysis options. Ideally, experiments, such as BEF-China, should be interdisciplinary platforms complementary to each other in generating information and knowledge to tackle global problems, providing scientists the opportunity to 'think big'. BEF-China aims at being such an open research platform and encourages researchers worldwide to participate.