A comprehensive and comparative evaluation of primers for metabarcoding eDNA from fish
Abstract
en
- Accurate assessments of fish species diversity and community composition are essential for understanding fish ecology and conservation management. Environmental DNA (eDNA) metabarcoding has become an integrated method for monitoring fish species. The accuracy and efficacy of eDNA metabarcoding rely heavily on the choice of primers used for PCR amplification. A wide selection of metabarcoding primers for fish has been developed; however, there exists no comprehensive and comparative evaluation of their amplification or taxonomic classification of a rich diversity of fish species, which hinders informed decisions regarding their suitability for different study systems.
- Here we reviewed the literature and compiled a list of 22 primer sets for eDNA-based metabarcoding analysis of teleost fish, the performance of which was compared using in silico PCR, followed by in vitro metabarcoding analysis using eDNA from waterbodies in Beijing, which harbour a high number of freshwater fish species.
- We found that the primers showed considerable differences in the amplified taxonomic ranges and proportions, fish taxa richness, species discrimination power and fish community compositions, both in silico and in vitro. The number of fish taxa detected from eDNA by the primer sets varied from 0 to 66. Primers targeting the 12S rRNA gene generally detected greater fish diversity than those targeting the 16S rRNA or COI genes, while primers targeting the cytochrome b gene amplified the fewest fish taxa in vitro.
- Regarding target genes, 12S primers generally outperformed other primers in terms of amplified fish diversity. The results of in silico PCR and in vitro tests were not always in agreement, suggesting that primer choice for biodiversity surveys should not be based solely on in silico evaluation. The use of different primers can qualitatively and quantitatively affect the detected biodiversity and these effects should be considered in experimental design and data interpretation. These results will assist with primer selection for eDNA-based fish surveys, and consequently support conservation of freshwater biodiversity.
摘要
zh
- 准确评估鱼类物种多样性及群落构成对于了解鱼类生态及保护管理至关重要。环境DNA (environmental DNA, eDNA)结合宏条形码技术是一种鱼类检测的新兴方法。在影响eDNA方法检测准确性和有效性的诸多因素中, PCR引物和条形码区段的选择极为关键。随着eDNA方法的发展, 不同研究者设计出多种鱼类宏条形码通用引物, 由于缺乏标准化的实验流程和具有丰富物种多样性的eDNA样品对这些引物进行全面评估, 难以比较不同引物的扩增表现和相应宏条形码对鱼类多样性的检测能力, 不利于针对不同生态系统选择最适引物进行鱼类研究。
- 本研究首先通过文献检索筛选得到22对应用于硬骨鱼类eDNA宏条形码研究的引物, 继而利用基于计算机模拟的in silico PCR和基于北京水体提取的eDNA进行in vitro宏条形码分析对这些引物的性能进行全面比较。
- In silico和in vitro的研究结果均显示这些引物在对不同生物类群的扩增范围和比例、检测鱼类物种丰富度、物种分辨度和鱼类群落构成等方面存在明显差异。在eDNA宏条形码分析中, 22对引物检测出的鱼类分类单元数范围为0~66。大部分线粒体DNA 12S基因区段的引物相比于16S和COI区段的引物检测出更高鱼类多样性, 而Cytb区段的引物扩增出的鱼类分类单元数最少。
- 综上, 从引物的目标基因来看, 12S区段的引物对鱼类多样性的扩增表现通常优于其他区段的引物。In silico与in vitro的引物评估结果存在差异, 因此不应仅依据in silico结果进行引物选择。不同引物的使用会对检测到的生物多样性产生定性和定量的影响, 这些影响在实验设计和数据解读中应予以充分考虑。本研究结果对基于eDNA进行的鱼类多样性调查提供了引物选择的依据, 同时为淡水生态系统的生物多样性保护提供了技术支持。
1 INTRODUCTION
With over 34,000 species described to date (http://www.fishbase.org/, accessed in November 2019), fish comprise more than half of all vertebrates and possess vital ecological and economic value (Nelson, 2006). However, a large proportion of the world's fish species is rapidly disappearing due to a myriad of factors including habitat disruption, over exploitation, climate change, pollution, infectious diseases and foreign species invasion (Jelks et al., 2008; Jeppesen et al., 2010; Jones et al., 2004; Reid et al., 2019). Effective conservation and management of fish diversity rely on a deep understanding of the ecology and dynamics of fish communities, which is only possible if fish community assemblages can be accurately and efficiently assessed in freshwater and marine ecosystems (Rees et al., 2014). Surveying large aquatic environments is particularly challenging and often requires specialised equipment and extensive fieldwork. Traditional fish surveys are generally capture based (e.g. netting, cage-trapping and electrofishing) and are invasive for the biological community of study, which violates the original intension of biodiversity conservation. Furthermore, morphology-based species identification is error-prone in closely related taxa and for early life stages, requiring substantial taxonomic expertise that is currently in great shortage (Wheeler et al., 2004).
Environmental DNA (eDNA)-based species detection has revolutionised the manner in which biodiversity is surveyed and has proven to be a highly effective, efficient, economical and non-invasive approach to biomonitoring (Bohmann et al., 2014; Deiner et al., 2017; Taberlet et al., 2012). Coupled with metabarcoding (i.e. the simultaneous amplification of DNA from multiple organisms using universal PCR primers) and high-throughput sequencing technologies, eDNA methods have demonstrated great success in biodiversity surveys by revealing comparable or higher species richness and uncovering hidden biodiversity at a fraction of the cost of traditional surveys, thus being increasingly integrated into aquatic biomonitoring (Miya et al., 2015; Shaw et al., 2016; Valentini et al., 2016).
Since eDNA-based biomonitoring is still a novel and rapidly advancing field, many aspects of eDNA methods await validation and adaptation to specific systems and research questions (Diaz-Ferguson & Moyer, 2014; Goldberg et al., 2016). Among the existing technical issues, the choice of PCR primers for amplification of target sequences (i.e. barcodes) is possibly one of the most influential factors in determining the detection probability of specific species or taxonomic groups (i.e. community survey; Alberdi et al., 2018; Freeland, 2017). For species-specific detection, low-efficiency primer binding to DNA of the target species can lead to a lack of amplification or amplification of non-target sequences, causing false negative or false positive identification respectively. For community surveys, universal primers are applied to amplify DNA from a group of organisms that are taxonomically close, and therefore have conserved primer binding sequences, while amplified barcodes should possess variable sites between different species, allowing for taxonomic assignments (Valentini et al., 2009). Ideally, universal primers should possess high specificity for the target group, broad intra-group coverage and even amplification across species, in addition to high discriminatory power (i.e. sufficient sequence variation between amplicons from different species) to ensure comprehensive and accurate species identification (Coissac et al., 2012; Riaz et al., 2011). Moreover, for use with environmental samples that often contain trace amounts of highly degraded DNA from the study organisms, a small barcode size (usually <200 bp) is likely to have a greater PCR success rate (Bylemans et al., 2018; Freeland, 2017). Furthermore, the effectiveness of metabarcoding-based biodiversity analyses is heavily dependent on the completeness and accuracy of the relevant reference databases, since a poor reference prevents taxonomic assignment of species for which information is lacking. These criteria for metabarcoding primers, that is, high taxonomic specificity and coverage, low species bias, high taxonomic resolving power and high-quality reference databases, may not be simultaneously fulfilled with a single primer set. For instance, longer barcodes may exhibit superior variability between species, thus allowing more robust species identification, but may be prone to higher DNA amplification failure rates (Bylemans et al., 2018). Additionally, highly degenerate primers can show superior taxonomic coverage, likely at the expense of taxonomic specificity and non-target amplification. Therefore, primer characteristics and potential biases should be evaluated prior to full-scale surveys, and the most suitable primers should be chosen for each study system to ensure effective profiling of the community of interest.
Given the key roles that fish communities play in aquatic ecosystem function and the fishing industry, it is not surprising that much effort has been exerted investigating freshwater and marine fish assemblages using eDNA metabarcoding methods (e.g. Valentini et al., 2016; Yamamoto et al., 2017). However, different research groups have developed a number of universal primers for fish community surveys, each with success in describing local fish diversity; nevertheless, it is difficult to compare the amplification performance and detection power of these primers across studies. Most studies have conducted eDNA-based fish community surveys using a single set of primers without prior evaluation of potential primer bias for the study communities (e.g. Balasingham et al., 2017; Kelly et al., 2014). Some studies have compared the performance of several universal primer sets; however, only in silico PCR was used without subsequent in vitro evaluation (e.g. Valentini et al., 2016). More recently, a few studies have tested more than one primer pair using eDNA metabarcoding analysis, but only a small number (2–6) of primer sets were evaluated, and mostly using samples containing relatively low fish diversities (Bylemans et al., 2018; Collins et al., 2019; Evans et al., 2016, 2017; Hänfling et al., 2016; Shaw et al., 2016). For researchers who are interested in using eDNA-based methods to survey fish diversity but are new to the field, choosing from dozens of published primers can be overwhelming and largely guesswork. Since primer amplification efficacy and coverage can vary considerably according to the biodiversity complexity and taxonomic composition of the study system (Bellemain et al., 2010; Clarke et al., 2014), a standardised, comparative test of a comprehensive panel of primers using a taxonomic-rich DNA pool is necessary for a thorough evaluation of primer efficacy and bias, which will greatly aid primer selection for fish biodiversity research.
The overall objective of the present study was to evaluate and compare the amplification efficiency, taxonomic specificity and coverage and species resolution power of metabarcoding primers for eDNA-based fish community surveys, with a view to provide guidance for researchers who are interested in investigating fish diversity using the eDNA method. We first searched Web of Science for published primer sets that had been used in eDNA-based fish metabarcoding, and subsequently evaluated their performance using in silico PCR against all standard sequences in the EMBL database. To evaluate the primer performance in vitro, we conducted metabarcoding analysis using these primers and pooled eDNA extracted from over 100 waterbodies in Beijing, which collectively harbour almost 100 native and introduced freshwater fish species (Zhang et al., 2011). We compared the primer performance in terms of taxonomic specificity, fish coverage and species resolution via in silico PCR and in vitro metabarcoding analysis, from which we have made recommendations for primer selection for eDNA-based fish biodiversity surveys.
2 MATERIALS AND METHODS
2.1 Literature review and primer search
To compile a comprehensive list of primers currently used in eDNA-based fish community studies, we searched the Web of Science Core Collection on 1 November 2018 using the search topic: fish AND (community OR diversity) AND ‘eDNA’ AND (barcoding OR metabarcoding). This search resulted in a total of 66 papers, which we screened for primers that were described to target teleost fish in general (Class Acinopterygii or Infraclass Teleostei) or higher taxonomic groups that included fish (e.g. vertebrates). Since the aim of the present study was to compare primer performance for metabarcoding analysis of teleost-dominated freshwater fish communities, we excluded primers that were described as specific for elasmobranchs (sharks and rays; e.g. primers in Bakker et al., 2017; Chang et al., 2017).
2.2 In silico PCR analysis
To construct a sequence database for in silico PCR evaluation, we first downloaded all standard sequences from the EMBL-European Nucleotide Archive (embl_r138; ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/) and retrieved taxonomic information from the NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz) on 14th January 2019. Subsequently, we constructed a sequence database organised by taxonomic groups using the sequences and taxonomic information and the OBICONVERT command in the OBITools software package (Boyer et al., 2016). We evaluated the amplification coverage and taxonomic resolution of the fish metabarcoding primers using in silico PCR implemented in the ECOPCR program (Ficetola et al., 2010) with the following settings: (a) amplicon length in the range of 15‒500 bp (excluding primers) and (b) a maximum of three mismatches between each primer and the target sequence and no mismatches in the last two nucleotides at the 3ʹ end of the primer (Bellemain et al., 2010; Epp et al., 2012). The in silico PCR was first conducted against all taxa to assess the taxonomic specificity of the primers. To preclude the influence of various numbers of sequences for each species in the database, the number of unique NCBI taxonomy IDs (TaxID; each representing a distinct taxon), instead of the number of sequences, were recorded for the amplified taxa. To further evaluate the primers' taxonomic coverage and species resolution for fish, we conducted in silico PCR for Actinopterygii (NCBI TaxID 7898). All sequences within the tested fish taxa were retained, and we used the ECOTAXSTAT command to calculate the number of amplified species and the ECOTAXSPECIFICITY command (both in the OBITools software package; Boyer et al., 2016) to assess the resolving power of the primers at the species level. A species was considered unambiguously assigned if its sequences differed from all other amplified sequences by at least 1 bp, regardless of the amplicon size. We used the Krona program (command ktImportTaxonomy) to visualise the taxonomic compositions of the sequences amplified by in silico PCR (Ondov et al., 2011).
2.3 Water sampling
To evaluate the ability of the primers to amplify actual fish DNA in vitro, we performed metabarcoding analysis using eDNA extracted from water sampled in Beijing (39°26′–41°03′N, 115°25′–117°30′E; area size 16,800 km2). Beijing is rich in freshwater ecosystems, including more than 100 rivers and over 120 ponds, lakes and reservoirs (Pan, 2004). Fish species in Beijing are typical of those found in temperate zones of North China. Historical specimen records since the 1920s and more recent netting surveys have described a total of 93 native and introduced fish species distributed in the Beijing area, belonging to 13 orders, 23 families and 73 genera (Zhang et al., 2011). Composed of 61 species, Cypriniformes is the largest taxonomic group at the order level, of which Cyprinidae (51 species) and Cobitidae (10 species) are the predominant families. Perciformes (10 species) and Siluriformes (five species) are the next largest fish orders in Beijing. Comparison of historical records with recent survey results indicates that native fish species have reduced considerably from the mid-20th century, whereas introduced species currently dominate many waterbodies in Beijing (Zhang et al., 2011).
We collected water samples from 104 freshwater sites distributed throughout Beijing, including 55 river sites, 40 ponds and lakes, six reservoirs and three wetlands, from April to June 2018. At each sampling site, 3 × 1 L water was collected from 5 to 10 cm below the surface and approximately 0.5 m from the shore using disposable 1-L plastic bottles. Water samples were transported on ice to the laboratory at Peking University and filtered within 12 hr.
2.4 eDNA preparation, PCR and sequencing
Water samples were filtered through a 1.2-µm pore size MCE Membrane filter (47 mm diameter; Merck Millipore Ltd., Tullagreen, Carrigtwohill Co.). Filtrations were carried out in a clean room designated for eDNA processing only. All equipment and utensils were cleaned with a 10% bleach solution prior to sample processing and between samples. Samples were processed in small batches (samples from 2 to 8 sites per batch) to reduce cross-contamination. To control for contamination during filtration, filtration blanks that filtered 1 L ddH2O were included before filtration of samples from each site. Filters were stored in individual sterile microcentrifuge tubes at −20°C until DNA extraction. DNA was extracted from each filter using the DNeasy Blood and Tissue Kit (Qiagen GmbH) as described previously (Zhang et al., 2020). The pellets at the final step were rehydrated with 100 µl AE buffer (Qiagen). Equal volumes of DNA extracted from each water sample were pooled and used as a template in the in vitro PCR.
With the exception of Pr10_L 2513, which showed poor amplification in the in silico PCR, the other 21 primer sets were evaluated by in vitro PCR and metabarcoding analysis. For each primer set, we carried out 18 replicate PCRs with the eDNA extract and six negative control PCRs containing all other reagents except eDNA. Since human DNA is often abundant in urban environments and can constitute a large proportion of PCR products from eDNA, two primer sets (Pr01_12SV5 and Pr02_Tele01) were each designed with a blocking oligonucleotide to inhibit the amplification of human DNA (Shehzad et al., 2012; Valentini et al., 2016; Table S1). For these two primer sets, we carried out 18 eDNA PCRs and six negative control PCRs with and without the blocking oligos. A ‘CC’ followed by a unique 7-nucleotide sequence tag was added to both the forward and reverse primers in each PCR to enable the identification of different PCR products during sequencing data analysis (Coissac, 2012). Each uniquely tagged primer pair was used in three PCRs (i.e. six different tags for eDNA PCRs and two different tags for the negative control PCRs using each primer set) to ensure that the quantity of PCR products was sufficient for sequencing. Each PCR was conducted in a total volume of 25 µl, including 6 µl pooled eDNA extract (diluted 10-fold to reduce PCR inhibitors), 0.2 µM forward and reverse primers, 4 µM blocking oligo (where applicable) and 0.4 µg/µl BSA in 1 × Premix Ex Taq (Takara Bio Inc.), which has a 4.5 times higher fidelity than standard Taq. We first tested amplification of each primer set using both the PCR program described in the original design paper (summarised in Table S1) and a general program used for eDNA amplification (see below). Agarose gel electrophoresis results indicated that our general program consistently yielded stronger amplification than that of the original programs (data not shown); therefore, we used this general program for all primer sets, which included an initial denaturation step at 95°C for 10 min; followed by 45 cycles of 95°C for 30 s, annealing temperature (Ta) for 30 s and 72°C for 1 min; and a final elongation step at 72°C for 10 min. We determined the optimal Ta for each primer set by performing 6‒7 replicate PCRs using the general program at a different Ta gradient spanning a range of 10‒20°C centred around the Ta used in the original design papers. The gradient PCR experiment was carried out twice to check for consistency of amplification patterns. PCR products were visualised using agarose gel electrophoreses, and the Ta showing the best amplification was chosen (Table 1).
Primer code | Original name | Target groupa | Target gene | Design paper | Amplicon size (bp)b | Ta (°C)c | Primer sequences (5'–3') | |||
---|---|---|---|---|---|---|---|---|---|---|
Pr01 | 12SV5 | 12SV5 | Vertebrata | 12S | Riaz et al. (2011) | 99 (5.99) | 60 |
For-TAGAACAGGCTCCTCTAG Rev-TTAGATACCCCACTATGC |
||
Pr02 | Tele01 | Teleo | Teleostei | 12S | Valentini et al. (2016) | 63 (9.40) | 60 |
For-ACACCGCCCGTCACTCT Rev-CTTCCGGTACACTTACCATG |
||
Pr03 | Tele02 | Tele02 | Teleostei | 12S | Taberlet et al. (2018) | 167 (6.62) | 63 |
For-AAACTCGTGCCAGCCACC Rev-GGGTATCTAATCCCAGTTTG |
||
Pr04 | MiFish-U | MiFish-U | Teleostei | 12S | Miya et al. (2015) | 171 (5.67) | 60 |
For-GTCGGTAAAACTCGTGCCAGC Rev-CATAGTGGGGTATCTAATCCCAGTTTG |
||
Pr05 | 12SF1/R1 | 12SF1/R1 | Vertebrata | 12S | Riaz et al. (2011) | 106 (5.38) | 60 |
For-ACTGGGATTAGATACCCC Rev-TAGAACAGGCTCCTCTAG |
||
Pr06 | Ac12S | Ac12S | Actinopterygii | 12S | Evans et al. (2016) | 389 (8.15) | 65 |
For-ACTGGGATTAGATACCCCACTATG Rev-GAGAGTGACGGGCGGTGT |
||
Pr07 | AcMDB | AcMDB07 | Actinopterygii | 12S | Bylemans et al. (2018) | 281 (7.63) | 60 |
For-GCCTATATACCGCCGTCG Rev-GTACACTTACCATGTTACGACTT |
||
Pr08 | Fish16S | Fish16S | Fish | 16S | Shaw et al. (2016) | 68 (8.27) | 60 |
For-GGTCGCCCCAACCRAAG Rev-CGAGAAGACCCTWTGGAGCTTIAG |
||
Pr09 | 16SF/D | Fish16SF/D-2R | Fish | 16S | DiBattista et al. (2017) | 203 (18.65) | 60 |
For-GACCCTATGGAGCTTTAGAC Rev-CGCTGTTATCCCTADRGTAACT |
||
Pr10 | L2513 | L2513/H2714 | Vertebrata | 16S | Kitano et al. (2007) | 202 (6.93) | 60 |
For-GCCTGTTTACCAAAAACATCAC Rev-CTCCATAGGGTCTTCTCGTCTT |
||
Pr11 | Ve16S | Ve16S | Vertebrata | 16S | Evans et al. (2016) | 312 (25.91) | 63 |
For-CGAGAAGACCCTATGGAGCTTA Rev- AATCGTTGAACAAACGAACC |
||
Pr12 | Ac16S | Ac16S | Actinopterygii | 16S | Evans et al. (2016) | 336 (14.47) | 63 |
For-CCTTTTGCATCATGATTTAGC Rev-CAGGTGGCTGCTTTTAGGC |
||
Pr13 | Vert-16S | Vert-16S | Vertebrata | 16S | Vences et al. (2016) | 264 (34.49) | 65 |
For-AGACGAGAAGACCCYdTGGAGCTT Rev- GATCCAACATCGAGGTCGTAA |
||
Pr14 | FishCB | FishCBL/CBR | Fish | Cytb | Thomsen et al. (2012) | 90 (0.28) | 60 |
For-TCCTTTTGAGGCGCTACAGT Rev-GGAATGCGAAGAATCGTGTT |
||
Pr15 | Fish2b | Fish2bCBR/CBL | Fish | Cytb | Thomsen et al. (2012) | 40 (0.03) | 60 |
For-GATGGCGTAGGCAAACAAGA Rev-ACAACTTCACCCCTGCAAAC |
||
Pr16 | Fish2deg | Fish2degCBL/CBR | Fish | Cytb | Thomsen et al. (2012) | 40 (0.02) | 58 |
For-ACAACTTCACCCCTGCRAAY Rev-GATGGCGTAGGCAAATAGGA |
||
Pr17 | L14912 | L14912/H15149 | Teleostei | Cytb | Miya and Nishida (2000) | 235 (0.47) | 54 |
For-TTCCTAGCCATACAYTAYAC Rev-GGTGGCKCCTCAGAAGGACATTTGKCCYCA |
||
Pr18 | L14841 | L14841/H15149 | Vertebrata | Cytb | Kocher et al. (1989) | 307 | 53 |
For-AAAAAGCTTCCATCCAACATCTCAGCATGATGAAA Rev-AAACTGCAGCCCCTCAGAATGATATTTGTCCTCA |
||
Pr19 | L14735c | L14735/H15149c | Vertebrata | Cytb | Burgener and Hübner (1998) | 413 (5.85) | 57 |
For-AAAAACCACCGTTGTTATTCAACTA Rev-GCCCCTCAGAATGATATTTGTCCTCA |
||
Pr20 | L14735c2 | L14735/H15149c2 | Vertebrata | Cytb | Hänfling et al. (2016) | 413 (5.84) | 57 |
For-AAAAACCACCGTTGTTATTCAACTA Rev-GCDCCTCARAATGAYATTTGTCCTCA |
||
Pr21 | PS1 | PS1 | Fish | COI | Balasingham et al. (2018) | 247 | 52 |
For-ACCTGCCTGCCGTATTTGGYGCYTGRGCCGGRATAGT Rev-ACGCCACCGAGCCARAARCTYATRTTRTTYATTCG |
||
Pr22 | Minibar | Uni-Minibar | Eukaryota | COI | Meusnier et al. (2008) | 127 (4.29) | 53 |
For-TCCACTAATCACAARGATATTGGTAC Rev-GAAAATCATAATGAAGGCATGAGC |
- a Target group as described in the original design paper.
- b Mean and standard deviation (in parentheses) of the amplicon size (excluding primers) estimated by the in silico PCR analysis, with the exception of Pr18_14841 and Pr21_PS1, which yielded no sequence in our in silico PCR, and their mean amplicon sizes were retrieved from the original design paper.
- c Optimal annealing temperature determined in the present study.
PCR products of each primer set were pooled separately for sequencing library construction. The 24 PCR products (18 eDNA PCRs and six negative controls) of each primer set were mixed at equal volume and purified using an EasyPure PCR Purification Kit (TransGen Biotech), and DNA concentrations were measured on a NanoDrop 2000 spectrophotometer (Thermo Scientific). Further quality and quantity examination, library preparation, and paired-end sequencing were carried out by the BGI Sequencing Service in Wuhan, China. All sequencing libraries were constructed using PCR-free adaptor ligation and the BGI's own index system. Together, 23 Illumina sequencing libraries were constructed, including 21 amplified using different primer sets and an additional two using primers with blocking oligos. The libraries containing PCR products under 220 bp (including primers) were sequenced on an Illumina HiSeq X Ten platform (Illumina Inc.), and the libraries containing PCR products over 220 bp were sequenced on an Illumina HiSeq 2500 platform (Table S1). We used two sequencing platforms because the HiSeq X Ten is an ultra-high-throughput sequencer that can generate sequencing data at considerably reduced time and cost as compared with other HiSeq platforms, yet the maximum read length is restricted to ~150 nucleotides (maximum length of a pair-end sequence ~300 bp). The HiSeq 2500, in contrast, can generate sequencing reads up to 250 nucleotides on each DNA strand (maximum length of a pair-end sequence ~500 bp) but the per-base cost is over eight times that of the X Ten at BGI. Library preparation protocols were identical for the two sequencing platforms, except for the size of the target DNA.
2.5 Bioinformatic processing of sequencing data
The raw sequencing reads were analysed using the OBITools software package (Boyer et al., 2016), which is commonly used in bioinformatics filtering of metabarcoding data (Bylemans et al., 2018; Valentini et al., 2016; West et al., 2020). We used the ILLUMINAPAIREDEND program to align forward and reverse sequences, and the OBIGREP program to remove aligned sequences with low (<40) quality scores. Sequences with no mismatches in tags and a maximum of two mismatches in primers were identified using the NGSFILTER program and retained for further analysis. The OBIUNIQ program was used to cluster identical sequences. Sequences <15 bp or with a total count in the sequence library <10 were removed using the OBIGREP program. Putative PCR and sequencing errors were detected and removed using the OBICLEAN program. OBICLEAN identifies putative chimeras and erroneous sequences such as indel or substitution errors based on sequence abundance and similarity to the most common sequences (Boyer et al., 2016). We used the setting of abundance ratio = 0.5 and sequence difference = 1 to assign the status of ‘head’, ‘internal’ or ‘singleton’ to each sequence within a PCR, and ‘internal’ sequences were discarded as they most likely represented errors (De Barba et al., 2014). To eliminate sequences likely introduced by cross-contamination or tag jumps (Schnell et al., 2015), the larger count of a sequence in the two negative control PCRs of a primer set was subtracted from the sequence count in each eDNA PCR.
Each unique sequence was searched using BLAST in the GenBank nucleotide database for assignment to broad taxonomic groups based on the max score of sequence identity. We considered the following groups: mammal (excluding human), bird, reptile, amphibian, fish, other chordate, invertebrate, protist, plant, fungi, non-eukaryote, other root (sequences not belonging to the other groups) and human.
Fish taxonomic assignments were further refined using indigenous and introduced species records from the study area (Xiong et al., 2015; Zhang et al., 2011, 2016). We excluded sequences with <97% identity to the query sequence with a view to increasing the accuracy of taxonomic assignments. A unique molecular operational taxonomic unit (MOTU; Blaxter, 2004) was assigned to each fish sequence. We used the following criteria for assignment of MOTUs: (a) if the query sequence matched a single species with max score, the species was assigned; (b) if the query matched more than one species with max score, a species was assigned based on knowledge of the distribution and the lowest taxonomic level that included all locally occurring and introduced species with the highest identity scores. To avoid misidentification of taxa due to insufficient local species records, we kept to conservative taxonomic assignments and only excluded species that showed no occurrence in the entirety of North China. Sequences with the same taxonomic assignments were combined for each primer set.
2.6 Statistical analysis of sequencing data
To evaluate the effects of sequencing depth on the number of detected fish taxa, rarefaction curves were constructed for each primer set with increasing sequencing depth (i.e. number of sequence reads) using the RARECURVE function in the r package vegan (Oksanen et al., 2019). To obtain an equal sequencing depth for all primers, the sequence reads of each uniquely tagged PCR were rarefied to 48,000 (the minimum number of reads per tag was 48,308) using the r package GUniFrac (Chen et al., 2012).
Fish diversity data detected in eDNA metabarcoding analysis were summarised both qualitatively, that is, as a binary presence (i.e. a taxon detected in one or more replicate PCRs) or absence (i.e. a taxon not detected in any of the replicate PCRs) matrix, and quantitatively as proportional read abundances (i.e. percentage of reads for each fish taxon, averaged across replicate PCRs) for each primer set. Dissimilarities between the qualitative and quantitative fish diversity detected with different primers were visualised by non-metric multidimensional scaling (NMDS) analysis using the species presence/absence-based Jaccard coefficient and the proportional read abundance-based Bray–Curtis coefficients respectively. Both the Jaccard and Bray–Curtis coefficients range from 0 to 1, with a value of 0 indicating completely identical community compositions, and a value of 1 indicating that the communities share no common taxa. Calculation of the coefficients and NMDS analysis were conducted using the VEGDIST and METAMDS commands in vegan respectively. All statistical tests were performed using r.
3 RESULTS
3.1 Literature review and primer search
From the literature search, we identified a total of 22 primer sets (coded Pr01‒Pr22) that have been used in metabarcoding studies of freshwater or marine fish. The primers, amplicon lengths and references for their original design and applications in metabarcoding analysis are shown in Table 1. A summary of the detailed primer information and amplification conditions is provided in Table S1. These primers were designed to target fish in general (Actinopterygii or Teleostei; 13/22) or originally designed for all eukaryotes or vertebrates (9/22; Table 1). The target sequences of the primers all resided within four genes of mitochondrial DNA, including 12S rRNA (‘12S’; n = 7), 16S rRNA (‘16S’; n = 6), cytochrome b (‘Cytb’; n = 7) and cytochrome c oxidase I (‘COI’; n = 2).
Locations of the target sequences also varied among primer sets. While 12S, 16S and Cytb primer sets were scattered along most of the gene lengths, the two COI primer sets both targeted the 5' end of the gene (Figure 1). Some of the primer sets shared primer binding sites (with slightly different primer sizes or degenerate nucleotides) and targeted almost identical sequences, such as Pr01_12SV5 and Pr05_12SF1/R1, Pr03_Tele02 and Pr04_MiFish-U, Pr15_Fish2b and Pr16_Fish2deg and Pr19_L14735c and Pr20_L14735c2. Target sequences of some primer sets overlapped or nested within each other, such as Pr01_12SV5/Pr05_12SF1/R1 and Pr06_Ac12S; Pr08_Fish16S, Pr09_16SF/D, Pr13_Vert-16S and Pr11_Ve16S; Pr17_L14912, Pr18_L14841 and Pr19_L14735c/Pr20_L14735c2; and Pr22_Minibar and Pr21_PS1 (Figure 1). We included all 22 primer sets in the in silico PCR analysis.

3.2 In silico PCR analysis
The results of the in silico PCR analysis using all standard sequences from the EMBL database indicated considerable differences between the primers with regard to taxonomic specificity and coverage, recovered species richness and taxonomic resolution power. Pr18_L14912 and Pr21_PS1 amplified no sequence, possibly due to the exceedingly long primer lengths. For the other 20 primer sets, the number of amplified taxa ranged from 947 to 61,757 (Table S2 and interactive Krona plots, Zhang et al., 2020). All 20 primer sets amplified both fish and non-fish taxa, but the proportions of fish taxa (all fish species included) varied considerably from 1.4% (Pr10_L2513) to 83.7% (Pr08_Fish16S; Figure 2). The overall proportions of fish sequences were relatively consistent among 12S (range: 35.5%‒49.5%) and Cytb (42.7%‒62.5%) primers, but varied markedly among 16S primers (1.4%‒83.7%). Sequences of mammals, birds, reptiles, invertebrates and prokaryotes were often amplified by the primers, and some primers primarily showed amplifications of non-fish taxa. For instance, Pr10_L2513 mainly amplified mammalian taxa (81.5%), and invertebrate taxa accounted for large proportions in the amplicons of Pr13_Vert-16S (47.8%) and Pr22_Minibar (57.2%; Figure 2).

To evaluate the primer amplification performance for fish, in silico PCR was performed against standard sequences of Actinopterygii species. A total of 28,539 Actinopterygii species were recovered from the database. With the exception of Pr18_L14912 and Pr21_PS1 that amplified no sequence, the other primers amplified 22 (Pr10_L2513) to 10,830 (Pr13_Vert-16S) species, and the number of unambiguously identified species varied from 18 (Pr10_L2513) to 7,081 (Pr13_Vert-16S), representing 22.9%‒83.6% of the amplified Actinopterygii sequences (Figure 3; Table S3).

The fish amplicon size of the primers, estimated by in silico PCR (except Pr18 and Pr21, for which the amplicon sizes were retrieved from the original design paper), also showed large variations. The average amplicon lengths varied from 40 to 412 bp, with six primer sets (Pr01, Pr02, Pr08, Pr14, Pr15 and Pr16) amplifying fragments smaller than 100 bp, four (Pr03, Pr04, Pr05 and Pr22) in the approximate range of 100–200 bp, six (Pr07, Pr09, Pr10, Pr13, Pr17 and Pr21) in the range of 200–300 bp and six (Pr06, Pr11, Pr12, Pr18, Pr19 and Pr20) in the range of 300–420 bp (Table 1).
3.3 eDNA metabarcoding analysis
Sequencing results were successfully obtained from all primer sets tested using in vitro PCR and metabarcoding analysis. Approximately 1.06‒4.05 Gb raw data were generated from each of the 23 sequencing libraries (see Section 2). Following sequence pairing, clustering and quality filtering steps, 356,162‒10,954,329 total sequence reads were retained for each library, with an average read count of 447,441 ± 402,824 (mean ± SD; range: 59,360‒1,696,670) for each uniquely tagged PCR containing eDNA templates. The number of unique sequences varied from 58 to 2,990 among the libraries. Sequence reads of negative control PCR were on average 2% of those of the corresponding PCR with eDNA but varied considerably (<1.0E−6‒22.8%) across libraries. Most of these sequences were of non-fish origin, as fish sequence reads in the negative controls were on average 0.02% (range: <1.0E−6‒0.1%) of those in the corresponding eDNA PCR across libraries. Summaries of sequence read counts after each filtering step of the sequencing libraries are shown in Table 2; Table S4 and detailed sequence reads data for individual PCR are displayed in the Dryad Digital Repository (Zhang et al., 2020).
illumiapairedend | ngsfilter | Obiuniq_seq | Obiclean_seq | Obiclean_reads | Average reads per PCR | Fish seqa | Fish taxab | |
---|---|---|---|---|---|---|---|---|
Pr01_12SV5 | 6,813,530 | 5,968,766 | 151,654 | 1,578 | 4,359,954 | 713,498 | 1,162 | 52 |
Pr01_12SV5-blk | 6,521,380 | 5,649,616 | 172,508 | 1,840 | 4,066,989 | 674,863 | 1,155 | 50 |
Pr02_Tele01 | 5,676,000 | 4,233,160 | 48,056 | 692 | 3,482,684 | 565,270 | 601 | 50 |
Pr02_Tele01-blk | 4,797,908 | 3,479,559 | 5,005 | 511 | 2,861,212 | 476,298 | 487 | 46 |
Pr03_Tele02 | 9,661,414 | 8,365,152 | 699,635 | 2,633 | 4,995,621 | 832,331 | 2,452 | 56 |
Pr04_MiFish-U | 7,549,436 | 6,020,313 | 429,961 | 2,898 | 3,622,312 | 603,543 | 2,808 | 62 |
Pr05_12SF1/R1 | 4,941,575 | 4,377,441 | 136,856 | 1,469 | 3,068,448 | 499,465 | 1,177 | 53 |
Pr06_Ac12S | 2,114,179 | 1,695,303 | 886,819 | 2,732 | 412,185 | 66,598 | 2,608 | 66 |
Pr07_AcMDB | 2,114,158 | 1,624,734 | 488,463 | 2,990 | 678,232 | 112,994 | 2,914 | 66 |
Pr08_Fish16S | 9,699,109 | 8,518,926 | 70,528 | 728 | 6,933,469 | 1,142,364 | 605 | 51 |
Pr09_16SF/D | 2,114,147 | 1,822,099 | 93,785 | 320 | 1,152,637 | 192,058 | 297 | 16 |
Pr11_Ve16S | 2,114,144 | 1,802,625 | 460,976 | 1,879 | 704,030 | 117,338 | 1,806 | 45 |
Pr12_Ac16S | 2,114,177 | 1,745,236 | 395,516 | 1,135 | 606,751 | 101,125 | 1,120 | 34 |
Pr13_Vert-16S | 2,114,152 | 873,698 | 217,280 | 1,371 | 356,162 | 59,360 | 1,268 | 51 |
Pr14_FishCB | 3,833,686 | 3,538,200 | 25,158 | 58 | 2,960,388 | 493,395 | 56 | 11 |
Pr15_Fish2b | 13,511,011 | 12,443,896 | 28,841 | 468 | 10,954,329 | 1,696,670 | 51 | 12 |
Pr16_Fish2deg | 4,540,291 | 4,098,848 | 14,573 | 258 | 3,443,501 | 543,332 | 52 | 30 |
Pr17_L14912 | 2,114,140 | 1,678,769 | 180,245 | 1,348 | 1,025,852 | 170,971 | 748 | 36 |
Pr18_L14841 | 2,114,163 | 1,592,968 | 277,459 | 442 | 739,083 | 123,181 | 348 | 24 |
Pr19_L14735c | 2,114,172 | 1,620,669 | 696,721 | 1,051 | 431,259 | 71,877 | 1,042 | 28 |
Pr20_L14735c2 | 2,114,240 | 1,681,971 | 741,346 | 695 | 404,242 | 67,374 | 679 | 31 |
Pr21_PS1 | 2,114,166 | 1,606,006 | 149,175 | 1,268 | 1,035,289 | 172,514 | 1,082 | 51 |
Pr22_Minibar | 8,859,336 | 7,118,022 | 351,347 | 2,574 | 4,769,179 | 794,731 | 1 | 0 |
Note
- ‘-blk’ indicates the addition of human DNA-blocking oligonucleotides to the PCR.
- a All unique fish sequences yielded from OBITools.
- b Fish taxa identified following sequence BLAST, merging of identical taxa and rarefaction of all PCRs to identical reads.
We first compared the taxonomic coverage of the primer sets across all taxa. Human sequences were analysed separately from other mammalian sequences to examine their amplification and the effect of blocking oligos (see Section 2). Fish sequences were the most abundant (over 50%) of all taxa in the amplicons of most primers, with the exception of Pr18_L14841 and Pr22_Minibar, which predominantly amplified non-fish taxa, and fish sequences only accounted for 29% and <0.001% respectively (Table S5). Non-fish vertebrate (mammal, bird, reptile and amphibian) sequences were often amplified but present at low proportions. On average, human DNA accounted for no more than 1% of the PCR products of most primers, with the exception of the products of Pr03_Tele02 (11.0%) and Pr18_L14841 (44.9%). Addition of human DNA-blocking oligos to the PCRs significantly reduced the mean proportions of human sequences in the PCR products from 0.265% to 0.0268% with Pr01_12SV5 and 3.43% to 0.0107% with Pr02_Tele01 (Figure 4; Table S5).

After combining the results from all tested primer sets, we detected 202 fish taxa, covering 164 species, 24 genera, two subfamilies, five families, two suborders and five taxa above order. The detected fish species included 50 species previously documented in Beijing (Zhang et al., 2011), 70 non-local Chinese species and 44 foreign species. Limited sampling of aquatic habitats, incomplete reference databases for local species and rapid disappearance of native fish may account for the fewer native species recovered by our eDNA method as compared with historical surveys. The newly detected species most likely represent recent intentional and accidental introductions for aquarium trade, aquaculture, biological control as a result of fishery transport and water facility construction (Lin et al., 2015; Xiong et al., 2015).
Rarefaction curves of the number of detected taxa plotted with increasing sequence reads revealed that our sequencing depths were sufficient to recover the amplified taxa for all primer sets (Figure S1). To standardise sequencing depth across primers, we rarefied all uniquely tagged PCRs to identical sequence reads for the analysis of fish taxa (see Section 2). No sequence amplified using Pr22_Minibar matched fish sequences with ≥97% identity; hence, this primer set was removed from further analysis. The other 20 primer sets amplified between 11 (Pr14_FishCB) and 66 (Pr06_Ac12S and Pr07_AcMDB) fish taxa, considering all taxonomic levels within Actinopterygii (Figure 5a; Table 2; Table S6). Primers targeting 12S regions showed good overall detection of fish, with all seven tested primer sets amplifying 46‒66 fish taxa. Cytb primers generally detected fewer fish than other primers, with two primer sets, Pr14_FishCB and Pr15_Fish2b, only amplifying 11 and 12 fish taxa, respectively, and the greatest number of fish taxa amplified by Cytb primers was a mere 36 (Pr17_L14912). Species-level assignments accounted for 63.6%‒87.1% of all fish taxa detected by each primer set (Figure 5a; Table S7).

When fish taxa were analysed by taxonomic order, taxa of Cypriniformes consistently accounted for the largest portions (48.0%‒81.8%) of taxa for all primers except Pr09_16SF/D (0%; Figure 5b). Gobiiformes was overall the second most abundant order. Other frequently detected orders included Siluriformes, Anabantiformes and Synbranchiformes (Figure 5b; Table S7). Proportional read abundances of different fish taxa were considerably similar across replicate PCRs (Figure S2), indicating a high consistency in the amplification performance of the primers.
Different primer sets generated different profiles of fish diversity from the same eDNA template, both qualitatively and quantitatively, as shown by the presence/absence-based Jaccard and proportional reads-based Bray–Curtis matrix (Figure 6; also see Figures S3 and S4 for proportions of unique and overlapping taxa detected by each pair of primer sets). Primers targeting different genes showed no clear separation in the qualitative or quantitative plot, and there was considerable variation among primers targeting the same gene (although the primers targeting 12S regions showed lower variation in detected fish diversity than the 16S and Cytb primers). Primers targeting highly similar sequences showed similar fish diversities, both qualitatively and quantitatively (e.g. Pr01_12SV5 and Pr05_12SF1/R1; Pr11_Ve16S and Pr13_Vert-16S), similar diversities qualitatively but not quantitatively (Pr03_Tele02 and Pr04_MiFish-U) or dissimilar diversities, both qualitatively and quantitatively (Pr15_Fish2b and Pr16_Fish 2deg; Pr19_L14735c and Pr20_L14735c2).

4 DISCUSSION
Given the critical importance of metabarcoding primers in eDNA-based biodiversity detection, a priori understanding of primer characteristics and bias is pivotal for an informed choice of barcode and interpretation of sequence data. However, despite the increasingly wide application of eDNA metabarcoding in fish community surveys of diverse ecosystems (e.g. Cilleros et al., 2019; Hänfling et al., 2016; Miya et al., 2015; Valentini et al., 2016), very little is known regarding the influence of primer choice on survey results, either qualitatively or quantitatively. Our study presents the most extensive evaluation of published fish metabarcoding primers to date, allowing direct comparisons between the amplification performance of 22 primer sets both in silico and in vitro.
4.1 Primer performance: Taxonomic specificity
A high taxonomic specificity for the target group is one of the crucial criteria when choosing metabarcoding primers. Inadequate taxonomic specificity for the target group would lead to excessive amplification of non-target sequences, causing swamping of the desired taxa and wasting of sequencing throughput (Collins et al., 2019). Our results reveal that almost all fish primers amplified sequences of non-fish organisms, some of which even primarily amplified these sequences (e.g. Pr10_L2513 and Pr13_Vert-16S in the in silico tests, and Pr22_Minibar in both the in silico and in vitro tests). Other vertebrates, including mammals, birds, reptiles and amphibians, often accounted for relatively large proportions of the in silico PCR products. Amplification of these groups may be undesirable in studies with a strict focus on fish communities, but can still provide valuable information regarding species diversity at the ecosystem level (Port et al., 2016).
Proportions of non-fish vertebrate sequences were generally reduced in the in vitro PCR products as compared with those in the in silico PCR products across primers. The only exception was human DNA, which was present in large quantities in the in vitro PCR products of several primer sets. We extracted eDNA from water samples collected in urban environments in Beijing, which inevitably receive various human DNA-containing discharges. Amplification of human sequences by metabarcoding primers from environmental samples has been observed previously (Kelly et al., 2014; Miya et al., 2015); hence, a number of general fish primers accompanied with a blocking oligo were designed to reduce human DNA amplification. Our in vitro PCR using these primer sets shows that the blocking oligos were highly effective in reducing the proportions of human sequences in amplicons (Figure 4). However, the addition of blocking oligos slightly reduced the number of detected fish taxa for some primers (52 vs. 50 for Pr01_12SV5 and 50 vs. 46 for Pr02_Tele01; Figure 5); therefore, apart from human DNA, blocking oligos may also inhibit primers from binding to or amplifying certain desired sequences. We suggest, therefore, for primers that primarily amplify target groups but also amplify a small portion of human DNA, increasing sequencing depth may be a preferred strategy to compensate for non-target amplification instead of adding blocking oligos.
4.2 Primer performance: Fish coverage and species resolution
A broad taxonomic coverage within the target group and a high species-level assigning power are prerequisites for the generation of comprehensive and accurate biodiversity data using metabarcoding primers. Regarding the barcode genes, 12S appears to be a highly effective target, since all seven 12S primer sets detected more than 45 fish taxa in eDNA metabarcoding analysis, and the top six primer sets that recovered the greatest numbers of fish taxa were all 12S primers (Figure 5). Among the 12S primer sets, Pr06_Ac12S and Pr07_AcMDB displayed the best performance in our study system in terms of the total fish taxa detected and species-level assignment. Pr04_MiFish-U and Pr03_Tele02 also showed outstanding detection of fish diversity. Pr01_12SV5 and Pr05_12SF1/R1 recovered slightly fewer taxa than the above four primer sets but still outperformed most other primers. The 16S primer sets showed considerable variation in their performance, detecting low to moderate numbers of fish taxa. Our results show that Cytb primers generally recovered fewer fish taxa than the 12S and 16S primers, indicating that they may be less effective for fish surveys in Beijing waters. Despite being the classic barcode gene (Hebert et al., 2003), COI appeared to be a less common target of fish metabarcoding primers than the other mitochondrial genes. Of the two general fish COI primer sets that we found in the literature, Pr22_Minibar amplified only a small number of fish sequences in silico and failed to recover any fish sequences in vitro. Pr21_PS1 detected 51 fish taxa in eDNA metabarcoding analysis, ranking seventh of all primers. Therefore, although the abundant COI sequences in reference databases are expected to offer a higher taxonomic coverage than that of other mitochondrial DNA regions, the available fish metabarcoding primers for COI did not appear to effectively amplify many local species. It has also been shown that COI primers recover fewer fish species than Pr04_MiFish-U using eDNA from freshwater and seawater samples collected around the English Channel and North Sea (Collins et al., 2019). The authors of this study suggest that the highly variable sequence within the standard COI barcode region may hinder the design of fish-specific degenerate primers for shorter amplicons, limiting the potential of COI in eDNA-based biodiversity detection.
Nevertheless, it should be noted that the community composition and complexity can be vastly different among geographic regions and different ecosystems, and primer performance in one study may not be completely transferable to another.
4.3 Comparison between in silico and in vitro PCR results
As discussed previously, in vitro PCR results may not mirror those of in silico PCR. In terms of taxonomic specificity, non-fish vertebrate sequences represented large proportions of amplicons in the in silico analysis of most primers, whereas their proportions were considerably smaller in the in vitro PCR products, most likely due to the limited abundance of non-fish vertebrate eDNA in water relative to fish eDNA. In terms of the total number of amplified fish taxa and species-level resolutions, the in vitro PCR results show both consistencies and discrepancies with those of the in silico PCR (comparing Figures 3 and 5a). For example, Pr14_FishCB, Pr15_Fish2b, Pr19_L14735c, Pr20_L14735c2 and Pr22_PS1 performed relatively poorly in both in silico and in vitro analyses; however, Pr08_Fish16S, Pr09_16SF/D and Pr13_Vert-16S, the three primer sets that amplified the most abundant fish sequences in silico, only detected low to medium levels of fish diversity in eDNA metabarcoding analysis. The causes for incongruence between the results of the in silico and in vitro analyses may be multifold, including the different taxonomic compositions between the reference databases and the actual biological community of the study system, the disparity between simulation conditions and primer binding thermodynamics in real PCR and the quality and quantity of eDNA templates in in vitro PCR. Since systematic in vitro analysis of a large number of primers is costly, time-consuming and not always attainable, in silico evaluation has been suggested as a useful alternative to bench testing. Our results suggest that in silico analysis can be used to provide initial, tentative assessments of primer specificity and taxonomic coverage; however, in vitro PCR evaluation is indispensable for the full understanding of amplification performance and the choice of reliable primers.
4.4 Primer performance: Barcode size
Due to the low quantity and degraded nature of eDNA, primers for short (typically <200 bp) barcode sequences are generally believed to offer greater amplification success. Examination of eDNA size distribution in water demonstrates that shorter mitochondrial DNA fragments were more abundant than longer ones (Bylemans et al., 2018), lending support for this notion. However, studies also show that longer eDNA fragments exist and can be amplified to provide more sequence information (Bylemans et al., 2018; Deiner et al., 2017). Nonetheless, few empirical studies have systematically investigated the effect of barcode size on metabarcoding outcome using actual environmental samples. The two best-performing primer sets in our metabarcoding analysis, Pr06_Ac12S and Pr07_AcMDB, both target longer fragments (approximately 300‒400 bp), yet provided the highest numbers of fish taxa and species-level assignments. Several other studies also used long (500‒650 bp) barcodes to successfully detect species from eDNA (Deiner et al., 2015; Egan et al., 2013). Long barcodes may suffer from reduced template concentrations but may offer greater taxonomic discriminatory power contained in longer sequences. Furthermore, biodiversity complexity of the study system and completeness of the reference databases can also complicate the effect of barcode size on taxonomic assignments, with longer barcodes being more capable of discerning closely related species but also being at increased risk of lacking reference sequences. Since reference databases are rapidly expanding, we expect that primer performance will be even less constrained by amplicon size than by DNA binding characteristics.
4.5 Recommendations
Given the strong impact of primers on metabarcoding results, a priori evaluation of primer performance is pivotal in estimating the potential influences introduced by primer bias and choosing suitable primers to generate a comprehensive and reliable biodiversity archive. Although in silico PCR can be employed for the initial assessment of primer coverage and specificity, in vitro PCR using eDNA templates should be performed to understand primer performance in an actual metabarcoding study. The suitability of primers is dependent upon the biodiversity composition of the group of interest and the study ecosystem. Since different metabarcoding primers may show divergent taxonomic ranges in amplification, multiple primer sets can be used in combination to increase taxonomic coverage and species detection probability (Evans et al., 2017; Miya et al., 2015; Shaw et al., 2016). For example, to gain a comprehensive overview of marine fish communities, primers that effectively amplify Chondrichthyes species from environmental samples should also be included, in addition to those targeting Actinopterygii species. Furthermore, efficient recovery of biodiversity and accurate taxonomic assignments from in silico evaluation and empirical metabarcoding applications rely on the completeness and sequence quality of the corresponding databases (Bylemans et al., 2018; Elbrecht & Leese, 2017). Hence, the construction of high-quality reference databases of local biological communities should be a priority in DNA-based biodiversity surveillance.
ACKNOWLEDGEMENTS
We wish to thank Yitao Zheng, Meixi Lin, Yiyan Wang, Qi Lu and Weiran Wang for their help with water sampling. Funding for the present research was provided by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) to J.Z. (Grant No. 2019QZKK0304) and to M.Y. (Grant No. 2019QZKK0503), the National Science and Technology Basic Resources Survey Program of China to M.Y. (2019FY101700), and the State Key Laboratory of Freshwater Ecology and Biotechnology (Grant No. 2020FB10) to M.Y.
CONFLICT OF INTEREST
None declared.
AUTHORS' CONTRIBUTIONS
M.Y. and S.Z. conceived and designed the study; S.Z. collected water samples, conducted the experiments, analysed the data and prepared the tables and figures with assistance of M.Y.; M.Y. wrote the manuscript with contributions from S.Z. and J.Z. All authors contributed to the critical assessment of the manuscript and approved publication.
Open Research
Peer Review
The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13485.
DATA AVAILABILITY STATEMENT
Raw Illumina sequence data (fastq format) from in vitro metabarcoding analysis are available from NCBI's SRA database BioProject ID: PRJNA655901 (https://www.ncbi.nlm.nih.gov/sra/PRJNA655901). Interactive Krona plots and associated files of in silico PCR analyses and per sample sequence reads data of in vitro metabarcoding are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.hdr7sqvfw (Zhang et al., 2020).