Volume 11, Issue 12 p. 1609-1625
RESEARCH ARTICLE
Free Access

A comprehensive and comparative evaluation of primers for metabarcoding eDNA from fish

Shan Zhang

Shan Zhang

School of Life Sciences, Peking University, Beijing, China

Institute of Ecology, College of Urban and Environmental Sciences, Peking University, Beijing, China

Search for more papers by this author
Jindong Zhao

Jindong Zhao

School of Life Sciences, Peking University, Beijing, China

Institute of Ecology, College of Urban and Environmental Sciences, Peking University, Beijing, China

State Key Laboratory of Freshwater Ecology and Biotechnology, Institute of Hydrobiology, Chinese Academy of Sciences, Wuhan, China

Search for more papers by this author
Meng Yao

Corresponding Author

Meng Yao

School of Life Sciences, Peking University, Beijing, China

Institute of Ecology, College of Urban and Environmental Sciences, Peking University, Beijing, China

Correspondence

Meng Yao

Email: [email protected]; [email protected]

Search for more papers by this author
First published: 13 September 2020
Citations: 138

Abstract

en

  1. Accurate assessments of fish species diversity and community composition are essential for understanding fish ecology and conservation management. Environmental DNA (eDNA) metabarcoding has become an integrated method for monitoring fish species. The accuracy and efficacy of eDNA metabarcoding rely heavily on the choice of primers used for PCR amplification. A wide selection of metabarcoding primers for fish has been developed; however, there exists no comprehensive and comparative evaluation of their amplification or taxonomic classification of a rich diversity of fish species, which hinders informed decisions regarding their suitability for different study systems.
  2. Here we reviewed the literature and compiled a list of 22 primer sets for eDNA-based metabarcoding analysis of teleost fish, the performance of which was compared using in silico PCR, followed by in vitro metabarcoding analysis using eDNA from waterbodies in Beijing, which harbour a high number of freshwater fish species.
  3. We found that the primers showed considerable differences in the amplified taxonomic ranges and proportions, fish taxa richness, species discrimination power and fish community compositions, both in silico and in vitro. The number of fish taxa detected from eDNA by the primer sets varied from 0 to 66. Primers targeting the 12S rRNA gene generally detected greater fish diversity than those targeting the 16S rRNA or COI genes, while primers targeting the cytochrome b gene amplified the fewest fish taxa in vitro.
  4. Regarding target genes, 12S primers generally outperformed other primers in terms of amplified fish diversity. The results of in silico PCR and in vitro tests were not always in agreement, suggesting that primer choice for biodiversity surveys should not be based solely on in silico evaluation. The use of different primers can qualitatively and quantitatively affect the detected biodiversity and these effects should be considered in experimental design and data interpretation. These results will assist with primer selection for eDNA-based fish surveys, and consequently support conservation of freshwater biodiversity.

摘要

zh

  1. 准确评估鱼类物种多样性及群落构成对于了解鱼类生态及保护管理至关重要。环境DNA (environmental DNA, eDNA)结合宏条形码技术是一种鱼类检测的新兴方法。在影响eDNA方法检测准确性和有效性的诸多因素中, PCR引物和条形码区段的选择极为关键。随着eDNA方法的发展, 不同研究者设计出多种鱼类宏条形码通用引物, 由于缺乏标准化的实验流程和具有丰富物种多样性的eDNA样品对这些引物进行全面评估, 难以比较不同引物的扩增表现和相应宏条形码对鱼类多样性的检测能力, 不利于针对不同生态系统选择最适引物进行鱼类研究。
  2. 本研究首先通过文献检索筛选得到22对应用于硬骨鱼类eDNA宏条形码研究的引物, 继而利用基于计算机模拟的in silico PCR和基于北京水体提取的eDNA进行in vitro宏条形码分析对这些引物的性能进行全面比较。
  3. In silico和in vitro的研究结果均显示这些引物在对不同生物类群的扩增范围和比例、检测鱼类物种丰富度、物种分辨度和鱼类群落构成等方面存在明显差异。在eDNA宏条形码分析中, 22对引物检测出的鱼类分类单元数范围为0~66。大部分线粒体DNA 12S基因区段的引物相比于16S和COI区段的引物检测出更高鱼类多样性, 而Cytb区段的引物扩增出的鱼类分类单元数最少。
  4. 综上, 从引物的目标基因来看, 12S区段的引物对鱼类多样性的扩增表现通常优于其他区段的引物。In silico与in vitro的引物评估结果存在差异, 因此不应仅依据in silico结果进行引物选择。不同引物的使用会对检测到的生物多样性产生定性和定量的影响, 这些影响在实验设计和数据解读中应予以充分考虑。本研究结果对基于eDNA进行的鱼类多样性调查提供了引物选择的依据, 同时为淡水生态系统的生物多样性保护提供了技术支持。

1 INTRODUCTION

With over 34,000 species described to date (http://www.fishbase.org/, accessed in November 2019), fish comprise more than half of all vertebrates and possess vital ecological and economic value (Nelson, 2006). However, a large proportion of the world's fish species is rapidly disappearing due to a myriad of factors including habitat disruption, over exploitation, climate change, pollution, infectious diseases and foreign species invasion (Jelks et al., 2008; Jeppesen et al., 2010; Jones et al., 2004; Reid et al., 2019). Effective conservation and management of fish diversity rely on a deep understanding of the ecology and dynamics of fish communities, which is only possible if fish community assemblages can be accurately and efficiently assessed in freshwater and marine ecosystems (Rees et al., 2014). Surveying large aquatic environments is particularly challenging and often requires specialised equipment and extensive fieldwork. Traditional fish surveys are generally capture based (e.g. netting, cage-trapping and electrofishing) and are invasive for the biological community of study, which violates the original intension of biodiversity conservation. Furthermore, morphology-based species identification is error-prone in closely related taxa and for early life stages, requiring substantial taxonomic expertise that is currently in great shortage (Wheeler et al., 2004).

Environmental DNA (eDNA)-based species detection has revolutionised the manner in which biodiversity is surveyed and has proven to be a highly effective, efficient, economical and non-invasive approach to biomonitoring (Bohmann et al., 2014; Deiner et al., 2017; Taberlet et al., 2012). Coupled with metabarcoding (i.e. the simultaneous amplification of DNA from multiple organisms using universal PCR primers) and high-throughput sequencing technologies, eDNA methods have demonstrated great success in biodiversity surveys by revealing comparable or higher species richness and uncovering hidden biodiversity at a fraction of the cost of traditional surveys, thus being increasingly integrated into aquatic biomonitoring (Miya et al., 2015; Shaw et al., 2016; Valentini et al., 2016).

Since eDNA-based biomonitoring is still a novel and rapidly advancing field, many aspects of eDNA methods await validation and adaptation to specific systems and research questions (Diaz-Ferguson & Moyer, 2014; Goldberg et al., 2016). Among the existing technical issues, the choice of PCR primers for amplification of target sequences (i.e. barcodes) is possibly one of the most influential factors in determining the detection probability of specific species or taxonomic groups (i.e. community survey; Alberdi et al., 2018; Freeland, 2017). For species-specific detection, low-efficiency primer binding to DNA of the target species can lead to a lack of amplification or amplification of non-target sequences, causing false negative or false positive identification respectively. For community surveys, universal primers are applied to amplify DNA from a group of organisms that are taxonomically close, and therefore have conserved primer binding sequences, while amplified barcodes should possess variable sites between different species, allowing for taxonomic assignments (Valentini et al., 2009). Ideally, universal primers should possess high specificity for the target group, broad intra-group coverage and even amplification across species, in addition to high discriminatory power (i.e. sufficient sequence variation between amplicons from different species) to ensure comprehensive and accurate species identification (Coissac et al., 2012; Riaz et al., 2011). Moreover, for use with environmental samples that often contain trace amounts of highly degraded DNA from the study organisms, a small barcode size (usually <200 bp) is likely to have a greater PCR success rate (Bylemans et al., 2018; Freeland, 2017). Furthermore, the effectiveness of metabarcoding-based biodiversity analyses is heavily dependent on the completeness and accuracy of the relevant reference databases, since a poor reference prevents taxonomic assignment of species for which information is lacking. These criteria for metabarcoding primers, that is, high taxonomic specificity and coverage, low species bias, high taxonomic resolving power and high-quality reference databases, may not be simultaneously fulfilled with a single primer set. For instance, longer barcodes may exhibit superior variability between species, thus allowing more robust species identification, but may be prone to higher DNA amplification failure rates (Bylemans et al., 2018). Additionally, highly degenerate primers can show superior taxonomic coverage, likely at the expense of taxonomic specificity and non-target amplification. Therefore, primer characteristics and potential biases should be evaluated prior to full-scale surveys, and the most suitable primers should be chosen for each study system to ensure effective profiling of the community of interest.

Given the key roles that fish communities play in aquatic ecosystem function and the fishing industry, it is not surprising that much effort has been exerted investigating freshwater and marine fish assemblages using eDNA metabarcoding methods (e.g. Valentini et al., 2016; Yamamoto et al., 2017). However, different research groups have developed a number of universal primers for fish community surveys, each with success in describing local fish diversity; nevertheless, it is difficult to compare the amplification performance and detection power of these primers across studies. Most studies have conducted eDNA-based fish community surveys using a single set of primers without prior evaluation of potential primer bias for the study communities (e.g. Balasingham et al., 2017; Kelly et al., 2014). Some studies have compared the performance of several universal primer sets; however, only in silico PCR was used without subsequent in vitro evaluation (e.g. Valentini et al., 2016). More recently, a few studies have tested more than one primer pair using eDNA metabarcoding analysis, but only a small number (2–6) of primer sets were evaluated, and mostly using samples containing relatively low fish diversities (Bylemans et al., 2018; Collins et al., 2019; Evans et al., 2016, 2017; Hänfling et al., 2016; Shaw et al., 2016). For researchers who are interested in using eDNA-based methods to survey fish diversity but are new to the field, choosing from dozens of published primers can be overwhelming and largely guesswork. Since primer amplification efficacy and coverage can vary considerably according to the biodiversity complexity and taxonomic composition of the study system (Bellemain et al., 2010; Clarke et al., 2014), a standardised, comparative test of a comprehensive panel of primers using a taxonomic-rich DNA pool is necessary for a thorough evaluation of primer efficacy and bias, which will greatly aid primer selection for fish biodiversity research.

The overall objective of the present study was to evaluate and compare the amplification efficiency, taxonomic specificity and coverage and species resolution power of metabarcoding primers for eDNA-based fish community surveys, with a view to provide guidance for researchers who are interested in investigating fish diversity using the eDNA method. We first searched Web of Science for published primer sets that had been used in eDNA-based fish metabarcoding, and subsequently evaluated their performance using in silico PCR against all standard sequences in the EMBL database. To evaluate the primer performance in vitro, we conducted metabarcoding analysis using these primers and pooled eDNA extracted from over 100 waterbodies in Beijing, which collectively harbour almost 100 native and introduced freshwater fish species (Zhang et al., 2011). We compared the primer performance in terms of taxonomic specificity, fish coverage and species resolution via in silico PCR and in vitro metabarcoding analysis, from which we have made recommendations for primer selection for eDNA-based fish biodiversity surveys.

2 MATERIALS AND METHODS

2.1 Literature review and primer search

To compile a comprehensive list of primers currently used in eDNA-based fish community studies, we searched the Web of Science Core Collection on 1 November 2018 using the search topic: fish AND (community OR diversity) AND ‘eDNA’ AND (barcoding OR metabarcoding). This search resulted in a total of 66 papers, which we screened for primers that were described to target teleost fish in general (Class Acinopterygii or Infraclass Teleostei) or higher taxonomic groups that included fish (e.g. vertebrates). Since the aim of the present study was to compare primer performance for metabarcoding analysis of teleost-dominated freshwater fish communities, we excluded primers that were described as specific for elasmobranchs (sharks and rays; e.g. primers in Bakker et al., 2017; Chang et al., 2017).

2.2 In silico PCR analysis

To construct a sequence database for in silico PCR evaluation, we first downloaded all standard sequences from the EMBL-European Nucleotide Archive (embl_r138; ftp://ftp.ebi.ac.uk/pub/databases/embl/release/std/) and retrieved taxonomic information from the NCBI (ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz) on 14th January 2019. Subsequently, we constructed a sequence database organised by taxonomic groups using the sequences and taxonomic information and the OBICONVERT command in the OBITools software package (Boyer et al., 2016). We evaluated the amplification coverage and taxonomic resolution of the fish metabarcoding primers using in silico PCR implemented in the ECOPCR program (Ficetola et al., 2010) with the following settings: (a) amplicon length in the range of 15‒500 bp (excluding primers) and (b) a maximum of three mismatches between each primer and the target sequence and no mismatches in the last two nucleotides at the 3ʹ end of the primer (Bellemain et al., 2010; Epp et al., 2012). The in silico PCR was first conducted against all taxa to assess the taxonomic specificity of the primers. To preclude the influence of various numbers of sequences for each species in the database, the number of unique NCBI taxonomy IDs (TaxID; each representing a distinct taxon), instead of the number of sequences, were recorded for the amplified taxa. To further evaluate the primers' taxonomic coverage and species resolution for fish, we conducted in silico PCR for Actinopterygii (NCBI TaxID 7898). All sequences within the tested fish taxa were retained, and we used the ECOTAXSTAT command to calculate the number of amplified species and the ECOTAXSPECIFICITY command (both in the OBITools software package; Boyer et al., 2016) to assess the resolving power of the primers at the species level. A species was considered unambiguously assigned if its sequences differed from all other amplified sequences by at least 1 bp, regardless of the amplicon size. We used the Krona program (command ktImportTaxonomy) to visualise the taxonomic compositions of the sequences amplified by in silico PCR (Ondov et al., 2011).

2.3 Water sampling

To evaluate the ability of the primers to amplify actual fish DNA in vitro, we performed metabarcoding analysis using eDNA extracted from water sampled in Beijing (39°26′–41°03′N, 115°25′–117°30′E; area size 16,800 km2). Beijing is rich in freshwater ecosystems, including more than 100 rivers and over 120 ponds, lakes and reservoirs (Pan, 2004). Fish species in Beijing are typical of those found in temperate zones of North China. Historical specimen records since the 1920s and more recent netting surveys have described a total of 93 native and introduced fish species distributed in the Beijing area, belonging to 13 orders, 23 families and 73 genera (Zhang et al., 2011). Composed of 61 species, Cypriniformes is the largest taxonomic group at the order level, of which Cyprinidae (51 species) and Cobitidae (10 species) are the predominant families. Perciformes (10 species) and Siluriformes (five species) are the next largest fish orders in Beijing. Comparison of historical records with recent survey results indicates that native fish species have reduced considerably from the mid-20th century, whereas introduced species currently dominate many waterbodies in Beijing (Zhang et al., 2011).

We collected water samples from 104 freshwater sites distributed throughout Beijing, including 55 river sites, 40 ponds and lakes, six reservoirs and three wetlands, from April to June 2018. At each sampling site, 3 × 1 L water was collected from 5 to 10 cm below the surface and approximately 0.5 m from the shore using disposable 1-L plastic bottles. Water samples were transported on ice to the laboratory at Peking University and filtered within 12 hr.

2.4 eDNA preparation, PCR and sequencing

Water samples were filtered through a 1.2-µm pore size MCE Membrane filter (47 mm diameter; Merck Millipore Ltd., Tullagreen, Carrigtwohill Co.). Filtrations were carried out in a clean room designated for eDNA processing only. All equipment and utensils were cleaned with a 10% bleach solution prior to sample processing and between samples. Samples were processed in small batches (samples from 2 to 8 sites per batch) to reduce cross-contamination. To control for contamination during filtration, filtration blanks that filtered 1 L ddH2O were included before filtration of samples from each site. Filters were stored in individual sterile microcentrifuge tubes at −20°C until DNA extraction. DNA was extracted from each filter using the DNeasy Blood and Tissue Kit (Qiagen GmbH) as described previously (Zhang et al., 2020). The pellets at the final step were rehydrated with 100 µl AE buffer (Qiagen). Equal volumes of DNA extracted from each water sample were pooled and used as a template in the in vitro PCR.

With the exception of Pr10_L 2513, which showed poor amplification in the in silico PCR, the other 21 primer sets were evaluated by in vitro PCR and metabarcoding analysis. For each primer set, we carried out 18 replicate PCRs with the eDNA extract and six negative control PCRs containing all other reagents except eDNA. Since human DNA is often abundant in urban environments and can constitute a large proportion of PCR products from eDNA, two primer sets (Pr01_12SV5 and Pr02_Tele01) were each designed with a blocking oligonucleotide to inhibit the amplification of human DNA (Shehzad et al., 2012; Valentini et al., 2016; Table S1). For these two primer sets, we carried out 18 eDNA PCRs and six negative control PCRs with and without the blocking oligos. A ‘CC’ followed by a unique 7-nucleotide sequence tag was added to both the forward and reverse primers in each PCR to enable the identification of different PCR products during sequencing data analysis (Coissac, 2012). Each uniquely tagged primer pair was used in three PCRs (i.e. six different tags for eDNA PCRs and two different tags for the negative control PCRs using each primer set) to ensure that the quantity of PCR products was sufficient for sequencing. Each PCR was conducted in a total volume of 25 µl, including 6 µl pooled eDNA extract (diluted 10-fold to reduce PCR inhibitors), 0.2 µM forward and reverse primers, 4 µM blocking oligo (where applicable) and 0.4 µg/µl BSA in 1 × Premix Ex Taq (Takara Bio Inc.), which has a 4.5 times higher fidelity than standard Taq. We first tested amplification of each primer set using both the PCR program described in the original design paper (summarised in Table S1) and a general program used for eDNA amplification (see below). Agarose gel electrophoresis results indicated that our general program consistently yielded stronger amplification than that of the original programs (data not shown); therefore, we used this general program for all primer sets, which included an initial denaturation step at 95°C for 10 min; followed by 45 cycles of 95°C for 30 s, annealing temperature (Ta) for 30 s and 72°C for 1 min; and a final elongation step at 72°C for 10 min. We determined the optimal Ta for each primer set by performing 6‒7 replicate PCRs using the general program at a different Ta gradient spanning a range of 10‒20°C centred around the Ta used in the original design papers. The gradient PCR experiment was carried out twice to check for consistency of amplification patterns. PCR products were visualised using agarose gel electrophoreses, and the Ta showing the best amplification was chosen (Table 1).

Table 1. Summary of 22 fish metabarcoding primer sets retrieved from the literature and analysed in the present study
Primer code Original name Target groupa Target gene Design paper Amplicon size (bp)b Ta (°C)c Primer sequences (5'–3')
Pr01 12SV5 12SV5 Vertebrata 12S Riaz et al. (2011) 99 (5.99) 60

For-TAGAACAGGCTCCTCTAG

Rev-TTAGATACCCCACTATGC

Pr02 Tele01 Teleo Teleostei 12S Valentini et al. (2016) 63 (9.40) 60

For-ACACCGCCCGTCACTCT

Rev-CTTCCGGTACACTTACCATG

Pr03 Tele02 Tele02 Teleostei 12S Taberlet et al. (2018) 167 (6.62) 63

For-AAACTCGTGCCAGCCACC

Rev-GGGTATCTAATCCCAGTTTG

Pr04 MiFish-U MiFish-U Teleostei 12S Miya et al. (2015) 171 (5.67) 60

For-GTCGGTAAAACTCGTGCCAGC

Rev-CATAGTGGGGTATCTAATCCCAGTTTG

Pr05 12SF1/R1 12SF1/R1 Vertebrata 12S Riaz et al. (2011) 106 (5.38) 60

For-ACTGGGATTAGATACCCC

Rev-TAGAACAGGCTCCTCTAG

Pr06 Ac12S Ac12S Actinopterygii 12S Evans et al. (2016) 389 (8.15) 65

For-ACTGGGATTAGATACCCCACTATG

Rev-GAGAGTGACGGGCGGTGT

Pr07 AcMDB AcMDB07 Actinopterygii 12S Bylemans et al. (2018) 281 (7.63) 60

For-GCCTATATACCGCCGTCG

Rev-GTACACTTACCATGTTACGACTT

Pr08 Fish16S Fish16S Fish 16S Shaw et al. (2016) 68 (8.27) 60

For-GGTCGCCCCAACCRAAG

Rev-CGAGAAGACCCTWTGGAGCTTIAG

Pr09 16SF/D Fish16SF/D-2R Fish 16S DiBattista et al. (2017) 203 (18.65) 60

For-GACCCTATGGAGCTTTAGAC

Rev-CGCTGTTATCCCTADRGTAACT

Pr10 L2513 L2513/H2714 Vertebrata 16S Kitano et al. (2007) 202 (6.93) 60

For-GCCTGTTTACCAAAAACATCAC

Rev-CTCCATAGGGTCTTCTCGTCTT

Pr11 Ve16S Ve16S Vertebrata 16S Evans et al. (2016) 312 (25.91) 63

For-CGAGAAGACCCTATGGAGCTTA

Rev- AATCGTTGAACAAACGAACC

Pr12 Ac16S Ac16S Actinopterygii 16S Evans et al. (2016) 336 (14.47) 63

For-CCTTTTGCATCATGATTTAGC

Rev-CAGGTGGCTGCTTTTAGGC

Pr13 Vert-16S Vert-16S Vertebrata 16S Vences et al. (2016) 264 (34.49) 65

For-AGACGAGAAGACCCYdTGGAGCTT

Rev- GATCCAACATCGAGGTCGTAA

Pr14 FishCB FishCBL/CBR Fish Cytb Thomsen et al. (2012) 90 (0.28) 60

For-TCCTTTTGAGGCGCTACAGT

Rev-GGAATGCGAAGAATCGTGTT

Pr15 Fish2b Fish2bCBR/CBL Fish Cytb Thomsen et al. (2012) 40 (0.03) 60

For-GATGGCGTAGGCAAACAAGA

Rev-ACAACTTCACCCCTGCAAAC

Pr16 Fish2deg Fish2degCBL/CBR Fish Cytb Thomsen et al. (2012) 40 (0.02) 58

For-ACAACTTCACCCCTGCRAAY

Rev-GATGGCGTAGGCAAATAGGA

Pr17 L14912 L14912/H15149 Teleostei Cytb Miya and Nishida (2000) 235 (0.47) 54

For-TTCCTAGCCATACAYTAYAC

Rev-GGTGGCKCCTCAGAAGGACATTTGKCCYCA

Pr18 L14841 L14841/H15149 Vertebrata Cytb Kocher et al. (1989) 307 53

For-AAAAAGCTTCCATCCAACATCTCAGCATGATGAAA

Rev-AAACTGCAGCCCCTCAGAATGATATTTGTCCTCA

Pr19 L14735c L14735/H15149c Vertebrata Cytb Burgener and Hübner (1998) 413 (5.85) 57

For-AAAAACCACCGTTGTTATTCAACTA

Rev-GCCCCTCAGAATGATATTTGTCCTCA

Pr20 L14735c2 L14735/H15149c2 Vertebrata Cytb Hänfling et al. (2016) 413 (5.84) 57

For-AAAAACCACCGTTGTTATTCAACTA

Rev-GCDCCTCARAATGAYATTTGTCCTCA

Pr21 PS1 PS1 Fish COI Balasingham et al. (2018) 247 52

For-ACCTGCCTGCCGTATTTGGYGCYTGRGCCGGRATAGT

Rev-ACGCCACCGAGCCARAARCTYATRTTRTTYATTCG

Pr22 Minibar Uni-Minibar Eukaryota COI Meusnier et al. (2008) 127 (4.29) 53

For-TCCACTAATCACAARGATATTGGTAC

Rev-GAAAATCATAATGAAGGCATGAGC

  • a Target group as described in the original design paper.
  • b Mean and standard deviation (in parentheses) of the amplicon size (excluding primers) estimated by the in silico PCR analysis, with the exception of Pr18_14841 and Pr21_PS1, which yielded no sequence in our in silico PCR, and their mean amplicon sizes were retrieved from the original design paper.
  • c Optimal annealing temperature determined in the present study.

PCR products of each primer set were pooled separately for sequencing library construction. The 24 PCR products (18 eDNA PCRs and six negative controls) of each primer set were mixed at equal volume and purified using an EasyPure PCR Purification Kit (TransGen Biotech), and DNA concentrations were measured on a NanoDrop 2000 spectrophotometer (Thermo Scientific). Further quality and quantity examination, library preparation, and paired-end sequencing were carried out by the BGI Sequencing Service in Wuhan, China. All sequencing libraries were constructed using PCR-free adaptor ligation and the BGI's own index system. Together, 23 Illumina sequencing libraries were constructed, including 21 amplified using different primer sets and an additional two using primers with blocking oligos. The libraries containing PCR products under 220 bp (including primers) were sequenced on an Illumina HiSeq X Ten platform (Illumina Inc.), and the libraries containing PCR products over 220 bp were sequenced on an Illumina HiSeq 2500 platform (Table S1). We used two sequencing platforms because the HiSeq X Ten is an ultra-high-throughput sequencer that can generate sequencing data at considerably reduced time and cost as compared with other HiSeq platforms, yet the maximum read length is restricted to ~150 nucleotides (maximum length of a pair-end sequence ~300 bp). The HiSeq 2500, in contrast, can generate sequencing reads up to 250 nucleotides on each DNA strand (maximum length of a pair-end sequence ~500 bp) but the per-base cost is over eight times that of the X Ten at BGI. Library preparation protocols were identical for the two sequencing platforms, except for the size of the target DNA.

2.5 Bioinformatic processing of sequencing data

The raw sequencing reads were analysed using the OBITools software package (Boyer et al., 2016), which is commonly used in bioinformatics filtering of metabarcoding data (Bylemans et al., 2018; Valentini et al., 2016; West et al., 2020). We used the ILLUMINAPAIREDEND program to align forward and reverse sequences, and the OBIGREP program to remove aligned sequences with low (<40) quality scores. Sequences with no mismatches in tags and a maximum of two mismatches in primers were identified using the NGSFILTER program and retained for further analysis. The OBIUNIQ program was used to cluster identical sequences. Sequences <15 bp or with a total count in the sequence library <10 were removed using the OBIGREP program. Putative PCR and sequencing errors were detected and removed using the OBICLEAN program. OBICLEAN identifies putative chimeras and erroneous sequences such as indel or substitution errors based on sequence abundance and similarity to the most common sequences (Boyer et al., 2016). We used the setting of abundance ratio = 0.5 and sequence difference = 1 to assign the status of ‘head’, ‘internal’ or ‘singleton’ to each sequence within a PCR, and ‘internal’ sequences were discarded as they most likely represented errors (De Barba et al., 2014). To eliminate sequences likely introduced by cross-contamination or tag jumps (Schnell et al., 2015), the larger count of a sequence in the two negative control PCRs of a primer set was subtracted from the sequence count in each eDNA PCR.

Each unique sequence was searched using BLAST in the GenBank nucleotide database for assignment to broad taxonomic groups based on the max score of sequence identity. We considered the following groups: mammal (excluding human), bird, reptile, amphibian, fish, other chordate, invertebrate, protist, plant, fungi, non-eukaryote, other root (sequences not belonging to the other groups) and human.

Fish taxonomic assignments were further refined using indigenous and introduced species records from the study area (Xiong et al., 2015; Zhang et al., 2011, 2016). We excluded sequences with <97% identity to the query sequence with a view to increasing the accuracy of taxonomic assignments. A unique molecular operational taxonomic unit (MOTU; Blaxter, 2004) was assigned to each fish sequence. We used the following criteria for assignment of MOTUs: (a) if the query sequence matched a single species with max score, the species was assigned; (b) if the query matched more than one species with max score, a species was assigned based on knowledge of the distribution and the lowest taxonomic level that included all locally occurring and introduced species with the highest identity scores. To avoid misidentification of taxa due to insufficient local species records, we kept to conservative taxonomic assignments and only excluded species that showed no occurrence in the entirety of North China. Sequences with the same taxonomic assignments were combined for each primer set.

2.6 Statistical analysis of sequencing data

To evaluate the effects of sequencing depth on the number of detected fish taxa, rarefaction curves were constructed for each primer set with increasing sequencing depth (i.e. number of sequence reads) using the RARECURVE function in the r package vegan (Oksanen et al., 2019). To obtain an equal sequencing depth for all primers, the sequence reads of each uniquely tagged PCR were rarefied to 48,000 (the minimum number of reads per tag was 48,308) using the r package GUniFrac (Chen et al., 2012).

Fish diversity data detected in eDNA metabarcoding analysis were summarised both qualitatively, that is, as a binary presence (i.e. a taxon detected in one or more replicate PCRs) or absence (i.e. a taxon not detected in any of the replicate PCRs) matrix, and quantitatively as proportional read abundances (i.e. percentage of reads for each fish taxon, averaged across replicate PCRs) for each primer set. Dissimilarities between the qualitative and quantitative fish diversity detected with different primers were visualised by non-metric multidimensional scaling (NMDS) analysis using the species presence/absence-based Jaccard coefficient and the proportional read abundance-based Bray–Curtis coefficients respectively. Both the Jaccard and Bray–Curtis coefficients range from 0 to 1, with a value of 0 indicating completely identical community compositions, and a value of 1 indicating that the communities share no common taxa. Calculation of the coefficients and NMDS analysis were conducted using the VEGDIST and METAMDS commands in vegan respectively. All statistical tests were performed using r.

3 RESULTS

3.1 Literature review and primer search

From the literature search, we identified a total of 22 primer sets (coded Pr01‒Pr22) that have been used in metabarcoding studies of freshwater or marine fish. The primers, amplicon lengths and references for their original design and applications in metabarcoding analysis are shown in Table 1. A summary of the detailed primer information and amplification conditions is provided in Table S1. These primers were designed to target fish in general (Actinopterygii or Teleostei; 13/22) or originally designed for all eukaryotes or vertebrates (9/22; Table 1). The target sequences of the primers all resided within four genes of mitochondrial DNA, including 12S rRNA (‘12S’; n = 7), 16S rRNA (‘16S’; n = 6), cytochrome b (‘Cytb’; n = 7) and cytochrome c oxidase I (‘COI’; n = 2).

Locations of the target sequences also varied among primer sets. While 12S, 16S and Cytb primer sets were scattered along most of the gene lengths, the two COI primer sets both targeted the 5' end of the gene (Figure 1). Some of the primer sets shared primer binding sites (with slightly different primer sizes or degenerate nucleotides) and targeted almost identical sequences, such as Pr01_12SV5 and Pr05_12SF1/R1, Pr03_Tele02 and Pr04_MiFish-U, Pr15_Fish2b and Pr16_Fish2deg and Pr19_L14735c and Pr20_L14735c2. Target sequences of some primer sets overlapped or nested within each other, such as Pr01_12SV5/Pr05_12SF1/R1 and Pr06_Ac12S; Pr08_Fish16S, Pr09_16SF/D, Pr13_Vert-16S and Pr11_Ve16S; Pr17_L14912, Pr18_L14841 and Pr19_L14735c/Pr20_L14735c2; and Pr22_Minibar and Pr21_PS1 (Figure 1). We included all 22 primer sets in the in silico PCR analysis.

Details are in the caption following the image
Locations of the 22 fish metabarcoding primer pairs and amplicons on the target mitochondrial genes (a) 12S, (b) 16S, (c) Cytb and (d) COI. Gene sequences of the common carp Cyprinus carpio (GenBank Acc. No. MK088487) were used as templates. Note that the amplicon sizes of the primer sets may vary depending on the fish species

3.2 In silico PCR analysis

The results of the in silico PCR analysis using all standard sequences from the EMBL database indicated considerable differences between the primers with regard to taxonomic specificity and coverage, recovered species richness and taxonomic resolution power. Pr18_L14912 and Pr21_PS1 amplified no sequence, possibly due to the exceedingly long primer lengths. For the other 20 primer sets, the number of amplified taxa ranged from 947 to 61,757 (Table S2 and interactive Krona plots, Zhang et al., 2020). All 20 primer sets amplified both fish and non-fish taxa, but the proportions of fish taxa (all fish species included) varied considerably from 1.4% (Pr10_L2513) to 83.7% (Pr08_Fish16S; Figure 2). The overall proportions of fish sequences were relatively consistent among 12S (range: 35.5%‒49.5%) and Cytb (42.7%‒62.5%) primers, but varied markedly among 16S primers (1.4%‒83.7%). Sequences of mammals, birds, reptiles, invertebrates and prokaryotes were often amplified by the primers, and some primers primarily showed amplifications of non-fish taxa. For instance, Pr10_L2513 mainly amplified mammalian taxa (81.5%), and invertebrate taxa accounted for large proportions in the amplicons of Pr13_Vert-16S (47.8%) and Pr22_Minibar (57.2%; Figure 2).

Details are in the caption following the image
Taxonomic distributions of amplified sequences in the in silico PCR analysis using the fish metabarcoding primer sets against all standard sequences in the EMBL database. Taxonomic designation followed the NCBI taxonomy, and ‘Other’ indicates sequences not assigned to known taxonomic groups

To evaluate the primer amplification performance for fish, in silico PCR was performed against standard sequences of Actinopterygii species. A total of 28,539 Actinopterygii species were recovered from the database. With the exception of Pr18_L14912 and Pr21_PS1 that amplified no sequence, the other primers amplified 22 (Pr10_L2513) to 10,830 (Pr13_Vert-16S) species, and the number of unambiguously identified species varied from 18 (Pr10_L2513) to 7,081 (Pr13_Vert-16S), representing 22.9%‒83.6% of the amplified Actinopterygii sequences (Figure 3; Table S3).

Details are in the caption following the image
The taxonomic coverage and species resolution within Actinopterygii estimated by the in silico analysis for the fish metabarcoding primer sets. The total number of species amplified and the number of distinct species that could be unambiguously assigned are shown

The fish amplicon size of the primers, estimated by in silico PCR (except Pr18 and Pr21, for which the amplicon sizes were retrieved from the original design paper), also showed large variations. The average amplicon lengths varied from 40 to 412 bp, with six primer sets (Pr01, Pr02, Pr08, Pr14, Pr15 and Pr16) amplifying fragments smaller than 100 bp, four (Pr03, Pr04, Pr05 and Pr22) in the approximate range of 100–200 bp, six (Pr07, Pr09, Pr10, Pr13, Pr17 and Pr21) in the range of 200–300 bp and six (Pr06, Pr11, Pr12, Pr18, Pr19 and Pr20) in the range of 300–420 bp (Table 1).

3.3 eDNA metabarcoding analysis

Sequencing results were successfully obtained from all primer sets tested using in vitro PCR and metabarcoding analysis. Approximately 1.06‒4.05 Gb raw data were generated from each of the 23 sequencing libraries (see Section 2). Following sequence pairing, clustering and quality filtering steps, 356,162‒10,954,329 total sequence reads were retained for each library, with an average read count of 447,441 ± 402,824 (mean ± SD; range: 59,360‒1,696,670) for each uniquely tagged PCR containing eDNA templates. The number of unique sequences varied from 58 to 2,990 among the libraries. Sequence reads of negative control PCR were on average 2% of those of the corresponding PCR with eDNA but varied considerably (<1.0E−6‒22.8%) across libraries. Most of these sequences were of non-fish origin, as fish sequence reads in the negative controls were on average 0.02% (range: <1.0E−6‒0.1%) of those in the corresponding eDNA PCR across libraries. Summaries of sequence read counts after each filtering step of the sequencing libraries are shown in Table 2; Table S4 and detailed sequence reads data for individual PCR are displayed in the Dryad Digital Repository (Zhang et al., 2020).

Table 2. Summary of Illumina read counts and unique sequences following each bioinformatics filtering step
illumiapairedend ngsfilter Obiuniq_seq Obiclean_seq Obiclean_reads Average reads per PCR Fish seqa Fish taxab
Pr01_12SV5 6,813,530 5,968,766 151,654 1,578 4,359,954 713,498 1,162 52
Pr01_12SV5-blk 6,521,380 5,649,616 172,508 1,840 4,066,989 674,863 1,155 50
Pr02_Tele01 5,676,000 4,233,160 48,056 692 3,482,684 565,270 601 50
Pr02_Tele01-blk 4,797,908 3,479,559 5,005 511 2,861,212 476,298 487 46
Pr03_Tele02 9,661,414 8,365,152 699,635 2,633 4,995,621 832,331 2,452 56
Pr04_MiFish-U 7,549,436 6,020,313 429,961 2,898 3,622,312 603,543 2,808 62
Pr05_12SF1/R1 4,941,575 4,377,441 136,856 1,469 3,068,448 499,465 1,177 53
Pr06_Ac12S 2,114,179 1,695,303 886,819 2,732 412,185 66,598 2,608 66
Pr07_AcMDB 2,114,158 1,624,734 488,463 2,990 678,232 112,994 2,914 66
Pr08_Fish16S 9,699,109 8,518,926 70,528 728 6,933,469 1,142,364 605 51
Pr09_16SF/D 2,114,147 1,822,099 93,785 320 1,152,637 192,058 297 16
Pr11_Ve16S 2,114,144 1,802,625 460,976 1,879 704,030 117,338 1,806 45
Pr12_Ac16S 2,114,177 1,745,236 395,516 1,135 606,751 101,125 1,120 34
Pr13_Vert-16S 2,114,152 873,698 217,280 1,371 356,162 59,360 1,268 51
Pr14_FishCB 3,833,686 3,538,200 25,158 58 2,960,388 493,395 56 11
Pr15_Fish2b 13,511,011 12,443,896 28,841 468 10,954,329 1,696,670 51 12
Pr16_Fish2deg 4,540,291 4,098,848 14,573 258 3,443,501 543,332 52 30
Pr17_L14912 2,114,140 1,678,769 180,245 1,348 1,025,852 170,971 748 36
Pr18_L14841 2,114,163 1,592,968 277,459 442 739,083 123,181 348 24
Pr19_L14735c 2,114,172 1,620,669 696,721 1,051 431,259 71,877 1,042 28
Pr20_L14735c2 2,114,240 1,681,971 741,346 695 404,242 67,374 679 31
Pr21_PS1 2,114,166 1,606,006 149,175 1,268 1,035,289 172,514 1,082 51
Pr22_Minibar 8,859,336 7,118,022 351,347 2,574 4,769,179 794,731 1 0

Note

  • ‘-blk’ indicates the addition of human DNA-blocking oligonucleotides to the PCR.
  • a All unique fish sequences yielded from OBITools.
  • b Fish taxa identified following sequence BLAST, merging of identical taxa and rarefaction of all PCRs to identical reads.

We first compared the taxonomic coverage of the primer sets across all taxa. Human sequences were analysed separately from other mammalian sequences to examine their amplification and the effect of blocking oligos (see Section 2). Fish sequences were the most abundant (over 50%) of all taxa in the amplicons of most primers, with the exception of Pr18_L14841 and Pr22_Minibar, which predominantly amplified non-fish taxa, and fish sequences only accounted for 29% and <0.001% respectively (Table S5). Non-fish vertebrate (mammal, bird, reptile and amphibian) sequences were often amplified but present at low proportions. On average, human DNA accounted for no more than 1% of the PCR products of most primers, with the exception of the products of Pr03_Tele02 (11.0%) and Pr18_L14841 (44.9%). Addition of human DNA-blocking oligos to the PCRs significantly reduced the mean proportions of human sequences in the PCR products from 0.265% to 0.0268% with Pr01_12SV5 and 3.43% to 0.0107% with Pr02_Tele01 (Figure 4; Table S5).

Details are in the caption following the image
Taxonomic distributions of sequences amplified using the fish metabarcoding primer sets in the in vitro metabarcoding analysis of eDNA from Beijing water samples. ‘Other’ indicates sequences not assigned to known taxonomic groups. Percentages of sequences represent the mean values of six uniquely tagged PCR for each primer set. ‘-blk’ indicates the addition of human DNA-blocking oligonucleotides to the PCR

After combining the results from all tested primer sets, we detected 202 fish taxa, covering 164 species, 24 genera, two subfamilies, five families, two suborders and five taxa above order. The detected fish species included 50 species previously documented in Beijing (Zhang et al., 2011), 70 non-local Chinese species and 44 foreign species. Limited sampling of aquatic habitats, incomplete reference databases for local species and rapid disappearance of native fish may account for the fewer native species recovered by our eDNA method as compared with historical surveys. The newly detected species most likely represent recent intentional and accidental introductions for aquarium trade, aquaculture, biological control as a result of fishery transport and water facility construction (Lin et al., 2015; Xiong et al., 2015).

Rarefaction curves of the number of detected taxa plotted with increasing sequence reads revealed that our sequencing depths were sufficient to recover the amplified taxa for all primer sets (Figure S1). To standardise sequencing depth across primers, we rarefied all uniquely tagged PCRs to identical sequence reads for the analysis of fish taxa (see Section 2). No sequence amplified using Pr22_Minibar matched fish sequences with ≥97% identity; hence, this primer set was removed from further analysis. The other 20 primer sets amplified between 11 (Pr14_FishCB) and 66 (Pr06_Ac12S and Pr07_AcMDB) fish taxa, considering all taxonomic levels within Actinopterygii (Figure 5a; Table 2; Table S6). Primers targeting 12S regions showed good overall detection of fish, with all seven tested primer sets amplifying 46‒66 fish taxa. Cytb primers generally detected fewer fish than other primers, with two primer sets, Pr14_FishCB and Pr15_Fish2b, only amplifying 11 and 12 fish taxa, respectively, and the greatest number of fish taxa amplified by Cytb primers was a mere 36 (Pr17_L14912). Species-level assignments accounted for 63.6%‒87.1% of all fish taxa detected by each primer set (Figure 5a; Table S7).

Details are in the caption following the image
(a) Distribution of taxonomic classification levels of fish taxa and (b) the number of taxa by fish orders detected using the primer sets in the in vitro metabarcoding analysis. ‘-blk’ indicates the addition of human DNA-blocking oligonucleotides to the PCR

When fish taxa were analysed by taxonomic order, taxa of Cypriniformes consistently accounted for the largest portions (48.0%‒81.8%) of taxa for all primers except Pr09_16SF/D (0%; Figure 5b). Gobiiformes was overall the second most abundant order. Other frequently detected orders included Siluriformes, Anabantiformes and Synbranchiformes (Figure 5b; Table S7). Proportional read abundances of different fish taxa were considerably similar across replicate PCRs (Figure S2), indicating a high consistency in the amplification performance of the primers.

Different primer sets generated different profiles of fish diversity from the same eDNA template, both qualitatively and quantitatively, as shown by the presence/absence-based Jaccard and proportional reads-based Bray–Curtis matrix (Figure 6; also see Figures S3 and S4 for proportions of unique and overlapping taxa detected by each pair of primer sets). Primers targeting different genes showed no clear separation in the qualitative or quantitative plot, and there was considerable variation among primers targeting the same gene (although the primers targeting 12S regions showed lower variation in detected fish diversity than the 16S and Cytb primers). Primers targeting highly similar sequences showed similar fish diversities, both qualitatively and quantitatively (e.g. Pr01_12SV5 and Pr05_12SF1/R1; Pr11_Ve16S and Pr13_Vert-16S), similar diversities qualitatively but not quantitatively (Pr03_Tele02 and Pr04_MiFish-U) or dissimilar diversities, both qualitatively and quantitatively (Pr15_Fish2b and Pr16_Fish 2deg; Pr19_L14735c and Pr20_L14735c2).

Details are in the caption following the image
Non-metric multidimensional scaling (NMDS) plots showing similarities of fish communities detected using different fish primer sets in the in vitro metabarcoding analysis. Analyses were based on the matrix of the presence/absence-based Jaccard coefficient and the abundance-based Bray–Curtis coefficient of fish taxa

4 DISCUSSION

Given the critical importance of metabarcoding primers in eDNA-based biodiversity detection, a priori understanding of primer characteristics and bias is pivotal for an informed choice of barcode and interpretation of sequence data. However, despite the increasingly wide application of eDNA metabarcoding in fish community surveys of diverse ecosystems (e.g. Cilleros et al., 2019; Hänfling et al., 2016; Miya et al., 2015; Valentini et al., 2016), very little is known regarding the influence of primer choice on survey results, either qualitatively or quantitatively. Our study presents the most extensive evaluation of published fish metabarcoding primers to date, allowing direct comparisons between the amplification performance of 22 primer sets both in silico and in vitro.

4.1 Primer performance: Taxonomic specificity

A high taxonomic specificity for the target group is one of the crucial criteria when choosing metabarcoding primers. Inadequate taxonomic specificity for the target group would lead to excessive amplification of non-target sequences, causing swamping of the desired taxa and wasting of sequencing throughput (Collins et al., 2019). Our results reveal that almost all fish primers amplified sequences of non-fish organisms, some of which even primarily amplified these sequences (e.g. Pr10_L2513 and Pr13_Vert-16S in the in silico tests, and Pr22_Minibar in both the in silico and in vitro tests). Other vertebrates, including mammals, birds, reptiles and amphibians, often accounted for relatively large proportions of the in silico PCR products. Amplification of these groups may be undesirable in studies with a strict focus on fish communities, but can still provide valuable information regarding species diversity at the ecosystem level (Port et al., 2016).

Proportions of non-fish vertebrate sequences were generally reduced in the in vitro PCR products as compared with those in the in silico PCR products across primers. The only exception was human DNA, which was present in large quantities in the in vitro PCR products of several primer sets. We extracted eDNA from water samples collected in urban environments in Beijing, which inevitably receive various human DNA-containing discharges. Amplification of human sequences by metabarcoding primers from environmental samples has been observed previously (Kelly et al., 2014; Miya et al., 2015); hence, a number of general fish primers accompanied with a blocking oligo were designed to reduce human DNA amplification. Our in vitro PCR using these primer sets shows that the blocking oligos were highly effective in reducing the proportions of human sequences in amplicons (Figure 4). However, the addition of blocking oligos slightly reduced the number of detected fish taxa for some primers (52 vs. 50 for Pr01_12SV5 and 50 vs. 46 for Pr02_Tele01; Figure 5); therefore, apart from human DNA, blocking oligos may also inhibit primers from binding to or amplifying certain desired sequences. We suggest, therefore, for primers that primarily amplify target groups but also amplify a small portion of human DNA, increasing sequencing depth may be a preferred strategy to compensate for non-target amplification instead of adding blocking oligos.

4.2 Primer performance: Fish coverage and species resolution

A broad taxonomic coverage within the target group and a high species-level assigning power are prerequisites for the generation of comprehensive and accurate biodiversity data using metabarcoding primers. Regarding the barcode genes, 12S appears to be a highly effective target, since all seven 12S primer sets detected more than 45 fish taxa in eDNA metabarcoding analysis, and the top six primer sets that recovered the greatest numbers of fish taxa were all 12S primers (Figure 5). Among the 12S primer sets, Pr06_Ac12S and Pr07_AcMDB displayed the best performance in our study system in terms of the total fish taxa detected and species-level assignment. Pr04_MiFish-U and Pr03_Tele02 also showed outstanding detection of fish diversity. Pr01_12SV5 and Pr05_12SF1/R1 recovered slightly fewer taxa than the above four primer sets but still outperformed most other primers. The 16S primer sets showed considerable variation in their performance, detecting low to moderate numbers of fish taxa. Our results show that Cytb primers generally recovered fewer fish taxa than the 12S and 16S primers, indicating that they may be less effective for fish surveys in Beijing waters. Despite being the classic barcode gene (Hebert et al., 2003), COI appeared to be a less common target of fish metabarcoding primers than the other mitochondrial genes. Of the two general fish COI primer sets that we found in the literature, Pr22_Minibar amplified only a small number of fish sequences in silico and failed to recover any fish sequences in vitro. Pr21_PS1 detected 51 fish taxa in eDNA metabarcoding analysis, ranking seventh of all primers. Therefore, although the abundant COI sequences in reference databases are expected to offer a higher taxonomic coverage than that of other mitochondrial DNA regions, the available fish metabarcoding primers for COI did not appear to effectively amplify many local species. It has also been shown that COI primers recover fewer fish species than Pr04_MiFish-U using eDNA from freshwater and seawater samples collected around the English Channel and North Sea (Collins et al., 2019). The authors of this study suggest that the highly variable sequence within the standard COI barcode region may hinder the design of fish-specific degenerate primers for shorter amplicons, limiting the potential of COI in eDNA-based biodiversity detection.

Nevertheless, it should be noted that the community composition and complexity can be vastly different among geographic regions and different ecosystems, and primer performance in one study may not be completely transferable to another.

4.3 Comparison between in silico and in vitro PCR results

As discussed previously, in vitro PCR results may not mirror those of in silico PCR. In terms of taxonomic specificity, non-fish vertebrate sequences represented large proportions of amplicons in the in silico analysis of most primers, whereas their proportions were considerably smaller in the in vitro PCR products, most likely due to the limited abundance of non-fish vertebrate eDNA in water relative to fish eDNA. In terms of the total number of amplified fish taxa and species-level resolutions, the in vitro PCR results show both consistencies and discrepancies with those of the in silico PCR (comparing Figures 3 and 5a). For example, Pr14_FishCB, Pr15_Fish2b, Pr19_L14735c, Pr20_L14735c2 and Pr22_PS1 performed relatively poorly in both in silico and in vitro analyses; however, Pr08_Fish16S, Pr09_16SF/D and Pr13_Vert-16S, the three primer sets that amplified the most abundant fish sequences in silico, only detected low to medium levels of fish diversity in eDNA metabarcoding analysis. The causes for incongruence between the results of the in silico and in vitro analyses may be multifold, including the different taxonomic compositions between the reference databases and the actual biological community of the study system, the disparity between simulation conditions and primer binding thermodynamics in real PCR and the quality and quantity of eDNA templates in in vitro PCR. Since systematic in vitro analysis of a large number of primers is costly, time-consuming and not always attainable, in silico evaluation has been suggested as a useful alternative to bench testing. Our results suggest that in silico analysis can be used to provide initial, tentative assessments of primer specificity and taxonomic coverage; however, in vitro PCR evaluation is indispensable for the full understanding of amplification performance and the choice of reliable primers.

4.4 Primer performance: Barcode size

Due to the low quantity and degraded nature of eDNA, primers for short (typically <200 bp) barcode sequences are generally believed to offer greater amplification success. Examination of eDNA size distribution in water demonstrates that shorter mitochondrial DNA fragments were more abundant than longer ones (Bylemans et al., 2018), lending support for this notion. However, studies also show that longer eDNA fragments exist and can be amplified to provide more sequence information (Bylemans et al., 2018; Deiner et al., 2017). Nonetheless, few empirical studies have systematically investigated the effect of barcode size on metabarcoding outcome using actual environmental samples. The two best-performing primer sets in our metabarcoding analysis, Pr06_Ac12S and Pr07_AcMDB, both target longer fragments (approximately 300‒400 bp), yet provided the highest numbers of fish taxa and species-level assignments. Several other studies also used long (500‒650 bp) barcodes to successfully detect species from eDNA (Deiner et al., 2015; Egan et al., 2013). Long barcodes may suffer from reduced template concentrations but may offer greater taxonomic discriminatory power contained in longer sequences. Furthermore, biodiversity complexity of the study system and completeness of the reference databases can also complicate the effect of barcode size on taxonomic assignments, with longer barcodes being more capable of discerning closely related species but also being at increased risk of lacking reference sequences. Since reference databases are rapidly expanding, we expect that primer performance will be even less constrained by amplicon size than by DNA binding characteristics.

4.5 Recommendations

Given the strong impact of primers on metabarcoding results, a priori evaluation of primer performance is pivotal in estimating the potential influences introduced by primer bias and choosing suitable primers to generate a comprehensive and reliable biodiversity archive. Although in silico PCR can be employed for the initial assessment of primer coverage and specificity, in vitro PCR using eDNA templates should be performed to understand primer performance in an actual metabarcoding study. The suitability of primers is dependent upon the biodiversity composition of the group of interest and the study ecosystem. Since different metabarcoding primers may show divergent taxonomic ranges in amplification, multiple primer sets can be used in combination to increase taxonomic coverage and species detection probability (Evans et al., 2017; Miya et al., 2015; Shaw et al., 2016). For example, to gain a comprehensive overview of marine fish communities, primers that effectively amplify Chondrichthyes species from environmental samples should also be included, in addition to those targeting Actinopterygii species. Furthermore, efficient recovery of biodiversity and accurate taxonomic assignments from in silico evaluation and empirical metabarcoding applications rely on the completeness and sequence quality of the corresponding databases (Bylemans et al., 2018; Elbrecht & Leese, 2017). Hence, the construction of high-quality reference databases of local biological communities should be a priority in DNA-based biodiversity surveillance.

ACKNOWLEDGEMENTS

We wish to thank Yitao Zheng, Meixi Lin, Yiyan Wang, Qi Lu and Weiran Wang for their help with water sampling. Funding for the present research was provided by the Second Tibetan Plateau Scientific Expedition and Research Program (STEP) to J.Z. (Grant No. 2019QZKK0304) and to M.Y. (Grant No. 2019QZKK0503), the National Science and Technology Basic Resources Survey Program of China to M.Y. (2019FY101700), and the State Key Laboratory of Freshwater Ecology and Biotechnology (Grant No. 2020FB10) to M.Y.

    CONFLICT OF INTEREST

    None declared.

    AUTHORS' CONTRIBUTIONS

    M.Y. and S.Z. conceived and designed the study; S.Z. collected water samples, conducted the experiments, analysed the data and prepared the tables and figures with assistance of M.Y.; M.Y. wrote the manuscript with contributions from S.Z. and J.Z. All authors contributed to the critical assessment of the manuscript and approved publication.

    Peer Review

    The peer review history for this article is available at https://publons.com/publon/10.1111/2041-210X.13485.

    DATA AVAILABILITY STATEMENT

    Raw Illumina sequence data (fastq format) from in vitro metabarcoding analysis are available from NCBI's SRA database BioProject ID: PRJNA655901 (https://www.ncbi.nlm.nih.gov/sra/PRJNA655901). Interactive Krona plots and associated files of in silico PCR analyses and per sample sequence reads data of in vitro metabarcoding are available from the Dryad Digital Repository https://doi.org/10.5061/dryad.hdr7sqvfw (Zhang et al., 2020).