Primer Validation |
Corresponding author: Till-Hendrik Macher ( till-hendrik.macher@uni-due.de ) Academic editor: Emre Keskin
© 2023 Till-Hendrik Macher, Robin Schütz, Atakan Yildiz, Arne J. Beermann, Florian Leese.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Macher T-H, Schütz R, Yildiz A, Beermann AJ, Leese F (2023) Evaluating five primer pairs for environmental DNA metabarcoding of Central European fish species based on mock communities. Metabarcoding and Metagenomics 7: e103856. https://doi.org/10.3897/mbmg.7.103856
|
Environmental DNA (eDNA) metabarcoding has become a powerful tool for examining fish communities. Prior to the introduction of eDNA-based assessments into regulatory monitoring contexts (e.g., EU Water Framework Directive), there is a demand for methodological standardization. To ensure methodical accuracy and to meet regulatory standards, various sampling, laboratory and bioinformatic workflows have been established. However, a crucial prerequisite for comprehensive fish monitoring is the choice of suitable primer pairs to accurately identify the fishes present in a given water body. Various fish-specific primer pairs targeting different genetic marker regions were published over the past decade. However, a dedicated study to evaluate the performance of frequently applied fish primer pairs to assess Central European fish species has not yet been conducted. Therefore, we created an artificial 'mock' community composed of DNA from 45 Central European fish species and examined the detection ability and reproducibility of five primer pairs. Our study highlights the effect of primer choice and bioinformatic filtering on the outcome of eDNA metabarcoding results. From the five primer pairs evaluated in our study the tele02 (12S gene) primer pair was the best choice for eDNA metabarcoding of Central European freshwater fish. Also, the MiFish-U (12S) and SeaDNA-mid (COI) primer pairs displayed good detection ability and reproducibility. However, less specific primer pairs (i.e., targeting vertebrates) were found to be less reliable and generated high numbers of false-positive and false-negative detections. Our study illustrates how the careful selection of primer pairs and bioinformatic pipelines can make eDNA metabarcoding a more reliable tool for fish monitoring.
12S, 16S, biomonitoring, COI, eDNA
Environmental DNA (eDNA) metabarcoding has become a valuable tool for monitoring fish species in different habitats (
Mock community metabarcoding is an efficient in vitro approach to test the performance of primer pairs using an artificially composed DNA mixture representing the expected target community (
In this study, we addressed this issue and evaluated five commonly used fish eDNA metabarcoding primer pairs targeting three different barcode marker regions (12S, 16S, and cytochrome c oxidase subunit I gene) by testing their performance on an artificial community ('mock') composed of DNA from 45 Central European fish species. Specifically, we examined the detection ability and reproducibility of the five primer pairs, investigated their false-positive and false-negative detection rates, and investigated primer-specific biases. Finally, we conclude with a primer pair recommendation for eDNA metabarcoding approaches targeting fish in routine monitoring campaigns of the ichthyofauna in Central Europe.
Mucus samples of 66 specimens (45 species) were collected by fish bioassessment experts during electrofishing campaigns in autumn 2020 at five sites across Germany, covering both the Rhine and the Danube catchment. Each mucus sample was collected individually using sterile swabs (FLOQ Swab 80 mm, minitip, without medium, sterile sleeve; COPAN, Italy). All fish were handled as efficiently as possible outside the water to keep the stress to a minimum, while a sterile swab was moved across the specimens’ flank. Swabs were placed back into the sleeve without further preservative and sealed. After field work, samples were stored at 4 °C until delivery to the University of Duisburg-Essen. Upon arrival the swabs were stored at -20 °C overnight followed by DNA extraction the next day.
Swab tips were clipped off at the handle and placed in a sterile 1.5 mL Eppendorf tube before 1 mL TNES buffer and 15 µL Proteinase K (300 U/mL, 7BioScience, Neuenburg am Rhein, Germany) were added to the sample. Samples were incubated at 55 °C and shaken at 1000 rpm for 3 h on an Eppendorf ThermoMixer C (Eppendorf AG, Hamburg, Germany). Subsequently, DNA was extracted using an adapted NucleoMag tissue kit (Macherey Nagel, Düren, Germany; Suppl. material
Two fish mock communities were created using the extracted fish swab DNA. Both mock communities contained DNA of the same 45 fish species (Suppl. material
Both mock communities were assessed using five different published primer pairs (Table
In total, 60 1st-step PCR amplifications were conducted, including five replicates for each mock community (MC1 and MC2) and two negative PCR controls for each of the five primer pairs. The reaction volume was 25 µL, consisting of 12.5 µL Multiplex Mastermix (Qiagen Multiplex PCR Plus Kit, Qiagen, Hilden, Germany), 7 µL PCR-grade water, 2.5 µL CoralLoad dye, 0.5 µL forward primer, 0.5 µL reverse primer (10 µM each), and 2 µL of DNA template. The 1st-step PCR included following steps: 5 min 95 °C initial denaturation, followed by 10 cycles of 30 s at 95 °C, 90 s at decreasing annealing temperature (starting from annealing temperature +10 °C), and 30 s at 72 °C, followed by 25 cycles of 30 s at 95 °C, 90 s at the respective annealing temperature (see Table
In the 2nd-step PCR, Illumina sequencing adapters with a dual twin-indexing system were added (
Name | Gene | Primer pair | Forward sequence (5’-3’) | Reverse sequence (5’-3’) | Annealing temp. | Target length | Publication |
---|---|---|---|---|---|---|---|
tele02 | 12S | tele02_fw/tele02_rv | AAACTCGTGCCAGCCACC | GGGTATCTAATCCCAGTTTG | 52 °C | ~ 167 bp |
|
MiFish-U | 12S | MiFish-U_fw/MiFish-U_rv | GTCGGTAAAACTCGTGCCAGC | CATAGTGGGGTATCTAATCCCAGTTTG | 59 °C | ~ 170 bp |
|
SeaDNA-mid | COI | coi.175f/coi.345r | GGAGGCTTTGGMAAYTGRYT | TAGAGGRGGGTARACWGTYCA | 53 °C | ~ 130 bp |
|
12SV5 | 12S | 12S‐V5f/12S‐V5r | ACTGGGATTAGATACCCC | TAGAACAGGCTCCTCTAG | 52 °C | ~ 106 bp |
|
LH16S | 16S | L2513/H2714 | GCCTGTTTACCAAAAACATCA | CTCCATAGGGTCTTCTCGTCTT | 55 °C | ~ 220 bp |
|
Raw reads were received as demultiplexed fastq files. All samples were processed with the APSCALE-GUI pipeline v1.1.6 (
The taxonomic assignment of each OTU was filtered using APSCALE-GUI (Fig.
Analyses were performed using custom python scripts (Suppl. material
First, the relative read abundances (%) for the target species (i.e., fish) and non-target species (bycatch) present in the mock were calculated for each primer pair. Additionally, the relative OTU proportions of target and bycatch species were calculated. Also, the proportions of species-level OTUs assigned to the four flags (F1–F4) and supported species were calculated. All three analyses were displayed as bar charts.
Second, to assess each primer pair’s detection abilities, Venn diagrams comparing the detected species of each primer pair to the original fish mock community composition were created in TTT. Additional Venn diagrams were created to compare the pre-adjusted TaXon tables.
Third, the log transformed number of reads and the log transformed DNA concentration (ng/µL) were plotted and Pearson coefficients were calculated. Also, the log transformed number of reads per species of MC2 were plotted against the log transformed reads per species of MC1 and a Pearson coefficients was calculated.
Additionally, oversplitting rates (i.e., number of additional OTUs) were calculated for all species and each primer pair. Also, PCR replicates were investigated by calculating the mean, minimum, and maximum Jaccard index of all five technical replicates per primer pair.
To estimate the completeness of our fish mock community, we downloaded all freshwater fish species reported from Germany and their occurrence categories per fishbase.org (categories: “endemic”, “introduced”, “native”, “not established”, “questionable”, and “stray”).
To investigate the reference sequence coverage for the fish species present in the mock community, the three Midori2 databases used for taxonomic assignment were searched for the respective species and all their records were extracted into three separate fasta files per species (COI, lrRNA, and srRNA). Subsequently, cutadapt was used to search all four possible combinations (considering reverse complements) of each primer pair. An error rate of 0.3 (-e 3) was allowed and the primers were required to be linked (forward…reverse). Here, each detected barcode was only counted once (in the four possible combinations). Consequently, for each of the three markers used, the overall number of available reference sequences, the number of matches per primer pair, and its respective proportion were calculated for all fish species in the mock community. Since in some cases only the target fragment without the primer binding site was uploaded as reference sequence, cutadapt consequently reported false-negative results to a certain degree. Thus, we manually checked all cases for which at least one reference sequence was found per species and marker, but no primer pair match was observed.
According to the fishbase database, 123 freshwater fish species are reported from Germany. Here we manually added the round goby (Neogobius melanostomus) and the rainbow trout (Oncorhynchus mykiss), as they are both non-native species reported from Germany, but were not present in the fishbase list. Consequently, our fish mock community of 45 Central European freshwater fish species represents about 36.6% of fish reported from Germany (Suppl. material
DNA was successfully extracted from 66 swabs, yielding an average DNA concentration of 5.73 ng/µL, ranging from 0.05 ng/µL to 26.3 ng/µL (Suppl. material
After removal of bycatch taxa and curation of ambiguous taxonomic assignments, the 12SV5 primer pair (45) included most species, followed by LH16S (40), tele02 (39 species), MiFish-U (37), and SeaDNA-mid (36). In comparison to the original mock community fish species composition, the tele02 dataset showed the highest congruence (2 false-positive species, 37 true positive, and 8 false-negative), followed by the MiFish-U (2, 35, 10) and SeaDNA-mid (3, 33, 12). Both the 12SV5 (18, 27, 18) and LH16S primer pair (17, 23, 22) were less congruent to the original mock community composition (Fig.
A Overall number of fish species and the respective number of OTUs (in brackets) per family detected in the mock community for each primer pair. B Number of false-positive (n/) and false-negative (/n) fish species detections compared to the original fish mock community composition.
Comparison of the fish mock community species composition to the detected species with each primer pair for both the adjusted (large Venn diagrams) and the pre-adjusted datasets (small Venn diagrams). All species declared as false-positive detections are listed on the left-hand side of the respective Venn diagram.
As a measure of primer bias the standard deviation of relative read abundances was across primer pairs. Here the standard deviation varied between the primer pairs ranging from an average of < 0.01% (Barbatula barbatula, Leucaspius delineatus, Neogobius melanostomus, Phoxinus phoxinus, and Romanogobio albipinnatus) to a maximum of 7.5% (Pungitius pungitius; Table
Relative read abundances (%) for all detected fish and lamprey species of all five primer pairs, A) including all species present in the mock community (i.e., true positive species) and B) all non-target species (i.e., false-positive species). For each species the number of positive detections (occurrences) and the standard deviation (STDEV) were calculated.
True positive species | tele02 | MiFish-U | 12SV5 | SeaDNA-mid | LH16S | Occurrences | STDEV | False positive species | tele02 | MiFish-U | 12SV5 | SeaDNA-mid | LH16S | Occurrences | STDEV |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | B | ||||||||||||||
Anguilla anguilla | 0.103 | 0.095 | 0.119 | 0.002 | 0.147 | 5 | 0.1 | Leuciscus aspius | 0.512 | 0.556 | 0.091 | 0.016 | 0 | 4 | 0.3 |
Silurus glanis | 0.129 | 0.02 | 0.164 | 0.15 | 0.064 | 5 | 0.1 | Pungitius platygaster | 0 | 0 | 0.008 | 0 | 0.002 | 2 | 0.0 |
Barbus barbus | 0.306 | 0.341 | 0.544 | 0.775 | 0.635 | 5 | 0.2 | Umbra pygmaea | 0 | 0 | 1.002 | 0.005 | 0 | 2 | 0.7 |
Thymallus thymallus | 0.466 | 0.507 | 0.731 | 0.033 | 1.208 | 5 | 0.4 | Acrossocheilus monticola | 0 | 0 | 0.013 | 0 | 0 | 1 | |
Tinca tinca | 3.682 | 4.099 | 4.393 | 4.852 | 4.523 | 5 | 0.4 | Alburnoides freyhofi | 0 | 0 | 0 | 0 | 37.686 | 1 | |
Gymnocephalus cernua | 2.243 | 2.241 | 2.37 | 0.357 | 1.108 | 5 | 0.9 | Alburnus tarichi | 0 | 0 | 0 | 0 | 0.146 | 1 | |
Ctenopharyngodon idella | 1.15 | 0.153 | 1.39 | 2.576 | 0.667 | 5 | 0.9 | Aphyocypris moltrechti | 0 | 0 | 0 | 0 | 0.35 | 1 | |
Lota lota | 0.264 | 0.253 | 0.329 | 3.746 | 0.191 | 5 | 1.6 | Ballerus sapa | 0 | 0 | 0.922 | 0 | 0 | 1 | |
Gobio gobio | 0.179 | 0.192 | 0.24 | 3.816 | 0.141 | 5 | 1.6 | Brachymystax lenok | 0 | 0.005 | 0 | 0 | 0 | 1 | |
Rhodeus sericeus /amarus | 1.46 | 1.412 | 1.544 | 5.838 | 0.73 | 5 | 2.1 | Carassius auratus | 0 | 0 | 0.031 | 0 | 0 | 1 | |
Carassius carassius | 2.366 | 2.539 | 3.167 | 7.149 | 2.121 | 5 | 2.1 | Chondrostoma prespense | 0 | 0 | 2.083 | 0 | 0 | 1 | |
Rutilus rutilus | 2.502 | 2.558 | 0.353 | 1.746 | 6.3 | 5 | 2.2 | Chrosomus erythrogaster | 0 | 0 | 0.028 | 0 | 0 | 1 | |
Esox lucius | 6.232 | 5.7 | 5.131 | 0.049 | 0.432 | 5 | 3.0 | Cirrhinus microlepis | 0 | 0 | 0.017 | 0 | 0 | 1 | |
Perca fluviatilis | 1.485 | 1.321 | 12.65 | 0.18 | 1.174 | 5 | 5.2 | Cottus perifretum | 0 | 0 | 0 | 0 | 0.964 | 1 | |
Proterorhinus semilunaris | 0.267 | 0.007 | 0.007 | 7.917 | 11.719 | 5 | 5.5 | Dionda episcopa | 0 | 0 | 0.118 | 0 | 0 | 1 | |
Hucho hucho | 2.067 | 2.288 | 2.494 | 17.357 | 3.042 | 5 | 6.7 | Gymnocypris dobula | 0 | 0 | 0 | 0 | 0.05 | 1 | |
Pungitius pungitius | 0.491 | 0.58 | 0.677 | 17.491 | 0.839 | 5 | 7.5 | Labiobarbus leptocheilus | 0 | 0 | 0 | 0 | 0.02 | 1 | |
Phoxinus phoxinus | 0.003 | 0.002 | 0.003 | 0 | 0.005 | 4 | 0.0 | Lampetra planeri | 0 | 0 | 0.004 | 0 | 0 | 1 | |
Barbatula barbatula | 0.003 | 0.004 | 0 | 0.006 | 0.002 | 4 | 0.0 | Margariscus margarita | 0 | 0 | 0 | 0 | 0.24 | 1 | |
Oncorhynchus mykiss | 0.149 | 0.192 | 0.294 | 0.54 | 0 | 4 | 0.2 | Microphysogobio elongatus | 0 | 0 | 0.007 | 0 | 0 | 1 | |
Cyprinus carpio | 0.372 | 0.363 | 0.506 | 0.019 | 0 | 4 | 0.2 | Micropterus dolomieu | 0 | 0 | 0 | 0 | 0.284 | 1 | |
Gasterosteus aculeatus | 0.197 | 0.218 | 0 | 0.732 | 0.25 | 4 | 0.3 | Mylopharyngodon piceus | 0 | 0 | 0.057 | 0 | 0 | 1 | |
Squalius cephalus | 1.072 | 1.09 | 0.09 | 0.007 | 0 | 4 | 0.6 | Naso brachycentron | 0 | 0 | 0 | 0 | 0.687 | 1 | |
Pseudorasbora parva | 0.214 | 0.212 | 0 | 1.972 | 0.206 | 4 | 0.9 | Notemigonus crysoleucas | 0 | 0 | 0.159 | 0 | 0 | 1 | |
Chondrostoma nasus | 2.333 | 2.508 | 0.047 | 0.408 | 0 | 4 | 1.3 | Parahucho perryi | 0.007 | 0 | 0 | 0 | 0 | 1 | |
Blicca bjoerkna | 5.043 | 5.176 | 0.161 | 1.269 | 0 | 4 | 2.6 | Percocypris tchangi | 0 | 0 | 0.046 | 0 | 0 | 1 | |
Abramis brama | 19.839 | 19.776 | 25.78 | 17.748 | 0 | 4 | 3.5 | Pogonichthys macrolepidotus | 0 | 0 | 0 | 0 | 0.226 | 1 | |
Alburnus alburnus | 6.532 | 6.837 | 10.588 | 1.189 | 0 | 4 | 3.9 | Pseudorasbora interrupta | 0 | 0 | 0.393 | 0 | 0 | 1 | |
Sander lucioperca | 8.895 | 8.623 | 0 | 0.054 | 3.856 | 4 | 4.2 | Rutilus virgo | 0 | 0 | 0 | 0.522 | 0 | 1 | |
Romanogobio albipinnatus | 0.003 | 0.005 | 0.006 | 0 | 0 | 3 | 0.0 | Sander canadensis | 0 | 0 | 0 | 0 | 0.01 | 1 | |
Neogobius melanostomus | 0.003 | 0 | 0.004 | 0.026 | 0 | 3 | 0.0 | Sander vitreus | 0 | 0 | 0 | 0 | 0.059 | 1 | |
Salmo trutta | 0.01 | 0.01 | 0 | 0.2 | 0 | 3 | 0.1 | Squalidus argentatus | 0 | 0 | 0 | 0 | 0.034 | 1 | |
Misgurnus fossilis | 0.039 | 0.038 | 0 | 0.821 | 0 | 3 | 0.5 | Squalidus gracilis | 0 | 0 | 0.091 | 0 | 0 | 1 | |
Cottus rhenanus | 1.065 | 1.029 | 0 | 0.005 | 0 | 3 | 0.6 | Squaliobarbus curriculus | 0 | 0 | 0.071 | 0 | 0 | 1 | |
Leucaspius delineatus | 0.002 | 0.001 | 0 | 0 | 0 | 2 | 0.0 | Stichaeus punctatus | 0 | 0 | 0 | 0 | 14.238 | 1 | |
Zingel streber | 0 | 0 | 0 | 0.158 | 0.333 | 2 | 0.1 | Thymallus arcticus | 0 | 0 | 0 | 0 | 0.157 | 1 | |
Leuciscus idus | 0 | 0.131 | 0 | 0 | 0 | 1 | Xenocypris argentea | 0 | 0 | 0 | 0 | 0.047 | 1 | ||
Leuciscus leuciscus | 0.053 | 0 | 0 | 0 | 0 | 1 | |||||||||
Scardinius erythrophthalmus | 0.07 | 0 | 0 | 0 | 0 | 1 | |||||||||
Cottus gobio | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||
Gymnocephalus schraetser | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||
Lampetra fluviatilis | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||
Rutilus pigus | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||
Umbra krameri | 0 | 0 | 0 | 0 | 0 | 0 | |||||||||
Zingel zingel | 0 | 0 | 0 | 0 | 0 | 0 |
In total, 48 cases of oversplitting (in our case species with more than one OTU assigned) were observed (Suppl. material
PCR replicates were highly consistent for all investigated primer pairs. The 12SV5 primer pair showed the highest reproducibility (mean Jaccard similarity of 0.99), followed by LH16S (0.98), SeaDNA-mid (0.96), tele02 (0.96), and MiFish-U (0.95). No correlations between log transformed input DNA concentration (ng/µL) and log transformed reads of the second mock community (MC2) were found for all primer pairs (Pearson correlation between 0.12 and 0.16, p≥0.05; Suppl. material
The Midori2 v249 reference database assessment showed that for COI markers most reference sequences are available (2313), followed by the lrRNA (809), and srRNA (379) markers (Table
Assessment of the Midori2 reference database (v249) coverage for the species present in the fish mock community. The overall number of reference sequences per maker region (COI, lrRNA for 16S, and srRNA for 12S) and the number and percentage of matching reference sequences for each primer are shown.
Species | COI | lrRNA | srRNA | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
References | SeaDNA-mid | SeaDNA-mid (%) | References | LH16S | LH16S (%) | References | 12SV5 | 12SV5 (%) | MiFish-U | MiFish-U (%) | tele02 | tele02 (%) | |
Abramis brama | 34 | 34 | 100 | 6 | 6 | 100 | 7 | 6 | 85.71 | 6 | 85.71 | 7 | 100 |
Alburnus alburnus | 59 | 59 | 100 | 9 | 9 | 100 | 3 | 3 | 100 | 2 | 66.67 | 2 | 66.67 |
Anguilla anguilla | 169 | 168 | 99.41 | 165 | 146 | 88.48 | 26 | 25 | 96.15 | 23 | 88.46 | 24 | 92.31 |
Barbatula barbatula | 48 | 47 | 97.92 | 10 | 10 | 100 | 7 | 6 | 85.71 | 6 | 85.71 | 7 | 100 |
Barbus barbus | 25 | 25 | 100 | 4 | 4 | 100 | 4 | 3 | 75 | 2 | 50 | 3 | 75 |
Blicca bjoerkna | 25 | 25 | 100 | 9 | 9 | 100 | 3 | 3 | 100 | 3 | 100 | 3 | 100 |
Carassius carassius | 23 | 23 | 100 | 8 | 7 | 87.5 | 5 | 5 | 100 | 4 | 80 | 4 | 80 |
Chondrostoma nasus | 35 | 34 | 97.14 | 4 | 4 | 100 | 3 | 2 | 66.67 | 2 | 66.67 | 2 | 66.67 |
Cottus gobio | 27 | 27 | 100 | 5 | 4 | 80 | 2 | 2 | 100 | 2 | 100 | 2 | 100 |
Cottus rhenanus | 10 | 10 | 100 | 1 | 1 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Ctenopharyngodon idella | 57 | 56 | 98.25 | 23 | 23 | 100 | 12 | 9 | 75 | 11 | 91.67 | 12 | 100 |
Cyprinus carpio | 261 | 246 | 94.25 | 110 | 98 | 89.09 | 72 | 62 | 86.11 | 50 | 69.44 | 54 | 75 |
Esox lucius | 65 | 65 | 100 | 21 | 21 | 100 | 12 | 11 | 91.67 | 4 | 33.33 | 4 | 33.33 |
Gasterosteus aculeatus | 93 | 92 | 98.92 | 23 | 20 | 86.96 | 23 | 18 | 78.26 | 11 | 47.83 | 16 | 69.57 |
Gobio gobio | 26 | 26 | 100 | 11 | 10 | 90.91 | 4 | 3 | 75 | 3 | 75 | 4 | 100 |
Gymnocephalus cernua | 40 | 40 | 100 | 9 | 9 | 100 | 6 | 6 | 100 | 4 | 66.67 | 4 | 66.67 |
Gymnocephalus schraetser | 5 | 5 | 100 | 1 | 1 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Hucho hucho | 12 | 12 | 100 | 3 | 3 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Hucho taimen | 20 | 20 | 100 | 6 | 6 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Lampetra fluviatilis | 22 | 21 | 95.45 | 7 | 7 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Leucaspius delineatus | 25 | 24 | 96 | 6 | 6 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Leuciscus idus | 28 | 28 | 100 | 7 | 7 | 100 | 4 | 3 | 75 | 2 | 50 | 3 | 75 |
Leuciscus leuciscus | 59 | 57 | 96.61 | 29 | 29 | 100 | 3 | 1 | 33.33 | 1 | 33.33 | 3 | 100 |
Lota lota | 44 | 42 | 95.45 | 15 | 14 | 93.33 | 11 | 10 | 90.91 | 9 | 81.82 | 9 | 81.82 |
Misgurnus fossilis | 12 | 12 | 100 | 2 | 1 | 50 | 1 | 0 | 0 | 0 | 0 | 1 | 100 |
Neogobius melanostomus | 45 | 45 | 100 | 5 | 5 | 100 | 7 | 6 | 85.71 | 5 | 71.43 | 5 | 71.43 |
Oncorhynchus mykiss | 124 | 123 | 99.19 | 36 | 31 | 86.11 | 23 | 18 | 78.26 | 16 | 69.57 | 17 | 73.91 |
Perca fluviatilis | 62 | 62 | 100 | 35 | 35 | 100 | 13 | 11 | 84.62 | 7 | 53.85 | 8 | 61.54 |
Phoxinus phoxinus | 177 | 176 | 99.44 | 17 | 17 | 100 | 13 | 11 | 84.62 | 11 | 84.62 | 13 | 100 |
Oxyeleotris marmorata | 17 | 17 | 100 | 5 | 5 | 100 | 6 | 5 | 83.33 | 4 | 66.67 | 5 | 83.33 |
Pseudorasbora parva | 108 | 108 | 100 | 37 | 35 | 94.59 | 13 | 9 | 69.23 | 9 | 69.23 | 13 | 100 |
Pungitius pungitius | 61 | 56 | 91.8 | 40 | 40 | 100 | 21 | 20 | 95.24 | 15 | 71.43 | 16 | 76.19 |
Rhodeus amarus | 27 | 27 | 100 | 4 | 4 | 100 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Rhodeus sericeus | 10 | 10 | 100 | 1 | 1 | 100 | 3 | 2 | 66.67 | 1 | 33.33 | 2 | 66.67 |
Romanogobio albipinnatus | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 100 | 1 | 100 | 1 | 100 |
Rutilus pigus | 2 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Rutilus rutilus | 56 | 55 | 98.21 | 12 | 12 | 100 | 7 | 6 | 85.71 | 3 | 42.86 | 3 | 42.86 |
Salmo trutta | 143 | 142 | 99.3 | 57 | 52 | 91.23 | 19 | 17 | 89.47 | 14 | 73.68 | 16 | 84.21 |
Sander lucioperca | 28 | 28 | 100 | 7 | 7 | 100 | 5 | 3 | 60 | 5 | 100 | 5 | 100 |
Scardinius erythrophthalmus | 30 | 30 | 100 | 8 | 7 | 87.5 | 5 | 4 | 80 | 3 | 60 | 4 | 80 |
Silurus glanis | 25 | 25 | 100 | 6 | 6 | 100 | 2 | 2 | 100 | 2 | 100 | 2 | 100 |
Squalius cephalus | 62 | 60 | 96.77 | 8 | 8 | 100 | 4 | 4 | 100 | 3 | 75 | 3 | 75 |
Thymallus thymallus | 49 | 49 | 100 | 22 | 20 | 90.91 | 16 | 15 | 93.75 | 15 | 93.75 | 15 | 93.75 |
Tinca tinca | 48 | 47 | 97.92 | 9 | 9 | 100 | 6 | 5 | 83.33 | 4 | 66.67 | 5 | 83.33 |
Umbra krameri | 2 | 2 | 100 | 1 | 1 | 100 | 1 | 1 | 100 | 0 | 0 | 0 | 0 |
Zingel streber | 9 | 9 | 100 | 3 | 3 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Zingel zingel | 4 | 4 | 100 | 2 | 2 | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
Our primer evaluation based on mock communities of 45 European freshwater fish species confirmed the previously reported high detection ability for two primer pairs (MiFish-U and tele02) belonging to the MiFish primer group (
Overall, all primer pairs generated highly reproducible taxa lists among the PCR replicates for the mock fish communities. However, this might not be achieved for environmental samples in which a lower reproducibility is expected. Therefore, sufficient field and laboratory replicates to maximize species detection and minimize stochastic sampling effects are recommended (
Independent fish mucus samples used for DNA extraction are likely to contain different proportions of target fish species DNA and other, non-target DNA (e.g., of microbes). Solely for this reason, differences in relative read abundances between the 45 species of the used mock community were already to be expected. In contrast, different read proportions between primer pairs within one species were unexpected, as all used marker regions are mitochondrial and hence present in an equal copy number. However, several species showed overproportional read abundances for one of the used primer pairs, such as Perca fluviatilis (12SV5: 12.65% to an average of other primer pairs of 1.04%), Hucho hucho (SeaDNA-mid: 17.36% to 2.5%), or Pungitius pungitius (SeaDNA-mid: 17.49 to 0.65%). The observed differences in read proportion hint at different primer binding efficiencies for the tested primer pairs and species present in the mock community. As a consequence, the primer choice can have direct implications when using read abundances as a proxy for fish biomass (
The Midori2 database is a curated version of the larger GenBank database and can be used as a reliable source for taxonomic assignment of fish OTUs. All species present in the mock community have reference sequences available for at least one genetic marker. However, seven species were not detected at all.
Amongst these was Cottus gobio, a common fish species in Central Europe for which 34 reference sequences comprising all three investigated markers are deposited in Midori2 v249. Although a taxonomic assignment was possible, no primer pair detected C. gobio in the mock communities. Since this species is frequently detected with eDNA metabarcoding from various sites and samples (Macher et al. unpublished data; tele02 primer pair), we cannot exclude the possibility that the C. gobio sample itself was the reason for the false-negative detection, as it might not have contained C. gobio DNA in sufficient concentration or due to sampling or laboratory errors, such as specimen misidentification, a swab inaccurately taken or DNA degradation.
The striped ruffe (Gymnocephalus schraetser) only has 13 reference sequences available in the Midori2 database, none of which is a 12S sequence. Consequently, the lack of reference for the 12S marker prevents a species level assignment for the tele02, MiFish-U, and 12SV5 primer pairs. However, all 12S primer pairs included OTUs assigned to Gymnocephalus that were trimmed to genus level due to low reference similarity threshold (< 97%). While no primer pair was able to detect G. schraetser, the SeaDNA-mid COI primer contained one ambiguous OTU assigned to G. schraetser/cernua. Thus, it remains unclear if the stripped ruffe can be distinguished from G. cernua, using eDNA primer pairs.
Furthermore, various species are known to be indistinguishable with the short target fragment lengths used for eDNA metabarcoding. Particularly the two common lamprey species Lampetra fluviatilis and L. planeri could not be distinguished with any of the used primer pairs. The species status of these two ‘sister species’ has puzzled scientists for decades and while a genome-wide divergence can be observed (
The zingel (Zingel zingel) was not detected by any primer pair despite the availability of COI and sRNA reference sequences in the Midori2 database. The closely related Danube zingel (Zingel streber) has various COI and 16S reference sequence available and was detected by the SeaDNA-mid and LH16S primer pair. Thus, the most likely explanation for the absence of Zingel zingel is errors in sampling or laboratory handling that led to the sample failure.
In several instances, the distinction between true-positive, false-positive, and false-negative detections were very narrow. For several species, we observed misidentification with closely related species, which resulted in false-positive and false-negative assignments in single cases. For example, a species that was not detected by any primer pair is Rutilus pigus, the Danube roach. This species is closely related to the cactus roach (R. virgo) which was once considered a subspecies (Rutilus pigus subsp. virgo (Heckel, 1852)) and occurs in the same habitats. However, since molecular data showed that R. pigus and R. virgo are separate species (
For the European mudminnow (Umbra krameri), only four reference sequences (for 12S, COI, 16S) are available in the Midori2 v249 database and it was not detected by any primer pair in our study. According to the Midori2 database assessment the SeaDNA-mid, LH16S, and 12SV5 primer pairs could have detected U. krameri, as suitable reference sequence are available. Here, the SeaDNA-mid and 12SV5 primer pairs false-positively detected the closely related species Umbra pygmaea and the teleo2, MiFish-U, and LH16S detected Umbra limi/pygmaea. Both U. limi (Central mudminnow) and U. pygmaea (Eastern mudminnow) are native to North America, and particularly the latter has been introduced to Western and Central Europe. One explanation for the incorrect assignments could be a misidentification of the specimen from which the mucus sample was taken. If so, the specimen identified as European mudminnow was truly an invasive Eastern mudminnow. This case should be further investigated since the European mudminnow is listed as ‘vulnerable’ (IUCN Red List of Threatened Species in 2010) and should ideally be distinguishable from the invasive Eastern mudminnow with eDNA metabarcoding.
Furthermore, we observed several cases of “difficult” taxonomic assignments. Here, particularly OTUs assigned to the genera Hucho, Sander and Leuciscus caused ambiguities. The Danube salmon (Hucho hucho) was initially only detected by the SeaDNA-mid and LH16S primer pairs. The three 12S primer pairs faced ambiguities caused by hits to the Sichuan taimen (Hucho bleekeri) and the Siberian taimen (Hucho taimen), which all share identical 12S sequences. However, since the Danube salmon is the only present species of the genus Hucho in Central Europe, H. bleekeri and H. taimen were ruled out for the tele02, MiFish-U and 12SV5 primer pairs. Similarly, the pikeperch (Sander lucioperca) is geographically clearly separated from the sauger (S. canadensis), but the two species are not genetically distinguishable with the investigated markers, leading to flag 2 ambiguities (“Two species, one genus”). In this case, however, based on the current distribution ranges, one can account for this ambiguity, similarly to the Danube salmon. Nevertheless, if one of the Hucho or Sander species were to be introduced to Central Europe, not all primer pairs could distinguish the native species, which could be of concern for invasive species monitoring. The common dace (Leuciscus leuciscus) and ide (L. idus), however, are highly prone to causing flag 1 ambiguities. This can be caused by several reasons: for instance, species of the family Leuciscidae are known to commonly hybridize, such as the bleak (Alburnus alburnus) and chub (Leuciscus cephalus) (
While in this study we used the Midori2 database, which is a curated version of the Genbank database, another widely used reference library for mitochondrial sequences is the MitoFish database (
The detection of false-positives is of particular concern since it drastically reduces the robustness of taxa lists. Particularly the less specific vertebrate primer pairs were prone to produce comparably high numbers of false-positive assignments. Here, 12SV5 and LH16S were the only datasets that included marine fish taxa, which were not present in the mock community of Central European freshwater fish. Since no marine samples have been processed in this laboratory, cross-contaminations can be ruled out. The most likely explanation for these false-positive assignments is the placement of target fragments in conserved regions to amplify a broader taxonomic range (e.g., vertebrates). However, this will ultimately decrease the taxonomic resolution for specific taxa within that group (e.g., fish species). For the here investigated primer pairs most likely the short fragment length (12SV5 primer pair; 106 bp) or the fragment location for the LH16S primer pair the number of substitutes is too low for reliable fish identification.
Furthermore, incorrect assignments of closely related species were observed for the less specific vertebrate primer pairs 12SV5 and LH16S. These included the Asian Chondrostoma prespense instead of C. nasus, the North American Thymallus arcticus instead of T. thymallus, or Pungitius platygaster instead of P. pungitius. Again, the conserved regions amplified by the 12SV5 and LH16S primer pairs could have led to these false-positive assignments. Particularly phylogenetically ‘young’ species that have not been separated long and e.g., share mitochondrial haplotypes (
However, also the tele02, MiFish-U and SeaDNA-mid primer pairs showed false-positive assignments. Even though the asp (Leuciscus aspius) was not included in the mock community, it was detected by all three primer pairs. Since it was consistently detected by the tele02 (2 OTUs, 98% similarity to reference sequence, 8578 reads, 10/10 samples), MiFish-U (2 OTUs, 98%, 7246 reads, 10/10 samples), and the SeaDNA primer pair (1 OTU, 100%, 156 reads, 9/10 samples), the most likely explanation for the detection of L. aspius is a misidentification during sampling (e.g., another closely related cyprinid species). Another explanation is that the DNA of one species can be found in the mucus of another species’ mucus, which could potentially also contain eDNA traces from other fish that were present during sampling. Another case of false-positive detection is the Japanese huchen (Parahucho perryi), which was detected in low read abundances by the tele02 primer pair (1 OTU, 98%, 114 reads, 9/10 samples). The Japanese huchen is not recorded from Central Europe but is related to both the huchen (Hucho hucho) and brown trout (Salmo trutta), which were both present in the mock community. The most likely explanation is that this false-positive assignment originates from huchen or brown trout DNA that is amplified by the tele02 primer pair followed by misassignment. The low read abundance observed in this dataset and its occurrence in combination with the brown trout in other eDNA metabarcoding datasets using the tele02 primer pair (Macher et al. unpublished data) hints towards a systematically false-positive detection of the Japanese huchen in the presence of the brown trout. A similar case is the detection of the Asian sharp-snouted lenok (Brachymystax lenok) with the MiFish-U primer pair, which is a salmonoid species related to trouts.
While most ambiguous taxonomic assignments and false-positive detections can be easily corrected using further information (e.g., species distribution), primer pairs that are not prone to false-positive assignments, such as the tele02, MiFish-U and the SeaDNA-mid primer pairs, are to be preferred over the less specific 12SV5 and LH16S primer pairs when investigating fish communities based on eDNA metabarcoding.
In conclusion, our study highlights how the choice of primer has a major effect on the outcome of eDNA metabarcoding analysis. Among the investigated primer pairs, the tele02 primer pair was the best choice for eDNA metabarcoding of Central European freshwater fish, showing the highest detection ability and good reproducibility with the fewest false-positive and false-negative detections. We also observed that gaps in reference libraries can still lead to false-negative detections and thus should be addressed. Through careful selection of the primer pair, laboratory protocol, and bioinformatic pipeline, eDNA metabarcoding is becoming an increasingly reliable tool for fish monitoring.
The collection of mucus samples is not categorized as an animal experiment and did not require further authorisation. All sampling events were coordinated with local authorities. Fish specimens were solely caught during sampling events for monitoring campaigns and were handled by experts.
We thank Christoph Feick (LFU Bayern), Falko Wagner (IGF Jena), Christine Mosch (LAVES Niedersachsen), Franziska Neumann (LUNG MV), Gunnar Jacobs (EGLV) and all the collectors involved in the collection of the fish mucus samples. We thank all leeselab members who participated in the journal club and provided valuable feedback on the manuscript.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This study was conducted as part of the GeDNA project, funded by the German Federal Environment Agency (Umweltbundesamt, FKZ 3719 24 2040).
Till-Hendrik Macher: Conceptualization, Methodology, Formal analysis, Investigation, Visualization, Writing – original draft, Writing – review & editing; Robin Schütz: Conceptualization, Methodology, Formal analysis, Investigation, Writing – original draft, Writing – review & editing; Atakan Yildiz: Methodology, Writing – review & editing; Arne J. Beermann: Conceptualization, Validation, Supervision, Writing – review & editing; Florian Leese: Conceptualization, Resources, Supervision, Project administration, Funding acquisition, Writing – review & editing.
Till-Hendrik Macher https://orcid.org/0000-0001-6164-9557
Robin Schütz https://orcid.org/0000-0002-4349-6697
Atakan Yildiz https://orcid.org/0009-0007-0333-9877
Arne J. Beermann https://orcid.org/0000-0003-0403-0322
Florian Leese https://orcid.org/0000-0002-5465-913X
The raw data were deposited at the European Nucleotide Archive (https://www.ebi.ac.uk/ena/browser/home) under the accession number PRJEB60937.
Pairwise comparison of the log-transformed reads of the non-normalized mock community (MC1) compared to the DNA concentration (ng/ul) of each species
Data type: png
Pairwise comparison of the log-transformed reads of the non-normalized mock community (MC1) compared to log-transformed reads of the normalized mock community (MC2) of each species
Data type: png
Sampled specimens and their respective species assignment collected for the fish mock community, extraction date, collection site, and concentration after DNA extraction
Data type: xlsx
List of all species reported from Germany, their occurrence status, and their presence in the mock community (data from fishbase.org)
Data type: xlsx
List of all ambiguous assignments
Data type: xlsx
List of over splitting rates per primer pair for each detected species
Data type: xlsx
Protocol for the adapted NucleoMag Tissue Kit
Data type: docx
Unmodified TaXon tables of each primer pair
Data type: zip
Processed TaXon tables of each primer pair (subtracted negative controls and filtered for fish and lamprey taxa OTUs)
Data type: zip
Processed and manually curated TaXon tables of each primer pair
Data type: zip
Python scripts used in this study
Data type: py