Methods |
Corresponding author: Barbara R. Leite ( barbaradrl.bio@gmail.com ) Corresponding author: Filipe O. Costa ( fcosta@bio.uminho.pt ) Academic editor: Owen S. Wangensteen
© 2021 Barbara R. Leite, Pedro E. Vieira, Jesús S. Troncoso, Filipe O. Costa.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Leite BR, Vieira PE, Troncoso JS, Costa FO (2021) Comparing species detection success between molecular markers in DNA metabarcoding of coastal macroinvertebrates. Metabarcoding and Metagenomics 5: e70063. https://doi.org/10.3897/mbmg.5.70063
|
DNA metabarcoding has great potential to improve marine biomonitoring programs by providing a rapid and accurate assessment of species composition in zoobenthic communities. However, some methodological improvements are still required, especially regarding failed detections, primers efficiency and incompleteness of databases. Here we assessed the efficiency of two different marker loci (COI and 18S) and three primer pairs in marine species detection through DNA metabarcoding of the macrozoobenthic communities colonizing three types of artificial substrates (slate, PVC and granite), sampled between 3 and 15 months of deployment. To accurately compare detection success between markers, we also compared the representativeness of the detected species in public databases and revised the reliability of the taxonomic assignments. Globally, we recorded extensive complementarity in the species detected by each marker, with 69% of the species exclusively detected by either 18S or COI. Individually, each of the three primer pairs recovered, at most, 52% of all species detected on the samples, showing also different abilities to amplify specific taxonomic groups. Most of the detected species have reliable reference sequences in their respective databases (82% for COI and 72% for 18S), meaning that when a species was detected by one marker and not by the other, it was most likely due to faulty amplification, and not by lack of matching sequences in the database. Overall, results showed the impact of marker and primer applied on species detection ability and indicated that, currently, if only a single marker or primer pair is employed in marine zoobenthos metabarcoding, a fair portion of the diversity may be overlooked.
COI, 18S, DNA metabarcoding, marine macrozoobenthic diversity, primer efficiency, taxonomic discrimination
DNA metabarcoding is the taxonomic identification of organisms present in a bulk or environmental sample through the use of DNA amplification of standard regions of a genome (i.e. DNA barcodes) coupled with high-throughput sequencing (HTS) (
Metabarcoding allows for comparison across studies, however the harmonization and standardization of protocols is still far from being established (
Targeting marine species is especially challenging due to the broad taxonomic and phylogenetic composition of marine communities, and the choice of marker usually depends on the target taxa. However, the balance between the range of taxonomic coverage and the taxonomic discrimination ability should be considered in the choice of target genomic region and/or primer pairs, since it may affect the number of species and the taxonomic groups detected (
PCR-based methodologies are highly influenced by amplification biases thereby encouraging the use of several primer pairs in different metabarcoding studies (
For DNA metabarcoding studies, multiple sets of primers amplifying different molecular markers have been used to target a broad range of taxonomic groups in different marine communities. However, the majority of studies used a single primer pair or single marker loci strategy (
Although DNA metabarcoding studies aim to provide species level assignments (
Considering the importance of choice of marker and primer to improve taxonomic coverage and discrimination of DNA metabarcoding, we investigated the impact of these factors on the composition and structure of marine macrozoobenthic communities. We selected two different primer pairs targeting COI and one targeting 18S rDNA V4 region, to compare their ability to detect macroinvertebrates at species level and to evaluate the benefits of the use of two molecular markers on species recovery success. We also conducted an assessment of the availability of reference sequences for all species detected in the study, in order to compare the representativeness of species, identifying the existence of gaps in both databases (BOLD for COI and SILVA for 18S rRNA gene) and attempting to infer the reasons for failed detections.
This study was conducted in Ría de Vigo, a semi-enclosed heavily human populated bay on the NW coast of Spain, constituted by important busy harbours and consequently affected by several human activities (e.g. sewage runoff or harvesting) (
In December 2016, four replicates (flat panels 10 × 10 cm) of three different types of artificial substrates – slate, polyvinyl chloride (PVC) and granite – were randomly deployed on the dock of Toralla Island (42°12'2.267"N, 8°48'4.187"W) approximately 1.5 m of depth (Suppl. material
We extracted the DNA from the bulk biomass using DNA extraction procedures adapted from
The production of amplicon libraries and the HTS were carried out at Genoinseq (Cantanhede, Portugal). A preliminary assessment of primer amplification efficiency of COI was conducted in a subset of samples to test four primer pairs that have been previously used in DNA metabarcoding studies (more details in Suppl. material
Primer pairs and respective thermal cycling conditions used in this study to amplify marine macroinvertebrate communities. F – forward; R – reverse; bp – base pairs.
Primer combinations and length | Direction (5’-3’) | Reference | PCR thermal cycling conditions | |
---|---|---|---|---|
COI | LCO1490/Ill_C_R (325 bp) | (F) GGTCAACAAATCATAAAGATATTGG |
|
(1) 94 °C (5 min); (2) 35 cycles: 94 °C (30 s), 52 °C (90 s), 68 °C (60 s); (3) 68 °C (10 min). |
(R) GGIGGRTAIACIGTTCAICC |
|
|||
mlCOIintF/LoboR1 (313 bp) | (F) GGWACWGGWTGAACWGTWTAYCCYCC |
|
(1) 95 °C (3 min); (2) 35 cycles: 98 °C (20 s), 60 °C (30 s), 72 °C (30 s); (3) 72 °C (5 min). | |
(R) TAAACYTCWGGRTGWCCRAARAAYCA |
|
|||
18S | TAReuk454FWD1/TAReukREV3 (400 bp) | (F) CCAGCASCYGCGGTAATTCC |
|
(1) 95 °C (3 min); (2) 10 cycles: 98 °C (20 s), 57 °C (30 s), 72 °C (30 s); (3) 25 cycles: 98 °C (20 s), 47 °C (30 s), 72 °C (30 s); (4) 72 °C (5 min). |
(R) ACTTTCGTTCTTGATYRA |
PCR reactions were performed using KAPA HIFI HotStart PCR Kit for the COI primer pair without inosines (mlCOIintF/LoboR1) and for the 18S V4 region primer. PCR amplification reactions contained 0.3 µM of each primer and 50 ng of template DNA in the case of COI amplification and 12.5 ng for 18S V4 amplification, in a total volume of 25 µL. For the other COI primer pair (LCO1490/Ill_C_R), PCR reactions were performed using 1× Advantage 2 Polymerase Mix (Clontech, Mountain View, CA, USA), 0.2 µM of each PCR primer and 25 ng of DNA template in a total volume of 25 µL. Second PCR reactions added indexes and sequencing adapters to both ends of the amplified target region (MiSeq Reagent Kit v3 – 600-cycle) according to manufacturer’s recommendations (
Negative and positive controls were included in PCR amplification. As positive controls, we used a DNA extract previously tested successfully for PCR. Success of PCR amplification was checked by electrophoresis. No amplification was detected in any of the negative controls from DNA extraction or PCR.
Amplification failed with the primer mlCOIintF/LoboR1 in the sample of mobile fauna of the granite substrate after 3 months of deployment and, consequently, was not considered for further analysis.
Raw reads in fastq format generated by MiSeq sequencing were quality-filtered with PRINSEQ version 0.20.4 (
The usable reads were then processed in two pipelines of public databases: a) COI reads were submitted to mBrave – Multiplex Barcode Research and Visualization Environment (www.mbrave.net;
In mBrave, since primer sequences were previously removed in mothur, only the trimming by length was applied (maximum 313 bp for mlCOIintF/LoboR1 and 325 bp for LCO1490/Ill_C_R; minimum 150 bp) and only reads with minimum quality value (QV) higher than 10 were kept. This filtering step allowed for a max of 25% nucleotides with <20 QV value and max 25% nucleotides with <10 QV value. Reads were then taxonomically assigned at species level using a 97% similarity threshold against BOLD database that includes several publicly available reference libraries for marine invertebrates of the northeast Atlantic (e.g.
Output fasta files produced in mothur for the 18S marker were then processed by the amplicon analysis pipeline of the SILVA project (SILVAngs 1.4;
For both markers, only reads with match at species level were used for further analysis, and taxonomic assignments with less than 8 sequences were discarded. Any read that matched to non-metazoan was also excluded. The validity of the species names was verified in the World Register of Marine Species (WoRMS) database (
Incongruences in genetic databases are an ongoing problem that can affect taxonomic assignments (
We then assessed the presence of representative sequences of all the species detected in the present study in BOLD and SILVA. Failed detection by one marker may simply have occurred because that particular species was not present in the respective reference database. However, if a species was present in both databases, but was only detected by one marker, that would be an indication of probable PCR amplification failure of the marker that failed detection. All the available COI sequences matching the detected species names were mined from BOLD using BAGS (
The proportion of species with overlapping or exclusive detections by each primer pair and marker was determined for all substrates and sampling time combinations, using Venn diagrams (http://www.venndiagrams.net/). For each primer pair the distribution of species among high-rank taxonomic groups (e.g. order or phyla) was displayed through barplots (GraphPad Software, Inc.).
To identify clusters of data objects (species level identifications) in the dataset, the unsupervised machine learning k-means was applied, in the presence/absence matrix of the global species detected by primer pairs and markers, which groups the data without prior categories. The optimal number of clusters was determined with the elbow (fviz_nbclust, method = “wss” function) and silhouette (silhouette function) analysis. The analyses were performed (kmeans function) and visualized (fviz_cluster function) in R with the packages “cluster” (
High-throughput sequencing of marine macroinvertebrate samples, for both markers and three primer pairs, generated a total of 2,956,328 raw reads. Following bioinformatic processing, a total of 2,356,818 reads were retained (Table
Total number of sequences generated in Illumina MiSeq high-throughput sequencing (raw reads), retained along processing steps of the bioinformatics pipeline (primers removal, demultiplex and quality filter), and assigned to taxonomic groups for each primer pair (mlCOIintF/LoboR1; LCO1490/Ill_C_R; TAReuk454FWD1/TAReukREV3).
Primer pairs | ||||||
---|---|---|---|---|---|---|
mlCOIintF/LoboR1 | LCO1490/Ill_C_R | TAReuk454FWD1/TAReukREV3 | ||||
Raw reads | 1110851 | 100,00% | 945639 | 100,00% | 899838 | 100,00% |
First quality-filter* | 953733 | 85,86% | 808234 | 85,47% | 594851 | 66,11% |
After filtering** | 953704 | 85,85% | 798645 | 84,46% | 581220 | 64,59% |
Usable sequences*** | 869015 | 78,23% | 587794 | 62,16% | 411782 | 45,76% |
Metazoa | 655097 | 68,69% | 579857 | 61,32% | 218416 | 24,27% |
No taxonomic match**** | 41641 | 3,75% | 99367 | 10,51% | 193366 | 21,49% |
<8 sequences***** | 287 | 0,03% | 281 | 0,03% | 986 | 0,11% |
Species level taxonomic assignment | 613169 | 55,20% | 480209 | 50,78% | 217430 | 24,16% |
From the three types of artificial substrates sampled at four different deployment periods (12 samples), the three primer pairs were able to identify a total of 161 species, distributed by 9 taxonomic groups: Annelida, Bryozoa, Crustacea, Echinodermata, Hydrozoa, Mollusca, Nemertea, Platyhelminthes and Tunicata (species names and the associated taxonomic group displayed in Suppl. material
The applied primers also differed in their efficiency to recover particular taxonomic groups (Fig.
Taxonomic profile of the marine macroinvertebrate species detected in the substrates by each primer pair.
A higher species richness was detected consistently at seven months for all primer pairs in the three substrates. The species detected by each primer pair, and also the taxonomic groups, were different between primers and markers in the four sampling times and between substrates (Suppl. material
If the combined number of detected species by the two COI primer pairs is used, the 18S V4 region retrieved less taxa than COI (77 species vs 107 species, respectively). Both elbow and silhouette analysis retrieved two as the optimal number of k (i.e. two clusters; Suppl. material
Best fitting number of clusters (k=2) using the unsupervised machine learning k-means for the combined identifications of marine macrozoobenthic species using both COI-primers (COI – red) and the 18S-primer (18S – blue), in the three sampling times (3M – 3 months; 7M – 7 months; 10M – 10 months; 15M – 15 months) and the three artificial substrates (Sla – slate; PVC; Gra – granite).
The two molecular markers and the three primer pairs used were highly complementary in their ability to detect marine macroinvertebrate species (Fig.
Considering a total of 161 marine macroinvertebrate species detected combining together the results of 18S V4 and the two COI primers, we evaluated the taxonomic coverage in the respective databases, namely mBrave for COI and SILVA for 18S V4. As much as 18% of the species still lack representative sequences of COI and 28% of the V4 region of the 18S rRNA gene (Fig.
Availability of reference sequences of COI and 18S V4 for each taxonomic group of marine macroinvertebrate species detected with the three primer pairs from COI and 18S genes. Barcode coverage with at least one sequence per species (black bar).
Since taxonomic assignment can be affected by incongruences in genetic databases, a manual inspection of the taxonomic assignments may be advisable to more accurately compare results. Overall, the fair majority of the species assignments (94%) appear to have a high level of certainty (Suppl. material
DNA metabarcoding-based biomonitoring of aquatic communities would benefit from the establishment of standardized approaches (
Together, the three primer pairs used in this study enabled the detection of a fair number of macrozoobenthic species. Quantitatively, the 18S V4 region and the two COI primer pairs displayed similar ability to detect marine macroinvertebrate species. However, considering the high complementarity in the species recovered between these markers, the choice of the best performing primer was not obvious. Similar COI – 18S comparisons described in the literature report somewhat distinct and even contradictory results. For example, DNA metabarcoding studies using mock zooplankton communities demonstrated different taxonomic recovery ability between COI and 18S: whereas one of the studies detected similar patterns of species detection ability among markers (
The significant complementarity observed between the two molecular markers, with each single marker capturing at the very best approximately 66% of the detected species in a sample, revealed that, by using only one marker, a fair portion of the marine macroinvertebrate species may fail detection. Both markers detected different communities, raising high concerns for monitoring studies, since the biodiversity detected will be different and many species may be overlooked due to methodological steps only. For example, while isopods were only detected by COI, platyhelminthes and tunicates were exclusively detected by 18S V4. Although few metabarcoding studies compared the performance of molecular markers on species recovery (
Regarding the two primer pairs from the COI barcode region, although we observed a similar number of species globally detected by each primer (84 vs 63 species), they diverge qualitatively in the species detected (41% of the species exclusively detected with mlCOIintF/LoboR and 21% in LCO1490/Ill_C_R), hence this should be the main criteria to consider in order to maximize the scope of species detection. The efficiency of different COI-primers in macroinvertebrates assessment has been already compared in previous studies (
The three primer pairs used in this study were able to detect marine macroinvertebrate species in every sampling time, all of them consistently pointing to a higher species diversity after seven months of deployment of the substrates. These results highlight the benefit of the application of a multiple primer pair and multi-locus strategy for ecological assessments of marine species, since if we had only used one primer pair or marker we would have failed to detect important macrobenthic taxa, and the taxonomic composition of the community could emerge substantially different. Temporal and seasonal changes in a community could affect the potential of DNA-based species monitoring, especially when methodological bias originated by amplification procedures (choice of marker loci and primer pairs) could influence ecological interpretations (
We performed an assessment of the availability of representative sequences for all species detected in this study in each of the reference databases employed, namely BOLD and SILVA, respectively for COI and 18S V4 markers. This enabled us to verify if the detection of a species by one marker, and not by the other, could be attributed to gaps in the library of the latter or, if no gap was found, it could be ascribed to faulty amplification. A sizable but minor proportion of gaps was recorded for both markers (18% for COI and 28% for 18S V4). The incompleteness, and possible inaccuracies of databases may explain some of the species detected exclusively by one marker, as for example, the flatworm Vorticeros auriculatum, a species detected by 18S V4 that does not have representatives in BOLD. On other hand, some of the detected species with reference sequences in both databases were only detected by one marker (e.g. the tunicate Asterocarpa humilis, undetected with COI despite having representative sequences in BOLD). Considering the complete 18S rRNA gene, the target region we selected should not be the main reason for failed detections, since V4 is reported to have high amplification success (
Globally, these results highlight the influence of marker and primer pair complementarity on the ability to record marine macrozoobenthic species through metabarcoding. For future high-throughput assessments using DNA metabarcoding approaches, we recommend combining molecular markers and, if possible, multiple primer pairs, to increase the power of species detections and the accuracy of biodiversity assessments, thereby yielding more comprehensive and reliable results for marine macroinvertebrate monitoring.
Conceptualization: B.R.L., J.S.T. and F.O.C.; Methodology: B.R.L and P.E.V.; Formal analysis: B.R.L. and P.E.V.; Data curation: B.R.L and P.E.V.; Writing – original draft: B.R.L. and F.O.C.; Writing – review and editing: B.R.L., P.E.V., J.S.T. and F.O.C.; Visualization: B.R.L. and P.E.V. Supervision: J.S.T. and F.O.C. All authors have read and agreed to the published version of the manuscript.
This work was supported by the project ATLANTIDA – Platform for the monitoring of the North Atlantic Ocean and tools for the sustainable exploitation of the marine resources, with the reference NORTE-01-0145-FEDER-000040, co-financed by the European Regional Development Fund (ERDF), through Programa Operacional Regional do Norte (NORTE 2020). BRL benefitted from an FCT fellowship PD/BD/127994/2016. The authors would like to thank Sofia Duarte (University of Minho) for the availability and support during practical stages of the research.
Table S1, Figures S1–S5
Data type: zip. archiv (docx. file with descriptions and image files)
Explanation note: Table S1. Primer pairs used to test the efficiency of COI-5P to amplify and assess marine macroinvertebrate species in the preliminary study performed. Figure S1. Sampling set-up: substrates suspended horizontally and deployed in December 2016 at Toralla Island (NW Iberian Peninsula). Figure S2. Number of marine macroinvertebrate species detected in each substrate and sampling time by each of the four primer pairs used in the first screening of primer performance. Figure S3. Shared and unique marine macroinvertebrate species detected by the four COI-5P primer pairs. Figure S4. Taxonomic composition of marine macroinvertebrate communities for each primer pair in each substrate type and sampling time. Figure S5. Optimal number of clusters determined by silhouette and elbow analysis retrieved from k-means.
Tables S2, S3
Data type: Taxonomic classification, Species occurence
Explanation note: Table S2. Number of sequences (raw, usable and submitted to species level taxonomic assignment) for each primer-pair (mlCOIintF/LoboR1; LCO1490/Ill_C_R; TAReuk454FWD1/TAReukREV3), in the three substrates (slate, PVC and granite) and sampling times (3, 7, 10 and 15 months). Table S3. Summary of range of similarity, samples detected and notes on species assignments accuracy for the species detected using the three primer pairs (mlCOIintF/LoboR1, LCO1490/Ill_C_R, TAReuk454FWD1/TAReukREV3).