Research Article |
Corresponding author: Laure Van den Bulcke ( laure.vandenbulcke@ilvo.vlaanderen.be ) Academic editor: Emre Keskin
© 2021 Laure Van den Bulcke, Annelies De Backer, Bart Ampe, Sara Maes, Jan Wittoeck, Willem Waegeman, Kris Hostens, Sofie Derycke.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Van den Bulcke L, De Backer A, Ampe B, Maes S, Wittoeck J, Waegeman W, Hostens K, Derycke S (2021) Towards harmonization of DNA metabarcoding for monitoring marine macrobenthos: the effect of technical replicates and pooled DNA extractions on species detection. Metabarcoding and Metagenomics 5: e71107. https://doi.org/10.3897/mbmg.5.71107
|
DNA-based monitoring methods are potentially faster and cheaper compared to traditional morphological benthic identification. DNA metabarcoding involves various methodological choices which can introduce bias leading to a different outcome in biodiversity patterns. Therefore, it is important to harmonize DNA metabarcoding protocols to allow comparison across studies and this requires a good understanding of the effect of methodological choices on diversity estimates. This study investigated the impact of DNA and PCR replicates on the detection of macrobenthos species in locations with high, medium and low diversity. Our results show that two to three DNA replicates were needed in locations with a high and medium diversity to detect at least 80% of the species found in the six DNA replicates, while three to four replicates were needed in the location with low diversity. In contrast to general belief, larger body size or higher abundance of the species in a sample did not increase its detection prevalence among DNA replicates. However, rare species were less consistently detected across all DNA replicates of the location with high diversity compared to locations with less diversity. Our results further show that pooling of DNA replicates did not significantly alter diversity patterns, although a small number of rare species was lost. Finally, our results confirm high variation in species detection between PCR replicates, especially for the detection of rare species. These results contribute to create reliable, time and cost efficient metabarcoding protocols for the characterization of macrobenthos.
biological replicates, Bulk DNA, diversity, DNA replicates, Macrobenthos, Metabarcoding protocol, North Sea, PCR replicates, pooling, technical replicates
Marine and coastal ecosystems are highly valuable environments as they deliver many ecosystem services (
DNA metabarcoding of macrobenthos uses bulk DNA, extracted from the organisms in the samples. Through PCR, a portion of the mitochondrial COI gene is amplified using specific primers and amplification conditions, followed by next generation sequencing (
A recent literature review on DNA metabarcoding for marine macrobenthos showed the large variation in methodological steps between different studies (
Another important step in the metabarcoding protocol that can introduce bias is PCR amplification. PCR is a stochastic process, because a molecule will be either replicated or not based on whether the primers were able to sufficiently bind during each PCR cycle. The impact of primer choice on species detection is well known (
In this study, we evaluated the effect of DNA and PCR replicates, as well as the effect of pooling DNA replicates before PCR amplification on alpha and beta diversity of macrobenthos. We considered that the impact of technical replication may be linked to the diversity of the sampled communities and therefore sampled three macrobenthic communities in the Belgian part of the North Sea, characterized by low, medium and high diversity (
Sampling was carried out at three different locations in the Belgian Part of the North Sea (BPNS), covering macrobenthic communities with low, medium and high diversity (
The samples were further processed as described in
DNA Replicates
Of each biological replicate, six subsamples of 2 ml were taken from the bulk solution for DNA extraction, resulting in 18 subsamples per location (Fig.
Experimental set-up. Three locations, with a low, medium or high diversity, were sampled and in each location, three Van Veen grab samples (biological replicates A, B and C) were taken. To test the effect of technical replicates on the alpha and beta diversity, six subsamples of each biological replicate were taken for DNA extractions (DNA replicates 1 to 6). In two locations (LOW and HIGH), three separate PCR amplifications were sequenced of one DNA extraction (PCR replicates 1 to 3). To test the effect of pooling multiple DNA replicates, three combinations of two, four or six DNA replicates were pooled before amplification (Pooled DNA replicates for each number of pooled DNA extracts).
Pooled DNA extractions
Pooling can decrease the processing time of the DNA metabarcoding protocol. To investigate the effect of pooling on the detected diversity with metabarcoding, two, four, and six DNA extracts were pooled before the PCR step. Each level of pooling was performed in triplicate with different randomly chosen DNA extracts from the DNA replicates experiment. This was done for the morphologically identified replicate from each location (ZVL-A, 840-C, 120-B). In total, nine pooled DNA extractions for each location were made (Fig.
PCR replicates
To assess variation between PCR replicates, the three PCR products of a subset of samples were processed individually for the index PCR and sequencing, under the same conditions as described above. PCR products of one DNA extract of each biological replicate (A6, B6 and C6) in the lowest and highest diversity locations (ZVL and 120) were used. This resulted in an additional 18 samples (Fig.
The sequencing datasets and corresponding metadata generated for this study can be found as BioProject in the online NCBI repository under accession number PRJNA683870.
The quality of demultiplexed reads was checked with MultiQC (
To take into account the different sequencing depth when comparing alpha diversity, samples of DNA replicates and pooled DNA extractions were rarified at 24,000 reads, while samples of PCR replicates were rarified at 12,000 reads. These numbers were a tradeoff between reaching the plateau of the rarefaction curves and removing a minimum of samples. Samples with lower number of reads were discarded from the dataset, resulting in four removed samples in total: three samples of DNA replicates (ZVL-A-1, ZVL-A-2, 120-C-6) and one sample of PCR replicates (120-C-3).
To assess whether number of ASVs and number of species were significantly different between locations with low, medium and high diversity, nested ANOVA’s were conducted with factors diversity (levels: Low, Medium and High) and biological replicate (levels: A, B and C) nested within diversity, separately for the DNA or PCR replicate datasets. Assumptions for the ANOVA test were checked by making plots and performing a Levene test and a Shapiro test for investigating the homogeneity of the variances and the normality of the data, respectively. When the assumption for normality was not met, data were transformed with the reciprocal transformation 1/x. In case the Shapiro test remained significant after transformation, normality was assumed when the values in the center of the Quantile-Quantile plot fitted within the 95% confidence interval.
To investigate the effect of DNA and PCR replicates on ASV and species richness, the ASVs and species uniquely found in one replicate and shared between all replicates were investigated using upset plots with the R package UpSet R v1.4.0 (
To investigate the effect of pooled DNA extractions on ASV and species richness, differences in the number of ASVs and species between the pooled DNA extractions were tested with a two-way ANOVA with factors diversity (levels: High, Medium and Low) and number of pooled DNA extracts (levels: 2, 4 or 6). Assumptions for the ANOVA test (homogeneity of the variances and normality of the data) were checked by making plots and performing a Levene test and a Shapiro test, respectively. When the assumption for the normality was not met, data were transformed with the reciprocal transformation 1/x. When this transformation did not give a normal distribution, the tests were still applied if the values fitted within the 95% confidence interval at the center of the Quantile-Quantile plot. Furthermore, we assessed the observed number of species and the mean number of expected species, each for two, four and six pooled DNA extractions. The observed number of species represents the sequencing results of the pooled DNA extractions, while the mean number of expected species is calculated as the mean number of species observed in the different separate DNA extractions that were used for pooling. For each diversity level, a two-way ANOVA with factors number of pooled DNA extracts (levels: 2, 4 or 6) and number of species (levels: Observed or Expected) was conducted. Assumptions for the ANOVA’s were checked as explained above. To investigate the unique and shared species between the separate and pooled DNA extractions, upset plots were made using the R package UpSetR v1.4.0. The number of shared and unique species in each location were calculated in percentages and visualized in barplots.
To investigate variation in community composition between biological and DNA replicates, non-metric multidimensional scaling (nMDS) plots based on Jaccard (
One biological replicate from each location (120-B, 840-C, ZVL-A) was identified morphologically up to species level. Respectively 3, 10 and 39 species were identified in the sample (0.1 m²) with low (ZVL-A), medium (840-C) and high (120-B) diversity. A detailed list of all species found in each of the samples is presented in Suppl. material
After filtering, the number of reads for the different technical replicates ranged between 215 and 225,722. A detailed table with the number of reads after each filtering step for each replicate can be found in Suppl. material
DNA replicates
The three locations were chosen based on their macrobenthos species diversity. As expected, the nested ANOVA showed that the number of ASVs and the number of assigned species differed significantly between locations with different diversity, and increased from the locations with low (ZVL), medium (840) and high (120) diversity (Table
Despite the fact that DNA was extracted from the same biological sample, the majority of ASVs (62%–86%) were uniquely found in one DNA replicate (Fig.
Output ANOVA’s. Results of the ANOVA’s for alpha diversity (ASVs and species level). Crossed factors are indicated with an asterisk, while brackets are used for nested factors. The output of the Levene test and the Shapiro test are shown in the second and third column, respectively.
Homogeneity | Normality | Df | SumsOfSqs | F Value | Pr(>F) | |
---|---|---|---|---|---|---|
ASVs | ||||||
DNA replicates | ||||||
Diversity | 0.061 | 0.001 | 2 | 73688 | 60.135 | 4.712e-13 |
Diversity(Biological_replicate) | 6 | 52849 | 14.376 | 7.874e-09 | ||
Residuals | 42 | 25733 | ||||
Pooled DNA extracts | ||||||
Diversity | 0.452 | 0.060 | 2 | 0.0046143 | 40.297 | 2.253e-07 |
Nr_pooled | 2 | 0.0000096 | 0.084 | 0.920 | ||
Diversity*Nr_pooled | 4 | 0.0000364 | 0.159 | 0.956 | ||
Residuals | 18 | 0.0010306 | ||||
PCR replicates | ||||||
Diversity | 0.626 | 0.053 | 1 | 0.00248999 | 47.430 | 2.632e-05 |
Diversity(Biological_replicates) | 4 | 0.00037546 | 1.788 | 0.202 | ||
Residuals | 11 | 0.00057748 | ||||
Species | ||||||
DNA replicates | ||||||
Diversity | 0.341 | 0.008 | 2 | 2955.99 | 168.022 | <2e-16 |
Diversity(Biological_replicate) | 6 | 115.15 | 2.182 | 0.0639 | ||
Residuals | 42 | 369.45 | ||||
Pooled DNA extracts | ||||||
Diversity | 0.526 | 1.689e-05 | 2 | 2064.30 | 149.828 | 6.023e-12 |
Nr_pooled | 2 | 2.30 | 0.167 | 0.848 | ||
Diversity*Nr_pooled | 4 | 50.37 | 1.828 | 0.167 | ||
Residuals | 18 | 124.00 | ||||
Pooled expected vs observed in HIGH diverse location | ||||||
Nr_pooled | 0.702 | 5.816e-04 | 2 | 34.670 | 1.7451 | 0.224 |
Method | 1 | 9.400 | 2.9597 | 0.116 | ||
Nr_pooled * Method | 2 | 12.600 | 0.6342 | 0.550 | ||
Residuals | 10 | 99.333 | ||||
Pooled expected vs observed in MEDIUM diverse location | ||||||
Nr_pooled | 0.977 | 0.0512 | 2 | 0.000188 | 0.257 | 0.778 |
Method | 1 | 0.000227 | 0.620 | 0.449 | ||
Nr_pooled * Method | 2 | 0.000597 | 0.814 | 0.470 | ||
Resiudals | 10 | 0.00367 | ||||
Pooled expected vs observed in LOW diverse location | ||||||
Nr_pooled | 0.359 | 0.0243 | 2 | 3.925 | 1.204 | 0.344 |
Method | 1 | 0.453 | 0.278 | 0.611 | ||
Nr_pooled * Method | 2 | 7.630 | 2.341 | 0.152 | ||
Residuals | 10 | 14.667 | ||||
PCR replicates | ||||||
Diversity | 0.385 | 0.00377 | 1 | 1210.04 | 339.841 | 1.279e-09 |
Diversity(Biological_replicates) | 4 | 137.26 | 9.638 | 0.001 | ||
Residuals | 11 | 39.17 |
Variation of ASVs and species in DNA replicates. The percentage of unique and shared ASVs (a) or species (b) between all six DNA replicates are shown for locations with low, medium or high diversity. These percentages were calculated based on UpSet plots of the different DNA replicates in biological replicates.
Irrespective of diversity, species uniquely found in one DNA replicate or shared by all six DNA replicates show similar distribution of the size classes and abundance classes (Suppl. material
Accumulation plots, to investigate the number of DNA replicates capturing most of the diversity, illustrate that the number of ASVs was still increasing for nearly all biological replicates, even when six DNA replicates were taken. The plateau was reached in only one location, more precisely in two biological replicates (A and C) of the location with medium diversity (840) (Suppl. material
Finally, we assessed the impact of DNA replicates on detecting the number of species that were identified morphologically. For the location with high, medium and low diversity, 31%, 30% and 33% of the morphologically identified species were consistently detected in all six DNA replicates. When looking at the total found species across six DNA replicates (calculated as the union of the six replicates), the detection rate was increased to 49% and 50% of the total number of morphologically identified species for the location with high and medium diversity, while the same percentage was found for the location with low diversity (33%) (Suppl. material
Pooled DNA extractions
To investigate the effect of pooling DNA extracts on the detected diversity with metabarcoding, two, four and six DNA extracts of one biological replicate in the three different locations were pooled before PCR amplification. No significant interaction effect between the diversity and the number of pooled DNA extracts was detected (Table
Between the separate and the two, four and six pooled DNA extracts, most (68%–88%) of the ASVs were unique (Fig.
Variation of ASVs and species in number of pooled DNA extracts. The percentage of unique and shared ASVs (a) or species (b) between the separate and pooled DNA extracts are shown for the different locations. These percentages are calculated based on UpSet plots. Therefore, sets with the different replicates were made for the separate DNA extracts (1,2,3,4,5,6) and pooled DNA extracts: two (2a,2b,2c), four (4a,4b,4c) and six (6a,6b,6c). (c) The unique number of species in the separate and pooled DNA extracts (2, 4 or 6) for each location.
PCR replicates
The number of shared ASVs between the different PCR replicates originating from the same DNA extracts was low and ranged between 9%–38%, while 52%–88% of the ASVs were unique for one of the three PCR replicates (Fig.
Variation of ASVs and species in PCR replicates. The average percentage of unique and shared ASVs (a) or species (b) between all three PCR replicates are shown for locations with low or high diversity. These percentages were calculated based on UpSet plots of the different PCR replicates in biological replicates and used to calculate the means of the different biological replicates in the different stations.
Finally, we assessed the effect of taking multiple PCR replicates on the detection of morphologically identified species. In the location with high diversity, 39 species were morphologically identified, of which 17 species were also found with metabarcoding. Most of them, 15 species, were found in each PCR replicate. In the location with low diversity, three species were morphological identified, of which one species was also found with metabarcoding. This species, Limecola balthica, was present in all three PCR replicates (Suppl. material
Accumulation plots show that no plateau was reached in the number of ASVs, even when three PCR replicates were taken (Suppl. material
DNA replicates
PERMANOVA shows a significant effect of the diversity, as well as of the biological replicates on the community composition (Table
Output PERMANOVA’s. Results of the PERMANOVA’s for beta diversity on species level. Crossed factors are indicated with an asterisk, while brackets are used for nested factors. The output of the permdisp is shown in the second column.
DNA replicates | Permdisp | Df | SumsOfSqs | F.Model | R2 | Pr(>F) |
---|---|---|---|---|---|---|
Diversity | 1e-04 | 2 | 5.5435 | 32.425 | 0.44624 | 1e-04 |
Diversity(Biological_replicate) | 4e-04 | 6 | 3.2889 | 6.4125 | 0.26475 | 1e-04 |
Residuals | 42 | 3.5902 | 0.28901 | |||
Total | 50 | 12.4227 | 1.00000 | |||
Pooled DNA extractions | ||||||
Diversity | 1e-04 | 2 | 6.3483 | 29.0437 | 0.59102 | 0.0001 |
Nr_pooled | 0.198 | 3 | 0.4361 | 1.3302 | 0.04060 | 0.1959 |
Diversity*nr_pooled | 6 | 0.5689 | 0.8677 | 0.5297 | 0.6325 | |
Residuals | 31 | 3.3879 | 0.31541 | |||
Total | 42 | 10.7413 | 1.00000 |
nMDS plot for DNA replicates, based on Jaccard dissimilarity index. Community composition between DNA replicates of the different biological replicates (marked with different symbols) and between the different locations with low, medium or high diversity (marked with different colors) are shown.
Pooled DNA extractions
The nMDS plot using the Jaccard dissimilarity index shows no separate clusters for the separate DNA extracts and two, four or six pooled DNA extracts (Fig.
nMDS plot for pooled DNA extractions, based on Jaccard dissimilarity index. Community composition between separate DNA replicates and pooled DNA extractions (marked with different symbols) in the different locations with low, medium or high diversity (marked with different colors) are shown.
Furthermore, 59% of the variation is explained by the stations, and only 4% by the number of pooled DNA extracts (Table
DNA metabarcoding is a promising method to study diversity, providing that the impact of methodological choices on detection of macrobenthic species is known. Here, we provide empirical evidence on the importance of DNA and PCR replication for species detection in bulk DNA samples to contribute to a metabarcoding protocol that is robust and reliable, but also time and cost-efficient. First, our data showed that a huge portion of the amplicon sequence variants (ASVs) were unique for a replicate, but most of these unique ASVs did not receive taxonomic assignment and represented very few reads, indicating that these ASVs did not reflect the genetic diversity in the sample, but were most likely generated during the metabarcoding process. Second, our results showed that there was substantial variation between DNA and PCR replicates, but this variation was lower than the differences between biological replicates. Finally, when pooling two, four or six DNA extractions, a similar number of species and similar species composition were detected as in the separate DNA extractions.
In this study, the dada2 protocol was used for processing of the raw reads, generating ASVs. A high number of unique ASVs was observed for a specific sample, even when these ASVs were obtained from the same homogenous ‘soup’ (DNA replicates) or DNA extract (PCR replicates). Most of these unique ASVs did not get a good taxonomic assignment with the used reference database, as just a small part (13%–41% for the different biological replicates) was assigned at phylum or higher level.
Our study allows to rule out some of these explanations. First, if the unassigned ASVs would represent unknown biological diversity, we would expect to find these ASVs in multiple DNA replicates and especially in multiple PCR replicates. Between DNA replicates, a high number of ASVs (62%–86%) were uniquely found in one replicate (Fig.
The need for DNA replication in sediment samples has already been demonstrated for several groups (
Next to homogenization, larger and more abundant species have been assumed to be more easily detected due to the higher contribution of tissue in the ‘soup’, in contrast to smaller and rarer species (
To detect as many species as possible in the homogenous soup, multiple DNA replicates should be taken. One DNA replicate contained 45%–70% of the observed diversity, in line with the results of
The results of this study illustrate the need for DNA replicates, but if these replicates could be pooled, time and costs could be decreased. Our statistical results show that the number of species as well as the community composition observed in pooled extractions compared to separate DNA extractions were not significantly different, illustrating that pooling does not alter alpha and beta diversity patterns. However, a small number of species (9%–38%) is no longer detected in the pooled extractions compared to separate extracts (Fig.
Despite the small differences observed when pooling DNA extracts, pooling provides more robust outcomes and can compensate for inefficient DNA extraction. The metabarcoding protocol and the decision of whether or not to pool depends on the research question. For example, in environmental impact assessments, the most abundant species are important to detect, as these have the highest impact on the community patterns. In this case, pooling is suggested as it can decrease time and costs. When the protocol would be used for mapping the diversity in a certain area, the rare species are also important to detect. In this case, DNA replicates are better analyzed separately or when pooled should be sequenced with sufficient depth, to increase the chance of detecting as many species as possible.
In this study, a high variation between the PCR replicates was detected for bulk DNA samples. Several plateaus were reached in the different species accumulation curves, but moderate numbers of species were shared between all PCR replicates. A first explanation for the variation between the PCR replicates could be the low abundance of multiple species, as most species (91%) found in only one replicate had low abundances (<50 reads). PCR amplification can introduce bias in the metabarcoding protocol, mainly due to the stochasticity of the process (
A second possibility why these species are not found in all three PCR replicates could be the primer efficiency. However, the used primer, the Leray primers set, has been shown to amplify most macrobenthic species of the North Sea (
Third, the PCR amplification conditions are important. Since the same cycling conditions are used for all PCR replicates, it seems unlikely that this would introduce variation between PCR replicates. Also the same PCR volume (25 µl) was used for the separate PCR replicates, but using small or bigger volumes can have an impact. Typically, 10 µL to 50 µL is used (
In this study, variation between PCR replicates was observed, particularly due to the stochasticity and primer bias of the PCR amplification, similar as found in a study on marine artificial mock communities by
Metabarcoding and morphological identification showed comparable results for the most abundant species. High abundances were found for 52% of the species detected by both methods. However, approximately only half the number of species detected by morphological identification were also picked up by metabarcoding. First, this can be explained by the different resolution of both methods. With the morphological identification, 15 specimens did not have a taxonomic assignment to species level, but only to genus, family or class level, while the genetic method was able to assign eight out of these 15 specimens to species level (Suppl. material
This study highlighted the importance of technical replicates. First, taking into account the financial constraints and additional time needed with every DNA extraction, our data show that limited gain is achieved when conducting more than three DNA replicates for macrobenthic metabarcoding studies of medium and high diverse locations. However, when sampling in a low diverse environment, taking more replicates should be considered. Second, for environmental impact assessments, the protocol can be made more time and cost efficient by pooling the separate DNA extracts without a valuable loss in species detection. Third, substantial variation between PCR replicates was observed in this study. As numerous papers still lack PCR replicates, we want to emphasize the importance of taking at least three PCR replicates, the maximum tested replicates in this study, when investigating species diversity. These results contribute to create time and cost efficient metabarcoding protocols for environmental impact assessments, so yielding reliable results.
The sequencing datasets and corresponding metadata generated for this study can be found as BioProject under accession number PRJNA683870 in the online NCBI repository https://www.ncbi.nlm.nih.gov/. The R script to reproduce the findings of this study can be found at https://gitlab.com/Lvandenbulcke/dna-metabarcoding-technical-replicates-and-pooling.
We thank Hans Hillewaert and Joran Vanhollebeke for their contribution to the reference database and the bioinformatics pipeline, respectively. Ship Time RV Belgica was provided by BELSPO and RBINS-OD Nature. We thank the crew for the logistic support during the sampling campaign. Funding for this research was received through the Sand Fund of the Federal Public Service Economy and GEANS – Genetic tools for Ecosystem health Assessment in the North Sea region, an Interreg project supported by the North Sea Programme of the European Regional Development Fund of the European Union. Willem Waegeman received funding from the “Onderzoeksprogramma Artificielë Intelligentie (AI) Vlaanderen Fund of the Flemish Government.
Figures S1–S14
Data type: zip. archiv (PNG. and JPG. images)
Explanation note: ESM Figure S1. Map of the Belgian Part of the North Sea (BPNS) with the three sampling locations. ESM Figure S2. Multivariate homogeneity of group dispersions (variances). ESM Figure S3. Mean number of reads and ASVs. ESM Figure S4. Taxonomic assignment of ASVs. ESM Figure S5. Mean number of ASVs and species. ESM Figure S6. Size and abundance distribution. ESM Figure S7. Accumulation plots for DNA replicates on ASV level. ESM Figure S8. Accumulation plots for DNA replicates on species level. ESM Figure S9. UpSet plots for comparison of morphological identified samples for DNA replicates. ESM Figure S10. Observed vs expected number of species in different number of pooled DNA extracts. ESM Figure S11. Upset plots for comparison of morphological identified samples for PCR replicates. ESM Figure S12. accumulation plots based on ASVs for PCR replicates. ESM Figure S13. accumulation curves based on species for PCR replicates. ESM Figure S14. nMDS plot for DNA replicates (a) and pooled DNA extractions (b) based on Bray-Curtis.
Tables S1–S8
Data type: zip. archiv (PDF. files)
Explanation note: ESM Table S1. Used primerset. ESM Table