Research Article |
Corresponding author: Jarno Turunen ( jarno.turunen@ely-keskus.fi ) Academic editor: Chloe Robinson
© 2021 Jarno Turunen, Heikki Mykrä, Vasco Elbrecht, Dirk Steinke, Thomas Braukmann, Jukka Aroviita.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Turunen J, Mykrä H, Elbrecht V, Steinke D, Braukmann T, Aroviita J (2021) The power of metabarcoding: Can we improve bioassessment and biodiversity surveys of stream macroinvertebrate communities? Metabarcoding and Metagenomics 5: e68938. https://doi.org/10.3897/mbmg.5.68938
|
Most stream bioassessment and biodiversity surveys are currently based on morphological identification of communities. However, DNA metabarcoding is emerging as a fast and cost-effective alternative for species identification. We compared both methods in a survey of benthic macroinvertebrate communities across 36 stream sites in northern Finland. We identified 291 taxa of which 62% were identified only by DNA metabarcoding. DNA metabarcoding produced extensive species level inventories for groups (Oligochaeta, Chironomidae, Simuliidae, Limoniidae and Limnephilidae), for which morphological identification was not feasible due to the high level of expertise needed. Metabarcoding also provided more insightful taxonomic information on the occurrence of three red-listed vulnerable or data deficient species, the discovery of two likely cryptic and potentially new species to Finland and species information of insect genera at an early larval stage that could not be separated morphologically. However, it systematically failed to reliably detect the occurrence of gastropods that were easily identified morphologically. The impact of mining on community structure could only be shown using DNA metabarcoding data which suggests that the finer taxonomic detail can improve detection of subtle impacts. Both methods generally exhibited similar strength of community-environment relationships, but DNA metabarcoding showed better performance with presence/absence data than with relative DNA sequence abundances. Our results suggest that DNA metabarcoding holds a promise for future anthropogenic impact assessments, although, in our case, the performance did not improve much from the morphological species identification. The key advantage of DNA metabarcoding lies in efficient biodiversity surveys, taxonomical studies and applications in conservation biology.
biomonitoring, Fennoscandia, mining, molecular data, species identification, taxonomic resolution
Freshwater biomonitoring and biodiversity assessments are traditionally based on morphological identification of species and specimen count data. Particularly, benthic macroinvertebrates are used to assess the ecological state of streams and lakes. Such bioassessment programmes are often limited to genus or family level taxonomic resolution (e.g.
The past two decades saw the rapid development of DNA-based approaches to species identification. Particularly, systems based on the analysis of sequence variation in short, standardised gene regions (i.e. DNA barcodes) were shown to be very effective in species discrimination (
Nevertheless, a key criterion for the adoption of DNA-based methods as state-of-the-art procedures in bioassessment and biodiversity surveys remains their reliability. There are known cases of false-negatives (a species is present in the sample, but its DNA is not detected either due to primer bias, insufficient specimen biomass, DNA degradation or incomplete barcode libraries) and false-positives (a species is not present in the sample, but is detected either due to the presence of trace DNA in, for example, gut content, cross-contamination or analytical errors;
For this study, we conducted a bioassessment survey of macroinvertebrates in subarctic streams close to mining operations in northern Finland. Macroinvertebrates were identified by using both a standard morphology-based protocol and DNA metabarcoding. We explored differences in detecting different taxa for both methods and were particularly interested in the potential of metabarcoding to gain additional taxonomic information for the study region as we expected that it would provide a more comprehensive assessment of local biodiversity than morphological identifications. Furthermore, we anticipated that a more detailed level of identification would help to better understand community-environment relationships and provide higher sensitivity to potential mining impacts.
Our study area is located in the Lapland region of northern Finland (between 66°N and 70°N). Streams in this region belong to northern boreal and subarctic ecoregions. The area is sparsely populated and the main anthropogenic pressure for freshwater ecosystems stems from forestry and mining operations. There is an increasing interest in the extraction of mineral deposits in the area with potentially negative impacts on freshwater biodiversity (
The key minerals extracted from the mining sites are gold (Kittilä, Kevitsa, Pahtavaara, Saattopora-Pahtavuoma), nickel (Kevitsa), copper (Kevitsa, Saattopora-Pahtavuoma) and chromium (Kemi). In Sokli, the key mineral is phosphate. The mines are mostly combinations of open pit and underground extraction sites. The main impacts on streams at all the mining-impacted study sites are elevated concentrations of nitrogen and sulphate which originate from the ore extraction and enrichment process, as well as use of explosives and rock tailings. Some sites also show elevated concentrations of arsenic, antimony and ions such as chloride salts (Kittilä, Kemi, Kevitsa). The impacts of this type of mining on stream biodiversity are poorly known; however, a recent study suggests that they are subtle (
We made several measurements in both the habitat and the catchment of each stream site (Suppl. material
Benthic macroinvertebrates were sampled in autumn 2017 by taking four 30 second kick-net samples (mesh size 0.5 mm, area of disturbed stream bed = 1.2 m2) from swiftly flowing riffle sections at each site. The protocol followed the sampling guidelines for Finnish National Water Framework Directive (WFD) monitoring (
Morphological identification of macroinvertebrates followed the requirements for national WFD bioassessments (
For DNA extraction, sampling lots comprising all morphologically identified specimens were dried overnight at 40 °C. Large specimens (e.g. a dytiscid beetle) were removed from samples and a single leg of the specimen added back to the sample to prevent DNA of such a single specimen dominating the sample (
Metabarcoding was done using a two-step fusion primer strategy (
Raw sequencing data was quality checked using FastQC v.0.11.8 and then processed using the R stats based JAMP v.0.58 pipeline (https://github.com/VascoElbrecht/JAMP) which mostly relies on usearch v.11.0.667 (
Identifications by morphology and DNA metabarcoding both had cases of higher and lower taxonomic level than species level identification for a given taxon. Therefore, to avoid overlapping higher level taxa, we harmonised the taxonomy in each sample separately for both identification methods as follows. Identifications from higher level (e.g. family or genus) were assigned to corresponding lower level (e.g. genus or species) identifications in the sample in accordance with the numerical abundance (number of specimens in morphological identifications and number of sequences in DNA metabarcoding) ratios of the corresponding lower level identifications. If lower level identifications were not present in a sample for a given taxon, the higher level identifications were left as is.
We compared both identification methods, using taxon accumulation curves, taxon occurrence information, multivariate analyses of community composition and community-environment-relationships, as well as statistical tests for potential effects of mining on biota. We explored community composition using Non-metric Multidimensional Scaling (NMDS) ordination. Two-dimensional NMDS ordinations were based on Bray-Curtis distances of log-transformed abundance data (morphological identifications) and proportional DNA sequence abundance data, as well as presence-absence data. Adding further dimensions did not considerably optimise stress. We correlated community composition (NMDS scores) with main environmental gradients by using Principal Component Analysis (PCA) to reduce multidimensionality of the environmental data to a few interpretable principal components (gradients) and subsequently fitted those to the NMDS scores. We used a Permutational Multivariate Analysis of Variance (PERMANOVA;
MiSeq sequencing generated 15,760,049 sequences. Raw data are available under the SRA accession PRJNA547646. Eleven of the 12 negative controls showed less than 500 reads and one negative control had around 3,000 reads. The macroinvertebrate samples had an average sequencing depth of 149,278 reads (SD = 17732). During bioinformatic processing and quality filtering, an average of 11.1% of the reads were discarded in each mining sample (SD = 2.4%). The 10 Lepidoptera species, added as positive controls, did appear in other samples, almost exclusively along the column in which the respective sample was added (figure S1). This might be caused by the decreased sequence quality in read two of Illumina sequencing or tag switching on the flow cell related to the second read.
We collected a total of 33,355 specimens from all 36 stream sites. The average number of specimens per sample was 927 (range: 201–2557). Overall, morphological analysis revealed 113 taxa, while metabarcoding unveiled 250 taxa (Fig.
Cumulative number of taxa (mean ± 95% confidence interval) for metabarcoding (blue solid line) and morphologically identified (red dashed line) data.
Metabarcoding revealed more unique taxonomic units (181) than did morphological identification (40). Approximately half of the taxa detected only by metabarcoding were from groups that were not morphologically identified below family or lower taxonomic level using the routine protocol: Oligochaeta (16 species, genera or families detected only through metabarcoding), Arachnida (11 families or species), Chironomidae (72 species or genera), Simuliidae (11 species), other Dipteran families (seven species) and Limnephilidae (Trichoptera, seven species). Even after excluding these taxonomic groups, metabarcoding still identified more unique taxa (57 taxa not identified by morphology) than did morphological identification (22 taxa not identified by metabarcoding). Overall, metabarcoding detected 81% of the taxa that were detected by morphology, whereas morphological identification detected 54% of the taxa detected by metabarcoding (excluding the taxonomic groups that were not morphologically identified to lower levels).
In many cases, metabarcoding outperformed morphological identification by producing otherwise not possible species level information (Table
Number of occurrences and mean abundances per site for taxa that were exclusively detected either by morphological identification or metabarcoding. Taxonomic resolution of the datasets was adjusted to national target level (
Taxon | Order | Number of occurrences | Mean abundance | ||
---|---|---|---|---|---|
Morphology | DNA | No of specimens | Sequence % | ||
Radix peregra | Mollusca | 9 | 6 | ||
Lymnea stagnalis | Mollusca | 1 | 1 | ||
Valvata sp. | Mollusca | 2 | 6.5 | ||
Stagnicola sp. | Mollusca | 2 | 1 | ||
Dytiscus circumcinctus | Coleoptera | 2 | 1 | ||
Dixidae | Diptera | 1 | 0.18 | ||
Clinocera sp. | Diptera | 2 | 1.5 | ||
Hemerodromia adulatoria | Diptera | 3 | 0.08 | ||
Tipula sp. | Diptera | 1 | 2 | ||
Heptagenia pulla | Ephemeroptera | 21 | 0.66 | ||
Kageronia fuscogrisea | Ephemeroptera | 2 | 2 | ||
Sipholonurus alternatus | Ephemeroptera | 2 | 0.12 | ||
Sigara striata | Heteroptera | 1 | 1 | ||
Sialis lutaria | Megaloptera | 2 | 3 | ||
Somathochlora alpestris | Odonata | 1 | 5.7 | ||
Somathochlora metallica | Odonata | 1 | 1 | ||
Capnia atra | Plecoptera | 2 | 0.07 | ||
Nemurella pictetii | Plecoptera | 1 | 0.44 | ||
Apatania crymophila | Trichoptera | 4 | 0.12 | ||
Apatania wallengreni | Trichoptera | 9 | 2.1 | ||
Micrasema gelidum | Trichoptera | 9 | 8.7 | ||
Micrasema primoricum | Trichoptera | 9 | 0.35 | ||
Ceraclea nigranervosa | Trichoptera | 1 | 1 | ||
Neureclipsis bimaculata | Trichoptera | 3 | 2 | ||
Sericostoma flavicorne | Trichoptera | 15 | 5.6 |
Metabarcoding also had some shortcomings in species detection. Notably, it often did not detect Gastropoda, for example, Gyraulus sp. only at five out of nine sites compared to morphology and failed to detect any of the 14 occurrences of four other gastropod taxa (Table
When datasets were harmonised to the taxonomic resolution required for national WFD bioassessments, both methods produced highly comparable results with 89 shared taxa and only 14 taxa identified solely by metabarcoding and only 18 exclusively by morphology (Fig.
The PCA reduced the environmental data to four components which together explain 73% of the variation in the environmental variables. The first component (PC1) represents a gradient of mining influence on water chemistry, specifically inorganic nitrogen, sulphate and metal and non-metal concentrations explaining most of the variation (38%, Suppl. material
The community composition (NMDS), based on both morphological and metabarcoding data, showed significant relationships to the environment, but the magnitude varied between both datasets depending on whether abundance or presence-absence data were used. The relationship of community composition and the mining pollution gradient (PC1) was stronger with morphological identification (R2 = 0.54, P < 0.001) than with metabarcoding (R2 = 0.46, P < 0.001) (Fig.
Venn diagrams showing the shared and unique number of taxa by methodology when the datasets are harmonised to the same taxonomic resolution of national WFD bioassessment. a) The whole macroinvertebrate community and b) Ephemeroptera, Plecoptera and Trichoptera (EPT) taxa.
The correlation of community dissimilarity with environmental distance was substantially weaker with metabarcoding data (rs = 0.50) than with morphological identifications (rs = 0.72) when abundance data were used. The results were also different in terms of environmental variables selected as only water iron (Fe) concentration was selected as the best variable using metabarcoding (Table
Summary of the BIO-ENV analysis and the best sets of environmental variables used to construct environmental dissimilarity matrices. Spearman (rs) correlations describe the strength of correlation between community dissimilarity and environmental distances matrices. p/a = Presence-absence data.
Dataset | rs | Best subset of environmental variables |
---|---|---|
DNA metabarcoding, abundance (%) | 0.50 | Fe |
DNA metabarcoding, p/a | 0.74 | Current velocity, Shade, Bryophyte cover, Mn, Tot N, Tot P, Fe, Ca |
Morphology, log(abundance) | 0.72 | Current velocity, Bryophyte cover, Tot N, Fe, Mn |
Morphology, p/a | 0.71 | Current velocity, Shade, Bryophyte cover, Tot N, Fe, Mn |
Distribution of the study sites in a two-dimensional NMS-ordination space, based on macroinvertebrate identification by morphology (a and c) and metabarcoding (b and d) using abundance data (a and b) or presence/absence (c and d) data across the six mining areas. The vectors represent significantly related Principal Components (PC) NMDS axis scores. The colours represent mining areas.
The mining impact on the streams was subtle across the surveyed mining areas (see also
The rapid development of new DNA sequencing techniques and the increase in taxonomic coverage in reference sequence databases has made DNA metabarcoding an attractive tool for biodiversity surveys and bioassessments (e.g.
Metabarcoding generated more information for unique taxa than did morphological identification. In fact, in contrast to the morphological approach that uses coarser taxonomic resolution, the cumulative number of taxa found with metabarcoding did not reach an asymptote with increasing number of studied sites. This indicates that much of the regional benthic macroinvertebrate diversity has not yet been observed at the taxonomic resolution that can be achieved by metabarcoding.
Much of this unique taxonomic information comes from groups such as Chironomidae, Simuliidae, Limnephilidae and Oligochaeta, which could be expected as Fennoscandian DNA reference libraries are quite comprehensive for these groups. In our routine morphology-based surveys, they are also never identified to species level.
In our study, metabarcoding was particularly useful in resolving morphological identifications at genus level (especially beetles and stoneflies). Cryptic diversity in morphologically-inseparable species complexes is a common feature of many benthic macroinvertebrate groups (e.g.
It has been suggested that better resolved taxonomy results in stronger estimates of community-environment relationships and better detection of environmental impacts (
Metabarcoding did not detect all taxa found in morphological identifications. In particular, several molluscs, which we were able to identify by morphology, were missing from the DNA dataset. This could be the result of primer bias during PCR amplification (especially for mollusca; see
Our results show that bulk metabarcoding can provide comparable or superior results to traditional morphology-based species identification for stream bioassessment and biodiversity surveys. The benefits were most apparent for species groups that are tedious to identify and require high taxonomic expertise. However, the fact that DNA sequence abundances did not correlate with species abundances or biomass hampers the use of ecological metrics that rely on those. Primer bias or incomplete DNA reference libraries, by contrast, do not seem to pose a major roadblock if the goal of a study is impact assessment rather than a complete biodiversity survey. Given the fast growth of DNA reference libraries (
We thank Satu Maaria Karjalainen for insightful project management, study planning and site selection. We would also like to thank Minna Kuoppala, Kirsti Leinonen, Katri E. Tolonen and Mika Visuri for resilient assistance in fieldwork. We also thank K. E. Tolonen for initiating the present study and M. Kuoppala for drawing the map. This study was funded by the evaluation and management project of the cumulative environmental effects of the mining cluster in Lapland (A72032) through the EU’s Northern Finland Regional Development Fund and FRESHABIT LIFE IP (LIFE14 IPE/FI/23). This work was also supported by the Canada First Research Excellence Fund and represents a contribution to the University of Guelph’s “Food from Thought” programme.
Table S1
Data type: Environmental variables
Explanation note: Table S1. Mean and range environmental characteristics of the 36 study sites in northern Finland.
Table S2
Data type: Bioinformatic information
Explanation note: Table S2. Bioinformatic information of DNA extraction including sample layout and in-line tagging.
Table S3
Data type: Read counts and taxonomy of OTU sequences
Explanation note: Table S3. Raw OTU table showing read counts, as well as taxonomy and OTU sequences for each sample. In addition, processed and quality filtered OTU tables with taxonomy assigned are included.
Table S4
Data type: Occurrences of taxa
Explanation note: Table S4. List of taxa observed by metabarcoding (DNA meta) and by routine morphological identification (Morpho) across 36 stream site samples in northern Finland. Taxon frequency occurrence (% of samples) is given for both methods. Mean relative abundance of reads for DNA metabarcoding and mean number of specimens for morphological identification across samples in which each taxon was present are also given.
Table S5
Data type: Results of PCA of environmental variables
Explanation note: Table S5. Eigenvalues, percent explained variance and varimax rotated principal component loadings of the environmental variables. Loadings > 0.5 are in bold.
Scripts S1
Data type: R-scripts
Explanation note: Commands for data processing are available as supporting information.