Research Article |
Corresponding author: Bonnie Bailet ( bb.bailet@hotmail.fr ) Academic editor: Hugo de Boer
© 2019 Bonnie Bailet, Agnes Bouchez, Alain Franc, Jean-Marc Frigerio, François Keck, Satu-Maaria Karjalainen, Frederic Rimet, Susanne Schneider, Maria Kahlert.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Bailet B, Bouchez A, Franc A, Frigerio J-M, Keck F, Karjalainen S-M, Rimet F, Schneider S, Kahlert M (2019) Molecular versus morphological data for benthic diatoms biomonitoring in Northern Europe freshwater and consequences for ecological status. Metabarcoding and Metagenomics 3: e34002. https://doi.org/10.3897/mbmg.3.34002
|
Diatoms are known to be efficient bioindicators for water quality assessment because of their rapid response to environmental pressures and their omnipresence in water bodies. The identification of benthic diatoms communities in the biofilm, coupled with quality indices such as the Indice de polluosensibilité spécifique (IPS) can be used for biomonitoring purposes in freshwater. However, the morphological identification and counting of diatoms species under the microscope is time-consuming and requires extensive expertise to deal with a constantly evolving taxonomy. In response, a molecular-based and potentially more cost-effective method has been developed, coupling high-throughput sequencing and DNA metabarcoding. The method has already been tested for water quality assessment with diatoms in Central Europe. In this study, we applied both the traditional and molecular methods on 180 biofilms samples from Northern Europe (rivers and lakes of Fennoscandia and Iceland). The DNA metabarcoding data were obtained on two different DNA markers, the 18S-V4 and rbcL barcodes, with the NucleoSpin Soil kit for DNA extraction and sequenced on an Ion Torrent PGM platform. We assessed the ability of the molecular method to produce species inventories, IPS scores and ecological status class comparable to the ones generated by the traditional morphology-based approach. The two methods generated correlated but significantly different IPS scores and ecological status assessment. The observed deviations are explained by presence/absence and abundance discrepancies in the species inventories, mainly due to the incompleteness of the barcodes reference databases, primer bias and strictness of the bioinformatic pipeline. Abundance discrepancies are less common than presence/absence discrepancies but have a greater effect on the ecological assessment. Missing species in the reference databases are mostly acidophilic benthic diatoms species, typical of the low pH waters of Northern Europe. The two different DNA markers also generated significantly different ecological status assessments. The use of the 18S-V4 marker generates more species inventories discrepancies, but achieves an ecological assessment more similar to the traditional morphology-based method. Further development of the metabarcoding method is needed for its use in environmental assessment. For its application in Northern Europe, completion and curation of reference databases are necessary, as well as evaluation of the currently available bioinformatics pipelines. New indices, fitted for environmental biomonitoring, should also be developed directly from molecular data.
Metabarcoding, environmental assessment, 18S-V4, rbcL, Bacillariophyta, water quality
Diatom communities are excellent bioindicators of water quality because of their rapid response to environmental changes, such as eutrophication and pollution (
High Throughput Sequencing based metabarcoding, provides an alternative for diatom monitoring (
In the scope of developing diatom metabarcoding for Fennoscandia, we need to compare the results of metabarcoding quality assessment to the classical morphological approach in use. Currently, both Finland and Sweden are using diatoms for routine environmental assessment of lakes and streams in the framework of the WFD (
The present study is now comparing the diatom metabarcoding approach for environmental assessment in Fennoscandia with the established morphological approach on a large scale, using 180 samples of benthic biofilm from both lakes and rivers, covering a broad environmental gradient across Finland, Sweden, Norway and Iceland. We aimed to compare the calculation of a diatom index and the subsequent assessment of ecological status and qualitative and quantitative species identification of benthic diatoms.
Previous studies (
The water bodies of Fennoscandia (Finland, Sweden and Norway) are different from the ones of France and Central Europe. Fennoscandia freshwater have, on average, lower pH values, higher amount of humic substances and lower nutrient concentrations than freshwaters in Central and Southern Europe (
Another challenge that could affect the results given by molecular analysis is the choice of the DNA marker. The two main markers, currently used for diatoms in Europe, are the 18S-V4 region (SSU rRNA) and rbcL (from the chloroplast genome). The rbcL-region has proved to be more efficient for diatom communities from temperate and tropical rivers of French territories (
Finally, the suitability of the molecular approach strongly depends on the quality of the generated taxonomic assignments (
In the scope of this first application of metabarcoding for environmental assessment of Fennoscandia’s water bodies using diatom communities, we aim to test four hypothesis:
1. Ecological status assessment of Fennoscandia lakes and rivers based on diatom indices will be similar using either metabarcoding or morphological methods for diatom identification.
2. Due to its detection power towards rare and cryptic species, the metabarcoding approach will detect species missed by the traditional approach.
3. However, because of the dependence on reference databases that are potentially incomplete regarding the Fennoscandia diatom flora, the metabarcoding approach will not cover all taxonomic diversity. Some species will be missing and specific genera will be less covered than others.
4. The use of two different DNA markers, rbcL and 18S-V4, will lead to dissimilarities in the ecological assessment caused by differences in the taxonomic coverage of the associated reference databases and by primers specificity.
A total of 180 environmental samples were collected from benthic biofilms in 65 streams and 43 lakes in Fennoscandia (Sweden, Finland, Norway) and two sites in Iceland (Fig.
Water chemistry characteristics of the 110 sites included in this study.
Mean | Range | ||
---|---|---|---|
Alkalinity (Alk) | mEq/l | 0.543 | 0.01–6.03 |
Conductivity (Kond) | mS/m | 12 | 0.5–131 |
pH | 6.7 | 4.6–8.6 | |
Total organic Carbon (TOC) | mg/l | 12 | 0.9–35 |
Total Nitrogen (TotN) | µg/l | 777 | 7–3801 |
Total Phosphorus (TotP) | µg/l | 39 | 0.2–433 |
All samples were collected in autumn from submerged hard substrate following the European standard for diatom sampling (EN 13946:2014,
Preparation, identification and counting for the morphological diatom analysis were performed using European and Swedish standards (SS-EN 13946:2014; SS-EN 14407:2014; (
Water samples from Fennoscandia can have high humic acid and TOC concentrations, so the DNA was extracted from 4–8 ml of sample using the NucleoSpin Soil Kit (Macherey-Nagel), following the recommendations of the manufacturer with one modification: the centrifugation time for washing the silica membrane was changed from 30 seconds to 1 minute for each of the 4 washing steps. DNA quality of the samples was assessed by spectrophotometry using the 260/280 nm ratio on the NanoDrop ND-1000 (Thermo Fisher Scientific) in order to estimate the water dilution factor needed to achieve 25 ngDA/μl for PCR. For the PCR amplification, we targeted two different markers: a 312 bp barcode on the rbcL plastid gene using the modified Diat_rbcL_708F and R3 primer pair, with increased degeneracy to match a broader diversity of diatoms (
To achieve sufficient DNA concentration, each sample for PCR amplification was done in triplicate. Each PCR mix was composed by 7.5 μl of DNA extract (for the rbcL marker) or 2 μl of DNA extract (for the 18S-V4 marker), 2.5 μl of 10× buffer, 2 μl of dNTP (2.5 mM), 1.25 μl of the respective forward primer mix (10 pmol/μl) and 1.25 μl of the respective reverse primer mix (10 pmol/μl), 1.25 μl of BSA (10 mg/ml), 0.15 μl of TakaraLa Taq polymerase and completed with molecular biology grade water. For the rbcL marker, the PCR reactions conditions were 5 min at 95 °C for initial denaturation, followed by 35 cycles of denaturation at 95 °C for 1 min, annealing at 54 °C for 1 min and extension at 72 °C for 1 min. For the 18S-V4 marker, the PCR reactions conditions were 2 min at 94 °C for initial denaturation, followed by 35 cycles of denaturation at 94 °C for 45 sec, annealing at 50 °C for 45 sec and extension at 72 °C for 1 min. The product of the PCR amplification was tested by electrophoresis on 1.5% agarose gel before further purification of the DNA.
The triplicates of PCR products of each sample were pooled together in 1 DNA LoBind 1.5 ml tubes (Eppendorf) and purified with Agencourt AMPure beads (Beckman Coulter) following the manufacturer’s instruction but with an adjusted volume ratio of 1.5× beads/DNA. Repair of amplicons fragments, ligation of tags to amplicons and library preparation were done using the NEBNext FastDNA Library Prep set for Ion Torrent (Biolabs) as described in (
The sequencing platform provides fastq file for each sample after demultiplexing and removing the tags sequences. The first filtering step of the sequences excluded too short and too long reads, so that only reads between 300–315 bp were left for the rbcL marker and between 320–340 bp for the 18S-V4 marker.
Diatoms molecular inventories were obtained with the R-syst::diatom database (
In this study, to compare valve counts and reads number, we used two comparable abundance thresholds following
We used the diatoms’ inventories produced by morphological and molecular methods to assess the ecological status class of our studied lakes and streams, as required by the WFD, by calculating the intercalibrated diatoms index IPS (
To test if the metabarcoding method returns comparable results regarding ecological status class when using benthic diatoms communities, we tested first if the ecological status classes were significantly different between the two identification methods with a Friedman’s test (
To understand the causes for eventual deviations between the morphological and metabarcoding methods and to focus future development studies, we used multiple methods. We began by analysing which environmental variables (amongst Alk, TOC, TotP, TotN, Kond and pH, cf. Table
MOL | Species found by molecular method only | |
H | Species not in the Fennoscandia taxa list | |
MOR | Species found with morphological method only | |
ND | Species represented in the DNA database but no identification | |
NR | Species not in the respective DNA database | |
AB1 | Species found with both techniques but with higher abundance in morphological inventory | |
AB2 | Species found with both techniques but with higher abundance in molecular inventory | |
GM | Species found by both techniques with the same abundance | |
G | Taxonomic identification stopped at the genus level |
Additionally, we carried out a SIMPER analysis (
Finally, each analysis mentioned previously was run on two datasets (one with the rbcL marker results and one for the 18S-v4 marker results) in order to compare the effect of the DNA marker choice. To evaluate the comparability with the established environmental assessment, we compared the ecological status class derived from the different markers directly and also analysed the extension and the impact of the lacks in the respective databases on IPS calculation and species assignment.
We performed the statistical analyses using the R 3.3.1 software (
The IPS index calculations done on morphological inventories, ranged from 12 to 20. The calculations done on the molecular inventories ranged from 7.9 to 20 and from 7.8 to 20 for the 18S and rbcL markers, respectively.
We found that, even if both the established and the molecular method indicated the same trend regarding the assessment of water quality (Fig.
IPS scores correlation. The two axes show the IPS scores values of the samples assessed by the molecular (y-axis) or morphological method (x-axis). Increasing IPS scores values show increasingly good ecological status (IPS ≥ 17.5 = high, IPS ≤ 8 = bad).
Percentages of overestimation and underestimation of ecological status class in the samples.
Overestimate | Exact | Underestimate | ||||
---|---|---|---|---|---|---|
Magnitude of quality class alteration | 1 | 1 | 2 | 3 | 4 | |
rbcL | 6% | 38% | 45% | 8% | 1% | 2% |
18S | 11% | 49% | 34% | 5% | 1% | 0% |
Overall, 56% of the rbcL samples and 40% of the 18S samples were associated with an ecological status lower than the one found using morphological assessment (Table
In general, IPS scores calculated on taxa lists generated from molecular markers were correlated with those calculated with the morphological method (18S marker: r2 = 0.29, p < 0.001 and rbcL marker: r2= 0.38, p < 0.001, R 3.3.2 package pls) (Fig.
The PLS regression showed that TotP (p < 0.01) and conductivity (p < 0.01) were significant predictors of the deviation of IPS scores from morphological to 18S communities (Table
Estimates and p-values of the PLS regression on environmental variables and the deviations (Δ) of IPS scores between morphological and molecular assessments.
Δ IPS scores | Δ IPS scores | |||
18S marker/morphology | rbcL marker/morphology | |||
Estimate | p-value | Estimate | p-value | |
(Intercept) | -0.9338 | 0.66 | 2.14 | 0.25 |
Conductivity | -0.0339 | p < 0.01 | -0.0225 | p < 0.05 |
pH | 0.4339 | 0.14 | 0.0028 | 0.99 |
TOC | -0.0064 | 0.85 | 0.0077 | 0.79 |
TotP | -0.0137 | p < 0.01 | -0.0063 | 0.17 |
TotN | 0.0004 | 0.23 | 0.0003 | 0.28 |
In total, the morphological analysis identified 585 species within 87 genera across all 4 Fennoscandia countries included in the analyses. Species richness per sample varied from 4 to 103 species, being lowest in the Norwegian samples and highest in Swedish samples. The average number of species per sample was 62 in Swedish sites, 63 in Finnish sites and 49 and 21 in Icelandic and Norwegian sites, respectively. The dominant genera across all samples were Achnanthidium, Eunotia and Fragilaria and the dominant species were Achnanthidium minutissimum, Tabellaria flocculosa, Fragilaria gracilis and Eunotia incisa.
The Shannon scores of the morphological and molecular taxa inventories were significantly different (Student’s paired-sample t-test, p < 0.001 both for the use of 18S and rbcL). As expected, the two references databases are lacking a significant amount of species used in Fennoscandia water quality assessment, with only 15.4% of all Fennoscandian taxa represented in the 18S database and 17.8% in the rbcL database. Many genera had only few species represented in the reference databases, especially Achnanthidium, Eunotia, Gomphonema, Navicula and Nitzschia. However, regarding our own study, actually 70% of the species found by the morphological method were represented in the reference databases.
When calculating the probability for a correct identification, including also the comparison of abundance, on average, 5% of all taxa showed a good match between the molecular and morphological techniques (Fig.
Average probability, per analysis, of a correct match or mismatch for a species identification and abundance assessment when comparing molecular and morphological methods. The code G represents a reference barcode stopping at the genus level; the yellow portion represents a species (H) or a genus (G-H) not expected in Fennoscandia; the orange portion represents the case when no DNA was detected for a species (ND) or a genus (G-ND) despite having a reference barcode in the database; the red portion represents the case when the database lacked a reference barcode for a species (NR) or a genus (G-NR). The green portion represents a good match both in presence and abundance of species, between the morphological and molecular assessments.
Another, less common, mismatch was an abundance discrepancy (AB1 and AB2). This mismatch happens when a species is found by both techniques but in significantly different abundances. In the case of a higher abundance in morphological than in molecular species lists (AB1), 44% of the concerned species were found to be causing the error with both markers. Similarly, 25% of the species, found in higher abundance in the molecular species list (AB2), were common to the two markers. The abundance mismatch thus seems to be linked to specific taxa (Ex: Achnanthidium minutissimum).
We also found that the hypothesis of a better taxonomic coverage achieved with molecular technique is supported by a significant number of species, most of them rare species, detected only when using the metabarcoding technique (MOL mismatch): 56 species and 8 genera when using the 18S marker, representing 27% of our diatoms’ communities and 122 species and 7 genera with the rbcL marker which represented 38% of our diatoms’ communities. Furthermore, some species, detected only by the molecular technique, were not included in the taxonomy used for morphological identification in Fennoscandia (H, in yellow). More precisely, 8% of the identifications for the rbcL marker (52 taxa amongst 34 genera) and 6% for the 18S marker (58 taxa amongst 45 genera) were not included.
To track the origin (link to specific sites and ecology) of the observed discrepancies in the species lists, we used Canonical Correspondence Analysis (CCA) on the occurrences across the 180 samples, for each mismatch code against our five environmental parameters. The CCA analysis showed that low pH and high TOC explained most of the variability observed when a species is not identified with the molecular technique due to a lack in the DNA reference database (NR), in agreement with the fact that many of the species lacking from the databases are acidophilic diatoms. None of the tested environmental variables explained the occurrence of abundances mismatch (AB1 or AB2) with either of the markers. The occurrence of a good match between molecular and morphological methods (GM code) was correlated to a high pH. We found no differences between the two studied DNA markers regarding how environmental variables could explain the different types of matches and mismatches between molecular and morphological species lists.
The SIMPER analysis highlighted the species that are most likely to contribute to the IPS deviation. When using the 18S or rbcL marker, Achnanthidium minutissimum, Eolimna minima, Amphora pediculus, Rhoicosphenia abbreviata, Nitzschia dissipata and Eunotia incisa were the main species contributing to an overestimation of the IPS values with molecular assessment compared to the morphological assessment. In the case of underestimation of the IPS values with molecular assessment, the main contributing species were Achnanthidium minutissimum, Tabellaria flocculosa, Fragilaria gracilis, Aulacoseira ambigua, Cocconeis placentula, Staurosira pinnata and Fragilaria capucina for both markers, as well as Eunotia incisa for the rbcL and Eunotia minor for the 18S marker.
When looking back at the mismatch codes, the majority of these species were represented in both reference databases, but showed significant discrepancy in their relative abundance when assessed by morphological or by molecular techniques (AB1 and AB2). Achnanthidium minutissimum was found occurring with a higher abundance with the morphological assessments when using either markers but also, in some cases, found by molecular assessment only. Fragilaria gracilis was found with higher abundance with the morphological method than when using metabarcoding with the rbcL marker. With the 18S marker, it was only identified below the threshold we used for our analysis (< 10 reads). Amphora pediculus was found with a higher abundance in the morphological method than by metabarcoding with the rbcL marker but with higher abundance in metabarcoding when using the 18S marker. Tabellaria flocculosa was always found with higher abundance with the morphological method when using the rbcL marker and was not found at all by the molecular technique when using the 18S marker (despite being represented in the reference database). Only Eunotia incisa, amongst the species most important for the IPS deviations, was actually missing from the reference databases, the reason why it could not be found by the molecular method.
The gaps in the two reference databases were overlapping: out of 416 species and 409 species missing across 74 genera for the 18S and the rbcL databases, respectively, 388 species were missing in both databases (across 69 genera). The IPS scores obtained with the two markers were correlated (r2 = 0.30, p < 0.001, Fig.
Correlation between the IPS scores obtained with the 18S and the rbcL markers. The boundaries for the ecological status classes defined by the IPS are indicated by the coloured squares (red=very bad, orange=bad, yellow=moderate, green=good, blue=very good). Red dots highlight “very bad” ecological status samples when using the rbcL marker and red triangles when using the 18S marker. Orange dots and triangles highlight “bad” ecological status samples using the rbcL and 18S markers, respectively.
Contrary to our first hypothesis, we found significant differences between the ecological status class generated from the morphological and the molecular assessments, rejecting our first hypothesis. We could confirm our second hypothesis as we found some taxa with the metabarcoding approach which were not detected by the morphological assessment. On the other hand, there were even more taxa which were not found by the metabarcoding approach, even if the morphological method detected them, which confirmed our third hypothesis. Finally, we found differences between the two markers used leading to discrepancies in the ecological assessment and confirming our fourth hypothesis.
The linear relationship of the ecological assessment results of both methods found in our study confirms that metabarcoding has the potential to be used for biomonitoring , as previously shown by other studies (
The ecological status class boundary between “good” and “moderate” ecological status is especially important for decision-makers, since the WFD defines that water bodies below this “good” ecological status are in need of remediation. Consequently, the discrepancies in ecological status class are the major concern when applying metabarcoding for monitoring purposes. The IPS index and, in turn, ecological status classes, are based on the presence and abundance of species of the diatom community.
In this study, we found evidence of both differences in species presence and in species abundances in the species lists derived from the two methods, as well their impact on the IPS index calculations.
Even with the current effort to complete and curate the diatom reference databases, many species are still lacking barcode information.
Incorrect identification of the reference barcodes can, of course, lead to important discrepancies, hence the importance of a continuous database verification and curation. However, even with the benefit of being constantly updated and curated, some taxonomic discrepancies were also detected when we used using Rsyst::diatom. Some taxonomic names differed from the ones used for the same taxon in morphological assessment leading to an artificial mismatch between morphological and metabarcoding species inventories. Sometimes, taxonomic information came on different taxonomic levels which required the merging of taxa for comparison (e.g. Ulnaria ulna var. acus and Ulnaria acus). The Rsyst::diatom version v5 for the rbcL barcode also included some sequences without a reliable taxonomic name assigned (e.g. “Nitzschia aff. dissipata”), although those mainly concerned reference barcodes included in Rsyst from the NCBI nucleotide database. Sequences assigned to such reference barcodes were removed from our dataset to avoid incorrect identification. Completion and curation efforts of the rbcL marker database is now being undertaken at European level and the latest Rsyst::diatom version v7 (in February 2018, renamed “Diat.barcode”) was released with most of those taxonomic discrepancies removed. On the other hand, the curation of the Rsyst::diatom database for the 18S-V4 barcode is still only partial.
Primer bias is often found to be a major source of variation (
Another well-known limitation in using metabarcoding for ecological assessment is the clustering method used before the taxonomic assignment, often leading to massive loss of genetic information (
Finally, the limitation of taxonomic resolution (when identification at species level was not possible) can also lower the quality of the ecological assessment using the IPS index because the genus level includes the pooling of closely related species, some of which can exhibit very different ecological preferences (
Naturally, taxonomic discrepancies can also arise from the morphological assessment, with the possible misidentification of small forms, as well as omission of rare species. For example, Entomoneis sp.’s silica skeleton is easily dissolved in the routine process of morphological diatom identification using oxidised samples. This omission of species in morphological identification creates species list discrepancies when comparing with the metabarcoding method. Moreover, a higher number of identified taxa with metabarcoding is expected because of the high number of sequences taken into account compared to only a 400 valves count with light microscopy (
We found that, even if abundance discrepancies occurred less often than presence/absence disparities, they affect the ecological assessment much more than any other type of mismatch. Especially when using the rbcL marker, the species mainly responsible for overestimation and underestimation of the IPS scores were found with significantly higher abundance in one or the other type of assessment (e.g. the species Achnanthidium minutissimum and Tabellaria flocculosa).
One explanation of a mismatch in relative abundance might be the known problem that the number of generated sequences by HTS does not directly correspond to the number of specimens or biomass (
This specific limitation of the metabarcoding method has few known solutions. However, the SIMPER analysis which we performed to assess which species accounted for most of the observed IPS scores deviations, highlighted several problematic species such as Achnanthidium minutissimum, Cyclotella meneghiniana, Nitzschia palea and Ulnaria ulna. These species are known for significant abundance discrepancies when assessed by morphological or metabarcoding methods: they were also found to be either under-represented or over-represented in the study by
Another source of abundance discrepancy is the possible assignation to a similar reference barcode when the correct taxon is not represented in the database. As mentioned previously, closely related species can have different ecological preferences, such as shown for Halamphora veneta and H. oligotraphenta by
Additionally, the cryptic diversity can create abundance disparities coupled with presence/absence discrepancies: where limited morphological identification under a light-microscope may result in assignment to a single taxon, metabarcoding works at a finer taxonomic level and can split the specimen into several taxa. In that case, we obtain a higher richness of species, at lower abundance, with the molecular technique. A similar problem arises if one of the method’s identification stops at genus level, when the other method splits the identification into several species. With the morphological assessment under a light-microscope, a genus level taxonomic identification may result from lack of taxonomic expertise or too small specimens. When assessed with the metabarcoding method, a limited taxonomic level may be due to a lack of higher level reference barcodes or to low primer specificity. The former case was more represented in our dataset but still rarely occurring.
The DNA barcodes rbcL and 18S-V4 were chosen in this study because of their power to discriminate diatom communities, covering the three major diatom divisions and for their balance between variability and conservation of the primer binding sites (
When looking at the species lists, the Shannon scores show greater deviations from the one calculated on morphological communities when using the 18S marker than when using the rbcL marker. The presence of green algae barcodes in the 18S dataset, which are completely absent from the rbcL dataset, is most likely responsible for that trend and a greater difference was found between the Shannon scores of the two markers in the molecular analysis, rather than between the two methods. The rbcL marker also had a better proportion of “good match” between the species lists generated by the morphological and by the molecular methods.
The inverse trend is observed for the ecological assessment: even though the results were very similar between the two markers and, in both cases, significantly correlated to the morphological method, the 18S marker achieves more exact ecological status classes than the rbcL marker, as well as less underestimation. Furthermore, a greater deviation was found between the IPS scores calculated with the rbcL marker and the morphological communities than between the IPS scores calculated on the 18S marker and morphological communities. Additionally, the IPS scores generated by the two markers were significantly different but less than when compared to the morphological assessment, highlighting that both markers produce more similar results than expected.
As mentioned before, the abundance discrepancies were present when using the 18S marker but less important than with the rbcL marker. Indeed, the majority of the species strongly contributing to the IPS deviations with the 18S were actually reflecting presence/absence discrepancies rather than abundances. Contrary to the rbcL marker, which exhibits a clear correlation between the species biovolume and the gene copy number (
While this study confirms the strong impact of the reference barcodes available, as much in quantity as in quality and some marker-specific difficulties, we cannot efficiently recommend one or the other marker. The 18S-V4 seems to have promising efficiency in ecological assessment and covered more of the Fennoscandian taxa morphologically identified at the time this study was undertaken, whereas the rbcL marker generated species lists more similar to the ones generated by the morphological approach. Indeed, the rbcL had more good-matches between species lists and better-correlated Shannon scores. However, its abundance discrepancies affected the IPS calculation more than the 18S ones.
Additionally, part of our results has facilitated the recent curation work on the rbcL marker reference barcodes, which greatly improved the quality of the database. No similar curation has yet been done for the 18S marker.
Overall, our findings that the metabarcoding method in general is also suitable for Northern European conditions are promising. However, based on our results, we are convinced that there is a need for further development of this method for the use for environmental assessment in Fennoscandia. The limitations of both techniques are multiple and correlated, making them difficult to isolate and properly quantify. Still, based on our results, we would recommend focusing first on the completion and maintenance of the reference databases, adding important missing species and carefully curating them to remove ambiguous barcodes and widen their use to broader ecosystems. Next, the abundance discrepancies were not the most common error source but clearly the one that mostly affected the ecological assessment. Thus, it would be interesting to adapt the
Additionally, the great diversity of bioinformatics pipelines, currently available for diatoms’ metabarcoding, poses another challenge. Which pipelines are currently being used by the different research groups dealing with diatom metabarcoding development and the way they affect the molecular data and ecological assessment need to be evaluated, perhaps as a first step toward a standardisation of the molecular process.
Finally, the current Fennoscandian way to set an ecological status class is based on the morphological method and the next step should be to focus on the integration of metabarcoding data into current ecological assessment methods, as recommended by
This project was funded by Stiftelsen Oscar och Lili Lamms Minne (http://www.stiftelsenlamm.a.se/) and by The Swedish Agency for Marine and Water management. We would like to thank Jón S. Ólafsson, from the Marine and Freshwater Resarch Institute (Iceland), for the collection of data from Iceland and for his helpful feedback. We also thank the Norwegian Institute for Water Research (NIVA), which provided the water chemistry data for all Norwegian samples. We would like to thank Teofana Chonova, Meline Corniquel and Sonia Lacroix who performed the preparation of DNA libraries for all samples at the molecular laboratory of INRA CARRTEL in Thonon (France) and The Plateforme Genome Transcriptome of INRA BIOGECO in Pierroton (France) for the HTS sequencing. Computer time for this study was provided by the computing facilities MCIA (Mésocentre de Calcul Intensif Aquitain) of the Université de Bordeaux and of the Université de Pau et des Pays de l’Adour and using the PlaFRIM experimental testbed, supported by Inria, CNRS (LABRI and IMB), Université de Bordeaux, Bordeaux INP and Conseil Régional d’Aquitaine (see https://www.plafrim.fr/). This work was supported by the European COST-Action DNAqua Net (CA15219).