How can integrated morphotaxonomy- and metabarcoding-based diatom assemblage analyses best contribute to the ecological assessment of streams?

Environmental conditions, such as nutrient concentrations, salinity, elevation etc., shape diatom assemblages of periphytic biofilms. These assemblages respond rapidly to environmental changes, a fact which makes diatoms valuable bioindicators. Hence, freshwater biomonitoring programmes currently use diatom indices (e.g. EU Water Framework Directive WFD). To date, microscopy-based assessments require high taxonomic expertise for diatom identification at the species level. High-throughput technologies now provide cost-effective identification approaches that are promising, complementary or alternative tools for bioassessment. The suitability of the metabarcoding method is evaluated for the first time in the Cyprus streams WFD monitoring network, an eastern Mediterranean country with many endemic species and results are compared to the results acquired from the morphotaxonomic analysis. Morphotaxonomic identification was conducted microscopically, using the most updated taxonomic concepts, literature and online resources. At the same time, DNA metabarcoding involved the use of the rbcL 312 bp barcode, high-throughput sequencing and bioinformatic analysis. The ecological status was calculated using the IPS Index. Results show a positive correlation between morpho-taxonomic and molecular IPS scores. Discrepancies between the two methodologies are related to the limitations of both techniques. This study confirmed that Fistulifera saprophila can have a crucial role in key differences observed, as it negatively influences IPS scores and microscopy methods frequently overlook it. Importantly, gaps in the DNA barcoding reference databases lead to a positive overestimation in IPS scores. Overall, we conclude that DNA metabarcoding offsets the morphotaxonomic methodology for the ecological quality assessment of freshwaters.


Introduction
Freshwater ecosystems are transient environments, featuring hotspots of biodiversity and supporting about 10% of the total known species on the earth (Strayer and Dudgeon 2010). Due to the extensive urbanisation and advancement in human activities in recent decades, these ecosystems are subjected to environmental degradation and impacted by anthropogenic pressures (Hupało et al. 2021). As a result, these freshwater biodiversity hotspots are declining. The Mediterranean region fosters a unique combination of geography, geological history and climate, making it an excellent example of a freshwater biodiversity hotspot. This is estimated to result from a distinctly cool and wet season, followed by a warm and dry season influenced by a sequence of regular and often extreme flooding and drying periods and relatively rare snowfall. Additionally, it is characterised by intense human activities. Mediterranean islands are no exception, being the most threatened ecosystems within the region, having intermittent and harsh-intermittent rivers highly impacted by human activities (Hupało et al. 2021). Although the Mediterranean is one of the most studied and well-known regions globally, little is known about the organisms inhabiting freshwater ecosystems. Studies focus mainly on vertebrates, macroinvertebrate orders and macrophytes of the northern and north-western domain (Figueroa et al. 2012).
Diatoms are unicellular photosynthetic autotrophs (microalgae) inhabiting a huge variety of habitats, such as oceans, rivers, wetlands and even the soil environment. They are the chief primary producers and key contributors in food webs, biogeochemical cycles and carbon fixation (Pérez-Burillo et al. 2020). Interestingly, an estimated 100,000 species of diatoms are believed to exist, with newly-discovered diatom species present in unique areas worldwide (Mann and Vanormelingen 2013). Diatom assemblages have selective growth preferences. They bloom at specific environmental conditions (salinity, temperature etc.) with short-generation periods (Smol and Stoermer 2010). Therefore, they respond rapidly to environmental changes that can influence their habitat, such as nutrient concentration, organic matter and water contamination (Hering et al. 2006).
For several decades, scientists have assessed the effect of pollution on diatom assemblages. Sixty years ago, scientists started assessing the influence of pollution on diatom assemblages and provided occurrence probabilities of microalgal species in water quality classes. During the late 20 th century, scientists managed to incorporate the long-term monitoring of the diatom assemblages as a means of bioassessment (Butcher 1947;Zelinka and Marvan 1961;McCormick and Cairns 1994). In 2000, the Water Framework Directive (WFD) incorporated the concept of reference conditions and the adoption of phytobenthos as one of the obligatory Biological Quality Elements (BQEs) (European Comission 2000) considering that changes in diatom assemblages can be a valuable monitoring and assessment tool. Therefore, diatoms are used by the Member States to meet their legislative obligations towards the WFD (European Comission 2000).
To date, the quality of water-bodies worldwide is assessed by applying different diatom indices (Dalu et al. 2016). Such indices include the Biological Diatom Index (BDI), the Eutrophication Pollution Index (EPI), the Trophic Diatom Index (TDI), the Specific Polluosensitivity Index (IPS), the Sladecek Index (SLA) and the Saprobic Index of Rott (ROT), each of which has different sensitivities and investigate various environmental drives towards pollution (Rimet et al. 2005;Dalu et al. 2016;Pawlowski et al. 2018).
In Cyprus, the IPS Index (CEMAGREF 1982) is used since it was more suitable for the pressures that could affect the water quality in Cyprus' streams (WDD 2008) and the official assessment method's class boundary values, based on the morphotaxonomy. The IPS Index correlates parameters related to organic pollution, ionic strength and eutrophication and, therefore, provides an overall estimate of water quality and is used by several European countries, such as Spain, Belgium, Estonia, Luxembourg, Sweden, Bulgaria, Greece and Portugal (WDD 2014;European Comission 2018;Masouras et al. 2021). It is worth mentioning that most of these indices and assessment concepts were developed elsewhere in Europe. Therefore their transfer to Cyprus and the eastern Mediterranean countries should be done with caution due to the combination of harsh climate and geographical isolation (Cantonati et al. 2020). In 2014, several Mediterranean countries performed an intercallibration exercise to provide a common framework for the successful comparison of diatom assessments of river quality across the Mediterranean European region (Almeida et al. 2014).
Diatom taxonomic classification involves morphological identification under the microscope by categorising the siliceous frustules of diatoms (Pandey et al. 2017). This is a time-consuming procedure that requires an accurate morphological identification at species or sub-species level. Therefore, it requires high taxonomic expertise (Kahlert et al. 2009). At the same time, such practices are subjected to biases and are susceptible to human error. New strategies have been examined to develop high-throughput, cost-effective approaches (Hering et al. 2018;Kelly et al. 2018;. Recently, DNA metabarcoding, combined with high throughput sequencing and bioinformatics, is an alternative approach to morpho-taxonomic identification (Hering et al. 2018;Pawlowski et al. 2018;Rivera et al. 2018;Tapolczai et al. 2019). DNA metabarcoding involves identifying the microbial diversity from environmental samples by analysing relatively conserved small gene sequences (i.e. barcodes).
The start of Cyprus streams' bioassessment using diatoms dates back to 2005(Cantonati et al. 2020) and the Water Development Department (WDD) of Cyprus has incorporated it into routine assessments since 2010. The routine assessment involves the use of the morpho-taxonomic identification method for the calculation of IPS. To date, DNA metabarcoding has been assessed mainly by the west-northern Mediterranean and northern European countries (Vasselon et al. 2017b;Kelly et al. 2018;Bailet et al. 2019;Feio et al. 2020;Pérez-Burillo et al. 2020). Additionally, extensive studies have been performed throughout Mediterranean marine environments regarding diatom biodiversity using DNA metabarcoding (Malviya et al. 2016;Kaleli et al. 2020;Vijver et al. 2020). Therefore, this study compares, for the first time, the diatom diversity and IPS scores obtained using high-throughput DNA sequencing and the morpho-taxonomic identification approach for an eastern Mediterranean country, like Cyprus, with a known population of endemic diatom species (Cantonati et al. 2016(Cantonati et al. , 2018. Here, the study uses the results of the metabarcoding analysis that utilised a small 312 bp fragment of the rbcL, which encodes the Ribulose-1,5-bisphosphate carboxylase/oxygenase. This gene shows alternating highly conserved and polymorphic regions, which are key requirements for a successful genetic identification to the species level (Kermarrec et al. 2013). We compared both methods to: a) assess whether the two methodologies are complementary or they can replace one another and b) test which methodologies are more suitable within Cyprus' routine bioassessment protocol using the IPS Index.

Study area and sampling methodology.
Sampling stations were selected following the annual schedule of the WDD during May and June 2018. Biofilm samples were collected from 32 river sites in 11 catchments. Permanently submerged cobbles making up an area of about 100 cm 2 were randomly selected from the main stream channel in moderate flow conditions and a minimum of 5 cm water depth. The cobbles were gently rubbed with a brush at the parts exposed to the river flow. The dislodged material was preserved in formaldehyde (3%) for morphological or ethanol (70%) for the molecular analysis (CEN 2018). Samples were stored in a cool, dark area until laboratory analysis, as previously described by Pissaridou et al. (2021).

DNA extraction and molecular analysis
DNA extraction was performed using a commercial kit (NucleoSpin Soil Kit, MACHEREY-NAGEL) according to the manufacturer's instructions. Two technical samples were evaluated. Briefly, 2 ml of suspended diatom biofilm were centrifuged for 30 min at 13,000 rpm. The pellet was then lysed and the DNA was purified following the Kit's protocol. The DNA quality was assessed spectrophotometrically (NanoDrop 1000, Thermo Fisher Scientific). The rbcL barcode was amplified by PCR using diatom-specific rbcL primers and the HiFi KAPA Taq Mix (KAPA Biosciences) to a final volume of 25 ml. Samples were then sent to GenoToul Genomics and Transcriptomics platform (GeT-PlaGe, Auzeville, France) for Illumina MiSeq sequencing.

Bioinformatic analysis
Raw Fastq sequences were analysed using Mothur software (version 1.41.3) (Schloss et al. 2009). Sequences were filtered for quality and length (criteria: minimum length 280 bp, one mismatch allowed and homopolymers less than 8 bp), dereplicated and aligned to the Diat. barcode database version 7.1 . Then, chimera detection and removal were performed. Taxonomic affiliation of reads was achieved using the Wang Method at a 75% confidence threshold. Lastly, a distance matrix was created to allow the clustering of reads into operational taxonomical units (OTUs). The clustering method was the Furthest-Neighbour, where sequences with more than or equal to 95% similarity were clustered in one OTU. OTUs singletons were removed. Then all samples were normalised to the lowest read abundance. The method is described in detail in Pissaridou et al. (2021).

Morpho-taxonomic analysis
The diatom biofilm samples underwent processing using the hydrogen peroxide and potassium dichromate method, following the European standard EN 13946 (DIN EN 13946 2014), before permanent slide mounting. Briefly, 5 to 10 ml of diatom samples were centrifuged. Samples were then cleaned by repeated rinsing with 30% hydrogen peroxide (H 2 O 2 ) and hydrochloric acid (HCl) following final decantation with distilled water. Next, samples were air-dried and mounted on microscope glass slides using Naphrax (Brunel Microscopes Ltd). At least 400 valves were identified for each sample and counted under an optical microscope Zeiss Axioskop 2, equipped with phase contrast and a digital camera Axiocam (Carl Zeiss JSC, Milan, Italy). The species identification was performed to the lowest taxonomic level possible (species or intraspecific rank) using various keys, with Cantonati et al. (2017) as the primary reference and recent scientific publications for specific taxa. The results were harmonised according to the "Final harmonisation of diatom names" list (Feio 2011). A complete species list and number of valves per species were created for each sample, along with each species' relative abundance.

Data analysis
Correction factor: The Correction Factor (CF) was applied to account for the rbcL copy number in relation to the biovolume of cells. The application was performed as described by  according to the information supplied in the Diat.barcode database version 7.1 . IPS score calculation:

Statistical analysis:
Pearson's Correlation was performed using GraphPad PRISM version 9 for Windows, GraphPad Software, (La Jolla, California, USA). Pie charts and Venn Diagrams were performed in Excel (Microsoft). Plots were created using R (R Core Team 2020).

DNA metabarcoding diatom identification
The DNA metabarcoding analysis identified a total of four classes, 13 orders, 28 families, 53 genera, 106 species and 136 strains amongst 1,553,538 reads for the 32 stations studied.  observed that the molecular approach results overestimate the presence of high biovolume species, which may have higher rbcL copies per cell. The metabarcoding approach performed in this study was also able to observe such changes in abundance, as expressed as OTU copy numbers (Fig. 1). Consequently, a Correction Factor (CF) was applied to take into account the biovolume of diatom frustules to make DNA metabarcoding counts fit better with those estimated by microscopic counts (Charles et al. 2021). This removed species with high biovolume from the top ten most abundant species, like Ulnaria ulna, Cocconeis euglypta and Cymbella cymbiformis and, instead, intro-duced species like Fistulifera saprophila, Planothidium frequentissimum, Planothidium lanceolatum and Mayamaea permitis with low biovolume (Fig. 1). This correction allows for a better alignment of the diatom assemblages from the molecular approach and subsequently a higher correlation between the two methodologies.
The ten most abundant species of the morpho-taxonomic analysis are represented in Figure 2. Cocconeis euglypta, Achnanthidium jackii and A. minutissimum are the most abundant diatoms. This was also the case in the molecular analysis (Fig. 1), as A. minutissimum was detected second in line. A. jackii was not detected in the molecular data as, in the reference database, it is considered a variant of A. minutissimum and is, therefore, represented by the same barcode ).

Comparison of the morpho-taxonomic and molecular identification of diatoms
To evaluate the performance of the two methods, we compared the species identified in both cases. Figure 3A shows that both methodologies identified 39 common species, corresponding to 18% (Suppl. material 1: Table S1). These 39 species represent a large part of the community in terms of biomass since the 39 species include several of the most frequent and abundant taxa in the Cyprus diatom assemblages. Interestingly, 99 species were identified only by the morpho-taxonomic identification and 67 species only by the molecular identification method (Fig. 3A).
The number of species observed with each technique was used to calculate the IPS index for the ecological status evaluation of each station (CEMAGREF 1982). The IPS scores, obtained with the two methodologies, show significant positive correlation (p < 0.0001, r = 0.6745, Fig. 3B). More than half of the samples shared the same quality class.
At the same time, some discrepancies between the two methodologies were observed. Twelve stations show substantial differences between the two approaches (Fig. 3B,  Fig. 4, Suppl. material 2). The molecular approach calculated lower ecological status values than the morpho-taxonomic approach for six stations (Fig. 3B). Interestingly, four out of six stations have Fistulifera saprophila as the most abundant species. As detected by the molecular analysis, one station has Amphora pediculus as the most abundant with Fistulifera saprophila following and one station has Achnanthidium minutissimum and Ulnaria ulna as the most abundant species.
On the other hand, the molecular approach calculated higher IPS values than the morpho-taxonomic approach for six more stations (Suppl. material 2). These differences could be due to the lack of barcoding data for diatom species at these stations, creating gaps in comparing the two methodologies. The molecular approach identifies a higher abundance of Planothidium victorii, Hantzschia unclassified and Achnanthes unclassified. For the same stations, higher abundance of Achnanthidium jackii, Achnanthidium minutissimum, Cocconeis euglypta, Cyclostephanos cf. invisitatus, Amphora pediculus and Amphora micra have been identified   Table S1 gives the number of valves and number of reads recorded for the 39 common species; B) IPS correlation of the morpho-taxonomic and molecular identification methodologies. Pearson Correlation was performed indicating a significant correlation (r = 0.6745, p < 0.0001. The boundaries of the ecological status classes are marked: red (Bad), orange (Poor), yellow (Moderate), green (Good), blue (High). Dashed lines: Good/moderate boundary, Red dashed circle: morphological classifies as good status, while molecular classifies below good status. Blue cirlce: morphological classifies as below good status, while molecular classifies good status.
using the morpho-taxonomic approach (data not shown) (Pérez-Burillo et al. 2020). As previously noted, Cyclostephanos cf. invisitatus and Amphora micra are diatoms species whose barcode is not yet available in online barcode databases.

Discussion
Thus far, several studies have shown disagreements between molecular and morphological datasets due to the inherent biases of both methods (Keck et al. 2021). The most apparent reason for these discrepancies is the incompleteness of molecular reference databases since barcoding studies focused so far on particular river types and geographical regions, including mostly north-west Mediterranean countries and perennial rivers. (Pawlowski et al. 2018).
Here, as presented by recent studies, the identification of more species by the morphological approach than by DNA metabarcoding, in many cases, does not significantly impact the IPS assignment (Vasselon et al. 2017a; Keck et al. 2018;Rivera et al. 2018;Mortágua et al. 2019). Additionally, we have demonstrated how the CF's application to the molecular approach is important in comparing the two methodologies. The CF accounts for the rbcL copy number concerning the biovolume of cells and, therefore, is considered to obtain DNA sequence proportions closer to cell proportions obtained through microscopy and account for large diatom frustules having several rbcL copies (Vasselon et al. 2019). Both methodologies have their limitations, including the classification biases in species complexes and oversimplification requiring high scientific expertise for morphology-based approaches and the gaps of DNA barcoding data in existing references databases in molecular-based approaches (Jahn et al. 2017;Pinseel et al. 2017). Kelly et al. (2020) suggested that, as microscopy data also have some biases and limitations, it would be better to treat molecular data at their face value to integrate them into indicator metrics. Indeed, the differences in size reflected by the rbcL data (Vasselon et al. 2018) may give a more accurate indication of the relative contribution of each taxon to diatom productivity than the microscopy data, which could lead to novel bioindicator metrics, better reflecting the ecosystem functioning.
In this paper, some discrepancies in IPS scores appear when comparing both approaches. The microscopy can sometimes underestimate small diatom species with low sensitivity values, thereby having a significant negative impact on the IPS scores estimated with DNA metabarcoding (Jahn et al. 2017; Pinseel et al. 2017). Interestingly, literature suggests that small species like Fistulifera saprophila and Mayamaea permitis are often overlooked/ underestimated by the morphological approach (Kahlert et al. 2009;Pérez-Burillo et al. 2020). In this study, only F. saprophila was detected exclusively by the molecular approach, whereas M. permitis was detected by both methods (Suppl. material 1). Both species become prominent in eutrophic and polluted environments and consequently lower the IPS Index scores (Rott et al. 1998;Rimet et al. 2005;Kahlert et al. 2009;Pérez-Burillo et al. 2020). This explains why six of the twelve differing stations have lower IPS scores when estimated through the molecular approach (Suppl. material 2). F. saprophila valves are known to be prone to dissolution during laboratory preparation and this could explain why this species is usually overlooked in the morphological identification procedures (Zgrundo et al. 2013;Cantonati et al. 2017Cantonati et al. , 2018Kelly et al. 2020). Our results agree with the observations of Pérez-Burillo et al. (2020) that Fistulifera saprophila was often the species responsible for sites becoming critical, i.e. having a low IPS score. The same study identified A. minutissimum as the diatom species contributing positively to the IPS scores (Pérez-Burillo et al. 2020).
On the other hand, gaps in the reference databases may be unable to identify some other taxa of low sensitivity value, with a significant positive impact on the IPS scores estimated with DNA metabarcoding Bailet et al. 2020). Regarding the eastern Mediterranean countries, there is a lack of sufficient DNA barcoding studies to date. Most recent DNA barcoding and metabarcoding studies focus on the west-northern Mediterranean and northern European countries (Vasselon et al. 2017b;Kelly et al. 2018;Bailet et al. 2019;Feio et al. 2020;Pérez-Burillo et al. 2020). When considering the eastern Mediterranean region, most studies focus on the marine environment (Malviya et al. 2016;Kaleli et al. 2020;Vijver et al. 2020) and morphological identification of endemic species, such as Navicula veronensis sp. nov. found in low elevation carbonate springs (Cantonati et al. 2016), Ulnaria monodii found in streams with relatively warm waters and Ulnaria acuscypriacus sp. nov. having tolerance to brackish conditions (Cantonati et al. 2018). Therefore, the gaps, observed here in the comparison of the two methodologies, are more or less expected.
Even though the morpho-taxonomic approach is able to identify more species than the DNA metabarcoding method, the DNA metabarcoding approach is still considered an effective method for diatom assemblage identification as it offers high-throughput, cost-effective and accurate analysis provided that the online barcode databases are up to date (Vasselon et al. 2017a;Keck et al. 2018;Rivera et al. 2018;Mortágua et al. 2019).

Conclusions
To conclude, this study has investigated diatom samples from the Cyprus stream monitoring network and the correlation of the two methodologies currently in play for stream-quality bioassessments, based on phytobenthos/diatoms. For the first time, we compared and contrasted DNA metabarcoding to the traditional morphological, ecological assessment for river biomonitoring in Cyprus, a typical eastern Mediterranean country. Our results are in line with what has already been shown in literature. The limitations of both techniques can offset each other nicely when using spatial datasets. Despite the differences between the two methods and the taxonomic inventories they produced, the IPS scores from the molecular data were well correlated with those from morpho-taxonomic data when a CF is applied to bring molecular data in line with the expectations, based on microscopy data. As Kelly et al. (2020) stated: "HTS is not a simple transaction in which LM is replaced by a better technology, but requires a deeper level of institutional transformation than had hitherto been anticipated".
Future work can investigate how the two methodologies could contribute to an integrated assessment of different Mediterranean rivers with different hydrological impacts and pollution (Charles et al. 2021). Additionally, it is worth studying the temporal observation of rivers to examine whether a potential shift to metabarcoding alters perceptions of status at an individual site level, as previously demonstrated by Kelly et al. (2020). Lastly, there is a parallel need of: 1) an enrichment of DNA reference databases for the diatom species that are primarily present and endemic to southern-eastern Mediterranean rivers and 2) development of so-called taxonomy-free approaches to overcome these reference gaps, as recently tested on diatoms (Apothéloz-Perret-Gentil et al. 2020; Tapolczai et al. 2021).