Research Article |
Corresponding author: Jonathan Warren ( jonathan.warren@environment-agency.gov.uk ) Academic editor: Tiina Laamanen
© 2024 Jonathan Warren, Sean Butler, Nick Evens, Laura Hunt, Martyn Kelly, Lindsay Newbold, Daniel S. Read, Joe D. Taylor, Kerry Walsh.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Warren J, Butler S, Evens N, Hunt L, Kelly M, Newbold L, Read DS, Taylor JD, Walsh K (2024) Influence of storage time on the stability of diatom assemblages using DNA from riverine biofilm samples. Metabarcoding and Metagenomics 8: e129227. https://doi.org/10.3897/mbmg.8.129227
|
DNA sequencing of diatom assemblages from biofilms has already been used to assess the ecological status of freshwater in the UK. However, recent work using DNA data from these biofilms suggests that alternate metrics that capture the broader taxonomic and functional information to demonstrate importance of microbial biofilms could be useful. Exploring this potential requires large numbers of samples over time and space to be analysed. Sample archives could be used to meet this need, but the compositional stability of microbial communities in stored biofilm samples for more than one year is uncertain.
This study compared changes in diatom assemblage structure using metabarcoding analysis of river biofilm samples before and after storage at -20 °C in an RNAlater-based nucleic acid preservative. We found minimal changes in the diatom assemblages in the samples when stored for up to three years. Slight differences in certain groups observed resulted in four samples changing ecological status. However, the overall differences were not significant across replicates, suggesting any genuine differences in assemblages are likely masked by sub-sampling, PCR, or primer biases. These findings are similar to those observed in other studies looking at variations between analysts and sequencing instruments. This indicates that the diatom assemblages in the archived biofilm samples are stable. This will give greater confidence that archived samples can be used for further research, including exploring broader microbial taxa and their responses to environmental change, potentially leading to the development of reliable microbial metrics for integration into biomonitoring programs.
Biomonitoring, diatom assemblage, metabarcoding, microbiome sample preservation, surveillance
In the United Kingdom, environmental regulators, including the Environment Agency of England, employ various methods using biological indicators to assess the ecological status of lakes, rivers, and estuaries. One method involves the use of diatoms from biofilms in rivers and lakes as proxies for wider phytobenthos (
In addition to the diatom assemblage, biofilms support a diverse microbial community embedded within a slimy matrix, which promotes their growth and survival (
The integration of microbial diversity and functional indicators into biomonitoring for the assessment of anthropogenic pressures has been advocated for many years, and has recently gained momentum (
Currently, data and models are needed to identify reliable microbial bioindicators (
This study aimed to assess the stability of river biofilm samples for up to three years of freezer storage in preservative and used diatom assemblages as a proxy for the integrity of the overall microbial species in the biofilms. We assessed the suitability of reusing existing samples for the generation of large DNA datasets, which could enable the characterisation of the microbial response to environmental change.
To assess the impact of storage time on diatom assemblages, samples collected and analysed in 2019 and 2020 for routine analysis that had been stored concentrated in preservative were re-analysed in December 2022. The reanalysis was performed twice so that differences due to storage could be disentangled from the expected stochastic differences due to sub-sampling.
River biofilm samples collected in 2019 (n=50) and 2020 (n=14) as part of the Environment Agency’s routine monitoring of diatoms in rivers for ecological status assessments were used in this study. Samples were collected between April-November and analysed between September-January of the corresponding year. Samples were selected for reanalysis based on the total number of reads regardless of sequence taxonomic assignment passing quality control for routine analysis (>50,000 reads), and to cover a broad spread of geographic regions across England. A total of 64 frozen biofilm samples were selected for re-analysis in late 2022, following storage in preservative for 2 or 3 years after the initial analysis (Fig.
Sample collection was performed as previously described in
All samples were processed, according to
After storage, the samples were thawed for reanalysis by aliquoting two 0.1 mL sub-samples (RA and RB; Fig.
Experimental design: samples were collected and processed in 2019 and 2020 (OG; n=50,14). Post-storage samples were reanalysed in 2022 on two replicate sub-samples from each archived sample (RA and RB). Note that 2 samples originally taken in 2019 from group RB failed quality control and were removed from the study.
Raw sequence reads were imported into the QIIME2 environment (v2022.8.3) (
ASV abundance and taxonomy data were imported into R (v4.1.2) using qiime2R (0.99.6) and phyloseq (v1.38.0), and all data was visualised using ggplot2 (v3.4.2), ggvenn (v0.1.10), gghighlight (v0.4.0), and ggally (v2.1.2) packages (
The presence and absence of taxa were compared at both genus and species levels and visualised using ggvenn. The differential abundance of taxa was assessed using DESeq2 (v1.34.0), and relevant differences were visualised using ggplot and gghighlight (
The raw sequence files were deposited at the European Nucleotide Archive under the accession number PRJEB76460.
Two sub-samples failed quality control due to poor read depth, causing the rarefaction curves not to plateau, and as a result, all sub-samples for that sample were discarded. In total, 186 sub-samples from 62 samples passed quality control and were used for statistical analysis.
Across all samples, 12.9 million reads passed quality control. The number of reads per sample ranged from 4,980 to 258,229, with a median frequency of 64,151. A total of 5,138 unique ASVs were detected, and all were assumed to be phytobenthic algae; of these, 1,403 could be assigned to the species level, accounting for 59.3% of all reads, and a further 2,358 to the genus level, accounting for 80.8% of all reads in total, making these suitable levels with which to assess taxon detection (Table
Taxonomic Level | Numbers of unique values at rank | Accumulative ASVs at level | Number of reads assigned to level or better | Percentage of total reads |
---|---|---|---|---|
ASVs | 5,138 | NA | 12,905,482 | 100 |
Species | 185 | 1,403 | 7,653,091 | 59.3 |
Genus | 81 | 2,358 | 10,428,673 | 80.8 |
Family | 38 | 2,491 | 10,809,236 | 83.8 |
Order | 21 | 2,563 | 10,902,280 | 84.5 |
Class | 8 | 2,793 | 11,202,988 | 86.8 |
Phylum | 5 | 3,673 | 11,973,295 | 92.8 |
Kingdom | 2 | 3,762 | 12,166,798 | 94.3 |
At the ASV level, differences in diatom assemblages were mostly due to differences in the sample sites (PERMANOVA, R2=0.383, p=0.001). Differences in storage conditions between the original and repeated sub-samples accounted for < 1% of the difference in diatom assemblages and were not statistically significant (PERMANOVA, R2=0.005, p=0.212, full model output in Suppl. material
The abundance of taxa at the genus and species levels was not significantly different between the original and post-storage sub-samples. At a log2fold change no species or genera were detected at a significantly higher abundance post-storage (Fig.
Volcano plot showing the fold change in abundance of the genus and species between the original and RA sample analyses. The horizontal dotted line denotes significance (p <0.05), and the vertical lines denote the magnitude of difference, where the lines are set at the equivalent of a 4-fold difference. Points to the right denote taxa found more abundantly in the original analysis, and points to the left denote taxa more abundant in the repeat analysis. Similar pairwise comparisons were made between all three groups (OG, RA, and RB), which also showed no significant difference (see Suppl. material
Most genera were detected in all replicate groups (84.0%; Fig.
Uniquely detected genera between sample replicates. Numbers in parentheses represent the number of samples in which the genera were detected.
Genus (* indicates non-diatom) | Number of reads across the replicate group | ||
---|---|---|---|
OG | RA | RB | |
Nupela | 0 (0) | 131 (3) | 106 (3) |
Bacillaria | 0 (0) | 98 (1) | 25 (1) |
Stenopterobia | 0 (0) | 0 (0) | 26 (1) |
Chaetoceros | 0 (0) | 0 (0) | 117 (1) |
Gedaniella | 0 (0) | 9 (1) | 0 (0) |
Placoneis | 0 (0) | 3 (1) | 0 (0) |
Quercus* | 0 (0) | 34 (2) | 29 (1) |
Venn diagrams of genera (A) and species (B) common and unique to each sample type. OG is the original, and RA and RB are sub-samples analysed post-storage.
Similarly, at the species level 69.7% of species were detected in all replicate groups. However, 11 species were uniquely detected in the original samples and 24 were detected in the post-storage (RA and/or RB Fig.
TDI values generated from the diatom assemblages representative of the original and post-storage samples were generally similar. The mean difference in TDI before and after storage was 3.18, st. dev=7.33 (OG vs RB: 4.03, st. dev=9.22). The TDI values of all three replicate groups were highly correlated and statistically significant (Pearson’s correlation coefficients ranged between 0.892 and 0.980; p <0.001 for all comparisons) (Fig.
Correlogram of TDI scores showing differences between sample types and Pearson’s correlation between scores. All Pearson correlations were highly significant (p<0.001). Histograms show the distribution of TDI scores across replicates.
When assigning ecological status classifications using TDI scores, 75.8% of samples were assigned to the same class across all replicates (79.0% of samples were the same or one class different). When comparing any two of the replicate groups, the OG and RA groups had the highest agreement, with 90.4% of samples assigned to the same class, whereas RA and RB had the highest agreement at the same class or one class difference at 98.4% (see Table
Matrix showing differences in the number of samples assigned to each ecological status class (bad, poor, moderate, good, high) between replicates. The samples assigned to the same class by two replicate tests are highlighted in green. Samples that were different by one class are highlighted in yellow. Samples in which the difference between replicates is greater than one class are highlighted in bold.
OG | RA | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
B | P | M | G | H | B | P | M | G | H | ||
RA | B | 0 | 0 | 0 | 0 | 0 | |||||
P | 0 | 6 | 1 | 1 | 0 | ||||||
M | 0 | 0 | 9 | 1 | 0 | ||||||
G | 0 | 0 | 1 | 10 | 0 | ||||||
H | 0 | 1 | 1 | 0 | 31 | ||||||
RB | B | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
P | 0 | 4 | 4 | 2 | 0 | 0 | 6 | 3 | 1 | 0 | |
M | 0 | 2 | 5 | 1 | 0 | 0 | 2 | 5 | 1 | 0 | |
G | 0 | 0 | 2 | 8 | 0 | 0 | 0 | 2 | 8 | 0 | |
H | 0 | 1 | 1 | 2 | 31 | 0 | 0 | 0 | 1 | 33 |
When comparing the TDI and the derived ecological status classifications, there were four clear outliers: S09, S43, S51, and S55. In one instance, RB (S09) was an outlier, and in the other three instances, the OG replicate was an outlier.
The relative abundance of diatoms between outlier sample replicates was compared using bar plots at the order level (Suppl. material
This study building on the work of
The results of this study show that the storage of biofilm samples for up to 3 years in preservative frozen at -20 °C resulted in no significant difference in diatom assemblage diversity at the ASV level (PERMANOVA, R2= 0.005, p=0.212), differences in 2 and 3 years of storage time were also compared and were equally insignificant (PERMANOVA, R2= 0.011, p=0.216). Similar findings were found by
Other studies have compared other microbial groups and storage conditions in similarly dense microbial sample types, reporting varying differences in the impact on the observed community.
In the present study, when comparing individual taxonomic groups across replicate groups, there were no significant differences. At the species and genus levels, there were no statistically significant differences in the relative abundance. Similarly, although there was uneven detection of some taxa at the species and genus levels between replicates, these were only observed at low levels and in a few samples. No other similar studies compared detection but
When comparing ecological status metrics, this study found highly significant and strong correlations in TDI values between replicates of the same original sample (Pearson’s correlation coefficients ranged from 0.892 to 0.980 with all comparisons p<0.0001) which is similar to the trends observed in specific pollution-sensitivity index (SPI) values by
We speculate that the cause of these insignificant but observed differences in sub-sample assemblages and at the taxon levels are due to differences caused by sub-sampling of the original biofilm samples and/or stochastic variation in the communities exacerbated by PCR amplification. Subtle differences caused by sample and PCR variations have been widely reported in the literature and are common caveats of routine monitoring data (
Overall, our observed (insignificant) findings suggest that any differences in diatom assemblages are likely masked by variations in the assemblages due to sub-sampling, inter-analyst, and inter-instrument biases, all of which existed between the original and replicate analyses. As a result, diatom assemblages from biofilm samples stored frozen in nucleic acid preservative were not affected by storage for up to three years and are suitable for use in further research. We do, however, suggest caution when extrapolating the results to a wider microbial community because research on the stability of bacterial communities by 16S metabarcoding is contradictory and difficult to compare to the samples used in this study due to differences in sample type and storage conditions (
This study has further evidenced RNAlater-based preservation of freshwater biofilms, as an alternative to ethanol, one of the standard recommended methods (
This study determined the stability of diatom assemblages in biofilm samples analysed before and after storage in a preservative for up to three years using metabarcoding. Differences in diatom assemblages in samples before and after storage were minimal and likely due to bias in sub-sampling of the original samples, as taxa varied equally before and after storage, and (insignificant) differences in beta-diversity were similar to those previously observed when assessing between-analyst and between-instrument variation. This suggests that the diatom assemblages are well preserved within biofilm samples up to 3 years old if stored as pellets in a preservative at -20 °C.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This work was funded by the Environment Agency under the research project SC220036. The views expressed in this paper are the authors’ and do not necessarily represent those of the Environment Agency. DSR was supported by NERC grant NE/X015947/1. JDT was supported by NERC grant NE/X012204/1.
Jonathan Warren: Conceptualization, methodology, formal analysis, writing - original draft, visualization, funding acquisition; Kerry Walsh: Conceptualization, writing - original draft, funding acquisition; Laura Hunt: Writing - original draft/ review and editing; Sean Butler: Investigation, resources, writing – review and editing; Nick Evens: Resources, writing – review and editing; Joe Taylor: Formal analysis, writing – review and editing; Lindsay Newbold: Formal analysis, writing – review and editing; Dan Read: Writing – review and editing; Martyn Kelly: Writing – review and editing.
Jonathan Warren https://orcid.org/0000-0003-3381-3852
Sean Butler https://orcid.org/0009-0003-4484-5339
Laura Hunt https://orcid.org/0000-0002-4600-5689
Lindsay Newbold https://orcid.org/0000-0001-8895-1406
Daniel S. Read https://orcid.org/0000-0001-8546-5154
Joe D. Taylor https://orcid.org/0000-0003-0095-0869
Kerry Walsh https://orcid.org/0000-0001-8619-8895
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Extended methods
Data type: pdf
PERMANOVA output for ‘site’ and ‘frozen’
Data type: pdf
Explanation note: Variable ‘frozen’ indicated whether the sample was analysed as part of the original analysis or the repeat. Variable ‘site’ is which original sample and sampling site each replicate is from and is included in the models to account for the expected variation between different locations.
Comparison of differential log2fold abundance between each replicate group
Data type: pdf
Distribution of uniquely detected species between sample replicates
Data type: pdf
Explanation note: Numbers in parentheses represent number of samples where genera were detected.
Taxonomic bar plots of diatom taxa at the order level of the 4 outlier TDI samples 09, 43, 51, and 55
Data type: pdf
Taxonomic bar plots of diatom taxa at the order level of all samples
Data type: pdf
Merged data and matadata
Data type: xlsx