Potential for cross-contamination of diatom DNA samples when using toothbrushes

The use of toothbrushes and similar devices for sampling diatoms from hard surfaces is a well-established approach. Toothbrushes are routinely cleaned and reused when sampling for analysis by light microscopy. This paper looks at the scale of contamination encountered when this technique is used to sample diatoms for metabarcoding analyses, as well as at the scale of contamination to be expected if stream, rather than distilled water, is used to wash diatoms from stones. Although some contamination attributable to toothbrushes was detected, read numbers were low and had no effect on index calculation or ecological status estimates. However, if the primary focus of a study is to thoroughly document diversity in a sample, then even this small level of contamination may be unacceptable and more stringent measures may be required.


Introduction
Although DNA-based technologies have potential to improve ecological assessment (Hänfling et al. 2016;Bista et al. 2017Bista et al. , 2018Blackman 2017;Valentin et al. 2019;Kelly et al. 2020, Di Muri et al. 2020, regulators are still concerned about these technologies, particularly when they are replacing established methods. Challenges associated with metabarcoding methods include differences in the mode of quantification compared to traditional approaches Vasselon et al. 2018), in optimising bioinformatics pipelines (Baillet et al. 2020), in curating barcode libraries (Rimet et al. , 2021 and in the lack of robust models to quantify uncertainties in DNA-based workflows. These technical challenges are being addressed by the research community, but some of the more subtle challenges and bottlenecks to implementing these technologies are less widely-recognised. These include logistical challenges associated with upscaling these new methods into large scale monitoring programmes and how they can be integrated into existing organisation infrastructures. Whilst there is a need to strive for the most robust, scientifically credible method, this must be balanced against a need from the user community for pragmatic, sustainable and cost-effective methods. If we fail to recognise this, it will impede uptake of methods by the end-user communities. The standard method for sampling diatoms for ecological status and water quality assessment in Europe and beyond is to brush or scrape the upper surface of a hard substratum (Kelly et al. 1998, CEN 2014Charles et al. 2020). Most workers use a toothbrush for this purpose, often reusing the same toothbrush at several sites and using stream water to rinse the biofilm off the stones and into containers. Kelly and Zgrundo (2013) showed that the scale of contamination associated with this approach was low and was unlikely to have a significant effect on ecological status assessments when diatoms were analysed by light microscopy. By contrast, sampling for molecular ecology studies typically uses disposable, sterile equipment (see, for example, Bista et al. 2017). However, such approaches generate large quantities of non-biodegradable waste or require the transportation and use of environmentally unfriendly chemicals (e.g. bleach) for sterilisation in the field, as well as requiring samplers to carry pure water in the field. The issue of non-biodegradable plastic waste and disposal of used sterilising solutions becomes a more significant issue when sampling is scaled up for nationwide assessment programmes. Asking whether such strict attention to contamination is as relevant when sampling phytobenthos communities as when sampling water for eDNA, therefore, has a number of potential benefits, including lower cost and reduced waste.
This study investigates the scale of contamination introduced by the reuse of toothbrushes at different sites and on the effect of using stream water, rather than pure water, to rinse biofilms from surfaces. It has the same basic design as the study by Kelly and Zgrundo (2013) on contamination picked up during light microscopic studies, with two locations with very different ecological profiles being sampled with toothbrushes, some of which were previously unused and some of which had been used already at the other site. As the two sites have very few diatom species in common, this design should make it easy to pick out any contamination.

Study design
Details of the sites are given in Table 1. Both are located in southern England: the River Nadder is a chalk stream in Wiltshire which is classified as having moderate ecological status, with macrophytes and phosphorus driving the classification (fish are at good status, invertebrates are high status and other chemical parameters are all high status). Ober Water, in contrast, is a stream in the New Forest (Hampshire) with softer (but still circumneutral) water and which is classified as being at good ecological status, with macrophytes and phytobenthos and all chemical parameters at high status. Further information on both sites can be found at: https://environment.data.gov.uk/catchment-planning/, https://environment.data.gov.uk/ecology-fish/ and https:// environment.data.gov.uk/water-quality/view/landing. Five samples were collected at each site for each of the following three treatments: • brand new toothbrushes using distilled water to wash the biofilm into sample bottles; • brand new toothbrushes using stream water from site to wash the biofilm into sample bottles; and • toothbrushes previously used at the other site, along with stream water from the sampling site In addition, three control samples (5 ml each) were collected: • one using just distilled water; and, • one each using river water from the two sites.

Sampling and analysis of benthic diatoms
Sampling involved brushing the upper surface of five cobble-sized stones and collecting the suspension in a tray. Using a new, but non-sterile Pasteur pipette, 5 ml of the suspension of biofilm and water was transferred to a sterile 15 ml centrifuge tube containing 5 ml nucleic acid preservative, consisting of 3.5 M ammonium sulphate, 17 mM sodium citrate and 13 mM ethylenediaminetetraacetic acid (EDTA). Samples were then transferred to the laboratory in a cool box and frozen at -30 °C prior to DNA extraction. The methods used for DNA extraction, amplification and analysis followed methods described in Kelly et al. (2020).

Data analysis
Non-metric multidimensional scaling (NMDS: McCune and Grace 2002) was used to investigate the structure of metabarcoding datasets using the vegan package in the R software package (R Core Team 2017; Oksanen et al. 2007) for multivariate analyses.
The Trophic Diatom Index (TDI5NGS) was calculated following Kelly et al. (2020) using the R package DAR-LEQ3 available at https://github.com/nsj3/darleq3. When evaluating the scale of variation in TDI5NGS, we used data from the Environment Agency (2018) which showed the average level of variation measured at a site on a single day. Kruskal-Wallis tests were implemented using base functions in R.
Non-metric multidimensional scaling (NMDS) of the dataset yielded an ordination with very low stress (0.0684), with a clear separation between the two sites along axis 1 (Figure 1). However, some samples from   Ober Water which were scrubbed using toothbrushes previously used at the Nadder site ("Ober used") had lower scores on the axis 1 than those scrubbed with clean toothbrushes ("Ober new"), suggesting some contamination from the River Nadder. By contrast, there was very little difference in the positions of samples collected using unused toothbrushes ("Nadder new") and toothbrushes previously used to sample the Ober ("Nadder used"). There was also very little difference in the position of samples rinsed with distilled water ("Nadder dw", "Ober dw") rather than river water ("Nadder clean", "Ober clean"). The river water control samples ("Nadder control" and "Ober control" in Figure 1) are distinct both from each other and from the biofilm samples along axis 2.
If there is a significant amount of contamination, then taxa that are abundant at one site should be present in raised numbers in samples collected using dirty equipment at the other site, but rare in the others. Although significant effects were observed for several taxa, the scale of the effect was generally small, particularly for samples from the River Nadder where the increased representation in samples collected with contaminated toothbrushes exceeded 1% only for Achnanthidium minutissimum (Fig-ure 2). The scale of the increase was greater in Ober Water samples, with a median increase for Melosira varians of about 2%, but with one replicate having an increase >10% relative to the sample collected with clean equipment.
A similar approach was adopted to look at possible contamination from stream water. The relative abundance of the most abundant taxa in the stream water sample from each stream was compared with the samples washed with stream and distilled water from that location.
In the case of the River Nadder, the stream water was dominated by planktonic diatoms (65% of total reads). Three of these -Stephanodiscus hantzschii, Cyclostephanos invisitatus and Discotella sp. -were all elevated with respect to the distilled water sample (Figure 3), but only in relatively small numbers (that is, still < 1% in the worst case). Differences between treatments for S. hantzschii and C. invisitatus were both significant (Kruskal-Wallis tests: p = 0.009 and 0.016, respectively).
There were almost no planktonic diatoms in the Ober Water stream water; however, the composition of the sample was quite different to that of biofilm samples, with a greater proportion of nutrient-rich taxa. There was, despite this, no significant increase in proportions of these  taxa in the biofilm when stream water was used to wash the stones (Figure 4, Kruskal-Wallis tests: all P > 0.3).
There is no significant difference between treatments when TDI5NGS scores are calculated on samples from the River Nadder (Kruskal-Wallis test: p = 0.14). By contrast (and counterintuitively), TDI5NGS is significantly lower in samples from Ober Water which were removed with toothbrushes formerly used in the more enriched River Nadder (Figure 5: Kruskal-Wallis test: p = 0.021). Despite this, the scale of variation observed in each stream still lies within the range of variation expected to occur at a site on a single day.

Discussion
The results of this study highlight a potential for toothbrushes to retain traces of the diatom assemblage (and, presumably, other constituents of stream biofilms), even after the routine cleaning procedure (washing bristles vigorously in the stream and rubbing against waders: Kelly et al. 1998). The scale of this contamination is relatively low but is, nonetheless, present. Based on experience at a number of locations, sampling a thick biofilm where there are entangling filamentous algae and then using the same toothbrush at a subsequent site with a very thin biofilm is more likely to lead to problems than the reverse situation. Similarly, given that the TDI is based on a weighted average equation where taxa tolerant to nutrient enrichment have higher scores than those associated with low nutrients (Kelly et al. 2008b;Kelly et al. 2020), sampling a 'clean' site after a visit to a 'polluted' one is more likely to result in problems than the other way around.
Contamination from the stream water used to wash the samples appears to be less of a problem. In the case of the River Nadder, planktonic taxa dominated the suspended diatom assemblage. Planktonic taxa do not contribute to the TDI5NGS score and so should have no effect on the final index value, However, several of these have large cells with multiple chloroplasts and there may be issues when sampling coincides with a plankton bloom (see Vasselon et al. 2018) as large numbers of these may reduce the sequencing depth of the target benthic taxa. The risk is small, but should not be ruled out entirely.
However, our results also show that contamination, both from dirty equipment and upstream sources of DNA, do influence the composition of assemblages and, therefore, it is reasonable to assume that they may affect the final assessment in a few cases. Earlier studies, using morphological identification by light microscopy, had found variation of up to 7 TDI units between replicate samples collected on the same day (Kelly et al. 2008a), which far exceeds the significant differences observed between samples collected with "clean" and "dirty" toothbrushes in this study. Contamination from toothbrushes is, therefore, an additional source of variation that can and should be controlled, rather than a threat to the integrity of existing protocols. With this in mind, a pragmatic and precaution-ary approach for routine monitoring would be to use clean water wherever possible (tap water is used in the UK) along with a clean (but not necessarily new) toothbrush for each sample taken. At the end of each sampling trip, all toothbrushes should be cleaned in bleach or with hydrogen peroxide to avoid contamination on future occasions. A more stringent approach to contamination, however, may be appropriate in the future when data are not processed using the current generation of assessment tools, based on weighted average equations. Where the primary focus is the thorough documentation of diversity at a site, the introduction of even a small number of taxa may give misleading results. In such situations, a more stringent approach to contamination should be followed, with new equipment and distilled water used for each sample.