Supplementary material 5 from: Weigand AM, Macher J-N (2018) A DNA metabarcoding protocol for hyporheic freshwater meiofauna: Evaluating highly degenerate COI primers and replication strategy. Metabarcoding and Metagenomics 2: e26869. https://doi.org/10.3897/mbmg.2.26869

The hyporheic zone, i.e. the ecotone between surface water and the groundwater, is a rarely studied freshwater ecosystem. Hyporheic taxa are often meiofaunal (<1 mm) in size and difficult to identify based on morphology. Metabarcoding approaches are promising for the study of these environments and taxa, but it is yet unclear if commonly applied metabarcoding primers and replication strategies can be used. In this study, we took sediment cores from two near natural upstream (NNU) and two ecologically improved downstream (EID) sites in the Boye catchment (Emscher River, Germany), metabarcoding their meiofaunal communities. We evaluated the usability of a commonly used, highly degenerate COI primer pair (BF2/BR2) and tested how sequencing three PCR replicates per sample and removing MOTUs present in only one out of three replicates impacts the inferred community composition. A total of 22,514 MOTUs were detected, of which only 263 were identified as Metazoa. Our results highlight the gaps in reference databases for meiofaunal taxa and the potential problems of using highly degenerate primers for studying samples containing a high number of non-metazoan taxa. Alpha diversity was higher in EID sites and showed higher community similarity when compared to NNU sites. Beta diversity analyses showed that removing MOTUs detected in only one out of three replicates per site greatly increased community similarity in samples. Sequencing three sample replicates and removing rare MOTUs is seen as a good compromise between retaining too many false-positives and introducing too many false-negatives. We conclude that metabarcoding hyporheic communities using highly degenerate COI primers can provide valuable first insights into the diversity of these ecosystems and highlight some potential application scenarios.


Introduction
DNA barcoding has become a standard technique in ecological and biodiversity studies (Hebert et al. 2003, Macher et al. 2016, Valentini et al. 2009). The technique has been enhanced by the possibility to generate and analyse millions of sequences on high-throughput sequencing machines. The use of DNA metabarcoding  is especially beneficial for studies on whole communities and taxa that are difficult to iden-tify based on their morphology. DNA metabarcoding has been tested and is used for the study of various habitats and different taxonomic groups (Arribas et al. 2016, Gibson et al. 2015, Yu et al. 2012 and recent studies have shown that metabarcoding can be used for standardised bioassessments in aquatic ecosystems (Cordier et al. 2017, Vasselon et al. 2017. In this context, a recently formed consortium (EU COST Action CA15219 'DNAqua-Net') comprising researchers, stakeholders, water managers and entrepreneurs has the goal to streamline the use of molecular methods for water quality assessment and to include them in international programmes such as the EU Water Framework Directive and EU Marine Strategy Framework Directive (Leese et al. , 2018. To date, the assessment of stream biodiversity and quality often relies on the study of presence, absence and abundance of diatoms, macrophytes, fish and macrozoobenthic invertebrates (MZB), i.e. invertebrates living on or in the riverbed (Birk et al. 2012). While this approach has been shown to be effective for biomonitoring, it neglects the fact that streams and rivers are complex ecosystems harbouring a wide range of microhabitats and taxonomic groups. Several faunistic groups such as the hyporheic community inhabiting the soil below the riverbed, i.e. the ecotone between the riverine surface water and the shallow groundwater, are largely neglected in current stream ecosystem biomonitoring. Hyporheic habitats harbour great biodiversity, have important ecological functions and thereby also provide crucial ecosystem services: They serve as i) a buffer for stressors in the water by filtering, storing and thus removing stressors from the water column, ii) a retreat for animals during droughts and other unfavourable environmental conditions at the surface, leading to a higher resilience of freshwater communities, iii) a spawning area for fish and invertebrates and iv) a nursery habitat for larvae , Lawrence et al. 2013, Stubbington et al. 2016. Further, hyporheic communities are known to react sensitively to stressors such as pollution, while also showing high pioneer settlement rates in newly available habitats (Beier andTraunspurger 2001, Eisendle-Flöckner et al. 2013). This means that insights into the biodiversity and ecology of hyporheic communities can potentially complement current stream quality assessments by providing information on the neglected biodiversity in the streambed and surface-subsurface hydrological connections (Lawrence et al. 2013).
So far, the hyporheic freshwater fauna has been scarcely studied mainly due to difficulties in sampling (mostly pumping and digging; Fraser and Williams 1997) and the small size of many taxa, making sample sorting and subsequent species identification a challenge (Giere 2008, Hakenkamp andPalmer 2000). As species identification has been shown to be often inaccurate even for relatively well known macrozoobenthic taxa (Haase et al. 2010), this inaccuracy is expected to be even more severe for the often difficult to identify meiofaunal (i.e. <1 mm body length) taxa inhabiting the hyporheic zone. However, this problem can be tackled with metabarcoding approaches, which often allow identification of taxa when genetic information has been deposited in a reference database such as the Barcode of Life Data System (BOLD, Ratnasingham and Hebert 2007). Studies on marine ecosystems have shown that molecular approaches can be successfully applied to identify soil fauna (Creer et al. 2010, Fonseca et al. 2014, Guardiola et al. 2015. We here present and evaluate a DNA metabarcoding protocol for the assessment of hyporheic freshwater com-munities, targeting the meiofauna (i.e. organisms <1 mm but >42 µm) of streambed sediments (i.e. the hyporheic zone). Further, we test the performance of a highly degenerate primer set, amplifying a fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene for the assessment of hyporheic communities and investigate the effect of PCR replication strategy on the inferred biodiversity parameters and community composition.
Testing primer performance and replication strategy is of high importance, as even though it is widely accepted that the mitochondrial COI marker is the gene of choice for metabarcoding of freshwater invertebrates, no consensus has yet been reached regarding primer sets and the number of PCR replicates needed for reliable detection of taxa. Further, to the best of our knowledge, no studies have tested the performance of highly degenerate COI primers for metabarcoding hyporheic freshwater communities and replication strategy for studying these habitats has also not been tested.
Different primers have been used and evaluated for metabarcoding of freshwater invertebrates (Andújar et al. 2018, Blackman et al. 2017) and the BF2/BR2 combination introduced by  has been found to reliably amplify a wide range of taxonomic groups, including taxa typically inhabiting the hyporheic zone, such as mites, dipteran larvae and oligochaetes. One prominent exception is the Nematoda, which is often not picked up by COI primers despite being abundant in soil habitats (Creer et al. 2010, Avó et al. 2017. However, recent studies have found that highly degenerate COI primers can amplify a great number of non-metazoan taxa when applied to samples containing a high amount of such taxa (Haene et al. 2017, Macher and Leese 2017, Wangensteen et al. 2018. As this is usually the case in sediment samples taken from the hyporheic zone, testing the performance of highly degenerate primers is an urgent task for planning further studies. Different replication strategies for metabarcoding studies have been tested (Alberdi et al. 2017, Ficetola et al. 2015 and some studies argue that sequencing depth and sample number can be more important than replication (Ficetola et al. 2015, Smith andPeay 2014). Sequencing PCR replicates allows for the identification and subsequent removal of potential false-positives by retaining only taxa that are found in multiple replicates per sample. However, replicates can fail to sequence or can be influenced by stochastic effects such as primer and PCR bias as well as random sample picking during sequencing (Alberdi et al. 2017). These effects can affect the abundance of specific PCR templates and (subsequent) sequencing reads and can hence lead to false-negatives due to the detection of taxa in one replicate, but not in the other. Therefore, sequencing more than two replicates and considering only taxa present in the majority of replicates might increase the robustness of assessment results (Alberdi et al. 2017). Testing replication strategy when applying highly degenerate COI primers to hyporheic communities is important, as the expected large number of amplified non-metazoan taxa might impact results due to their competition for sequencing reads with the metazoan target taxa.
We here present and evaluate a DNA metabarcoding protocol for the assessment of meiofaunal hyporheic freshwater communities. Further, we test the performance of a highly degenerate primer set amplifying a fragment of the mitochondrial cytochrome c oxidase subunit I (COI) gene for the assessment of hyporheic communities and investigate the effect of PCR replication strategy on the inferred community composition. The here presented metabarcoding protocol is further discussed in the context of potential application scenarios.

Sampling procedure
Sediment was collected at five lotic freshwater sites in the Boye tributary of the Emscher River catchment (Ruhr Metropolitan Region, North Rhine-Westphalia, Germany), which is one of Europe's largest river management and restoration projects (Perini 2016, Semrau andHurck 2012). The sampling sites can be classified into two categories: near-natural upstream regions (NNU; three sites: NNU_58, 61, 101) and ecologically-improved downstream regions with connection to a near-natural upstream section (EID: two sites: EID_27, 57) (Coordinates: Suppl. material 4: Table S1; see also Winking et al. 2014). The two EID-sites represent ecologically-improved sewage channels. All five localities were visited on 15 October 2015. The NNU-site 'Scheidgensbach' (NNU_61) was found to be water-depleted, but was still sampled to investigate the applicability of our DNA-based bioassessment method for intermittent freshwaters. Each stream was sampled by taking nine subsamples, which were taken in different microhabitats over an area of 10 metres to account for fine-scale patterns in local hyporheic community structures. Each subsampling was performed by using a disposable plastic syringe with removed tip as the corer (diameter: 2.8 cm; height: 7.5 cm; volume: 50 mL). The syringe was pushed into the sediment without the plunger until the 50 mL label was below the sediment surface. The plunger was then placed on the syringe, thus creating an underinflation in the syringe. The now closed syringe was pulled up and the collected sediment immediately stored in three times the amount of 96% ethanol. The uppermost 2 cm of sediment were discarded to prevent sampling of benthic taxa. This procedure was repeated nine times, sampling the different microhabitats present at a site. A new syringe was used for each sampling site. All sediment cores taken at a given site were pooled. Sediment at the sampling sites was mostly very fine and contained high amounts of organic material (Suppl. material 2: Figure S1).

Extraction of hyporheic meiofauna from sediment and DNA isolation
To extract hyporheic meiofaunal organisms from the sediment samples, the sediment was sieved through 1 mm mesh size in the laboratory, using high-pressure water to flush sediment through sieves. Similar sieving methods have been tested before (Wangensteen and Turon 2017). The >1 mm fraction was collected and stored in 96% ethanol, but was not further processed. The <1 mm fraction was sieved through a second sieve with 42 µm mesh size. This step was repeated multiple times, as some samples contained a high amount of fine sediments leading to fast clogging of the 42 µm sieve. The <42 µm fraction was discarded, whereas the >42 µm (and <1 mm) fraction was stored in 96% ethanol and used for subsequent DNA extraction. When a sample was split into several sub-samples for sieving, the retained material from all sub-samples was pooled and vortexed. All sieves were sterilised by bleaching overnight before processing a new sample.
The following protocol for genomic DNA extraction from lotic freshwater sediments is available as a stepby-step laboratory handout (Suppl. material 1). For each sample, ethanol was removed by evaporation over a period of three days at room temperature (RT) under a sterile flow hood. The dried sediment was weighed and, if a sample contained >10 g sediment, it was fractionated and placed into several Petri dishes. Cell lysis was achieved by adding 25 mL of a TNES-Proteinase K buffer (39:1 parts by volume; 3000 U/mL Proteinase K) to each Petri dish containing a maximum of 10 g sediment. Sediment and lysis buffer were homogeneously mixed by multiple pipetting. The dish was covered with a lid and closed with parafilm. Samples were incubated for ~5 h at RT and were mixed every hour by pipetting. The solution was transferred into a Falcon-tube and centrifuged for 1 min at 2500 xg (RT). Supernatant was transferred into a new Falcon-tube and used for DNA extraction via a salt precipitation protocol (Sunnucks and Hales 1996). Per 1 mL solution, 350 µL 5M NaCl was added and the sample was vortexed. Then, the sample was centrifuged for 5 minutes at 2500 xg (RT) and supernatant was transferred into another Falcon-tube. Per 1 mL solution, 1 mL of icecold 100% ethanol was added. In case of higher volumes, the sample was split into subsamples. Each (sub)sample was centrifuged for 15 min at 3000 xg at 4 °C. Supernatant was discarded and 4 mL of 70% ethanol was added to the pellet. Each (sub)sample was again centrifuged for 15 min at 3000 xg at 4 °C. Ethanol was removed and the pellet air-dried by evaporation. Pellets were resuspended in 10 mL lab grade water. When samples were split into subsamples, the very same 10 mL were used for resuspension of all pellets belonging to one sample, effectively pooling all subsamples and keeping the final volume to a minimum. The PowerMax DNA Isolation Kit (MO BIO Laboratories, QIAGEN, Germany) was used for the final extraction of DNA from sediments and for removal of PCR inhibitors.
We also tried to extract DNA according to the standard PowerMax protocol, i.e. without the additional steps described above. However, this resulted in minimal DNA yield and we thus developed the protocol described here.
The resuspended pellets (10 mL volume) were transferred into the provided BeadTubes and extraction was carried out as described in the kit manual, with a final elution volume of 5 mL. The eluted DNA was finally concentrated by another salt precipitation as described above and the DNA pellet was resuspended in 50 µL PCR grade water.

COI amplification and sequencing
Extracted DNA was quantified on a Fragment Analyzer with the Standard Sensitivity Genomic Kit (Advanced Analytical, Oak Tree, USA) and a volume containing 15 ng of DNA was used for PCR. A two-step PCR protocol was followed: For the first PCR, untailed primers BF2 and BR2  were used for amplification of a 421 bp fragment of COI using Illustra PuReTaq Ready-to-go PCR beads (GE Healthcare, Little Chalfont, UK). Initial denaturation was performed for 3 minutes at 94 °C, followed by 25 cycles at 94 °C for 30 seconds, 50 °C for 30 seconds, 72 °C for two minutes, followed by a final elongation at 72 °C for five minutes. For the second PCR, 1 µL of the product was used with individually tagged BF2 / BR2 primers (combinations: Suppl. material 5: Table S2). The PCR was conducted as described above, but limited to 15 cycles. Three independent PCRs were run for each sample. A negative control was included in all PCR steps and DNA concentrations in negative controls were checked on gels and the Fragment Analyzer using the NGS High Sensitivity Kit. As no DNA was observed in the negative controls, these were not sequenced. PCR products were cleaned using the MinElute PCR Purification Kit (QIAGEN, Germany), quantified on the Fragment Analyzer using the NGS High Sensitivity Kit, size selected using SpriSelect beads (left sided size selection, ratio 0.74×; Beckman Coulter, Brea, USA) and equimolarly pooled before being sent for sequencing on one run of the Illumina MiSeq platform (v2 Kit, 2× 250bp reads) at GATC Biotech (Constance, Germany).

Data quality filtering and MOTU clustering
All raw reads were processed with R scripts as in Elbrecht et al. (2016). Reads were demultiplexed and paired-end reads were merged using USEARCH (v.8.1.1756; Edgar 2010) with the maximum error rate set to fastq_maxee = 1. Primers were removed with cutadapt (v.1.9, Martin 2011). Sequences were dereplicated, singletons were removed and the UPARSE pipeline (Edgar 2013) was used for clustering at 97% similarity, which is a common approach in invertebrate metabarcoding studies relying on the barcoding gap approach (Bista et al. 2017). Subsequently, only Molecular Operational Taxonomic Units (MOTUs) that had read abundances over 0.001% per replicate (corresponding to >1 read in replicates with >100,000 reads) were retained, while all other MOTUs were discarded. This step has been shown to be a suitable alternative to rarefaction (Elbrecht and Leese 2015). MEGAN (v.6.8.18, Huson et al. 2007) was used to assign Linnaean taxonomy to the generated MOTUs by comparing sequences against a custom-made database containing all dereplicated COI sequences of <5000 bp length available from the NCBI GenBank (Benson et al. 2017) (Gen-Bank sequences downloaded on 10 September 2017; ME-GAN settings: -evalue 1e-60, -max_target_seqs 10). The following thresholds were used for identification: species (97% identity), genus (95%), family (93%), order (90%) and kingdom (85%). Metazoan MOTUs were then further checked against the Barcode of Life Data System (BOLD, Ratnasingham and Hebert 2007). In case of incongruent taxonomic classifications, the highest shared taxonomic level was used (e.g. family if different genera within the same family were proposed). MOTUs identified as "Metazoa", but referring to "Invertebrate Environmental Samples" tagged with "Eukaryota; Metazoa; environmental samples" in NCBI and MOTUs with strongly conflicting taxonomic assignments in NCBI and BOLD (e.g. identified as fungi or bacteria in one database, but as Metazoa in the other) were excluded from the final list of metazoan MOTUs, as were five pseudogenes of the Homo sapiens COI gene (pseudogene MTCO1P8, 13, 28, 49 and 51 with ≥99% sequence identity each). The taxonomy of oligochaete MOTUs was further manually resolved using the recently published Swiss database of Vivien et al. (2017).

Diversity analyses
Metazoan species richness was determined using the R package vegan (Oksanen et al. 2007), which was also applied to calculate beta diversity (Sørensen dissimilarity) across sampling sites and between replicates within single sampling sites. All calculations were done for Metazoa and the two most abundant taxonomic groups within the metazoan dataset, i.e. Oligochaeta and Diptera. Analyses were run on two datasets: first for the dataset including all MOTUs found per sampling site and second, for the dataset including only MOTUs found in at least two out of three replicates. All diversity estimates were calculated excluding the water-depleted site NNU_61. The Venn diagram creator of the University of Ghent (http=//bioinformatics.psb.ugent.be/ webtools/Venn/) was used to visualise the number of metazoan MOTUs shared between sampling sites and number of MOTUs shared between replicates within sampling sites.

Results
No DNA was observed in the negative controls processed together with samples and thus, no contamination was suspected. A total of 14,128,336 read pairs were ob-tained after sequencing, retaining 2,313,243 high quality reads after bioinformatic processing and quality filtering (Suppl. material 6: Table S3).

Alpha diversity
Alpha diversity of all metazoan taxa was highest in the ecologically improved downstream (EID) sites, with 150 MOTUs found in EID_57 and 134 MOTUs in EID_27 when analysing the full dataset, i.e. all MOTUs found at a sampling site, respectively, 99 metazoan MOTUs in EID_57 and 68 MOTUs in EID_27, when only MOTUs  found in two out of three replicates were analysed. Both near natural upstream (NNU) sites demonstrated a considerably lower number of MOTUs than the EID sites (full dataset: 82 for NNU_101, respectively, 65 for NNU_58; 2-out-of-3 dataset: both 44 MOTUs) (Figure 3). The same diversity patterns were found for Oligochaeta and Diptera MOTUs (Figure 4, all values: Table 2).

Beta diversity
When all MOTUs found at a sampling site were analysed, Sørensen dissimilarity within sampling sites ranged from 0.15 (Diptera, site NNU_58) to 0.31 (Metazoa and Diptera, site 27). In contrast, when only MOTUs found in at least two of three replicates per site were included in the analyses, communities in replicates were much more similar, with Sørensen dissimilarity ranging from 0.02 (Oligochaeta, sites 27 and 57) to 0.11 (Oligochaeta, site 58 and Diptera, site 27) (all values: Table 3).

Degenerate COI primers for metabarcoding of hyporheic communities
The applied metabarcoding approach detected taxa characteristic for the hyporheic community (Pacioglu 2010), with water mites, platyhelminthes and nematodes comprising prominent exceptions. However, these taxa were observed in preliminary morphological studies performed in the course of two B.Sc. theses at the same sites and using the same sampling technique, seven weeks prior to DNA metabarcoding sampling (Frie 2016, Tonscheidt 2016) (Suppl. material 9: Table S6).
Several possible explanations exist for their non-detection/absence: first, the lack of detection could be attributed to the failure of the BF2/BR2 primer pair to effectively amplify some taxonomic groups, as has been found in previous studies (Creer et al. 2010;, which also seems the most plausible explanation. Second, these taxa might have been absent from the sampling sites at the time of sampling for the metabarcoding study. However, although we cannot be fully sure that all three groups were present and sampled during the DNA metabarcoding fieldwork, the preliminary morphological sampling strongly supports this assumption. Third, there is a lack of public COI reference data for these taxa, hampering molecular species identification by automated comparison with available databases. However, if taxa were amplified, identification should have been possible at least to a higher taxonomic level, e.g. order. As this was not the case, it seems likely that these taxa were not amplified at all.
In addition to the detection of meiofaunal organisms, many macrozoobenthic organisms and several taxa not present in the preliminary morphological dataset were detected in the metabarcoding dataset. The detection of some taxa can likely be attributed to free DNA, cells or small body fragments of these taxa being present in the sampled sediments, e.g. Homo sapiens, Gasterosteus aculeatus, Agelastica alni, Elodes minuta, Pediobius saulius  and Psocoptera. This corresponds to findings in previous studies showing that cells or DNA of metazoans can be extracted and amplified from soil samples (Bienert et al. 2012, Epp et al. 2012. Other taxa might have been present in the sediment as eggs. Prominent groups only identified by DNA metabarcoding were Cnidaria, Trichoptera, Plecoptera and Isopoda.
The most apparent advantage of the developed DNA metabarcoding approach for hyporheic communities is the detection of a large number of species that might not be found when applying standard kick-net sampling and omitting size sorting of samples to prevent over-representation of taxa with a high biomass per specimen . However, only a fraction (17.5%) of all detected MOTUs could be assigned to a taxonomic name and only a very small fraction (1.2%) of the detected MOTUs could be reliably identified as Metazoa. When analysing the read abundance per taxonomic group, these differences were less pronounced, but still evident, as only 33.6% of all quality filtered reads belonged to the identified metazoan taxa. This ratio might be increased by applying flotation techniques that select for meiofaunal organisms, thus removing non-target taxa such as bacteria attached to sediment from the sample (Andújar et al. 2018, Burgess 2001, Haenel et al. 2017. The phenomenon that highly degenerate COI primers amplify a multitude of potential non-target taxa has been described before (Macher and Leese 2017, Siddall et al. 2009, Wangensteen et al. 2018) and designing primers targeting specific taxa of interest might be beneficial for future studies. Yet, the majority of studies applying the BF2/BR2 primer combination investigated (size-sorted) freshwater invertebrate bulk samples. In those studies, the proportion of target taxa (i.e. macroinvertebrates) and non-target organisms (e.g. bacteria from the cuticula, in gut contents or attached to the sediment) is much more beneficial for obtaining a high proportion of target MOTUs and sequence reads, compared to the size-sorted meiofaunal extraction approach applied here, focusing on the size fraction of <1 mm and >42 µm. Designing new primers targeting meiofaunal taxa commonly found in the hyporheic zone is promising, as currently available, highly degenerate COI primers have been designed for macroinvertebrate taxa (BF2/BR2: Elbrecht and Leese 2017) or marine invertebrates (Leray et al. 2013, Wangensteen et al. 2018. Therefore, they are more likely to miss taxa and lead to an underestimation of meiofaunal diversity, as seems partly the case in our study.
Further, the use of degenerate primers can help to detect and identify metazoan and non-metazoan MOTUs potentially suitable for biomonitoring and ecological studies. However, COI might not be the ideal marker gene for studying some metazoan (Creer et al. 2010) and especially non-metazoan taxa, as questions remain regarding its diagnostic power for species-level identification (Pawlowski et al. 2012). In addition, reference libraries for non-metazoan taxa contain mostly other genes, e.g. 18S or ITS (Kõljalg et al. 2005, Quast et al. 2012. Therefore, further research is needed in order to assess the usability of COI-based MOTUs for ecosystem assessment. Machine-learning approaches as developed by Cordier et al. (2017) seem promising as they allow the use of MOTUs with unknown taxonomy and unknown ecological traits for ecosystem assessment. Yet, it must be highlighted that (Linnaean) taxonomy-free MOTU-based diversity analyses are influenced by the underlying data quality filtering settings, MOTU clustering approaches and post-clustering algorithms (e.g. see Coissac et al. 2012, Boyer et al. 2014, Mahé et al. 2015, Frøslev et al. 2017.

Site heterogeneity and metazoan community composition
Beta diversity and, as such, site heterogeneity were considerably high for Metazoa, but also when investigating Diptera and Oligochaeta only. In total, 70% of all identified metazoan taxa (126/180) were exclusively present at a single sampling site each, despite pooling nine sediment cores at each sampling site. Thereby, the finding that beta diversity estimates between sites increased when only MOTUs found in at least two out of three replicates were included in analyses could hint to a real biological phenomenon: taxa abundant at one sampling site might be rare at another site, therefore being removed by restrictive MOTU filtering. Another explanation would be the presence of taxa due to contamination or tag switching, which is a common phenomenon on Illumina sequencing platforms (Schnell et al. 2015). In the latter case, removing rare MOTUs would increase the reliability of study results, as the retained community would more accurately depict the real community present at a sampling site.
The findings that EID sites harboured more taxa than NNU sites and that communities in EID sites were more similar than communities in NNU sites could be explained by different phenomena. It might show that upstream sites are more isolated from each other, therefore harbouring more distinct metazoan communities. In addition, very small streams in upstream sites tend to dry out during times of low rainfall, therefore communities in these streams might be adapted to more extreme conditions and might more frequently face community fluctuations. In contrast, communities in downstream sites are expected to be more often connected, receive drifting organisms from different upstream regions and rarely become dry and hence, communities could be more constant and similar in EID sites. The finding that freshwater communities are more different in upstream and generally isolated sites has been reported before (Altermatt et al. 2013, Finn andPoff 2005). However, the small number of sites sampled in this study does not allow answering the question if the same patterns are consistently found when studying hyporheic communities and further research is needed. Nevertheless, it seems obvious that, in particular, oligochaetes and dipterans drive the observed patterns in our study.

The impact of replication strategy on inferred community composition
The results of our study highlight that replication strategy and removal of MOTUs present in only one PCR replicate can greatly impact the inferred diversity within and across sampling sites. Although the dataset analysed in this study is small and does not allow drawing definite conclusions, the results suggest that replication and removal of rare MOTUs found only in one out of three replicates could be beneficial for future studies and monitoring of hyporheic communities. As previously outlined in Alberdi et al. (2017), this approach is a compromise between retaining potential false-positives and introducing potential false-negatives. Analyses of community similarity between replicates showed that the removal of rare MOTUs leads to greatly increased similarity between replicates and therefore, false-positives are expected to be efficiently removed from the dataset. However, it is possible that the number of false-negatives increases and that some false-positives with a high read abundance remain in the dataset. Our results further suggest that for studies focusing on comparing (hyporheic meiofaunal) biodiversity in larger areas and across multiple sampling sites, PCR replication might not be that important, as beta diversity estimates remained largely independent of replication and removal of rare MOTUs. However, dissimilarity between sampling sites increased when only MOTUs present in at least two out of three replicates were considered. Therefore, in future studies, the best replication and filtering approach should be chosen based on the research question/application scenario, available financial resources and sequencing depth.

Future application scenarios
Hyporheic communities of stream ecosystems are often highly diverse. At the same time, taxonomic keys for genus and/or species level identification are widely lacking (Robertson et al. 2000). DNA-based approaches such as DNA metabarcoding have the potential for circumventing this shortcoming (Creer et al. 2010) by referring to the digitally preserved taxonomic knowledge stored in DNA barcode reference libraries. Given the steady progress made in the setup of those libraries (Ratnasingham and Hebert 2007), it can be expected that many taxa will be reliably identifiable on species level in the future -in case libraries are well curated. Even if species level identification is not possible, identification of MOTUs can be used to infer the ecology of taxa (Macher et al. 2016) and therefore, metabarcoding approaches hold great promises for the assessment of ecosystems and study of taxa that are difficult to identify based on their morphology. Hence, metabarcoding of hyporheic freshwater communities has great potential for diverse application scenarios such as the following: Stream restoration management: The hyporheic zone is often rapidly colonised by meiofaunal organisms (Eisendle-Flöckner et al. 2013). Therefore, using metabarcoding to characterise the hyporheic community can potentially provide information on the ecological status of restored streams soon after restoration and before the arrival of classical bioindicator taxa such as Ephemeroptera, Trichoptera and Plecoptera (EPT taxa).
Streambed health assessment: Studies have shown that substrate characteristics are highly important for ecological processes in streams and freshwater communities are strongly influenced by substrate characteristics (Duan et al. 2008, Radwell and Brown 2006. When the ecological demands of taxa are known, metabarcoding hyporheic communities can potentially provide information on the ecological quality and function of the streambed. This might be particularly attractive in the context of monitoring streambed clogging, which can reduce accessibility and oxygenation of the hyporheic zone and subsequently leads to the loss of fish spawning grounds and habitats for species such as lampreys (Kemp et al. 2011). In particular, a high diversity of oligochaetes, as observed in the current study, might be an indication for intense streambed clogging (Bo et al. 2007). Improvement of existing indices: As hyporheic fauna composition largely depends on small-scale environmental factors such as streambed substratum, a high resolution for ecological health assessments of streams could be reached. Some existing indices use information on certain taxa to assess ecosystem quality, e.g. the Oligochaete Index of Sediment Bioindication (IOBS; Rosso et al. 1994) and the Nematode Species at Risk Index (NemaSPEAR; Höss et al. 2011). Metabarcoding of hyporheic freshwater communities using taxon-specific primers could help to increase the accuracy of such indices by allowing for species level identification of otherwise difficult to identify taxa, as proposed by, for example, Vivien et al. (2015Vivien et al. ( , 2017. Assessment of intermittent stream and spring ecosystems: Standard assessment protocols for streams require the presence of water. However, intermittent rivers are common in many arid parts of the world and the hyporheic zone is an important refuge for aquatic taxa during droughts (Robson et al. 2011, Vander Vorste et al. 2016. Metabarcoding the hyporheic community might therefore help monitoring and assessing the ecological status of intermittent rivers throughout the year (Stubbington et al. 2018). The same applies for springs (i.e. ecotones between surface waters and groundwater aquifers), which can move in their longitudinal position over the year and depending on aquifer discharge conditions. All those aspects are currently tackled in the two EU COST Actions 'DNAqua-Net -Developing new genetic tools for bioassessment of aquatic ecosystems in Europe' (Leese et al. , 2018 and 'SMIRES -Science and Management of Intermittent Rivers and Ephemeral Streams' ).

Conclusions
Highly degenerate COI primers are known to amplify a multitude of non-metazoan organisms, especially when applied to samples with a low ratio of metazoan to non-metazoan taxa (Macher and Leese 2017, Siddall et al. 2009, Wangensteen et al. 2018). Our study indicates that this is also the case for hyporheic communities when using the BF2/BR2 primer pair for the investigation of stream meiofauna. Therefore, in future studies, appropriate primers should be chosen based on the expected and targeted taxonomic groups and depending on the research questions/application scenarios at hand. While the utilisation of highly degenerate primers has the benefit of amplifying a wide range of taxonomic groups, the disadvantage is that many reads can be "lost" to non-target, non-metazoan taxa that might be out of focus (such as bacteria). However, for widely understudied ecosystems like the hyporheic zone, using highly degenerate primers allows first insights into species diversity and biomonitoring can potentially benefit from including so far understudied and neglected non-metazoan taxa. Based on the initial results obtained by metabarcoding with highly degenerate primers, less degenerate primers can be chosen or designed for future studies, targeting taxonomic groups of interest.

Data resources
Raw data is available from the Short Read Archive (SRA), accession number SRR7623732.

Author contributions
AMW and JNM designed the study, performed sampling, analysed the data and wrote the manuscript. Table S5: Metazoa MOTU list with 2-out-of-3 replicate strategy and excluding site 61 Authors: Alexander M. Weigand, Jan-Niklas Macher Copyright notice: This dataset is made available under the Open Database License (http://opendatacommons.org/licenses/odbl/1.0/). The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited. Link: https://doi.org/10.3897/mbmg.2.26869.suppl8

Supplementary material 8
Supplementary material 9 Table S6: Taxa identified based on morphology Authors: Alexander M. Weigand, Jan-Niklas Macher Copyright notice: This dataset is made available under the Open Database License (http://opendatacommons.org/licenses/odbl/1.0/). The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited.