Choice of DNA extraction method affects DNA metabarcoding of unsorted invertebrate bulk samples

Characterisation of freshwater benthic biodiversity using DNA metabarcoding may allow more cost-effective environmental assessments than the current morphological-based assessment methods. DNA metabarcoding methods where sorting or pre-sorting of samples are avoided altogether are especially interesting, since the time between sampling and taxonomic identification is reduced. Due to the presence of non-target material like plants and sediments in crude samples, DNA extraction protocols become important for maximising DNA recovery and sample replicability. We sampled freshwater invertebrates from six river and lake sites and extracted DNA from homogenised bulk samples in quadruplicate subsamples, using a published method and two commercially available kits: HotSHOT approach, Qiagen DNeasy Blood & Tissue Kit and Qiagen DNeasy PowerPlant Pro Kit. The performance of the selected extraction methods was evaluated by measuring DNA yield and applying DNA metabarcoding to see if the choice of DNA extraction method affects DNA yield and metazoan diversity results. The PowerPlant Kit extractions resulted in the highest DNA yield and a strong significant correlation between sample weight and DNA yield, while the DNA yields of the Blood & Tissue Kit and HotSHOT method did not correlate with the sample weights. Metazoan diversity measures were more repeatable in samples extracted with the PowerPlant Kit compared to those extracted with the HotSHOT method or the Blood & Tissue Kit. Subsampling using Blood & Tissue Kit and HotSHOT extraction failed to describe the same community in the lake samples. Our study exemplifies that the choice of DNA extraction protocol influences the DNA yield as well as the subsequent community analysis. Based on our results, low specimen abundance samples will likely provide more stable results if specimens are sorted prior to DNA extraction and DNA metabarcoding, but the repeatability of the DNA extraction and DNA metabarcoding results was close to ideal in high specimen abundance samples.


Introduction
Modern biomonitoring and water quality assessments of freshwater ecosystems rely on standardised sampling and identification of benthic macroinvertebrates (Bonada et al. 2006).Samples are usually collected using kick-nets or surber-nets in lotic environments, while different ways of grab-or core sampling typically are used in deeper lentic environments (European Community 2000).Sufficient sampling effort is important to obtain an appropriate representation of the fauna (Bongard et al. 2011;references therein).In most cases, the effort needed to sample a large proportion of the community is substantial, resulting in massive volumes of samples.For an undisturbed boreal highland river in Norway, for instance, it is estimated that > 3000 specimens of Ephemeroptera, Plecoptera and Trichoptera (EPT) must be sampled in order to cover 95% of the EPT diversity (Bongard et al. 2011).This corresponds to eight-ten minutes of active kick-sampling.Therefore, the following sorting and morphological identification of specimens is very time-consuming since samples are very large and especially if non-EPT taxa are included.Subsampling is often implemented to expedite the sample handling and identifications rely on taxonomic expertise and knowledge (e.g.availability of taxonomic keys for a specific group and location); therefore, the same samples can produce different taxonomic lists depending on the identifier and the subsample examined (Haase et al. 2010).In this situation, approaches based on DNA metabarcoding promise more efficient and standardised sample processing (Hajibabaei et al. 2011, Yu et al. 2012), especially since large or replicate samples from the same location will not require significant additional effort.
In DNA metabarcoding, short homologous DNA fragments -e.g.part of the cytochrome c oxidase I (COI) gene -from bulk samples are sequenced in parallel and the resulting reads are matched against a reference library to identify the taxa present in the samples (Hajibabaei et al. 2011, Taberlet et al. 2012).The approach has proven to give more detailed taxon lists than morphological identification (e.g.Ji et al. 2013, Gibson et al. 2015, Emilson et al. 2017) and several studies have demonstrated the suitability of the method in monitoring of freshwater macroinvertebrates (Hajibabaei et al. 2011, Carew et al. 2013, Elbrecht et al. 2017b).Although some methodological limitations, such as estimation of biomass (Elbrecht and Leese 2015), detection of rare species (Leray and Knowlton 2017) and incomplete reference libraries (Hajibabaei et al. 2016;Hering et al. 2018) still exist, DNA metabarcoding based monitoring of freshwater macroinvertebrates has attracted great interest and effort has been made towards large-scale implementation (Baird andHajibabaei 2012, Hajibabaei et al. 2016) i.e. in accordance with the EU Water Framework Directive (Leese et al. 2016;Leese et al. 2018).
Efficient sample homogenisation and DNA extraction is a prerequisite for any successful DNA-based study.Conventionally, DNA has been extracted using desalting and organic solvents like phenol-chloroform (Tan and Yiap 2009).Commercial solid-phase (spin-column based) methods make use of the finding that DNA binds to a silica membrane in the presence of a high concentration of chaotropic salts (Boom et al. 1990) and non-target substances are rinsed off while the target DNA is bound to the silica membrane.Although more expensive, the spin-column methods have advantages compared to the earlier 'artisanal' and often toxic DNA extraction methods, since the former are standardised, easy to perform and can save time.Therefore, many freshwater DNA me-tabarcoding studies are performed using commercial spin column kits for DNA extraction (e.g.Carew et al. 2013;Vivien et al. 2016;Blackman et al. 2017;Emilson et al. 2017), although some research groups have used salt extraction (Elbrecht et al. 2017a;Elbrecht et al. 2017b).However, DNA may be made readily available for downstream analyses by simply using alkaline lysis and neutralisation in a single tube (Truett et al. 2000).Applying such a simple DNA extraction method to DNA metabarcoding samples would further facilitate the reduction of sample processing time.
Extensive scientific literature on comparisons of DNA extraction methods exists, each piece of literature concentrating on different target taxa and environments and showing differences in DNA yield and PCR success amongst methods (e.g.Miller et al. 1999;Fredricks et al. 2005;Mahmoudi et al. 2011;Psifidi et al. 2015;Mäki et al. 2017;Schiebelhut et al. 2017).High-quality DNA was best achieved using bead-based homogenisation (Miller et al. 1999;Mahmoudi et al. 2011) and best PCR success was achieved with efficient inhibitor removal (Miller et al. 1999;Whitehouse and Hottel 2007;Mahmoudi et al. 2011;Psifidi et al. 2015).However, performance may vary by taxon, for instance in the case of fungi and macroinvertebrates (Fredricks et al. 2005;Schiebelhut et al. 2017).Although no universal best method for DNA extraction exists, optimal methods might be available in literature or be developed based on existing approaches for specific applications.
Most freshwater DNA metabarcoding studies rely on separating biomass of organisms from the non-target organic and inorganic material and on grinding the organisms before adding lysis buffer (e.g.Gibson et al. 2015;Elbrecht et al. 2017b;Emilson et al. 2017).Separating target biomass reduces sample size before grinding and efficiently removes the main contributors to PCR inhibition, the phenolic compounds (Wilson 1997).In addition, there are some benefits of sorting by specimen size before DNA extraction for metabarcoding (Elbrecht et al. 2017a).However, there is still the issue of time used for separating the target biomass of samples, especially for sufficiently large samples (Bongard et al. 2011), including all macroinvertebrate groups.
In various soil studies, DNA has been successfully extracted directly, without first separating organisms (Ogram et al. 1987;Van Elsas et al. 1997;Delmont et al. 2011;Taberlet et al. 2012).The use of such unsorted bulk samples for DNA extraction of macroinvertebrates is appealing due to the dramatic reduction in sample processing time (i.e.no specimen sorting), but only a few freshwater macroinvertebrate studies have used such techniques (Hajibabaei et al. 2011, Gardham et al. 2014, Dowle et al. 2016, Andújar et al. 2018).The sample homogenisation and DNA extraction steps become crucial when using unsorted bulk samples because uneven specimen distribution in the sample and presence of non-target organic matter and derived PCR inhibitors might influence the DNA metabarcoding success.
Here, we compared the success of DNA metabarcoding for assessing a set of unsorted freshwater benthic macroinvertebrate samples, processed using different DNA extraction methods.Rather than being an exhaustive comparison, we chose to compare methods that have low toxicity, are easy to use and require little handling time.Thus, no conventional time-consuming or toxic desalting and organic solvent-based methods were included.

Material and methods
Samples were collected from three sites in Norway: River Atna at Skranglehaugen (10 August 2015, 61.98186°N, 09.80454°E, 1117 m above sea level) and at Vollen (10 August 2015, 61.98471°N, 10.02823°E, 720 m above sea level), as well as Lake Jonsvatn at Jonsborg (28 September 2015, 63.39569°N, 10.55370°E, 150 m above sea level).River Atna originates in the Rondane National Park in Central Norway and is a well-documented Nordic freshwater ecosystem (Aagaard et al. 2004).Lake Jonsvatn, near Trondheim, is a moderately large lake (area 15 km 2 , maximum depth 97 m) and is the main source for drinking water for the city of Trondheim.
Four minute benthic kick samples were collected from the River Atna (Bongard et al. 2011) and 0.01-m 2 Van Veen grab samples from the benthos of the Lake Jonsvatn.The Van Veen samples were collected from 0.25, 2, 7.5 and 15 m depth, from the shoreline to approximately 50 m from the shore.Samples are called Skranglehaugen, Vollen, 15 m, 7.5 m, 2 m and 0.25 m hereafter (Fig. 1).Excess plant material was carefully removed from the samples on site and 96% ethanol was added.The next day, the ethanol was changed to fresh 96% ethanol and the samples were stored at +4 °C.After storing the samples three to four months (until 16 December 2015), excess ethanol was carefully collected from the samples and the remaining material including all plant material, animals, sediment and ethanol was homogenised, using a hand-held blender.Between each sample, the blender was sterilised by submerging it first in deionised water, then in a 5% bleach solution for 15 min.After 15 min, the bleach was washed away from the blender by submerging it in deionised water then in double-deionised water.Finally, the blender was dried under UV light for 30 min.From each of these homogenised samples, excess ethanol was removed and twelve 27.6-130.5-mg(mean 69.4 mg, standard deviation 25.8 mg; Fig. 2) subsamples per sample were collected into microcentrifuge tubes, using forceps sterilised in the same manner as the blender.Samples were shaken vigorously in between subsampling.
The subsamples were randomly divided into three groups exposed to three different DNA extraction protocols (Fig. 1).Prior to extraction using the DNeasy Blood and Tissue Kit (Qiagen GmbH, Hilden, Germany), the subsamples were homogenised, using Tissue Lyser (Qiagen) with 19.0/5 frequency for 2 min and incubated in 20 µl of proteinase K overnight at 56 °C on a rocking platform (200 rpm).The PowerPlant Pro DNA Isolation Kit (MO BIO Laboratories, Carlsbad, CA, USA) was used according to manufacturer's protocol (currently named Qiagen DNeasy PowerPlant Pro Kit).The third DNA extraction protocol was based on the HotSHOT approach (Truett et al. 2000), which is a cheap and quick DNA extraction method based on lysis of the tissues in an alkaline sodium hydroxide and neutralisation with an acidic Tris buffer.The extracted DNA was quantified with Qubit 2.0 (Invitrogen, CA, USA), using the dsDNA HS Assay Kit according to the manufacturer's protocol.
We amplified three fragments of the mitochondrial COI gene.The full barcode region (approximately 648 bp) was amplified, using the standard Folmer et al. (1994) LCO1490 (GGTCAACAAATCATAAAGATATTGG) and HC02198 (TAAACTTCAGGGTGACCAAAAAATCA) primers.The approximately 239bp long F230 fragment at the 5' end of the barcode region was amplified, using the standard Folmer LCO1490 primer and the 230_R (Gibson et al. 2015) reverse primer (CTTATRTTRTTTATICGIG-GRAAIGC).The approximately 314bp long BE fragment at the 3' end of the barcode region was amplified, using the forward primer B CCIGAYATRGCITTYCCICG (Hajibabaei et al. 2011) and the reverse primer R5 GTRATIGCIC-CIGCIARIACIGG (Gibson et al. 2014).The first-round PCR amplifications were undertaken once with attached Illumina forward (TCGTCGGCAGCGTCAGATGTG-TATAAGAGACAG) and reverse (GTCTCGTGGGCTC-GGAGATGTGTATAAGAGACAG) adapters.
The first-round PCRs (Suppl.material 1: Table 1) had a final volume of 25μl containing 2 μl DNA template, 17.8 μl molecular biology grade water, 2.5 μl 10× reaction buffer (200 mM Tris HCl, 500 mM KCl, pH 8.4), 1 μl MgCl 2 (50 mM), 0.5 μl dNTPs mix (10 mM), 0.5 μl forward primer (10 mM), 0.5 μl reverse primer (10 mM) and 0.2 μl Invitrogen's Platinum Taq polymerase (5 U/μl; Thermo Fisher Scientific, Waltham, MA, USA).Slightly modified PCRs were necessary for 15 m (all HotSHOT subsamples, F230 and BE Blood & Tissue Kit subsamples and one F230 PowerPlant Kit subsample), 7.5 m (all HotSHOT) and 2 m (all HotSHOT) samples.These reactions contained increased concentrations of forward primer (1 μl), reverse primer (1 μl) and Taq (0.5 μl).All PCRs included negative control reactions (no DNA template).The PCR conditions for the F230 and BE fragments were, with a heated lid, 94 °C for 5 min, followed by a total of 35 cycles of 94 °C for 40 s, 46 °C for 1 min and 72 °C for 20 s and a final extension at 72 °C for 2 min and hold at 10 °C.The modified PCRs were amplified, using 40 cycles.The PCR conditions for the full barcode region were, with heated lid, at 94 °C for 1 min, followed by 5 cycles of 94 °C for 40 s, 45 °C for 40 s and 72 °C for 1 min, 35 cycles of 94 °C for 40 s, 51 °C for 40 s and 72 °C for 1 min and a final extension at 72 °C for 5 min and hold at 10 °C.The DNA template was 1:50 diluted due to PCR inhibition (except some subsamples; see Suppl.material 1: Table 1).Samples that were amplified with increased conditions were purified, using Qiagen's noised, using a sequence identity threshold of 98% in the command -cluster_otus.The above steps were undertaken for each subsample and the resulting operational taxonomic unit (OTU) fasta files were pooled to F230, BE, Folmer forward and Folmer reverse files, using merge.files in mothur.Then, the resulting pooled OTU files were de-replicated, using -derep_fulllength and the final OTUs (each fragment separately) were clustered, using -clus-ter_otus at 97% similarity level.The abundance of each OTU in each sample was searched, using -usearch_global against the pooled OTU-files.All OTUs that had less than 6 reads/sample were removed.Also, OTUs present only or predominantly in three negative samples were removed (42 F230 and BE OTUs, 38 full barcode forward OTUs and 12 full barcode reverse OTUs).
The OTUs were assigned taxonomically in two steps.First, they were searched against the NCBI non-redundant nucleotide database, using the BLAST 2.3.0+(Zhang et al. 2000) (F230 and BE fragments searched 17 November 2016 and forward and reverse fragments of the full barcode region 7 April 2017).Taxonomic assignment of an OTU was undertaken, using the lowest common ancestor algorithm in MEGAN 6.5.10 (minimum bit score 100, top percentage 8.0 and minimum support 1; Huson et al. 2016).The Metazoa OTUs were translated to amino acids and aligned, using Mafft online tool (Katoh and Standley 2013) to remove all non-COI OTUs and OTUs with stop codons in the mitochondrial invertebrate code.Secondly, the Metazoa OTUs were assigned taxonomically using the BOLD v.4 identification engine and Species Level Barcode Records (Ratnasingham and Hebert 2007; F230 and BE fragments 24 November 2016 and forward and reverse fragments of the full barcode region 21 April 2017; > 97% matches to a species in BOLD were considered as an assignment).The F230, BE and the forward and reverse full barcode fragment OTU tables were combined, but only OTUs that matched with an invertebrate species in the BOLD v.4 Species Level Barcode Records were kept (889 OTUs).All OTUs assigned to the same taxonomic name were merged leaving 305 invertebrate species.The different DNA extraction methods were compared, using both non-normalised and normalised species level reads (36991 reads/sample).
MinElute PCR purification Kit according to manufacturer's instructions.PCR products were visualised on a 1.5% agarose gel to check the amplification success and a subset of samples was quantified using PicoGreen (Thermo Fisher Scientific) according to the manufacturer's protocol.In the second-round PCRs, the Illumina tailed amplicons were dual indexed, using Nextera XT Indices (FC-131-1002, Illumina, San Diego, CA, USA) in a reduced-cycle PCR according to the manufacturer's protocol.The amplification reactions contained the same reagent concentrations as above with three modifications: 3 µl of amplicon, 1 µl of each primer and 0.25 µl of Platinum Taq (Invitrogen).Indexed amplicons were pooled into a library and sequenced on a standard flow cell with three extraction controls, using the 600-cycle V3 Illumina MiSeq sequencing kit (MS-102-3003).
The resulting raw amplicon reads (available in the ENA SRA repository with study name PRJEB26589) were processed, using mothur v.1.36.1 (Schloss et al. 2009, see Suppl.material 5: Methods for details).The forward and reverse reads of the F230 and BE fragments were merged keeping only the reads with no mismatch in primer sequence, using the command make.contigs.The forward and reverse fragments overlapped on average with 282 bases (F230 reads) and 238 bases (BE reads).All merged F230 reads shorter than 250 bases and longer than 340 bases and all merged BE reads shorter than 330 and longer than 375 bases, as well as merged reads with ambiguous bases were removed, using the command screen.seqs.Primer sequences were removed, using the command trim.seqs.The full barcode region reads were processed separately since the forward and reverse reads could not be merged reliably (only few bases overlapping).All reads with mismatches in primer sequences, with ambiguous bases, with homopolymers longer than 12 bases and shorter than 200 bases were removed; primer sequences were removed from the retained reads and the reads were truncated when the average quality score decreased to under 30 in 30 bases sliding window, using the command trim.seqs.After the quality control, the resulting good quality reads were processed further, using usearch v8.1.1831_win32 (Edgar 2013).Exact duplicates were removed, using the command -derep_fulllength and the reads were de-To find statistically significant differences (p < 0.05) in DNA yield and in metazoan diversity amongst sites, oneway Kruskal-Wallis followed by pair-wise comparisons was used.To find significant differences in subsample size, in DNA yield and in metazoan diversity amongst different DNA extraction methods, two-way ANOVA followed by Tukey's pair-wise comparisons was used.Partitioned pair-wise betadiversity values (β sim and β sne , Baselga 2010) were calculated using the R-package betapart (Baselga et al. 2018, R Core Team 2016) to see how much of the variation in species richness was due to nestedness (β sne , subsamples with less species include a subset of the species in subsamples with more species) and turnover (β sim , change in species composition).To study repeatability of the DNA extraction methods, similarity in invertebrate community composition was analysed by fitting the bivariate Poisson-lognormal species abundance distribution to pairs of subsamples (Engen et al. 2002, Engen et al. 2011, Diserud et al. 2013), using the R-package poilog (Grøtan and Engen 2015, R Core Team 2016).Species abundances from a community were assumed to follow a lognormal distribution and the sampling process to be described by Poisson sampling.For a species with a given abundance, the expected number of reads observed will then equal the abundance scaled by the sampling intensity.When considering two community subsamples simultaneously, the log abundances in the pair of communities were assumed to follow the binormal distribution, with the estimated correlation from this bivariate Poisson-log-normal species abundance distribution serving as a robust measure of similarity between subsamples.This approach utilises all the species abundance information, accounts for the sampling process and can deal with over-dispersion relative to the Poisson, that is, the variance in the observed log abundances is greater than the mean (Engen et al. 2002).When community subsamples are grouped according to the DNA extraction method, the dissimilarity between pairs of subsamples, after Poisson sampling has been accounted for, can be attributed to over-dispersion and the extraction method.Assuming the variance component caused by different sampling sites to be the same for all extraction methods, the different methods can be evaluated by how consistently they represent the community structure, as quantified by the bivariate correlation.If the dissimilarity between subsamples is large for a given extraction method, this method's representation of the community structure will vary more than for an extraction method with lower dissimilarity.

Results
Total DNA was extracted from 28-131 mg subsamples, four subsamples/extraction method from each site (Figs 1-2).The weights of subsamples were similar overall (Fig. 2a; two-way ANOVA, extraction factor F 2,5 = 0.55, site factor F 2,5 = 1.04, p > 0.05) in spite of a crossover interaction (two-way ANOVA, interaction fac- tor F 2,5 = 3.60, p < 0.01), which was due to HotSHOT extraction subsamples being significantly heavier in the 2 m samples and significantly lighter in the 0.25 m samples compared to the PowerPlant Kit subsamples (Tukey's HSD on extraction at each site, p < 0.05).The DNA yield correlated weakly with the subsample weights overall (Spearman's ρ = 0.31, p < 0.01).This correlation could be attributed to a significant and strong correlation of the PowerPlant Kit subsample weight and DNA yield (Spearman's ρ = 0.77, p < 0.001).Weights of Blood & Tissue Kit and HotSHOT extraction subsamples did not correlate with the DNA yield (Spearman's ρ, p > 0.05).All HotSHOT extraction, 75% of Blood & Tissue Kit and 25% of PowerPlant Kit subsamples suffered from PCR inhibition as only diluted templates worked in PCR (Suppl.material 1: Table 1).There was no correlation between the amount of initial material extracted and inhibition.
The DNA yield was lower in 15 m, 7.5 m and 2 m samples (42-495 ng) than in 0.25 m, Skranglehaugen and Vollen samples (169-4280 ng; Kruskal-Wallis χ 2 =50.99, p < 0.001, pair-wise Mann-Whitney with sequentially Bonferroni corrected p < 0.05; Fig. 2b).The DNA yield of PowerPlant Kit was significantly higher than that of HotSHOT extraction in the 15 m and 0.25 m samples and significantly higher than that of Blood & Tissue Kit in the 15 m, 7.5 m and 0.25 m samples; also the DNA yield of HotSHOT extraction was significantly higher than that of Blood & Tissue Kit in the 15 m and Skranglehaugen samples (two-way ANOVA, interaction factor F 2,5 = 3.99, Tukey's HSD on interaction, p < 0.001).Overall, the highest DNA yield was achieved with the PowerPlant Kit, also when the subsample weight was taken into consideration (Suppl.material 4: Figure 1a).
A total of 10.3, 9.8 and 1.3 million reads were generated from the F230, BE and full barcode fragment, respectively (Suppl.material 3: Table 3).After merging and quality filtering, 3.9 and 3.0 million good-quality paired-end F230 and BE reads as well as 0.5 million good-quality full barcode fragments remained, which were clustered into 2741, 4832 and 2289 OTUs at 97% similarity level, respectively.From those OTUs, 1426, 1261 and 1430 OTUs were affiliated with Metazoa in an NCBI BLAST search and from those, 616, 489 and 588 OTUs were taxonomically assigned to a metazoan species in BOLD (> 97% matches to a species in BOLD).After merging the taxonomic information from all three fragments, 305 OTUs were taxonomically assigned to an invertebrate taxon (named DNA species hereafter), which were used for comparing the extraction methods.The majority of DNA species were Insecta (78%).Clitellata (8%), Maxillopoda (4%) and Branchiopoda (2%) were the next richest groups.
The number of DNA species was the highest using PowerPlant Kit in the 15 m and 2 m samples, using Blood & Tissue Kit in the 7.5 m and Vollen samples and using HotSHOT extraction in the Skranglehaugen samples (Fig. 3; two-way ANOVA, interaction factor F 2,5 = 5.09, p < 0.001, Tukey's HSD on extraction at each site p < 0.05) and, thus, the number of DNA species did not give a clear indication of which extraction method performed best overall.
The portion of shared DNA species amongst the extraction methods was lower for the 15 m, 7.5 m and 2 m samples (Fig. 4a-c) than for the 0.25 m Skranglehaugen and Vollen samples (Fig. 4g-i).The beta diversity (measured as Sørensen dissimilarity) of the Blood & Tissue Kit was significantly higher than that of the HotShot extraction in the 15 m, 7.5 m and 2 m samples and significantly higher than that of the PowerPlant Kit in the 15 m, 7.5 m, 2 m and 0.25 m samples (Fig. 4d-f, j-l; mean beta diversity value ± 95% confidence interval higher).The beta diversity of the HotShot extraction samples was also significantly higher than that of the PowerPlant Kit samples in the 7.5 m, 2 m and 0.25 m samples.The significantly lower beta diversity values of the PowerPlant Kit samples than those of Blood & Tissue Kit and HotShot samples indicate less change in community composition using PowerPlant Kit in the lake site.The beta diversity values were similar amongst the extraction methods in the river site (Fig. 4k-l).However, partitioning of Sørensen dissimilarity showed that the change in community composition due to nestedness of PowerPlant Kit samples was significantly higher than that of the Blood & Tissue Kit and HotShot extraction in Vollen sample (Fig. 4l; mean β sne value ± 95% confidence interval higher).This indicates that the PowerPlant Kit subsamples were more subsets of each other and less affected by species replacement than the Blood & Tissue Kit and HotShot extraction subsamples.Overall, the PowerPlant Kit performed best in the lake site and showed indications of better repeatability in the river site.
When evaluating repeatability of extraction methods in more detail, based on the bivariate Poisson-lognormal correlations, the PowerPlant Kit outperformed both Blood & Tissue Kit and HotSHOT extraction in most sites (Fig. 5; mean bivariate correlation value ± 95% confidence interval higher than that with other methods at each site except 15 m where Blood & Tissue Kit and PowerPlant Kit confidence intervals overlap).The mean bivariate correlation value was higher than 0.80 in all except the 15 m samples using PowerPlant Kit, while using Blood & Tissue Kit and HotSHOT extraction, it was higher than 0.80 only in river samples.This indicates that the Blood & Tissue Kit and HotSHOT extraction subsamples failed to give a consistent community composition in the lake samples.

Discussion
In this study, we compared three DNA extraction methods on unsorted bulk samples of lentic and lotic macroinvertebrates.DNA metabarcoding, based on unsorted samples, holds great potential for large scale monitoring of freshwaters because of reduced sample processing time, especially for samples with a high invertebrate to debris ratio.Our results show that the DNA extraction method employed influences the success of DNA metabarcoding for assessing macroinvertebrate biodiversity and is dependent on environmental factors such as the benthic substrate, type of vegetation and specimen abundance.
Our sampling sites differed markedly in their physical and biological composition.The Vollen and Skranglehaugen sites are characterised by abundant boreal and alpine species living in variable lotic microhabitats (Aagaard et al. 2004), which were thoroughly sampled with our four-minute kick-samples.The kick-samples resulted in two to three litres of sampled material, including both macroinvertebrates and some living plant material.Our sampling of the lake Jonsvatn depth gradient was more cursory as four 0.01-m 2 Van Veen grab samples in ca.50 m line represented each depth.Thus, different microhabitats were not as extensively sampled as in the river.In addition, the grab samples included dead plant material and excess sediments as well as some living plant material.We took parallel samples at each site for sorting and morphological identification (Majaneva et al. in prep.)and, as an example, a lake grab sample at 15 m included only 17 specimens compared to over 10000 specimens in the river kick-samples.The lentic samples included fewer macroinvertebrates (Figs 3,4) and less DNA (Fig. 2) and they had lower consistency of the subsamples (Fig. 5) than the lotic samples, leading to uncertainty of the me-tabarcoding-based community composition results for the lentic site.Consequently, more stable results may be achieved if such low-specimen lentic site samples were sorted prior to DNA extraction and DNA metabarcoding, as is the usual practice currently undertaken in DNA metabarcoding studies.
All DNA extraction methods tested might have been affected by PCR inhibition because of the presence of non-target organic matter in our bulk samples.However, the lotic samples suffered less than the lentic samples from inhibition, probably because the plant material present in the lotic samples was mainly bryophytes that have a relatively low amount of phenolic compounds (Maksimova et al. 2013), while the lentic samples included remains of vascular plants.There are several ways to deal with inhibition after DNA extraction, but previous studies have shown that species-specific detectability is reduced by 25% using column purification and 52% using dilution (McKee et al. 2015, Buxton et al. 2017).Thus, it is preferable to maximise inhibitor removal already in the DNA extraction step.In our test, the PowerPlant Kit, which includes a specific inhibitor removal step, performed best as it suffered least from PCR inhibition and yielded most DNA in strong correlation with the subsample weight (Fig. 2).The high yield observed in our results is in contrast to previous studies where kits that included inhibitor removal steps had lower yields than kits without this step (Mahmoudi et al. 2011;Eichmiller et al. 2016).Obviously, it is possible that the PowerPlant Kit performs better for extracting plant DNA present in the samples and this DNA can contribute to the higher yield.
Some samples were particularly difficult in terms of PCR inhibitors.In the 15 m lake sample, where also the PowerPlant Kit failed to produce DNA extracts unaffected by PCR inhibition, each DNA extraction method resulted in some close to zero and negative bivariate correlation values (Fig. 3d).Correlation close to zero indicates that the communities are independent in structure, while a negative correlation show that dominant species in one community tend to be rare in the other and vice versa (Diserud et al. 2013).An obvious reason for the low consistency in the lentic subsamples is the lower relative and total number of target specimens compared to the lotic kick samples and subsequent inferior target specimen homogenisation of the lentic samples.In such situations, the time spent on sorting small samples is beneficial compared to using unsorted samples for DNA metabarcoding.Another factor that could have improved the consistency in our lentic subsamples is using DNA extraction developed for a greater amount of starting material, e.g.> 1 g instead of 100 mg (Ranjard et al. 2003;Zinger et al. 2016), although some of such extractions methods have been found unreliable for PCR (Mahmoudi et al. 2011).In any case, using homogenisation of bulk samples when abundance and diversity is low is questionable as it is easy to envisage a situation where the ecological status is incorrectly inferred due to chance.This may also apply for other low abundance and diversity samples, for reliability of abundance data (Elbrecht and Leese 2015).Based on our results, no DNA extraction method was best in terms of number of species (Fig. 3).PowerPlant Kit subsamples showed significantly better repeatability both example if using environmental DNA samples to monitor invertebrate biodiversity.
Often only presence/absence data are used for community studies based on DNA metabarcoding due to the low using incidence based sample nestedness (Fig. 4) and abundance based bivariate correlation values (Fig. 5), but as the overall dissimilarity amongst subsamples was high and proportion of shared species amongst the extraction methods was low in the low-abundance lentic samples (Fig. 4), we do not recommend homogenising low-abundance bulk samples prior to sorting.
Based on our results, all tested DNA extraction methods appear suitable for large, high-abundance unsorted samples, particularly from lotic systems.Smaller samples with fewer specimens are less time-consuming to sort and likely provide more stable results if PCR-inhibiting substances are physically removed prior to DNA extraction, although DNA extraction with the inhibitor removal step performed significantly better in the case of low-abundance samples.In conclusion, we urge for continued testing of homogenised unsorted bulk samples and comparison with results from presorted samples since time spent on sorting samples might be more efficiently used for monitoring more localities using DNA metabarcoding.

Figure 1 .
Figure 1.Experimental setup.The 4-minute kick samples were collected from two sites along the River Atna and the Van Veen grab samples were collected from a depth gradient in the Lake Jonsvatn.From each sample, 12 subsamples were taken and DNA was extracted using three extraction methods (four subsamples per extraction method); HotSHOT extraction, Qiagen DNeasy Blood & Tissue Kit and MO BIO PowerPlant Pro Kit.An extraction blank was done for each extraction method.

Figure 2 .
Figure 2. Subsample sizes (a) and DNA yields (b).Two-way ANOVA followed by Tukey's HSD was used to test differences amongst the extraction methods and F-statistic value (F) and significance (p) are given.The small letters denote significantly different extraction methods at each sampling site based on site and extraction interaction factors.

Figure 3 .
Figure 3. Number of DNA species in each subsample.Twoway ANOVA followed by Tukey's HSD was used to test differences amongst the extraction methods and F-statistic value (F) and significance (p) are given.The small letters denote significantly different extraction methods at each sampling site based on site and extraction interaction factors.

Figure 4 .
Figure 4.The number of shared DNA species amongst extraction methods illustrated using Venn diagrams (a-c; g-i) and pairwise beta diversity values within extraction methods at each site measured as Sørensen dissimilarity (β sor , which is partitioned to nestedness, β sne and species turnover, β sim ) (d-f; j-l), lines denote group means and 95% confidence intervals for β sor and β sne .The small letters denote significantly different extraction methods at each sampling site based on non-overlapping 95% confidence intervals for β sor and β sne .

Figure 5 .
Figure 5. Bivariate correlation values, using the full dataset.Circles denote values for 6 pairs of samples within each extraction method, lines denote group means and 95% confidence intervals and all 66 bivariate correlation values within the sampling site are summarised in the box plots.The small letters denote significantly different extraction methods at each sampling site based on non-overlapping 95% confidence intervals.