Internal transcribed spacer primer evaluation for vascular plant metabarcoding

The unprecedented ongoing biodiversity decline necessitates scalable means of monitoring in order to fully understand the underlying causes. DNA metabarcoding has the potential to provide a powerful tool for accurate and rapid biodiversity monitoring. Unfortunately, in many cases, a lack of universal standards undermines the widespread application of metabarcoding. One of the most important considerations in metabarcoding of plants, aside from selecting a potent barcode marker, is primer choice. Our study evaluates published ITS primers in silico and in vitro, through mock communities and presents newly designed primers. We were able to show that a large proportion of previously available ITS primers have unfavourable attributes. Our combined results support the recommendation of the introduced primers ITS-3p62plF1 and ITS-4unR1 as the best current universal plant specific ITS2 primer combination. We also found that PCR optimisation, such as the addition of 5% DMSO, is essential to obtain meaningful results in ITS2 metabarcoding. Finally, we conclude that continuous quality assurance is indispensable for reliable metabarcoding results.


Introduction
Globally, one million species are threatened by extinction in the near future and 68% of monitored populations are declining (IPBES 2019; Grooten et al. 2020). An estimated two out of five plant species are threatened with extinction (Antonelli et al. 2020). Accurate monitoring is vital to understand and alleviate the driving forces behind the unprecedented biodiversity decline (IPBES 2019; Grooten et al. 2020). The term metabarcoding, which describes the analysis of complex DNA samples with the aim of taxonomic identification, has the potential to provide a scale and accuracy in biodiversity surveys that was previously unattainable for many taxonomic groups (Deiner et al. 2017;Ruppert et al. 2019). However, the increase in technical complexity (Piper et al. 2019), compared to most other monitoring methods (Marsh and Trenham 2008;Prosekov et al. 2020), also implies a higher susceptibility to errors and therefore requires stringent quality control (Deiner et al. 2017;Ruppert et al. 2019;Thalinger et al. 2020). The objective validation of metabarcoding methods can most efficiently be implemented through mock communities, since the composition of environmental DNA (eDNA) samples is unknown (Bjørnsgaard et al. 2017;Elbrecht and Leese 2017;Smith et al. 2017;Zhang et al. 2018;Braukmann et al. 2019;Thalinger et al. 2020). eDNA studies can be performed either on an amplicon basis (metabarcoding) or on a genome basis (metagenomics). Although genomic methods bypass most PCR biases (Porter and Hajibabaei 2018;Piper et al. 2019), their efficiency currently is not on par with metabarcoding Ruppert et al. 2019). In metabarcoding studies, the internal transcribed spacer 2 (ITS2) is widely used because it has a high success rate in species-level identification across the plant kingdom (see Kolter and Gemeinholzer 2020 for a detailed discussion of different plant markers). ITS2 also has one of the largest number of reference sequences in public DNA sequence libraries amongst the most common plant barcode markers (Kolter and Gemeinholzer 2020). Universal criticism, based on the multi-copy nature of ITS2, can be countered by the fact that Song et al. (2012) discovered that 97% of all ITS2 variants in their analysis could only be found within a single species. Song et al. (2012) furthermore reported that intra-genomic distances between variants are smaller than intra-specific or inter-specific distances. Therefore, ITS2 remains to be an important tool in metabarcoding, phylogeny and many other applications (Kay et al. 2006;Feliner and Rosselló 2007;Cheng et al. 2016;Alanagreh et al. 2017;Liu et al. 2019).
Aside from marker choice, primer choice has repeatedly been identified as one of the key factors to facilitate accurate recovery of taxa in a sample (Krehenwinkel et al. 2017;Elbrecht et al. 2019;Hajibabaei et al. 2019;Kelly et al. 2019;Piñol et al. 2019;Li et al. 2020). However, despite the availability of multiple ITS primer sets (White et al. 1990;Gu et al. 2013;Cheng et al. 2016;Moorhouse-Gann et al. 2018;Tremblay et al. 2019), we have identified persistent problems with the amplification success of ITS in multiple recent large-scale barcoding studies (Braukmann et al. 2017;Gill et al. 2019;Jones et al. 2021). The lack of a side-by-side evaluation of current ITS primers makes it impossible to identify and solve the underlying issues of low ITS amplification success rates, compared to other barcode markers.
Our study evaluates ITS primers, based on an in silico and in vitro analysis. The in vitro analysis, performed by using two mock communities, aims to compare the uniformity of amplification achieved with different primers and identify primer-specific amplification biases. The in silico analysis identified mismatches of common ITS primers in Spermatophyta (Cycadopsida, Gnetopsida, Pinopsida, Liliopsida and Magnoliopsida) and led to the design of five new ITS primers with improved universality. Our primer design was focused towards Spermatophyta, as this plant taxon is well represented in public sequence repositories compared to other plant taxa. However, we also reported mismatches in Bryophyta, Fungi, Polypodiopsida and Lycopodiopsida if adequate taxonomic representation were available. Furthermore, due to the high guanine-cytosine (GC) content in a substantial number of ITS2 sequences, we investigated the impact of dimethyl sulphoxide (DMSO) on mock community species retrieval success.

De novo primer design
Spermatophyta sequences containing the ITS region, used as a template to generate primers, were downloaded from GenBank in April 2018 as described in Kolter and Gemeinholzer (2020). Degenerate consensus ITS sequences on family level of the nrDNA LSU (large subunit of the nuclear ribosomal DNA) and nrDNA SSU (small subunit of the nuclear ribosomal DNA) regions were used to identify conserved flanking regions and, subsequently, suitable primer locations in a single step.
To screen for potential primer sequences in the 5.8S nrDNA region, a consensus sequence of all Spermatophyta plant sequences was established (Suppl. material 1: Suppl. file 1). All nucleotides with more than 2% abundance at a specific position were taken into consideration and represented by the respective IUPAC code. Following this, all 24-mers, extracted from the aforementioned single consensus sequence with a maximum of eight IUPAC ambiguity codes and ending in a C or G, were identified as primer candidates. Primer candidates with hairpin structures with an average melting temperature above 50 °C and those with a GC content below 40% or above 80% were filtered out. Primer candidates forming self-dimers or hetero-dimers with any reverse primer (created in this work), with a Gibbs free energy (∆G) higher than one fourth of the maximum Gibbs free energy were discarded or modified. The remaining primer candidates were aligned with 5.8S nrDNA consensus sequences on family level and mismatches were manually resolved by adding a degenerate nucleotide code, if possible. Some mismatches were specific for certain plant families and, therefore, could not be resolved without overly inflating the overall primer degeneracy (Suppl. material 1: Suppl. file 2). The three forward primers have been named in accordance with White et al. (1990) followed by the position (p) within the 5.8S nrDNA region, the specificity (pl = plant, un = universal), the orientation (F = forward, R = reverse) and a revision number (e.g. ITS-3p53plF1). Previously published primers located in the 5.8S nrDNA region played no role in primer positioning or design.
We selected 22 frequently used ITS primers from literature to be analysed alongside the five newly designed primers (Table 1). Primers were validated with Tracheophyta sequences downloaded from GenBank in September 2020 and subsequently processed as described in Kolter and Gemeinholzer (2020). Additional sequence alignment filter steps on family level included the removal of columns with more than 95% gap characters and alignment columns that were supported by less than three species. The sequence coverage of the LSU and SSU region varied from 22,574 to 63,024 (SSU) and from 25,845 to 90,319 (LSU) due to the fact that regions located upstream of the most common ITS primers were less covered in GenBank (Appendix 1). Primer BEL-3, designed by Chiou et al. (2007) and referred to as S3R by Chen et al. (2010), was not evaluated in silico as its distance to the ITS2 region resulted in very poor sequence availability. To evaluate the specificity towards Fungi and Bryophyta, sequences were extracted from Cheng et al. (2016). The specificity analyses of Polypodiopsida and Lycopodiopsida has been limited to the 5.8S nrDNA region (for a detailed sequence list with taxonomic information, see Suppl. material 1: Suppl. file 1).
Primers were compared to the DNA sequences using the R packages ShortRead and Biostrings (Morgan et al. 2009;Pagès et al. 2020). To give each species the same weight in primer evaluation, the sum of mismatches per sequence has been divided by the number of sequences of that respective species. Adding these numbers up on a family level and dividing them by the total number of species per respective family, resulted in a mismatch score, given in percentages (Suppl. material 1: Suppl. file 3). Only mismatch scores above 30%, per primer position, were reported (for an unfiltered list, see Suppl. material 1: Suppl. file 2). Figures were created using ggplot (Wickham 2016). Higher taxonomic names (above the rank of family) have been retrieved by the R package rgbif from the Global Biodiversity Information Facility (GBIF) backbone taxonomy and are meant to be descriptive only (Chamberlain and Boettiger 2017).

Mock community design
We extracted DNA from 58 herbarium specimen by the use of silica-coated ferric beads and the tissue protocol by Sellers et al. (2018). The two mock communities (mix 1, mix 2) were constructed by considering: 1) DNA concen-tration, 2) individual amplification success, 3) taxonomic diversity and 4) inclusion of samples with high GC content. We did not use known mismatches as a criterion for inclusion or exclusion. Non-Spermatophyta species were added to investigate primer specificity. Mix 1 (Appendix 4) was created from 23 different Spermatophyta plus four Lycopodiopsida species (Selaginella kraussiana, Selaginella denticulata, Huperzia carinata and Equisetum arvense) and two fungal species (Aspergillus chevalieri and Pseudogymnoasus pannorum). Mix 2 (Appendix 5) was created from 20 different Spermatophyta plant species plus two fungal species (Talaromyces wortmannii and Aureobasidium pullulans). The mixtures contained equal amounts of DNA from each species (1 ng), as determined by Qubit v.4.0 (Invitrogen, dsDNA HS Assay Kit Q32854). This resulted in two mock communities with a taxonomic spread of 16 Spermatophyta plant orders and a difference in GC content of ~25% (Appendices 5 and 6). This intentionally resulted in mock communities with a wide variety of ITS copy numbers per species.

Sequencing primer design
Primers contained a part of the Illumina TruSeq read primer in addition to the target primer to act as a linker between PCR number one (target amplification) and PCR number two (Illumina indexed adapter being added), which results in the following forward primer: 5'-CAGACGTGTGCTCTTCCGATCT [optional spacer] [target primer]-3' (reverse: 5'-CTACACGACGCTCTTC-CGATCT [optional spacer] [target primer]-3'). A spacer that is non-complementary to the target sequence was added to: 1) prevent more than three identical consecutive nucleotides, 2) stop the TruSeq sequence from interfering with primer binding if it showed the potential to be partially complementary to the target sequence and, 3) to act as a mini-barcode with a length of 3 bp to facilitate pooling of samples with otherwise identical primers after the first round of PCR (termed internal index by Glenn et al. (2019)). The second round of PCR targeted the primer linker added before, which also acts as a part of the read primer in the upcoming sequencing reaction and added the barcoded Illumina indexes (i7 and i5) and the p5 and p7 sequencing adapters (Appendix 3).

PCR optimisations
Due to the relatively high GC content of ITS2 amplicons, we optimised PCR conditions in a pre-trial and concluded that the additive DMSO at a concentration of 5% enables amplification of ITS2 from genomic plant templates (Suppl. material 1: Suppl. file 1). To assess the impact of DMSO on mixed PCR templates, like the mock com-munities used in this study, we compared 0% DMSO to 5% DMSO using the ITS-3p62plF1 + ITS-4unR1 primer combination. All other parameters were identical to the protocols mentioned earlier.

Sequence data analysis
Sequencing data was processed with R (Suppl. material 1: Suppl. file 3) and VSEARCH (Rognes et al. 2016; R Core Team 2020). All filtering steps were applied to individual reads instead of read pairs (paired forward and reverse reads) to retain reads in which one of the sequencing directions did not produce any or not enough data. Such reads were granted permission to bypass the merge step if they had a minimum length of 150 bp after primer removal and quality control. In case both reads passed the quality filters and length requirement, but could not be merged, the longer read was retained and the corresponding paired read was discarded. The first step of filtering raw reads was performed by R and included the removal of sequences showing errors in the primer sequence or sequences that were shorter than 30 nucleotides. Subsequent filtering steps by vsearch removed all reads with any undetermined nucleotides (N) and truncated the reads after the continuously accumulated chance of an erroneously assigned nucleotide reached one (maximum expected errors). The merge step allowed only reads to pass (for exceptions, see above) which showed a minimum overlap of 20 nucleotides with a maximum of five differences and which produced a merged sequence with a minimum length of 60 nucleotides. Merged reads were deduplicated by vsearch at 100% identity in a reversible manner and identified by SINTAX (Edgar 2016) using a database by Ankenbrand et al. (2015). Obvious misidentifications (i.e. due to missing reference sequences) were manually corrected. Mock communities were analysed on a genus level.

Sequence data derived metrics
To assess the successful detection of taxa in the mock communities, we calculated its read abundance for each taxon and primer combination. We define the read abundance for each taxon in each replicate as the proportion of reads for a given taxon relative to 1000 reads. If the median read abundance of all replicates of one taxon of a specific primer combination was above 0.1 and the taxon was detected (≥ one read) in all replicates of the respective primer combination, the taxon was classified as present (Table 3; Appendices 4 and 5). We set a minimum of more than one read in 10,000 per replicate as the detection threshold as, based on the read depth, this requires a taxon to be not represented by singletons only. We furthermore calculated a retrieved taxa score by including partial detections (taxon not detected in all replicates). The most optimal outcome would be for all taxa within one mock community to be detected in all replicates. The retrieved taxa score expresses how many of those detec-tions, in relation to the maximum number of possible detections (i.e. mix 1: 21 * 9 replicates), were successful by a specific primer combination.
The required read depth for each primer combination to detect all but one taxon with a confidence of 95% within a mock community was calculated separately for mix 1 and mix 2 in multiple steps. First, instead of assigning an arbitrary penalty score to missing values (taxon not detected in some replicates), we limited the calculation to taxa which could be detected in all replicates in all primer combinations (Appendices 4 and 5). Second, we subsampled the reads of each replicate for each primer combination in increments of 100 reads, each 100 times. If the specific taxon was found (≥ one read) in at least 95 of the 100 repeated subsamples, it was scored as detected. For example, in one case, we subsampled the reads of replicate three out of nine (mix 1) for the primer combination ITS-3p62plF1 + ITS-4unR1 100 times at a depth of 3000 reads. In this example, Solanum citrullifolium was detected in 88 out of the 100 subsamples and, therefore, was scored as "not detected". Third, the lowest number of subsampled reads required to achieve the aforementioned detection rate of 95% for each replicate and each taxon per primer combination was averaged per taxon. We finally took the 95 th percentile of the aforementioned average values per primer combination to estimate the number of reads required to detect all taxa in that respective primer combination with a confidence of 95%, except one (give outliers less weight), which was allowed a lower confidence score (Table 3).
Due to the sample size of the mock communities, the 95 th percentile was roughly equivalent to the average value of the two taxa requiring the most reads. As the removal of singletons is desirable in some experimental setups, we cloned the workflow mentioned before with the exception that two reads of a specific taxon had to be detected in 95 out of 100 subsamples per primer combination (Table 3). If the number of reads of any primer combination were not sufficient to achieve 95% detection probability for some difficult to detect taxa with low read abundances, we linearly interpolated the read abundances to increase the sample size.

In silico primer evaluation
We tested a total of 26 primers (Table 1), located in three distinct regions (SSU, 5.8S nrDNA and LSU). Primer melting temperatures of ITS primers analysed in this study ranged from 55.4 °C to 73.0 °C (Table 2). With four exceptions (ITS1, ITS-D, ITS-p3 and 58SPL), the hairpin melting temperature was below the melting temperature of the primer itself ( Table 2). Some of the primer variants of primers featuring ambiguous nucleotides (i.e. Uni-PlantF and ITS-3p53plF1), also form hairpin structures with a melting temperature higher than their respective melting temperature (Table 2). However, our results indicate that these primer variants do not match any plant template in our database (Supp. file 5). We listed the plant families that are potentially negatively impacted by hairpin structures with a melting temperature higher than 50 °C (Suppl. material 1: Suppl. file 6). GC primer content ranged from 36% to 80% with a 3' terminal GC stretch of zero to five ( Table 2). The UniPlantR primer variant (CCCGCCTGACCTGGGGTCGC) that matches with the most (~72%) plant template sequences in our database, has a GC content of 80% (Suppl. material 1: Suppl. file 5). The primer lengths varied between 19 bp and 24 bp and resulted in a maximum ∆G between -44.32 and -35.49 kcal per mole ( Table 2). The self-dimer ∆G ranged from -13.74 to -3.61 kcal per mole ( Table 2). The smaller the ratio between maximum ∆G and self-dimer ∆G, the less likely it is that the primer forms troublesome dimers. The number of mono nucleotide and dinucleotide repeats ranged from two to four (Table 2). We identified four primer hotspots within the 5.8S nrDNA region (Fig.  1). Their central motifs are approximately located at the positions: 5-20 bp (ITS-D, ITS-p3, ITS-p2, ITS-2plR1), 30-50 bp (ITS3, ITS-u3, ITS-3p34unF1), 60-75 bp (ITS-S2F, UniPlantF, ITS-3p53plF1, ITS-3p62plF1) and 100-108 bp (ITS-u2, 58SPL).
The total amplicon length of each primer combination can be calculated by adding the primer lengths, the distance to the ITS region of interest and the length of the Note: The top three most optimal values in each column were highlighted (bold) until three values were chosen and the next higher or lower value is different from the previous one. Lycopodiopsida read proportions were not highlighted, as it depends on the particular study whether they are negatively (non-target taxon) or positively (target taxon) evaluated. A taxon was defined as missing if the read abundance were lower than 1 in 10000 reads or if it was not represented in all technical PCR replicates. The retrieved taxa score, in contrast, includes taxa that were represented in less than all technical PCR replicates. The required read depth only considers taxa that could be fully recovered (present in all replicates) in all primer combinations. Values were rounded, but never to zero, if at least one read could be detected. Figure 1. Position of previously published and newly introduced ITS primers in the 5.8S nrDNA region (5' → 3'). Primer positions may be shifted by ± 2 bp in comparison to previously published alignments due to different annotation software being used to identify the 5.8S region. Primers usually used to amplify the ITS1 region (Table 1) have been reverse-complemented to fit the 5' → 3' plus strand orientation of the figure. A black arrowhead indicates the direction of amplification of the original primer sequence (Table 1). Primers introduced in this work are marked by an asterisk.

In silico primer evaluation of the SSU region
Regardless of their position, the primers located in the SSU region, flanking the ITS1 region (ITS1, ITS-A, ITS-u1, ITS-p5), have five or fewer mismatches with the exception of ITS5 (Tables 1 and 2; Suppl. material 1: Suppl. file 2 and Suppl. material 1: Suppl. file 3). However, a stable hairpin structure, with a melting temperature above the annealing temperature, is formed by a 7 bp long self-complementary stretch of the ITS1 primer (Table 2).
In silico primer evaluation of the 5.8S nrDNA region (reverse) The ITS-2plR1 primer displays the lowest number of mismatches (  (Table 2), features a stretch of eight self-complementary bases located near the 3' end and, therefore, bears a high risk of producing unwanted by-products.
In silico primer evaluation of the 5.8S nrDNA region (forward) The primers located in the 5.8S nrDNA region that are usually used to serve as forward primers to amplify the ITS2 region can be split into two major groups by the number of Spermatophyta families with mismatches (Table 2). Group one (ITS3, ITS-D, ITS-S2F, ITS-u3 and ITS-p3) does not match in a minimum of 22 families, while group two (Uni-PlantF, ITS-3p* and 58SPL) has a maximum of 11 mismatched families (

Mock community sequence analysis
The sequencing yielded 3,196,249 paired raw reads (NCBI BioProject PRJNA740294). The filtering step that removed most reads on average (~30%) was to require an exact primer fit at the start of the sequence. A total of 6.5% of all reads failed to merge. As the overall read quality was very high, other filters (combined) removed, on average, less than 5% of the total reads. This results in a total of 2,128,039 merged reads to enter the analysis. There was no contamination detected in the blanks that could have affected the results. Rarefaction curves show a flat slope, except for the primer combination ITS-p3 + ITS-p4, as it yielded less reads than the other primer combinations (Suppl. material 1: Suppl. file 4). All primer combinations are represented by nine replicates, with the exception of ITS-S2F + BEL-3 which is represented by three replicates only. There is a negative Spearman correlation of -0.7 and -0.67 between the GC content and the number of reads recovered for each taxon in mix 1 and mix 2, respectively (p < 0.01, df = 16). This is expected, as species with a relatively high GC content were intentionally included (Appendices 4 and 5). The GC content of taxa ranged from 52% to 77% in mix 1 and from 52% to 79% in mix 2 (Appendices 4 and 5). We recovered no correlation between the number of reads for a specific taxon and the number of mismatches to its respective primer pair. The number of reads from unexpected taxa (reads from species that were not fungi and not included in the mock communities) were negligible with less than 1 in 1,000 reads.

Mock community primer tests
Philodendron angustisectum could not be detected by any primer combination targeting the ITS2 region in mix 1 and Sassafras albidum could not be detected in mix 2, most probably due to DNA degradation. Both will be excluded from the following analysis. The number of missed taxa per mix ranges from 3 to 9 out of 21 for mix 1 and from 3 to 6 out of 18 for mix 2 ( Table 3). The number of retrieved taxa is highest with the ITS-3p62plF1 + ITS-4unR1 primer combination and lowest with ITS3 + ITS4 (Table 3). The same pattern in reverse was observed for the required read depth for both analyses, with and without singletons (Table 3). Primer combinations (mix 1 and mix 2) that contain the reverse primer UniPlantR generally recovered very low reads of Liliopsida, especially Gagea graeca, which becomes nearly undetectable with this primer (Appendices 4 and 5). The 58SPL primer has a similar number of mismatches compared to 3p62plF1 (Table 2). Yet, in direct comparison, we find the key metrics clearly in favour of the 3p62plF1 primer (Table 3). Taking a closer look, the taxa showing a read abundance greater than 200 in the 3p62plF1 + ITS-4unR1 primer combination are increasing in read abundance in the 58SPL + ITS-4unR1 primer combination, while all other taxa, with the exception of Ericales, are decreasing in read abundance (Appendices 4 and 5).
The number of fungi sequences varies from < 1% to 80% of all reads in mix 1 and from < 1% to 88% in mix 2 (Table 3). Abundances are associated with the average number of mismatches in fungi (Tables 2 and 3). The primer combinations UniPlantF + UniPlantR, ITS-3p62plF1 + UniPlantR, ITS-p3 + ITS-p4 and ITS-S2F + BEL-3 are all expected to have three or more mismatch-es in fungi which resulted in less than 1% of reads to be classified as fungi (Tables 2 and 3). The primer combinations UniPlantF + ITS-4unR1, ITS-u3 + ITS-u4 and ITS3 + ITS4 have, on average, less than one mismatch in fungi and produce fungal read abundances of 41% to 87% (Tables 2 and 3). The Lycopodiopsida in mix 1 are represented by 0 to 345 reads per 1,000 reads (Table 3; Appendices 4 and 5).

Mock community PCR optimisations
Addition of 5% DMSO to the PCR mix has a threefold effect: 1) In most cases, it reduces the number of reads necessary to detect the species with a GC content of ≥ 62% (Fig. 2). 2) It reduces the number of taxa that have at least one failed replicate from nine to three for mix 1 and from six to three in mix 2 (Fig. 2). 3) It enables Lindera (mix 1) and Asimina (mix 2) to be detected (Fig. 2). Taxa like Figure 2. Impact of DMSO on mock community representation. The detection chance (colour) per genus (black lines) was tracked per replicate (horizontal coloured lines) by subsampling the reads 100 times randomly in steps of 100 from 100 to 10000 (x-axis). The detection chance was defined as the number of subsamples where the respective genus could be detected by at least one read. Smilax (mix 1) and Quercus (mix 2) that did not show a substantial improvement compared to 0% DMSO, had an overall very low number of reads (Appendices 4 and 5).

Selecting and refining: one possible approach to plant metabarcoding studies
Although universal ITS primers have been proposed in the past (White et al. 1990;Blattner 1999;Cheng et al. 2016;Moorhouse-Gann et al. 2018), the increasing availability of publicly available ITS sequences allows for an improvement in universality that was not previously possible. This is demonstrated by comparing the number of sequences (Cycadopsida, Gnetopsida, Pinopsida, Liliopsida and Magnoliopsida) used for primer design by Cheng et al. (2016) and this study (55,700 and 187,522 5.8S nrDNA sequences, respectively). Moorhouse-Gann et al. (2018) used a geographically restricted (Mauritius and United Kingdom) database of less than 10,000 5.8S nrDNA sequences for primer design.
A plant-specific primer with a low number of mismatches, weak secondary structures and a uniform amplification of a complex DNA mixture has the highest chance to deliver representative results when used in combination with an unknown eDNA sample (Tables 2  and 3). The detailed mismatch lists in this study furthermore allow primers to be customised to fit the exact needs of a given study (Suppl. material 1: Suppl. file 2 and Suppl. material 1: Suppl. file 3). Especially studies including Orchidaceae may find that their unique sequence characteristics warrant extra attention (Suppl. material 1: Suppl. file 3). With the help of the family-level alignments (Suppl. material 1: Suppl. file 3), an orchid-specific primer can be synthesised and added to the degenerate primer mix in a relative quantity of 1/n (n = total number of primer variations of the respective degenerate primer).

In silico primer evaluation
Finding the balance between the elimination of primer mismatches and the number of total primer variants was one of the main design goals of this study. The five primers we generated (ITS-2plR1, ITS-3p* and ITS-4unR1) achieved less mismatches than most of the previously published ITS primers (Table 2). This results are a significant improvement, as it is rather difficult to predict whether or not a mismatch is critically affecting subsequent amplification or may be of minor importance. The impact of primer mismatches on the overall assay success are determined by parameters that include, but are not limited to, salt concentration, annealing temperature, mismatch position, mispaired nucleotide and template concentrations (Ayyadevara et al. 2000;Waterfall et al. 2002;Sipos et al. 2007;Wu et al. 2009;Lefever et al. 2013;Rejali et al. 2018). Studies by Lefever et al. (2013) show that the impact of (non-3'-terminal) primer mismatches on primer melting temperature is hard to predict, ranging from 0-8 °C for one mismatch and 2-20 °C for two mismatches. Despite the fact that PCR performance has been reported to decrease fundamentally in severity if the mismatch is located more than 8 bp (Rejali et al. 2018) or more than 12 bp (Wu et al. 2009) away from the 3' terminal position, multiple studies suggest that mismatches towards the 5' end of primers may not be completely neutral (Sipos et al. 2007;Boyle et al. 2009;Lefever et al. 2013). For these reasons, up-to-date primer development with a minimal number of mismatches is a prerequisite for any successful DNA-based environmental assay, as it increases the number of true positive detections.
Although the study by Cheng et al. (2016) shows a drastic reduction of mismatches of more than 80% in Angiosperms in the ITS-u3 primer versus the ITS3 primer, our results indicate only a reduction of approximately 15% (Table 2). This can be attributed to the way mismatches have been counted. This study counts mismatches and summarises them on a family level, while Cheng et al. (2016) counts mismatches on a sequence-based level. This leads to a scenario where 15% of the included Angiosperm plant families represent 50% of the informative nature of the analysis. We believe that our method of analysis is better suited for a wide breadth of applications, as families with little researched taxa are given the same weight as plant families with a better sequence coverage.
An additional advantage of our method is that it reveals mismatch patterns more easily. If a mismatch occurs consistently in a large number of families and if some of these families are only represented by few sequences, the underlying pattern becomes very hard to catch, if the analysis is based on sequence level only. This can be seen in the ITS-p4 and ITS-u2 primer (Suppl. material 1: Suppl. file 3). The majority of mismatches in these primers could have been eliminated by introducing a single ambiguous nucleotide (ITS-p4: 10G > K and ITS-u2: 10G > R). Making three modifications in the ITS-p3 primer (i.e. 8C>Y, 10G > R, 14C > Y) eliminates most of its mismatches. However, unfortunately, this would introduce a hairpin structure with a melting temperature above the primer melting temperature (Suppl. material 1: Suppl. file 5). In addition to the positioning of the ITS-p3 primer, which adds 140 bp of minimally informative nucleotides to the amplicon length, this limits the potential of the ITS-p3 primer.
The threshold we report of a 30% frequency of mismatch in the corresponding plant family prevents outdated taxonomic assignments (i.e. a genus is placed in the wrong family) and misidentified sequences from introducing noise, which would result in primer design with an unnecessarily high number of ambiguous nucleotides. However, there are rare cases of mismatches of the same nucleotide and at the same primer position that occur at very low frequency within a noticeable number of plant families (Suppl. material 1: Suppl. file 2). One example shown in our data is the UniPlantF and the (partially over-lapping) ITS-3p62plF1 primer (Suppl. material 1: Suppl. file 2). For this reason, although Moorhouse-Gann et al. (2018) introduced an ambiguous nucleotide at primer position 12G > R, we did not replace 18G > R in the ITS-3p62plF1 primer due to the low prevalence of the mismatch as well as this modification resulting in the introduction of a hairpin structure. Due to the high melting temperature of the hairpin structure, which is close to the primer annealing temperature, there is a high risk that the gained universality will not translate into increased PCR success in the targeted plant families. Furthermore, this modification would introduce additional seven new primer variants that do not match any Spermatophyta template (Suppl. material 1: Suppl. file 5). There is also a geographical aspect, as the affected plant families generally have their centre of diversity in tropical regions (Köppen-Geiger climate classification: A). This illustrates that even primers optimised for universality require careful evaluation to find the best possible match between primer characteristics and the target of the respective study.
Strictly eliminating all mismatches by replacing each mismatch by an ambiguous nucleotide increases the count of total primer variations exponentially. Every introduced ambiguous nucleotide comes with a risk of negatively impacting the primer performance (e.g. primer dimers). An alternative for primers with ambiguous nucleotides that have already been optimised as far as possible, considering a reasonable number of total primer variations and still have mismatches in a few plant families, would be to add only those primer variations that lead to the correction of the respective mismatch. Our results allow researchers to supplement existing primers with fixed (without ambiguous nucleotides) primers, targeting specific mismatches of plant families to tailor the primer mixture to their individual needs (Suppl. material 1: Suppl. file 3).
A modification of the ITS-4unR1 primer (11T>D) allows flexibility to either be truly universal or exclude certain taxa, at least partially wind-pollinated plant families. This could be useful in insect pollination studies to reduce the number of reads generated by NGS from some Pinus species (e.g. Pinus sylvestris).
Like previous studies (Cheng et al. 2016;Moorhouse-Gann et al. 2018), this analysis is limited by the quantity of publicly available of ITS sequences and a future re-evaluation of the primers introduced here will be necessary to verify their universality. Filling the taxonomic gaps in public sequence repositories is of high importance, as reliable biodiversity monitoring via metabarcoding cannot be achieved without a complete regional reference database. Currently, there is no plant barcode marker available which covers more than 25% of all plant species, known or unknown (Corlett 2016;Kolter and Gemeinholzer 2020).

Mock community primer test
Due to PCR stochasticity (Kebschull and Zador 2015) and nrDNA copy number variation between 150 and > 100,000 per genome (Prokopowich et al. 2003;Wang et al. 2019) sequencing results of the mock community should be treated qualitatively and not quantitatively. Therefore, we scored the number of missed taxa instead of their relative abundance (read counts) as a measure of primer fitness. The minimally-required read depth provides an indication of how evenly amplification occurred (Table 3). We purposefully did not correct for the differences in nrDNA copy numbers to create mock communities in which some taxa are under-represented by at least one order of magnitude.
Having these aforementioned restrictions in mind, selection of the best suited primers is, in general, the most important tool in minimising additional biases during PCR and library preparation (Schirmer et al. 2015). Although previous studies have focused on optimising existing ITS primers or generating new ones (Morgan et al. 2009;Cheng et al. 2016;Moorhouse-Gann et al. 2018), none of them evaluated primer performance using a mock community. We effectively address this knowledge gap by the first primer efficacy assessment, based on two plant mock communities. The importance of this approach is underlined by the fact that some primers selectively disfavour a certain set of taxa, even without any mismatch being present. Other possible explanations for primer specific taxon bias are secondary structures or stretches with high or low GC content close or at the primer binding site and different binding characteristics of the variants of a primer with ambiguous nucleotides. This may be the case for Gagea graeca (Appendices 4 and 5), as it is nearly absent from sequencing runs using the UniPlantR primer, which has no mismatch in Gagea graeca. In addition to its bias, the mismatches of the UniPlantR primer in other Liliopsida families warrant caution when using this primer in combination with unknown eDNA samples (Suppl. material 1: Suppl. file 3).
One possible explanation for the mixed performance of the 58SPL primer could be that the relatively high primer hairpin melting temperature makes a large proportion of the 58SPL primer unavailable to anneal to its intended template sequence, disfavouring amplicons with a rare prevalence (Table 2). If this is the case, increasing the primer concentration or raising the annealing temperature might alleviate this issue.
The ITS3 + ITS4 primer combination, included in this study, was originally designed to amplify fungi (White et al. 1990). Due to its confirmed high preference towards fungi, as well as the high missed taxa rate, we discourage the use of this primer pair for plant metabarcoding. In spite of this, the ITS3 + ITS4 primer combination still holds its place in recent literature (Gresty et al. 2018;Kamo et al. 2018;Besse 2021;Câmara et al. 2021).
The ITS-u3 + ITS-u4, ITS-p3 + ITS-p4 and ITS-S2F + BEL-3 primer combinations have some performance metrics in their favour (Table 3). Despite this, the relatively high number of plant families with mismatches, as shown by the in silico analysis (Table 2), warrants careful evaluation before using them with eDNA. These results emphasise that, despite the usefulness of mock communities, in silico evaluation can provide valuable additional information on primer universality. While the universality of aforementioned primers could be improved by adding additional ambiguities, the overlap with already existing or already optimised primers, introduced in this paper, indicates that most of them can be replaced with updated versions (Fig. 1).
To our knowledge, this is the first mock community-based primer comparison in the context of metabarcoding in the plant kingdom. Although our mock communities only reflect a fraction of the genetic diversity within plants, we have demonstrated that there are differences between different ITS primer combinations and that these differences are not necessarily based on primer mismatches. The differences not originating from primer mismatches cannot be detected by in silico analyses only, further illustrating the need for mock community studies to verify the results of metabarcoding programmes. In contrast, the universality of the primers can be better assessed by the more comprehensive evaluations made during the in silico analysis. Should the need arise, in essence, we recommend an integrative approach to evaluate primers by combining in silico and mock community analyses. The composition of the mock community should ideally be connected to the respective study area. Considering that we thoroughly screened the whole 5.8S nrDNA region for potential primer sequences, we advise using one of the primers presented in this paper as a starting point for further refinement, if needed.

Mock community PCR optimisations
Varadharajan and Parani (2021) mentions 65% GC content as the threshold for which additives are impacting the PCR success dramatically; similarly, our results indicate 62% GC as the threshold where DMSO starts to improve the sequencing result of a diverse mock community (Fig. 2). In accordance with Varadharajan and Parani (2021), we also show that concentrations below 5% DMSO did not suffice to maximise amplification success (Suppl. material 1: Suppl. file 1). Of the ITS2 plant sequences available in public repositories in 2018, 28.5% had a GC content of 65% or higher (Kolter and Gemeinholzer 2020). The lack of PCR optimisation for high GC levels may invalidate the main findings of previous studies, as plants with high GC levels are eliminated from the amplicon pool during PCR.

Conclusion
In metabarcoding, regardless of the marker used, we point out that it is strongly recommended to integrate mock communities into the workflow to provide additional quality control (Thalinger et al. 2020). Based on the results of our in silico and mock community analyses, we recommend ITS-3p62plF1 in combination with ITS-4unR1 (possibly modified) for amplification of the ITS2 region. The UniPlantF + ITS-p4 (modified) needs additional validation, but shows promise to also improve future ITS2 metabarcoding studies. If sequence length is decisive for primer selection, the SPL58 primer, in combination with the UniPlantR primer, offers the shortest possible ITS2 amplicon. However, secondary structures and mismatches could negatively affect PCR efficiency and universality.
The past has shown that ITS-based studies have struggled with amplification success (Braukmann et al. 2017;Gill et al. 2019;Jones et al. 2021). However, this work eliminates the most pressing issues, namely the lack of PCR optimisations and the lack of a comprehensive primer evaluation. In contrast to other plant markers, the ITS region now combines high informative and the potential for high amplification success. Note: * This GC content is the average of Equisetum (73% GC), Huperzia (72%GC) and Selaginella (57% GC). # Equal parts of Arbutus unedo and Arbutus andrachne. The ITS-S2F + BEL-3 primer combination yielded only 3 replicates. The ITS-p3 + ITS-p4 primer combination yielded less reads than the other combinations. The highest three read abundance values in each column, including fungi, are highlighted (bold). Absence of a taxon in one or more replicates is represented by a grid pattern where each white cell represents one replicate the respective taxa could not be detected in. Philodendron angustisectum could not be detected.

Appendix 4
Appendix 5 Note: The ITS-S2F + BEL-3 primer combination yielded only 3 replicates. The ITS-p3 + ITS-p4 primer combination yielded less reads than the other combinations. The highest three read abundance values in each column, excluding fungi, are highlighted (bold). Absence of a taxon in one or more replicates is represented by a grid pattern where each white cell represents one replicate the respective taxa could not be detected in. Sassafras albidum could not be detected.  Appendix 7