Corresponding author: Carol A. Stepien ( cstepien@uw.edu ) Academic editor: Bernd Hänfling
© 2020 Matthew R. Snyder, Carol A. Stepien.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Snyder MR, Stepien CA (2020) Increasing confidence for discerning species and population compositions from metabarcoding assays of environmental samples: case studies of fishes in the Laurentian Great Lakes and Wabash River. Metabarcoding and Metagenomics 4: e53455. https://doi.org/10.3897/mbmg.4.53455
|
Community composition data are essential for conservation management, facilitating identification of rare native and invasive species, along with abundant ones. However, traditional capture-based morphological surveys require considerable taxonomic expertise, are time consuming and expensive, can kill rare taxa and damage habitats, and often are prone to false negatives. Alternatively, metabarcoding assays can be used to assess the genetic identity and compositions of entire communities from environmental samples, comprising a more sensitive, less damaging, and relatively time- and cost-efficient approach. However, there is a trade-off between the stringency of bioinformatic filtering needed to remove false positives and the potential for false negatives. The present investigation thus evaluated use of four mitochondrial (mt) DNA metabarcoding assays and a customized bioinformatic Bioinformatic pipeline to increase confidence in species identifications by removing false positives, while achieving high detection probability. Positive controls were used to calculate sequencing error, and results that fell below those cutoff values were removed, unless found with multiple assays. The performance of this approach was tested to discern and identify North American freshwater fishes using lab experiments (mock communities and aquarium experiments) and processing of a bulk ichthyoplankton sample. The method then was applied to field environmental (e) DNA water samples taken concomitant with electrofishing surveys and morphological identifications. This protocol detected 100% of species present in concomitant electrofishing surveys in the Wabash River and an additional 21 that were absent from traditional sampling. Using single 1 L water samples collected from just four locations, the metabarcoding assays discerned 73% of the total fish species that were discerned during four months of an extensive electrofishing river survey in the Maumee River, along with an additional nine species. In both rivers, total fish species diversity was best resolved when all four metabarcoding assays were used together, which identified 35 additional species missed by electrofishing. Ecological distinction and diversity levels among the fish communities also were better resolved with the metabarcoding assays than with morphological sampling and identifications, especially using all four assays together. At the population-level, metabarcoding analyses targeting the invasive round goby Neogobius melanostomus and the silver carp Hypophthalmichthys molitrix identified all population haplotype variants found using Sanger sequencing of morphologically sampled fish, along with additional intra-specific diversity, meriting further investigation. Overall findings demonstrated that the use of multiple metabarcoding assays and custom bioinformatics that filter potential error from true positive detections improves confidence in evaluating biodiversity.
community composition, cytochrome b, eDNA, Great Lakes, population variation, species detection, species diversity, 12S RNA
Assessments of species compositions and diversities of biological communities are fundamental for understanding their ecology (
Metabarcoding assays employing high-throughput sequencing (HTS) can be used for species identifications and calculations of community diversity, and are more sensitive, less damaging, and relatively time- and cost-efficient than are morphological determinations from capture-based surveys (
PCR inhibition is a challenge in some environmental samples, leading to amplification failure or false negatives (
Our research objectives were to: (1) test the use of multiple metabarcoding assays and an associated bioinformatic Bioinformatic pipeline, which combined results from primer sets to reduce possible sources of error and increase confidence, and (2) evaluate the efficiency and accuracy of this approach in field and laboratory experiments. For (2), we compared the results with those from traditional capture-based field sampling of fishes, morphological identifications, and population genetic Sanger sequencing of individuals.
We tested the performance of our metabarcoding assays and bioinformatic Bioinformatic pipeline with mock communities, laboratory aquarium experiments, and processing of an ichthyoplankton sample to assess sensitivity for assessing inter- and intra-specific diversity (Suppl. material 1). We applied this metabarcoding protocol to eDNA water samples from two large rivers (Figs
Experimental design schematic, depicting Experiment Series A and B, brief methods summary for each experiment in the Series, the aspect of metabarcoding assays tested, and assays applied. * silver carp haplotypic diversity was assessed by
Map showing sample sites in the Wabash River (WAB), Maumee River (MAU), Detroit River (larval fish sample; DRL), Lake St. Clair (LSC), and Lake Erie Islands (LEI) (for Experiment Series A and B). At selected sites, morphological surveys (*) or traditional population genetics sampling and data collection (†) were conducted and compared to eDNA metabarcoding assay results. Wabash River (WAB) and Lake St. Clair (LSC) locations were in too close proximity to be depicted separately (geographic coordinates are in Suppl. material 1: Table S1). Field locations were mapped using STEPMAP (stepmap.com, which holds no copyright on data or layers presented).
All fishes were collected by our lab under Ohio Department of Natural Resources (ODNR) permit #17-159, Michigan Department of Natural Resources permits, or by collaborators with their permits (see Acknowledgements). All native fishes except those used for the mock communities (Suppl. material 1) were released alive and in apparent good health immediately in the sampling area. Invasive fishes (which cannot be legally re-released) were anesthetized and sacrificed under the approved University of Toledo IACUC #205400, “Genetic studies for fishery management” (to CAS and laboratory members) using an overdose of 250mg/L tricaine methane sulfonate (MS222; Argent Chemical Laboratories, Redmond, WA). Taxonomy and nomenclature presented followed www.Fishbase.org.
Three metabarcoding assays designed by our lab (
Primers used for our metabarcoding assays. Table indicates primer element function, primer name, direction (Direction (Dir); F=forward, R=reverse), and sequences for each primer element. Length of region amplified (NTs; variable for 12S RNA MiFish, for which a mean is given) and annealing temperatures (TA) are provided for target-specific primers. Primer topology was 5`–Illumina sequencing adapter, spacer insert, target specific primer–3`. Spacer inserts were from
Function | Name | Dir | Sequence 5’–3’ | NTs | T A |
Target specific | FishCytb | F | GCCTACGCYATYCTHCGMTCHATYCC | 154 | 50 ° C |
R | GGGTGTTCNACNGGYATNCCNCCAATTCA | ||||
CarpCytb | F | KRTGAAAYTTYGGMTCYCTHCTAGG | 136 | 54 ° C | |
R | AARAAGAATGATGCYCCRTTRGC | ||||
GobyCytb | F | AACVCAYCCVCTVCTWAAAATYGC | 167 | 50 ° C | |
R | AGTCANCCRAARTTWACRTCWCGRC | ||||
MiFish | F | GTCGGTAAAACTCGTGCCAGC | ~172 | 65 ° C | |
R | CATAGTGGGGTATCTAATCCCAGTTTG | ||||
Adapter | Illumina | F | TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG | ||
seq | R | GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG | |||
Spacer inserts | e | F | TCCTATG | ||
R | CGTACTAGATGTACGA | ||||
f | F | ATGCTACAGT | |||
R | TCACTAGCTGACGC | ||||
g | F | CGAGGCTACAACTC | |||
R | GAGTAGCTGA | ||||
h | F | GATACGATCTCGCACTC | |||
R | ATCGGCT |
All primer sets included the Illumina sequencing adapters and four unique spacer inserts, designated e–h, at the 5’ end (Table
To compare among the assays and with results from traditional morphological identifications of capture-based samples, several eDNA water samples were collected concomitant with conventional surveys (Figs
The Ohio Environmental Protection Agency (OEPA) conducted 44 electrofishing surveys at 22 sites in the Maumee River, OH (a western Lake Erie, Great Lakes tributary) in June–September 2012, from which all fishes were identified, counted, and weighed (g) by them (
To evaluate population genetic compositions using metabarcoding assays of eDNA water samples, 1 L of water was collected 10 cm below the surface (site LSC 2 in Fig.
Water samples from turbid habitats often clog filters (
We used our previously published custom bioinformatic Bioinformatic pipeline (
Trimmed reads were merged in DADA2 (
Unique ASVs were subjected to the Basic Local Alignment Search Tool (BLAST; https://blast.ncbi.nlm.nih.gov/Blast.cgi) from the command line against custom databases, to obtain the top 500 results per ASV. The custom database consisted of all cyt b or 12S RNA (MiFish) sequences on GenBank for fishes from the Great Lakes, plus predicted future invasive species (
Our cyt b reference database was robust, containing sequences for > 95% of Great Lakes fishes and 100% of the present and predicted invasive species (see
BLAST results (i.e., “hits”) with < 90% query cover or identity were removed. All unique species hits per ASV per assay that passed this filter and had the lowest expectation (e) value (best match) were combined in a list of potential taxa. A lowest common ancestor approach was used for taxonomic assignments for sequences that did not match a single species (i.e., if all hits with the lowest e value were the same genus, then taxonomic assignment was to that level).
For metabarcoding assay results, species incidences were scored as valid when they were greater than the calculated error cutoff from the positive controls (see “DNA capture, extraction, and library preparation” above and Suppl. material 1) in a single assay or occurred in multiple assays (hereafter termed the multi-assay approach). This approach aimed to reduce false positives and simultaneously compensate for potential primer set biases. We compared those results to the use of 0.1% as the cutoff (the known rate of index-hopping on MiSeq;
Results from our metabarcoding assays were compared with their concomitant morphological identifications from traditional capture-based surveys (i.e., electrofishing transects A1 and A2, or Sanger sequencing of sampled fish in B1 and B2). Species appearing unique to the metabarcoding assays and/or capture-based surveys were evaluated from individual samples and sampling regions. Some morphological or metabarcoding identifications were restricted to the genus or family levels (see). False negatives were determined using a relaxed detection criterion, which recorded a species as being present when distinguished at the genus level in either the morphological or metabarcoding results. Species richness values were compared between the morphological and the metabarcoding approaches (and among the individual and multi-metabarcoding assays) using t-tests in R (
Habitat differentiation was assessed with Non-metric Multi-Dimensional Scaling (NMDS) using Bray-Curtis dissimilarity in VEGAN (
Numbers and proportions of population haplotypes were calculated from HTS reads and traditional Sanger sequencing of individuals. FST and exact tests of population differentiation in Arlequin (
No PCR amplifications occurred in the 0 hr control samples from the goby aquarium experiments (C3) or in any of our negative extractions, centrifugations, no-template PCR, indexing, or clean-up controls (see Suppl. material 1). A total of 27,961,011 sequence reads were obtained among all libraries (mean pe r sample per assay ± SE=229,189 ± 18,645; Suppl. matetial 1: Table S3). A mean of 204,320 ± 17,738 reads per sample per assay was successfully trimmed. DADA2 merged an average 0.80 ± 0.01 of trimmed reads, with a mean of 75.4 ± 11.4 ASVs per sample per assay. Of those, a mean of 23 ± 1.7 had BLAST hits to our fish databases that passed the identity and query cover filter of > 90% (mean query cover = 99.83 ± 0.01%, mean identity = 99.14 ± 0.02%). Single species identifications included 2,342 ASVs (89% of total 2,631 hits passing the filter) and 59, 103, 43, and 75 of the hits to the genus level for the FishCytb, CarpCytb, GobyCytb, and 12S RNA MiFish assays, respectively. These included 14 genera, primarily: Carassius (13% of the overall genus level hits), Carpiodes (19%), and Ictiobus (39%), for which either morphological identifications and/or another one of our metabarcoding assays resolved a congener to species. Nine 12S RNA hits were resolvable only to the family level, of which six were Cyprinidae, and three were Catostomidae; all were discarded. Not all eDNA samples led to successful libraries for every assay, presumably due to primer-specific inhibition (dashes in Table
Laboratory experiments and analyses of the larval fish sample showed that our assays and accompanying bioinformatic Bioinformatic pipeline were highly sensitive, with few false negatives and high detection probability (Suppl. material 1). Several false negatives, i.e., taxa discerned by morphological identifications but not with single metabarcoding assays using the calculated error cutoff, were positive when 0.1% was used for filtering (N = 13) or when all ASVs were accepted (N = 48). Our multi-assay approach detected more species than did the single assays (see “Community comparisons” below). Using the multi-assay approach, just one false negative (with the calculated error cutoff) was positive when all ASVs were accepted. When ASVs above 0.1% were accepted, several index-hops were apparent, including for the Black Sea sprat Clupeonella cultriventris, a possible future invader of the Great Lakes that has not been documented in North America (
Morphological identifications from capture-based surveys did not completely overlap the metabarcoding assay results. For the total of all samples, our multi-assay metabarcoding approach detected more taxa than did morphological determinations (Table
Experiment A1: Morphology discerned 18 fish species belonging to five families, from electrofishing surveys in the Wabash River. Our metabarcoding assays identified all of those (100%) to species, along with an additional 20 species (Fig.
Sample diversity based on morphological identifications versus metabarcoding results (from Experiment Series A and B). (A) Species richness from morphological surveys and metabarcoding assays. (B) Number of species (richness) discerned with morphology and species uniquely found with metabarcoding results (morphological “false negatives”). Proportion of false negatives in metabarcoding results (in parentheses). Regional samples were combined, respectively, for the Maumee (1–4) and Wabash (1–2) rivers. For the Maumee River, “all” indicates results for all species in summer 2012 electrofishing surveys regardless of whether their concomitant eDNA data were processed.
A | Richness | |||||
---|---|---|---|---|---|---|
Location | Morphology | Multi-assay | FishCytb | CarpCytb | GobyCytb | 12S RNA MiFish |
Maumee River 1 | 13 | 38 | 19 | 14 | 18 | 18 |
Maumee River 2 | 22 | 25 | 15 | 12 | 9 | 7 |
Maumee River 3 | 23 | 26 | 18 | 10 | 6 | 16 |
Maumee River 4 | 23 | 28 | 17 | 6 | 15 | 19 |
Maumee River 1–4 | 33 | 42 | 26 | 20 | 24 | 31 |
Lake St. Clair 1 | – | 16 | 6 | 7 | 8 | 8 |
Lake St. Clair 2 | – | 16 | – | 10 | 12 | 12 |
Lake St. Clair 3 | – | 16 | 9 | 11 | 9 | – |
Lake St. Clair all | – | 23 | 6 | 16 | 16 | 8 |
Lake Erie Islands | – | 14 | 7 | 8 | – | 9 |
Wabash River 1 | 13 | 30 | 8 | 14 | – | 21 |
Wabash River 2 | 12 | 29 | 14 | 10 | 5 | 16 |
Wabash River 3 | – | 27 | 17 | 11 | 9 | 11 |
Wabash River 1–2 | 18 | 37 | 22 | 20 | 10 | 30 |
B | Richness | Unique to metabarcoding assays (false negatives) | ||||
Location | Morphology | Multi-assay | FishCytb | CarpCytb | GobyCytb | MiFish |
Maumee River 1 | 13 | 20 (0.00) | 1 (0.23) | 1 (0.38) | 3 (0.23) | 7 (0.31) |
Maumee River 2 | 22 | 6 (0.41) | 3 (0.45) | 3 (0.73) | 1 (0.73) | 1 (0.77) |
Maumee River 3 | 23 | 4 (0.26) | 0 (0.57) | 1 (0.70) | 0 (0.78) | 3 (0.52) |
Maumee River 4 | 23 | 10 (0.30) | 1 (0.61) | 1 (0.83) | 4 (0.65) | 6 (0.57) |
Maumee River 1–4 | 33 | 18 (0.12) | 2 (0.36) | 4 (0.67) | 6 (0.58) | 9 (0.39) |
Maumee River all | 59 | 9 (0.27) | 0 (0.56) | 2 (0.76) | 3 (0.64) | 6 (0.54) |
Wabash River 1 | 13 | 16 (0.23) | 0 (0.54) | 5 (0.54) | – | 13 (0.46) |
Wabash River 2 | 12 | 14 (0.08) | 7 (0.67) | 1 (0.50) | 2 (0.83) | 7 (0.17) |
Wabash River 1–2 | 18 | 21 (0.00) | 9 (0.33) | 5 (0.39) | 2 (0.83) | 14 (0.22) |
Families (and numbers of fish species, in parentheses) detected with morphology (Morph), metabarcoding assays, or using both methods (for Experiment Series A). Samples taken concomitant with electrofishing surveys were combined (Maumee River 1–4, Wabash River 1–2). Maumee River all: comparison of four eDNA water samples to 44 electrofishing transects from 22 sites in the Maumee River, June–September 2012.
Experiment A2: 33 species belonging to 11 families were detected with electrofishing surveys conducted concomitant with eDNA water sampling from four sites in the Maumee River. Our metabarcoding analyses detected 29 (88%) of those, along with an additional 19 species. A total of 59 species belonging to 12 families were collected among all 44 morphological surveys from 22 Maumee River sites across four months of intensive sampling by the OEPA during summer 2012. Our metabarcoding assays discerned 43 (73%) of those species and an additional nine species from just single 1 L water samples at only four of the sites (corresponding to 9% of the OEPA’s surveys, and 18% of their total number of sampling sites).
Only four of the fish species from the Maumee River electrofishing surveys conducted concomitant with our four single eDNA water samples were not detected with metabarcoding assays – northern hogsucker Hypentelium nigricans, longnose gar Lepisosteus osseus (the sole false negative in our high diversity aquarium experiments; Suppl. material 1: Experiment C2), stonecat Noturus flavus, and white crappie Pomoxis annularis (Fig.
As expected, species results from the single assays did not completely overlap. Of the 347 individual species detections across all samples, 111 (32%) occurred in single assays. Twenty-one (6%) of the detections were scored as positive according to the multi-assay criteria, meaning that their hits fell below the cutoff values for multiple assays in the same sample. Mean proportions of false negatives from single assays in samples taken concomitant with electrofishing surveys were 0.48 ± 0.04. When all samples from a single region were combined, this value fell to 0.34 ± 0.04. The multi-assay approach had significantly fewer false negatives after SBC than did the single assays (p < 0.004 for all). Mean proportions of false negatives using the multi-assay approach were 0.17 ± 0.05 for the individual sampling sites, and 0.09 ± 0.03 when all samples from each region were combined. Six of the common false negatives from the 12S RNA assay were due to species lacking reference 12S RNA sequences in GenBank (i.e., quillback Carpiodes cyprinus, highfin carpsucker Ca. velifer, shorthead redhorse Moxostoma macrolepidotum, ghost shiner Notropis buchanani, and white crappie).
Complete (100%) detection from metabarcoding assays occurred for Experiment A2 in the MAU 1 sample, which had the lowest morphological species richness (Table
A mean of 4.6 ± 1.0 taxa in the single assays or 14.5 ± 5.3 using the multi-assays were undetected in the concomitant morphological samples. Two unlikely false positives occurred with the 12S RNA MiFish assay. There were several apparent matches to the non-native blacktip jumprock Moxostoma cervinum in the Wabash River, likely due to most Moxostoma being absent from the 12S RNA database – six of the seven species known from the watershed (
The metabarcoding assays found every invasive species collected in the morphological capture-based surveys, including: silver carp, common carp Cyprinus carpio, flathead catfish Pylodictis olivaris, round goby, and white perch (Suppl. material 1: Table S4). Ghost shiner eDNA was not detected from two Maumee River sites (A2) where it was physically collected, but was found in metabarcoding assay results from another sample in the region. Both of those false negatives occurred at < 0.1% of the total fish biomass. Our assays identified more samples that contained invasive species. For example, just one electrofishing transect in the Wabash River (A1) caught silver carp, yet every metabarcode sample detected that species in at least one assay. Our assays detected invasive grass carp in the Maumee (A2) and Wabash rivers (A1), where it is known to occur but was not caught. Tubenose goby was not captured in our Maumee River samples (A2), but was present in the eDNA metabarcoding results (and is known to occur there).
Some species were detected in just one of the geographic regions we sampled. In the Wabash River (Experiment A1), these were: blue sucker Cycleptus elongatus and invasive silver carp Hypophthalmichthys molitrix, identified both with morphology and metabarcoding assays, and gravel chub Erimystax x-punctatus and mooneye Hiodon tergisus by eDNA metabarcoding assays alone. Species detected in the Maumee River samples alone were: pumpkinseed Lepomis gibbosus, orangespotted sunfish Lepomis humilis, invasive ghost shiner, spotted sucker Minytrema melanops, and common logperch Percina caprodes with both morphology and metabarcoding assays, and black crappie Pomoxis nigromaculatus, spoonhead sculpin Cottus ricei, orangethroat darter Etheostoma spectabile, and invasive tubenose goby solely with metabarcoding assays (Experiment A2). We surveyed Lakes St. Clair (Experiment A3) and Erie (Experiment A4) with eDNA metabarcoding assays alone. Black bullhead Ameiurus melas solely was in the Lake Erie Islands sample, and invasive chum salmon Oncorhynchus keta in Lake St. Clair (where it has been introduced for sport fishing).
Species richness values obtained from single samples, as well as for regional analyses, were higher using the multi-assay approach than with any single metabarcoding assay or morphology (Table
NMDS plots discerned more discrete groupings of regional samples with multi-assays than with single assays (Fig.
Some samples did not cluster by geographic region in the dendrograms when single assay results were used together with morphological identifications (Fig.
Non-metric multi-dimensional scaling plot based on binary Bray-Curtis dissimilarity for presence/absence of species detected by metabarcoding assays and morphological capture-based methods (when both were conducted concomitantly; Experiment Series A).
Dendrogram of relationships among metabarcoding and morphological samples, using binary distances and Ward’s D2 agglomeration method (Experiment Series A). (A) Results from individual metabarcoding assays and morphological data. Fish = FishCytb, Carp = CarpCytb, Goby = GobyCytb, MiFish = 12S RNA, Morph = morphological sampling. (B) Results from the multi-assay approach (Multi) and morphological (Morph) data. See Fig.
Aquarium experiments using round gobies possessing haplotypes RG 1, 8, and 57 showed no false negatives, but some false positives fell above the error cutoff (Suppl. material 1: Experiment C3). Traditional Sanger sequencing of tissue samples identified three round goby haplotypes in Lake St. Clair (Experiment B1): “RG 1” (78% of individuals), “8” (12%), and “57” (10%) (Fig.
Sanger sequencing discerned three silver carp haplotypes that were physically sampled in the Wabash River (Experiment B2: designated as “SC A”, “B”, and “H”), constituting 49%, 48%, and 3% of that population sampled at a separate time (
Population genetic haplotypic diversity assessed with metabarcoding assays versus traditional DNA sequencing (Experiment Series B). Round goby (RG) in Lake St. Clair (LSC2: surface, LSC3: benthos) and silver carp (SC) haplotypes in the Wabash River (WAB) assessed with traditional population genetic sequencing (Trad) and the GobyCytb and CarpCytb metabarcoding assays. New cytochrome b haplotypes (N) not described from either species to date, and having sequence frequencies <1% are unlabeled, for visual clarity.
Our multi-assay metabarcoding approach and accompanying bioinformatic Bioinformatic pipeline demonstrated high detection probability that was better or similar to traditional morphological sampling, with low false negatives and additional species discerned despite considerably less sampling effort. The custom Bioinformatic pipeline improved overall sequence run quality and removed apparent index-hops and/or cross-contamination by using spacer inserts, which served as indices for the initial amplifications. Sequencing error was calculated using positive controls of marine species that could not live in this freshwater environment. ASVs whose proportions were below the error cutoff were removed unless they occurred in multiple markers. Proportions of sequence reads showed weak but positive correlations to species biomass. Metabarcoding results for the round goby and the silver carp assays identified all of the haplotypic variation found with traditional population genetics Sanger sequencing. Additional “new” haplotypes found with metabarcoding assays may have resulted from sequencing error not removed by our Bioinformatic pipeline, meriting further testing.
We evaluated various frequency-based filters to eliminate index-hops and/or cross-contamination from positive controls or other samples (since the likelihood that sequencing error would result in a BLAST hit to a different species was low). Given the large number of samples that can be pooled on a HTS run, such sources of error could result in false positives (
Our results showed that the cyt b assays revealed strong positive relationships between input concentrations of DNA and the proportions of sequence reads in mock communities (Suppl. material 1: Experiment C1). eDNA water samples showed weaker, but most often positive relationships, depending on the assay used. These relationships likely were affected by environmental conditions, such as eDNA transport and settling rates in water (
Elucidating overlap in community diversity between metabarcoding assays and traditional morphological capture-based survey methods was an important goal of our Experiment Series A. Other investigations have shown wide variation in species overlap between these approaches, ranging from 25% (
The trend for higher detection efficiency across broader geographic regions or watersheds compared to findings at individual sampling sites is common (
Sequence identifications have been shown to be related to stringency of bioinformatic filtering (
Unsurprisingly, more intensive sampling regimes increased the total diversity obtained and improved overlap between metabarcoding assays and capture-based surveys (
Traditional capture or visual surveys can be thwarted by physical or environmental conditions (
The 100% detection efficiency of our metabarcoding assays at MAU 1 may have been due to eDNA samples resolving a larger spatial extent than capture-based methods, particularly in large lotic systems (
Invasive species discovered solely with our metabarcoding assays all were within their known geographic ranges, except for round goby in the Wabash River (A1). The round goby also was identified from eDNA in nearby bait shops (
eDNA metabarcoding assays often have described greater diversity in habitats compared to morphological identifications from capture-based sampling (see
Our multi-assay approach and most of the single metabarcoding assays differentiated taxonomic compositions and diversity levels among geographic regions more effectively than did morphological captured-based surveys. This finding is in concert with results from other metabarcoding investigations (
Our goby aquarium experiments (Suppl. material 1: Experiment C3) and additional studies have demonstrated the potential of metabarcoding assays to assess population-level diversity (
Aquarium eDNA experiments by
To our knowledge, few investigations have examined whether haplotype identities and their frequencies from traditional Sanger sequencing of tissue samples matched those found with metabarcoding assays in the environment.
Haplotypic diversities of populations targeted for metabarcoding assays must be evaluated in order to design primers that are best able to differentiate them. In theory, metabarcoding assays have the ability to distinguish more variation and/or more accurately assess population genetic diversity, due to the much larger numbers of individuals that may be screened, as for thousands of dreissenid mussel larvae by
Our metabarcode Bioinformatic pipeline revealed high species-level discrimination and low false positive probability employing the multi-assay approach. This was achieved by using multiple primer sets to alleviate false negatives stemming from possible bias, and applying stringent bioinformatic filtering to reduce any false positives from cross-contamination, index-hopping, or sequencing error. Results from these primer sets were combined in a logical way. The multi-assay approach discerned nearly all of the diversity sampled over much more extensive traditional electrofishing surveys, and yielded an appreciable number of additional species. We found that our multi- and single assays alike better differentiated among ecological regions and their communities than did morphological identifications from conventional sampling.
Future work likewise should employ a library preparation and bioinformatic Bioinformatic pipeline that reduces error using indexing of the initial PCR, and removal of cross-contamination and index-hopping. Such research also should assess effectiveness using positive controls and/or mock communities, for every assay on every run. Multiple values for frequency-based filtering should be evaluated with these positive controls, mock communities, and test samples to determine which performs best. More intensive eDNA water sampling at each location likely would improve the performance of the multi-assay metabarcoding results presented here. Metabarcoding reads potentially could be used as a proxy for proportional taxon abundances within the system, but results are marker dependent and should be interpreted with caution. Current technological limitations may render population genetic analyses using metabarcoding data problematic, but as technology improves, error incidences decline, and longer sequence read lengths become more feasible, this application shows promise.
Scripts for the bioinformatic Bioinformatic pipeline and custom BLAST databases are in the Dryad database (https://doi.org/10.5061/dryad.7m0cfxprx). FASTQ files for all samples sequenced are in the NCBI Sequence Read Archive (BioProject Accession: PRJNA625378). The Suppl. material 1 contains additional details, including additional experimental results.
This is contribution 4967 from the NOAA Pacific Marine Environmental Laboratory (PMEL) and 2020-1061 from the University of Washington’s Joint Institute for the Study of the Atmosphere and Ocean (JISAO). We thank Nathaniel Marshall for aiding in library preparation and Bioinformatic pipeline development; Thomas Blomquist, Carson Prichard, and James Willey for early consultation and work on primer development; Edward Roseman for collecting and identifying the species in the larval fish sample; Yuriy Kvach, Matthew Neilson, and Mariusz Sapota for collecting unestablished potential invaders for use in mock communities; Mark Pyron for conducting electrofishing surveys in the Wabash River; David Altfater and the entire Ohio Environmental Protection Agency stream sampling crew for conducting the electrofishing surveys and collecting water samples in the Maumee River; and Keith Wernert for obtaining permission for us to sample display aquarium 2 and providing the census of fishes within. The laboratory of C.A.S. provided logistical support, especially Shane Yerga-Woolwine and Anna Elz. Additional support for grant and project management was provided by Thomas Ackerman, Frederick Averick, David Butterfield, Frank Calzonetti, Kevin Czajkowski, Timothy Fisher, Darlene Funches, Anna Izzi, Daryl Moorhead, and Scott McBride. Jonathan Bossenbroek, Kerry Naish, and William Von Sigler provided valuable comments on an early version of the manuscript.
The research was funded by USEPA grants GL-00E01149-0 (to C.A.S. and W. Von Sigler) and GL-00E01898 (to C.A.S. and Kevin Czajkowski); the latter was partially subawarded from the University of Toledo to the University of Washington Joint Institute for the Study of the Atmosphere and Ocean (JISAO) for research at the new Genetics and Genomics Group of C.A.S., NOAA Pacific Marine Environmental Laboratory (PMEL). Additional support was provided by NOAA OAR (Oceanic and Atmospheric Research) ‘Omics funding to the Genetics and Genomics Group led by C.A.S. M.R.S. was partially supported by a graduate student fellowship from the University of Toledo (2016–2019), conducted under the advisement of C.A.S., and by JISAO (2019).