Research Article |
Corresponding author: Devin N. Jones ( dnjones@usgs.gov ) Academic editor: Bernd Hänfling
© 2024 Devin N. Jones, Ben C. Augustine, Patrick Hutchins, James M. Birch, Kevan M. Yamahara, Scott Jensen, Rodney Richardson, Regina Trott, James R. Campbell, Elliott Barnhart, Adam J. Sepulveda.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Jones DN, Augustine BC, Hutchins P, Birch JM, Yamahara KM, Jensen S, Richardson R, Trott R, Campbell JR, Barnhart E, Sepulveda AJ (2024) Autonomous samplers and environmental DNA metabarcoding: sampling day and primer choice have greatest impact on fish detection probabilities. Metabarcoding and Metagenomics 8: e122375. https://doi.org/10.3897/mbmg.8.122375
|
Unprecedented rates of biodiversity loss and ecosystem function necessitate the use of rapid, efficacious, and cost-effective biomonitoring tools. The combination of autonomous samplers and high throughput sequencing (i.e., “metabarcoding”) of environmental DNA (eDNA) samples enables characterization of entire communities at high frequency and can be an important tool for conservation and management, allowing researchers to track fluctuations in biodiversity. We deployed two autonomous samplers at two U.S. Geological Survey streamgage sites in the upper Snake River (Wyoming and Idaho, USA) to collect eDNA samples from July-September 2021 and 2022 to characterize fish diversity. We used a probabilistic approach to evaluate the effects of water temperature, water discharge, filter pore size, water volume filtered, number of samples collected, timing, and primers on the probability of detecting eDNA from fish species known to be present. We detected eDNA from 13/15 species present in these areas of the Snake River. Overall, we did not find evidence that filter pore size, water volume filtered, water discharge, and water temperature affected the probability of detecting fish species’ eDNA. By contrast, primers and sampling day affected fish detection probabilities, indicating that primer choice and sampling day can either over- or under- estimate species diversity. These results indicate that users would ideally consider sampling on non-consecutive days and which primer set will maximize species detections.
Autonomous samplers, biomonitoring, eDNA, metabarcoding, USGS streamgage
Biodiversity monitoring informs species conservation, ecosystem service management, and progress toward major national and international policy initiatives (e.g.,
Environmental DNA (eDNA) metabarcoding analysis is a biomonitoring approach that allows for biological communities to be described from short sections of DNA that occur in easy-to-collect environmental samples. The approach is accurate, rapid, cost-effective, non-invasive and can be consistently and broadly applied (
The amount of eDNA available to be sampled at a site is temporally variable because it is a product of dynamic biological and environmental interactions. Organisms release complex mixtures of DNA in the environment at inconsistent rates and amounts that are associated with taxonomy, size, activity, and phenology among other things (
Autonomous eDNA sampling instrumentation presents an opportunity to overcome human resource limitations for ecosystem monitoring over time and space. Autonomous eDNA instruments, like Monterey Bay Aquarium Research Institute’s environmental sample processor (ESP) platforms (Moss Landing, CA, USA), can collect and preserve dozens of eDNA water samples without a human operator (e.g.,
Understanding about appropriate eDNA temporal sampling frequencies for biomonitoring is minimal relative to many eDNA workflow methods (
To gain insight on temporal sampling tradeoffs for eDNA biomonitoring programs, we leveraged existing eDNA samples that were collected for invasive species surveillance by autonomous samplers at sub-daily frequencies for three months from summer (July) to autumn (September) seasons in 2021 and 2022. The autonomous samplers were deployed at U.S. Geological Survey (USGS) streamgages immediately downstream of two reservoirs in the Snake River (Idaho and Wyoming, USA). These waters contain fish species known to exhibit variation in daily and seasonal activities, such as cutthroat trout (Oncorhynchus clarkii) that reproduce in the summer and lake trout (Salvelinus namaycush) that reproduce in the fall (
We leveraged existing eDNA samples associated with a separate project that deployed Monterey Bay Aquarium Research Institute’s second-generation ESPs for targeted eDNA surveillance of aquatic invasive species in the upper Snake River (Wyoming and Idaho, USA). While our sample set and analyses are limited by the study design of this separate project, the high frequency of sample collection provides a unique opportunity to address knowledge gaps in eDNA metabarcoding study design. Second-generation ESPs (as described in
Map of the Upper Snake River in Wyoming and Idaho, USA. ESP locations are shown with pink circles. Green lines denote national parks (Yellowstone National Park and Grand Tetons National Park). Black lines indicate state borders.
We used these eDNA samples collected by ESPs at streamgage sites to infer fish community composition at the two upstream reservoirs. Previous research found that streamgage sites on large rivers, including those in this study, are adequately located for biosurveillance of upstream reservoirs (
Fish species detected by each primer set for 2021 and 2022 samples. An (M) next to the species name indicates that genetic material from this species was included in our mock community samples. Under each primer column (with the two sampling years included), a JL indicated that a species’ DNA was detected at the Jackson Lake site and a PR indicates that a species’ DNA was detected at the Palisades Reservoir site. The letter M indicates a species’ DNA was detected in the mock community samples.
Species Name | Common Name | Primers Used (Year of Sample Collection) | ||
---|---|---|---|---|
MiFish (2021) | MiFish (2022) | AcMDB (2022) | ||
Catostomus ardens (M) | Utah sucker | JL, PR, M | ||
Catostomus discobolus | Bluehead sucker | JL, PR | PR | |
Catostomus platyrhynchus | Mountain sucker | |||
Cottus bairdii (M) | Mottled sculpin | JL, PR, M | JL, PR, M | M |
Cottus beldingii | Paiute sculpin | |||
Gila atraria (M) | Utah chub | JL, PR, M | ||
Oncorhynchus clarkii | Cutthroat trout | JL, PR | JL, PR | JL, PR |
Oncorhynchus mykiss (M) | Rainbow trout | JL, PR, M | JL, PR, M | JL, PR, M |
Oncorhynchus nerka (M) | Kokanee | JL, PR | JL, PR, M | JL, M |
Prosopium williamsoni (M) | Mountain whitefish | JL, PR, M | JL, PR, M | JL, PR, M |
Rhinichthys cataractae | Longnose dace | JL | JL, PR | PR |
Rhinichthys osculus (M) | Speckled dace | JL, PR | PR, M | JL, PR, M |
Richardsonius balteatus (M) | Redside shiner | JL, PR, M | JL, PR, M | JL, PR, M |
Salmo trutta (M) | Brown trout | JL, PR | JL, PR, M | JL, PR, M |
Salvelinus namaycush (M) | Lake trout | JL, PR, M | JL, PR, M | M |
Water samples were filtered at daily or sub-daily time increments (Suppl. material
We used available data from the U.S. Geological Survey’s National Water Information System to describe the discharge (daily mean, m3/sec;
We conducted an initial assessment of the effects of filter pore size on fish eDNA detections by using a subset of ESP samples from 2021 that were collected using 0.45, 1.2, and 5 µm mixed cellulose ester (MCE) Millipore filters (Sigma-Aldrich, Darmstadt, Germany). Samples for filter pore size comparison were collected and filtered consecutively as the ESPs cannot collect multiple samples simultaneously. The ESP collected this subset of samples in order of size 0.45 µm, 1.2 µm, then 5 µm repeatedly at 3-hr intervals across four days in September 2021 for a total of 18 samples (three replicates of each filter pore size per site). Samples collected for filter pore size comparison were collected on September 10–12, 2021, at the Jackson Lake site and on September 12–14, 2021, at the Palisades Reservoir site. We did not use samples from 2022 to assess filter pore size on fish eDNA detections as we used a different filter type of one pore size (see ‘Sample frequency and timing’ below).
For samples collected in 2021, we amplified and sequenced DNA using the MiFish-U primer set (hereafter referred to as ‘MiFish’;
To assess the optimal number of samples needed to obtain reliable estimates of detection and the impact of sub-daily, daily, and seasonal timing on expected detections, we used both the 2021 and 2022 datasets. However, these datasets were analyzed separately because of methodological differences associated with the study design of the initial project; 2021 samples used MCE filters of variable pore size whereas 2022 samples used polyethersulfone (PES) filters (Sterlitech, Kent, WA, USA) of the same pore size (1.2 µm only).
Upon retrieval from the ESPs in mid-September, filters were placed into Qiagen Investigator Lyse and Spin Baskets (Qiagen, Germany) containing 180 µL of Qiagen buffer ATL and 20 µL of proteinase K. Samples were then transported back to the U.S. Geological Survey Northern Rocky Mountain Science Center (NOROCK) in Bozeman, Montana, where they were incubated at 56 °C for twelve hours. DNA extraction was carried out with Qiagen DNEasy Blood and Tissue kits according to the manufacturer’s instructions. The final elution volume was 100 µL for 2021 samples and 50 µL for 2022 samples. Elution volume was lower for 2022 since DNA concentrations and quality were low for 2021 samples. DNA extracts were stored at -80 °C until further processing.
For samples collected in 2021, each sample was amplified in triplicate with untailed MiFish primers and triplicates were pooled and shipped frozen to the University of Maryland Center for Environmental Science Appalachian Laboratory. Sample DNA extracts were prepared for Illumina sequencing using the methods of
Samples collected in 2022 were amplified in triplicate with two primer sets (MiFish and AcMDB) in simplex, after which each replicate amplicon was amplified using respective tailed versions of those same primers that were tagged on both the forward and reverse primers (sequences given in Suppl. material
In addition to the aforementioned negative field controls, each 96-well PCR plate contained the following controls, which were carried through tailed and untailed amplifications and sequencing: a no-template control using 1 µL of sterile water per reaction, 1 µL of pooled extraction blanks (one extraction blank for every batch of 47 field samples and this was only included in sequencing for the 2022 samples), and one (for 2021 samples) or three (2022 samples) replicate wells per plate of a synthetic mock community comprised of 1e3 12s gene copies of each of ten fish species (see Suppl. material
To generate updated reference sequence databases for each genetic marker, we downloaded all available 12S sequence data from NCBI on March 27th, 2023, and used MetaCurator (v1.0.1;
We aligned eDNA sequence data against these reference databases using VSEARCH v2.8.1 (
Decontaminants were identified and proportionally removed with the R package microDecon (
We chose to model the data at the species-by-sample level, instead of the number of species detected per sample, which may be useful for eDNA metabarcoding studies moving forward. Our approach is similar to that of
logit(pi) = β0speciesi + β1 × volumei + β2 filteri + ηspeciesi,sitei,monthi + ζdayi (1)
where β0speciesi is the intercept for the species of observation i, volumei is the sample volume of observation i, and filteri is the filter pore size of observation i, with filter sizes 1.2 µm and 5 µm treated as offsets from 0.45 µm (i.e., β21 = 0). Next, ηspeciesi,sitei,monthi is the species by site by month effect of observation i and ζdayi is the day effect of observation i. We modeled both species by site by month effects and day effects with non-centered random intercepts (
For 2022, our linear predictor was
logit(pi) = β0speciesi,primeri + β1 × volumei + ηspeciesi,sitei,monthi + ζdayi (2)
where β0speciesi,primeri is the intercept for the species by primer combination of observation i and all other terms are the same as in the 2021 model.
For both models, we derived the expected number of species detected per sample, assuming independence across species, as a function of all factor levels in the model for each year. In 2021, we derived the cumulative detection probability for species s at site j in month k and filter pore size f for n = 1, …, 10 hypothetical samples following
p (n)s,j,k,f = 1 – (1– ps,j,k,f)n (3)
where ps,j,k,f is the detection probability of species s at site j in month k for filter size f. Then, we computed the expected number of species detected as a function of n samples for these factor levels following
λ (n)j,k,f = ∑sNspecies p (n)s,j,k,f (4)
In 2022, we derived the expected number of species detected per sample in the same manner, except these computations were stratified by primer, site, and month instead of filter, site, and month. Again, this approach is analogous to that of
We fit both models via Markov chain Monte Carlo (MCMC) using the NIMBLE software (
We sequenced a total of 176 field samples with the number of samples collected per day ranging from one to six, eight mock community samples, and 20 negative controls (including field, extraction, PCR, and sequencing blanks). In 2021, we sequenced 45 samples from Jackson Lake and 45 samples from Palisades Reservoir collected over 12 days. In 2022, we sequenced 46 samples from Jackson Lake (nMiFish = 46, nACMDB = 46) and 40 samples from Palisades Reservoir (nMiFish = 40, nACMDB = 40) collected over 16 days (breakdown of samples collected at each site per day listed in Suppl. material
We found overlapping probabilities of detecting eDNA from the same number of fish species for each of the filter pore sizes, indicating no effect of filter pore size on results (Fig.
Detected species accumulation curves showing the point and interval estimates for the expected number of species detected from eDNA based on the number of samples collected for each site during each month in 2021. Colors indicate the three different filter pore sizes tested. Bars represent 95% confidence intervals.
The two different primers were able to detect eDNA from a total of 13 out of 15 fish species known to occur (Table
The MiFish primer set was able to detect eDNA from three species (C. bairdii, G. atraria, and Salvelinus namaycush) that AcMDB was unable to detect. Interestingly, the AcMDB primers were able to detect C. bairdii and S. namaycush in the mock communities. By contrast, the AcMDB primer set was able to detect one additional species (C. ardens) in the field samples that the MiFish primer set did not. We found primer choice resulted in species-specific differences in detection using samples from 2022 to assess the difference in fish detection probabilities. Five fish species were detected with both primer sets; six species were detected with only one primer set (Fig.
We found that detection probabilities varied by day but lacked a consistent or obvious temporal pattern. Days with 95% credible intervals that do not overlap with zero indicate that the eDNA detection probabilities of all species were higher or lower on those days compared to the average day, implying that either more or fewer number of species’ eDNA will be detected on those days (Fig.
In light of global declines in biodiversity, it is important to understand how to integrate eDNA sampling into robust monitoring frameworks. Biomonitoring programs are challenged to balance tradeoffs in study design choices, such as sampling frequency and duration since human and funding resources are limited. Combining eDNA metabarcoding with autonomous samplers into a biomonitoring program provides a new way to reduce these limitations, but study design tradeoffs must still be considered and optimized to minimize the misrepresentation of communities. Here, we considered how sample timing frequency, filter pore size, and primer choice influence inferences of fish community composition. Taken together, results from this study show that, of the factors we considered, primer choice and sampling date had the greatest impact on fish species eDNA detections.
High frequency samples collected autonomously across months from USGS streamgage sites were able to detect eDNA from 13 of 15 fish species known to occur at two upstream reservoirs. Though these data provide an imperfect representation of the known fish assemblage, our results underscore the relatively minimal field effort needed for eDNA-based biomonitoring when using autonomous samplers. Traditional assessments can require several biological technicians employing multiple, potentially destructive methods (e.g., gill nets and electrofishing) at multiple sites and depths and can still result in biased findings. Comparable manual eDNA sampling would have required dozens of site visits. Our assessment required approximately six hours of field time at each of two sites for ESP deployment and retrieval. Technological advancements and standardization are likely to accelerate workflows such that eDNA autonomous platform-based programs can respond to the scale of biomonitoring needed in our rapidly changing world. An outstanding example of this promise is the use of uncrewed surface vessels fitted with autonomous platforms that were able to collect eDNA samples over a 4200-km, 29-day transit in the northeastern Pacific Ocean (
Technological advancements like autonomous platforms cannot be used to their full potential in biomonitoring programs if genetic reference libraries are insufficient. Inadequate genetic reference library coverage likely limited our ability to provide a complete picture of Jackson Lake and Palisades Reservoir fish assemblages. Of the two species’ eDNA we did not detect, one of these fish species had no 12S sequences available, whereas the other one had only six 12S sequences available. The use of multiple primers did not resolve this issue. The problem of incomplete reference libraries is nearly ubiquitous in eDNA biomonitoring efforts (
Our results emphasize that primer choice is one of the most influential considerations in eDNA metabarcoding study design, as has been demonstrated many times over (e.g.,
Implementation of autonomous platforms facilitates high frequency eDNA sampling, which can provide insight into changes in an organism’s ecology. Indeed, previous research has shown that sub-daily measurements of eDNA concentrations may reflect changes in daily behavior and activity throughout a photoperiod (e.g.,
Sub-daily samples showed that adequate coverage of fish species was reached relatively quickly. For both sites and seasons, the sampling effort needed to optimize the fish species detections via eDNA reached an asymptote after three samples. Collecting more than three samples per day did not increase the probability of detecting more fish species’ eDNA, and therefore may not be worth the additional collection and analysis costs. Autonomous platforms have a limited number of samples that can be collected and stored before needing replacement. For example, the Dartmouth Ocean Technologies eDNA Sampler can collect nine samples and the Woods Hole Oceanographic Large Volume eDNA sampler can collect 12 samples (Table
We found that spreading out eDNA samples across days rather than within days is more critical to minimizing misrepresentation of communities. While the probability of detecting eDNA from all fish species was consistent on most days, there were five days that clearly had lower or higher detection probabilities, and thus observed species diversity, compared to the average day. In fact, at the Palisades Reservoir streamgage, there were two consecutive days when eDNA samples showed large swings in detection probability and diversity relative to the average. Day 208 resulted in below average diversity followed by day 209 when replicates averaged higher than expected detection probabilities. Similar to
In contrast to other research assessing eDNA detections at a longer time scale (e.g.,
When using eDNA sampling to detect species of interest, it is important to consider which specific supplies (e.g., filter type and pore size) are needed to optimize the likelihood of detection. We found that the effect of pore size was minimal relative to other factors such as primer choice and sampling frequency. Previous research has shown mixed results where filter pore size for different filter types (i.e., MCE, Cellulose Nitrate, and Glass Fiber) may (
Metabarcoding of eDNA samples is becoming more commonly used and applied toward species detections and may be useful for future management (
Finally, we demonstrate the usefulness of autonomous eDNA samplers to describe fish communities at USGS streamgages downstream of reservoirs. We showed that, of the factors we considered, primer choice and sampling day had the greatest impact on species eDNA detection probabilities. Autonomous platforms, like the ESP, are particularly useful in that they can collect many eDNA samples. One current limitation of many autonomous platforms is their inability to collect multiple replicates at the same time. Additional replicates collected in parallel, either autonomously or manually, could account for the impact of time of day on species detections. When autonomous platforms are used in biodiversity monitoring programs and budgets are limited, spreading out samples over the course of several days rather than taking several samples in a short period of time may be optimal.
We would like to thank Dr. Donald Zaroban (College of Idaho), Jesse McCane (Idaho Fish and Game), and Matthew Campbell (Idaho Fish and Game) for providing tissue samples for mitochondrial sequencing. Additionally, we would like to thank Dr. Masayuki Ushio, Dr. Nathan Griffiths, and one anonymous reviewer for their thoughtful feedback which helped improve the manuscript.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This project is one among a set of coordinated projects funded in part by the Bipartisan Infrastructure Law through the U.S. Department of the Interior to advance a nationally coordinated Early Detection and Rapid Response Framework. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The authors have declared that no competing interests exist.
Conceptualization: DNJ, EB, PH, AJS. Data curation: PH, AJS, DNJ, BCA, RT, RR. Formal analysis: BCA, DNJ, RR. Funding acquisition: AJS, EB. Investigation: PH, AJS, SJ, EB, KMY, DNJ, JMB. Methodology: BCA, DNJ, PH, AJS. Project administration: AJS, PH, JRC, DNJ. Resources: EB, RT, JRC, AJS, JMB, KMY, SJ, RR. Software: JMB, RR, SJ, KMY, RT. Supervision: EB, AJS. Validation: PH, BCA, DNJ. Visualization: DNJ, BCA. Writing - original draft: DNJ, AJS. Writing - review and editing: AJS, EB, DNJ, BCA, JMB, PH, KMY, SJ, RT, JRC, RR.
Devin N. Jones https://orcid.org/0000-0001-9215-2930
Ben C. Augustine https://orcid.org/0000-0001-6935-6361
Patrick Hutchins https://orcid.org/0000-0001-5232-0821
James M. Birch https://orcid.org/0000-0002-6080-8955
Kevan M. Yamahara https://orcid.org/0000-0003-3344-0283
Rodney Richardson https://orcid.org/0000-0002-4443-1705
James R. Campbell https://orcid.org/0000-0002-2760-3149
Elliott Barnhart https://orcid.org/0000-0002-8788-8393
Adam J. Sepulveda https://orcid.org/0000-0001-7621-7028
Data and code are available at https://doi.org/10.5066/P1XKA9NV (
Additional detail for methodology as well as additional figures
Data type: docx
Sequence data for mock communities used in metabarcoding
Data type: fasta
Summaries of reads removed during the decontamination process
Data type: xlsx