expand article infoKaty E. Klymus, Jacoby D. Baker§, Cathryn L. Abbott|, Rachel J. Brown, Joseph M. Craine#, Zachary Gold¤, Margaret E. Hunter«, Mark D. Johnson»˄, Devin N. Jones˅, Michelle J. Jungbluth¦, Sean P. Jungbluth¦, Yer Lorˀ, Aaron P. Maloyˁ, Christopher M. Merkesˀ, Rachel Noble, Nastassia V. Patin, Adam J. Sepulveda˅, Stephen F. Spearˀ, Joshua A. Steele, Miwa Takahashi, Alison W. Watts, Susanna Theroux
‡ U.S. Geological Survey, Columbia Environmental Research Center,, Columbia, United States of America
§ Monterey Bay Aquarium Research Institute, Moss Landing, United States of America
| Pacific Biological Station, Fisheries and Oceans Canada, British Columbia, Canada
¶ U.S. Fish and Wildlife, Whitney Genetics Laboratory, Onalaska, United States of America
# Jonah Ventures, Boulder, United States of America
¤ NOAA Pacific Marine Environmental Laboratory, Seattle, United States of America
« U.S. Geological Survey, Wetland and Aquatic Research Center, Gainesville, United States of America
» Engineer Research and Development Center, Champaign, United States of America
˄ University of Illinois at Urbana-Champaign, Champaign, United States of America
˅ U.S. Geological Survey, Northern Rocky Mountain Science Center, Bozeman, United States of America
¦ San Francisco State University, Tiburon, United States of America
ˀ U.S. Geological Survey, Upper Midwest Environmental Sciences Center, La Crosse, United States of America
ˁ U.S. Fish and Wildlife Service, Northeast Fishery Center, Lamar, United States of America
₵ University of North Carolina Chapel Hill, Chapel Hill, United States of America
ℓ University of California, La Jolla, United States of America
₰ Southern California Coastal Water Research Project, Costa Mesa, United States of America
₱ Indian Oceans Marine Research Centre, Crawley, Australia
₳ University of New Hampshire, Durham, United States of America
Environmental DNA (eDNA) and RNA (eRNA) metabarcoding has become a popular tool for assessing biodiversity from environmental samples, but inconsistent documentation of methods, data and metadata makes results difficult to reproduce and synthesise. A working group of scientists have collaborated to produce a set of minimum reporting guidelines for the constituent steps of metabarcoding workflows, from the physical layout of laboratories through to data archiving. We emphasise how reporting the suite of data and metadata should adhere to findable, accessible, interoperable and reproducible (FAIR) data standards, thereby providing context for evaluating and understanding study results. An overview of the documentation considerations for each workflow step is presented and then summarised in a checklist that can accompany a published study or report. Ensuring workflows are transparent and documented is critical to reproducible research and should allow for more efficient uptake of metabarcoding data into management decision-making.

eDNA, eRNA, metadata, quality control, FAIR data standards, reproducibility


Environmental DNA (eDNA) methods have become increasingly popular in the past two decades due to various advantages compared to traditional methods (Darling and Mahon 2011; Schenekar 2023). In particular, eDNA methods have been shown to have enhanced sensitivity compared to traditional methods (Lodge et al. 2012; Furlan et al. 2016; Qu and Stewart 2019; Fediajevaite et al. 2021), enabling non-invasive and non-destructive monitoring for target taxa. The high scalability of eDNA methods allow increased spatial and temporal scale monitoring, refining the resolution of distributional data (Cristescu and Hebert 2018; Garlapati et al. 2019; Veilleux et al. 2021; Morisette et al. 2021). In response to its popularity, eDNA research has become a field of its own with annual conference sessions, workshops and societies across the globe (US: Stepien et al. 2022; Stepien et al. 2023, UK: Handley et al. 2023, Australia/New Zealand: Southern eDNA Society, Japan: eDNA Society). However, the adoption of eDNA tools for real-world applications has been hindered by the lack of standardised sampling, analytical, and data reporting methods. This results in eDNA publications that contain highly inconsistent documentation of methods and results (Nicholson et al. 2020; Shea et al. 2023; Takahashi et al. 2023). Readers and peer-reviewers alike are thus faced with having to interrogate the quality and comparability of eDNA data across publications without a consistent framework for doing so.

There is momentum in the eDNA community to develop both formal standards (e.g. Gagné et al. (2021); Abbott et al. (2023)) and less formal working guidelines for species-specific and targeted eDNA approaches (e.g. Borchardt et al. (2021); Bruce et al. (2021); De Brauwer et al. (2022a)). Targeted eDNA approaches leverage quantitative or digital polymerase chain reaction (qPCR and dPCR, respectively), often for invasive and endangered species monitoring programmes (e.g. Lodge et al. (2012); Doi et al. (2017); Hunter et al. (2018); Postaire et al. (2020); Dimond et al. (2022); Nolan et al. (2023)) and have received some attention related to consistent reporting (e.g. Abbott et al. (2021)). Environmental DNA metabarcoding is used for multi-species or community level surveys and there are multiple review papers that have established a foundation of best practices for sample collection and processing, including sample volume, filter type, extraction method and primer selection (e.g. Deiner et al. (2017); Lear et al. (2018); Minamoto et al. (2021); Bruce et al. (2021); Patin and Goodwin (2023)). However, since multiple workflows can satisfy these best practices, adequate reporting is essential to ensure the data can be traced to the methods used to generate them.

There is currently a lack of guidance on metabarcoding data reporting for publication. In order for metabarcoding data to be findable, accessible, interoperable and reproducible (FAIR, Wilkinson et al. (2016)), adequate documentation of each step in the workflow as well as sample-related data and metadata are required (Shea et al. 2023). Minimum information reporting requirements published 15 years ago for qPCR data (Bustin et al. 2009) can be used as a model; here we suggest researchers and editors who publish environmental metabarcoding studies follow minimum guidelines of methods and results reporting. These guidelines cover the full range of processes in the metabarcoding workflow from field sample collection to accessibility of final datasets (Fig. 1).

Figure 1.

Diagram of an environmental metabarcoding study workflow, including the field (blue), wet laboratory (green) and dry laboratory (orange) components. Created with

The metabarcoding workflow is similar across a variety of starting sample types often used in biodiversity research, some of which are not typically referred to as eDNA. Here we follow Ruppert et al. (2019) by including samples collected either from environmental matrices (i.e. water, soil, air etc.) or from bulk, organismal sampling techniques (e.g. plankton tow nets, malaise trap etc.). We also address methods relevant to eRNA metabarcoding workflows. Thus, we use the term environmental nucleic acid (eNA) when a process is similar for both (Littlefair et al. 2022; Bunholi et al. 2023). Finally, we provide a checklist (Table 1, Suppl. material 1: file S1) and example entries (Suppl. material 1: file S2) for data and metadata reporting as a common resource for both authors and reviewers of eNA metabarcoding studies. This checklist indicates what details would be beneficial if reported in a metabarcoding study to support open science principles. We do not make recommendations on specific methodological approaches as these will vary depending on study objectives and are addressed by existing published guidelines, papers and standards. For instance, we discuss a number of controls that, if used, should be reported, but we do not make recommendations on which controls to use, as controls will vary depending on the study's objectives. The checklist provided offers a practical tool to help the community consistently and clearly report critical workflow elements, which, as an important quality measure, will expedite the adoption of eNA data for decision support.

Table 1.

A checklist for reporting data and metadata associated with environmental metabarcoding studies. Requirements include: ‘Report’ - steps should be reported for FAIR (findable, accessible, interoperable and reproducible) data practices, ‘If applicable’ - report if component is relevant to study, ‘Report for eRNA metabarcoding’ - reporting of additional steps specific to eRNA studies, ‘Optional’ - can be reported. Reporting of these data and metadata will maximise study reproducibility and FAIR data practices. See Suppl. material 1: file S2 for a checklist that includes examples.

Step Reporting Requirements Reported
Methods – Laboratory Space
General laboratory space layout Report
Contamination mitigation efforts in the laboratory Report
Methods - Metabarcoding Assay
Target gene name Report
Target amplicon length Report
Target taxa Report
Primers, sequence, reference, modifications to published primers Report
Assay validation Report
Methods - Environmental Sample Collection
Number of field samples Report
Definition of field sample Report
Number of field sample replicates If Applicable
Sampling dates Report
Sampling times If Applicable
Sampling locations/Geographic coordinates and geodetic datum used/Non-disclosure statement Report
Capture methods and materials Report
Sample processing method Report
Volume/Mass of sample collected Report
Volume/Mass of sample processed Report
Sample preservation method Report
Sample storage conditions Report
Sample storage duration Report
Environmental parameters If Applicable
Contamination mitigation efforts in the field Report
Positive and Negative (Site/Field/Process) controls Report
Methods - Nucleic Acid Extraction
Extraction method (if using a commercial kit, provide name of kit and manufacturer) Report
Changes/modification to a published method or kit Report
Amount of sample extracted Report
Extraction storage conditions and duration Report
Nucleic acid quantification for each sample and method used to quantify If Applicable
Any subsequent clean-up methods If Applicable
Negative extraction controls Report
Positive extraction controls If Applicable
Methods - Inhibition Detection and Mitigation
PCR inhibition detection and mitigation steps If Applicable
Methods - PCR Amplification and Library Preparation
Library preparation method Report
Number of PCR replicates per sample If Applicable
Are PCR replicates pooled and indexed or indexed separately If Applicable
Thermal cycling instrument and manufacturer Optional
Thermal cycling conditions (temperatures, time at temperature, # cycles, annealing temperature(s)) Report
Master mix composition: Final reaction volume (μl) Report
Master mix composition: name and manufacturer Report
Master mix composition: Taq concentration (X) Report
Master mix composition: Final concentration of each primer (forward and reverse) Report
Master mix composition: Volume of water added (μl) Report
Master mix composition: Volume (μl) or concentration of template NA Report
Master mix composition: Any additives (manufacturers and volumes (μl) or concentrations) If Applicable
Amplicon visualisation: Method used If Applicable
PCR clean-up methods Report
Size selection methods If Applicable
Size selection: Instrumentation and manufacturer If Applicable
Normalisation: Method used If Applicable
PCR controls (Positive PCR, Mock Community, Negative PCR or No-template) and indicate whether or not controls were also sequenced Report
Methods - PCR Amplification and Library Preparation (Reverse Transcription)
Reverse transcriptase reaction kit and manufacturer Report for eRNA metabarcoding
Reverse transcriptase reaction conditions (primers, cycling conditions, controls, amount of template) Report for eRNA metabarcoding
RNA enrichment methods Report for eRNA metabarcoding
DNA contamination evaluation and mitigation methods Report for eRNA metabarcoding
Methods – Sequencing
Sequencing instrument/ platform Report
Sequencing chemistry kit and manufacturer Report
Sequencing quality control steps Report
PhiX and percentage (If using Illumina platform) If Applicable
Methods - Bioinformatics and Reference Database
Database creation: Source of sequences and steps to identify locus of interest Report
Database creation: Method for sequence curation Report
Database creation: Link to database or repository Report
Primer removal (trimming): programme, version, parameters Report
QC programme: programme, version, parameters Report
Read pair merging: programme, version, parameters Report
Chimera removal: programme, version, parameters Report
Clustering: OTUs or ASVs (and thresholds) Report
Additional filtering: removal of singletons or other methods If Applicable
Additional filtering: decontamination using sequenced controls If Applicable
Taxonomic assignment method Report
Taxonomic assignment parameters (and thresholds) Report
Read normalisation: methods If Applicable
Results - Sequencing Summary Statistics
Total number of raw sequence reads produced Report
Total number of reads assigned to MIDs (i.e. tags, indices, barcodes) Report
Total number of reads that made it through bioinformatic filtering Report
Total number of reads used for final/subsequent analyses Report
Total number of OTUs or ASVs assigned to taxa (and to what level of taxonomy) Report
Total number of OTUs or ASVs unassigned Optional
Average number of reads per sample Optional
Minimum and maximum number of reads per sample Optional
Results from Controls (Negative/Positive/Mock Communities) Report
Results - Data Archiving and Availability
Software and code archiving If Applicable
Raw sequence data archiving Report
Processed data archiving If Applicable

General quality control reporting

Several elements should be considered and evaluated before beginning a metabarcoding study (for in-depth review, see Bruce et al. (2021); De Brauwer et al. (2022a)). Clean laboratory practices and metabarcoding assay design are components outside of the metabarcoding workflow itself but are crucial, as well as positive and negative controls to evaluate the risk of poor assay optimisation, contamination or inhibition. Here, we discuss recommendations for reporting metadata associated with general quality control.

Laboratory spaces and practices

A key consideration is the prevention of contamination when handling eNA samples. This is particularly important in the laboratory setting where PCR is central to library preparation steps. As PCR leads to the amplification of billions of amplicons, samples, reagents, consumables and benchtops can easily become sources of contamination (Persing 1991; Aslanzadeh 2004; Willerslev and Cooper 2005). For a thorough review of clean laboratory procedures for eNA work in general, see Goldberg et al. (2016) and Patin and Goodwin (2023). It is important to describe the laboratory environment in which samples will be processed, extracted and sequenced, to enable an assessment of potential contamination risks.

The methods sections of published eNA metabarcoding studies should detail specific measures used to prevent laboratory-based contamination, such as: (1) if a unidirectional workflow was used for the wet-laboratory steps (i.e. physically separate laboratory spaces for pre- and post-PCR such that products of later steps are not introduced into spaces from earlier steps); (2) laboratory cleaning protocols and reagents (e.g. sodium hypochlorite solution); (3) workspace designated equipment and consumables; and (4) other laboratory-based contamination prevention measures including positive air pressure, HEPA-filtered air and UV-treatment of workspaces and/or consumables. We acknowledge that not all of these measures will be employed, but we point to ones that should be reported if used. This level of detail is intended to increase transparency and overall confidence in the handling of highly sensitive eNA samples.

Metabarcoding assay used

Numerous metabarcoding assays (here we define assay as a molecular analysis) have been developed targeting a range of organisms from environmental samples (Takahashi et al. 2023). When reporting on the assay used, include the target gene region, primer sequences (if newly developed) or a citation to the primer source (if previously published), any modifications made to an already published assay, expected target amplicon length and taxonomic coverage. As phylogeographic variation in target taxa may lead to primer bias or failed amplification in some species, additional validation of an assay (new or already published) is often conducted (Thalinger et al. 2021). Results of any assay validation conducted in the study should be reported. Validation of the assay should be thoroughly documented, including any in silico, in vitro and in situ testing, as well as results from tests demonstrating primer specificity to the targeted group (Taberlet et al. 2018; Thalinger et al. 2021; De Brauwer et al. 2023). For further guidance on assay development, we point readers to Thalinger et al. (2021) who developed a validation scale for species-specific assays, as well as to recent guidelines for metabarcoding assays developed by the Southern eDNA Society (De Brauwer et al. 2022b, 2023).


Control samples are used throughout the metabarcoding workflow and detailed reporting of all control samples used is vital. Positive controls are generally included to ensure any run or reaction failures or anomalies (i.e. unrelated to the samples themselves) are detected (Yeh et al. 2018; Gold et al. 2022), whereas negative controls detect contamination (Borchardt et al. 2021). For eRNA studies, a negative reverse transcriptase control should be included to test for genomic DNA.

Table 2 and Fig. 2 list various controls used in a metabarcoding workflow. Not all of these will be used in every study and additional controls may be used. For instance, positive extraction controls can be used to assess extraction efficiency, but are often not used in general metabarcoding studies. Regardless, information on the number and type of controls should be provided. Controls can be sequenced to assist with interpreting final results; if they are, all relevant data and associated metadata should be reported. It is also critical to report any corrections made to final datasets using results of sequenced controls. Sepulveda et al. (2020), Bruce et al. (2021) and Takahashi et al. (2023) provide guidance on integration of controls into project design.

Figure 2.

Chart of the different elements of the environmental metabarcoding workflow showing the control samples involved at each step.

Table 2.

Table of controls that can be used throughout the metabarcoding workflow. Positive controls confirm a process is working and negative controls ensure any contamination is detected. NA refers to nucleic acid (i.e. DNA or RNA).

Control How it’s created Why it’s used
Site Positive Collection of an environmental sample at a site where the target species is known to be present Confirms assay viability in an environmental sample
Extraction Positive A laboratory contrived sample to which NA or tissue is added during the NA extraction process Introduces target NA to monitor NA extraction efficiency; used for laboratory quality control; not generally used in studies
Internal Positive Control (IPC) A known concentration of synthetic or natural NA added to the PCR Amplification of IPC DNA above expected cycle threshold (qPCR) suggest samples are inhibited; not generally used in metabarcoding, but see text
PCR Positive A laboratory contrived sample to which synthetic or natural NA is added during PCR setup; can include mock community samples Introduces target material to monitor for PCR success
Site Negative Collection of an environmental sample at a site where the target species is known to be absent Monitors potential non-specific amplification of the assay from the environment
Field Negative Collection of a blank sample that follows field collection protocols Monitors potential contamination in field collection
Process Negative A sample added during processing that lacks target NA Monitors potential contamination during sample processing
Extraction Negative A laboratory contrived sample that only includes reagents used during the NA extraction process Monitors potential contamination during extraction process
PCR Negative (No-template control; NTC) A laboratory contrived sample that includes only PCR reagents and molecular grade water replaces the NA input Monitors potential contamination during PCR and sequencing
Negative Reverse Transcription Control A sample of RNA extract carried through subsequent steps, but without reverse transcriptase Tests for DNA contamination in the RNA extract

Metabarcoding workflow reporting

Sample collection, processing and preservation

Reporting on sample collection for eNA metabarcoding studies allows readers to evaluate the adequacy of the sampling design in the context of the study goals and limitations. A field sample should be defined by the authors. For instance, it could be a singular collection obtained during a single sampling event (i.e. a sample collected at a single location at a single point in time) or a composite sample in which material from multiple locations or times has been combined. Studies may also take replicate field samples given the inevitable stochasticity of whether the target molecule is captured by the sampling if present in the environment. Methods used for capturing samples should be reported, along with sample volume or mass, the number of field samples, field sample replicates and field control samples. In addition, if any pre-extraction processing steps are used after sample collection, the amount of sample processed and methods used (e.g. pre-filtration and/or filtration steps) should be reported. Preservation of samples, storage and duration prior to extraction should be reported as well.

Any contamination mitigation efforts made in the field, including collection of field, site and process controls should also be reported. For recommendations on sample collection and design, see Deiner et al. (2017), Dickie et al. (2018), Bowers et al. (2021), Bruce et al. (2021), Minamoto et al. (2021), De Brauwer et al. (2022a) and Pawlowski et al. (2022).

Detailed information describing the sampling date and locations for all samples should be included. Names of the geographical location of sampling can be reported; however, to ensure reproducibility, geographic coordinates are preferred. The latitude and longitude should be included, as well as the geodetic datum used following Geographic Information Standards ( Exceptions for non-disclosure of geographic coordinates are acceptable for privacy of information, cultural concerns or security reasons necessitating exact locations be withheld. Environmental parameters (e.g. temperature, salinity, pH, habitat type) should be reported, if taken, either in a table, supplemental material or within the main text. For further guidance on reporting sampling data, see standards and guidelines developed through Darwin Core (Wieczorek et al. 2012) and the Global Biodiversity Information Facility (GBIF) (Abarenkov et al. 2023).

Nucleic acid extraction

Extraction protocols are selected based on the sample, tissue or matrix targeted and what steps are required to disrupt biological membranes and release nucleic acids into solution for subsequent purification. Thus, the extraction method should be chosen, based on its efficiency for the particular sample type and/or target taxa and be reported (Djurhuus et al. 2017; Pawlowski et al. 2022). When reporting commercial kit-based extractions, the kit name and manufacturer should be included. Sample volume or mass used for extraction should be reported along with any modifications to commercial kit-based protocols or published methods. In addition, any positive and negative controls included in each extraction batch to ensure quality control of this step should be reported. Quantifying eNA post-extraction is commonly done; however, this measure will reflect total DNA concentration and not just the targeted DNA. For some substrates (e.g. water), the extraction may not yield quantifiable amounts of eNA, but will still be suitable for PCR and downstream steps. Information on how eNA extracts were stored before further use, including buffers or ethanol used and storage temperature and duration, should be reported.

Inhibition mitigation and testing

Environmental samples are prone to containing molecular compounds and trace metals that can inhibit enzymatic reactions such as PCR, thereby reducing amplification efficiency (Wilson 1997; Schrader et al. 2012; Lance and Gaun 2020; Sidstedt et al. 2020). During nucleic acid (NA) extraction, some of these inhibitory compounds may be removed, but NA dilution (McKee et al. 2015) or additional purification steps may be required for further removal (Hunter et al. 2018). An additional way to mitigate the effects of inhibitory compounds is the use of additives to the PCR such as dimethyl sulphoxide (DMSO) and Bovine Serum Albumin (BSA) (Kreader 1996; Farell and Alexandre 2012). The use of internal positive controls (IPCs) can detect PCR inhibition in a sample prior to library preparation. This is common in qPCR studies of eDNA samples and less common in metabarcoding studies; however, there are examples of the latter (e.g. Shirazi et al. (2021) used qPCR to perform inhibition testing on all extractions in their DNA metabarcoding study). Furthermore, the U.S. Fish and Wildlife Service tests for inhibition in all metabarcoding samples collected for the Early Detection and Monitoring programme in the Great Lakes Basin (USFWS 2024). As PCR inhibition can severely reduce amplification, subsequent sequencing and data analysis can be affected; thus, any information on PCR inhibition detection and mitigation should be reported if used.

Library preparation

The metabarcoding workflow requires extracted NA samples go through a series of processing steps called library preparation before high-throughput sequencing. Libraries consist of pooled amplicon sequences that have been modified to allow for simultaneous sequencing of high numbers of samples (and subsequent demultiplexing). These modifications include addition of: sequencing primer regions; adaptors compatible to the sequencing platform; and MIDs or multiplex identifiers, that allow identification of sequences to individual samples (GBIF DNA derived data – Extension (2024)). These MIDs are often referred to as “tags”, “indices” or “barcodes” in the literature (see Deiner et al. (2017)). Outlined below we discuss the steps of library preparation (PCR, Reverse Transcription for eRNA, Post-PCR and Normalisation) and make reporting recommendations. For a more detailed explanation of steps, see Taberlet et al. (2018), Bruce et al. (2021) and Bohmann et al. (2022).


There are multiple methodologies for library preparation and the chosen method should be reported. The three predominant strategies when using Illumina instrumentation are a one-step PCR-based, two-step PCR-based or a ligation-based approach; for a detailed overview of each strategy, see Taberlet et al. (2018) and Bohmann et al. (2022). Regardless of method, the main steps in library preparation include amplification of target NA and addition of required sequencing modifications to resulting amplicons. Different PCR chemistries combined with varying thermocycling conditions may produce significantly different results due to varying amplification biases, species drop-outs or off-target amplification (Gohl et al. 2016; Gold et al. 2023; Shelton et al. 2023) and, thus, must be documented. If a kit is used during library preparation, the name and manufacturer should be reported as well as any deviations from the manufacturer’s protocol.

For all PCR steps in library preparation, chemistries and conditions should be reported. These include: total reaction volume; final concentration of primers; concentrations of master mix components and polymerase; volume of any additives; and amount of template DNA. For mastermix and polymerases, the name and manufacturer should be included. The type of thermal cycler and manufacturer should be reported as well as thermal cycling conditions used. These conditions include the temperature and time at temperature for: the initial denaturation, denaturation, annealing, extension and final extension steps, as well as the number of cycles performed and if annealing temperature varied (e.g. touchdown PCR; Korbie and Mattick 2008).

Studies indicate multiple PCR replicates of the same sample are beneficial for environmental metabarcoding studies as they can influence species detection and richness estimates (Ficetola et al. 2015; Alberdi et al. 2018; Shirazi et al. 2021; Van den Bulcke et al. 2021). Other studies indicate that pooling multiple samples (biological replicates) may be more effective at increasing species detection and richness (Beentjes et al. 2019; Macher et al. 2021; Stauffer et al. 2021). The use of such technical or biological replicates should be described as well as whether they are indexed separately (i.e. given different MIDs) or pooled.

Controls should be included and reported in library preparation to assess potential for cross contamination of samples during all PCR steps. Using negative PCR controls allows the user to identify any laboratory contamination and positive PCR controls verify the assay is working. Mock community samples consist of known concentrations of DNA from multiple species and can provide information on amplification efficacy and bias as well as any contamination during library preparation and sequencing (Hänfling et al. 2016; Parada et al. 2016; Yeh et al. 2018; Marinchel et al. 2023).

Reverse transcription (additional step for eRNA)

For eRNA metabarcoding studies, reporting on steps from sample collection through sequencing and data reporting are the same as for eDNA; however, eRNA requires a couple of additional steps in the library preparation phase. Specifically, conversion of RNA to complementary DNA (cDNA) by reverse transcription and post-extraction enrichment of specific RNAs should be reported (Bustin and Nolan 2004; Zhao et al. 2014; Telzrow et al. 2021). For the reverse transcription reaction, the same details as described above for PCR should be reported. A concern for eRNA studies is the presence of genomic DNA contamination (see Li et al. (2022)); therefore, it is important to report detection of contaminating genomic DNA and mitigation steps taken (Laurell et al. 2012; Padhi et al. 2016; Hashemipetroudi et al. 2018; Verwilt et al. 2020).


Throughout the library preparation process, sample visualisation is often done to assess the size, distribution and quantity of PCR products, as well as template clean-ups to remove unwanted components. Authors should report methods for DNA product verification as well as where in the process it was done and the type, model and manufacturer of instrumentation used.

One of the primary functions of post-PCR library preparation is to ensure amplified PCR products represent mostly the target sequences; this is done by PCR clean-up and size selection approaches. Size selection can remove non-target DNA products, primer dimer and leftover primer (Zizka et al. 2019). Size selection is important because long metabarcoding primers can be prone to forming dimers (Peng et al. 2015) and assays may amplify non-target DNA (Collins et al. 2019). This can cause library normalisation to be biased as the final concentration will be normalised to non-target sequences, preventing detection of the targeted taxonomic group and challenges for sequencing (Alberdi et al. 2018). There are several approaches for conducting size selection, with some of the most common including the use of magnetic beads, Pippin Prep Instrumentation (Sage Science Inc., Beverly, Massachusetts, USA) and agarose gel extraction. If size selection is performed, authors should report the approach used, associated protocols, settings and parameters and manufacturers of instrumentation used. If additional PCR clean-ups are conducted, information on the methods or kits should be reported.

Library normalisation

A typical last step before sequencing a metabarcoding library is to normalise the concentration of each sample so each is represented equally in the final pooled library sequenced and will have similar read depths (Rohland and Reich 2012). Note, however, that negative control samples included in the library will have low or no quantifiable DNA and, thus, cannot be added at equal concentration (see Bruce et al. (2021) for more detail). The two main approaches used for normalisation include: 1) using DNA binding/magnetic beads to normalise and purify amplicons to a single concentration without the need to quantify; or 2) quantifying each sample and then diluting to the desired volume. Quantification can be achieved with a number of methods (see Bruce et al. (2021)). It is important to report both the normalisation method used and how sample concentration was estimated, if applicable.


Sequencing-by-synthesis (SBS) instruments have become the dominant instrumentation for amplicon sequencing (Bohmann et al. 2022). Platform choice depends on the number of samples, desired read depth per sample and amplicon length. Although Illumina is the most commonly used sequencing platform as of the time of writing, other technologies are appearing on the market. New platforms from Element Biosciences (San Deigo, California, USA) and Singular Genomics (San Diego, California, USA) are designed to compete with Illumina NextSeq output levels. Long-read sequencing platforms are also available and have grown in popularity (e.g. Pacific Biosciences, Menlo Park, California, USA; Oxford Nanopore Technologies, Oxford, United Kingdom).

When using short-read platforms, including Illumina and Element G4 instruments, it is recommended by the manufacturer to spike-in a PhiX sequencing control (Illumina, Inc., San Diego, California, USA; For ‘low diversity’ libraries (such as those associated with amplicon metabarcoding), PhiX can improve sequencing quality control and provides a measure of overall run performance. If using PhiX, the percentage concentration used should be reported.

If sequencing is outsourced, information from the external facility on what quality control is performed should be obtained before releasing data. For example, most facilities will demultiplex sequence data into sample-specific files and remove sequencing adapter sequences; they may also remove PhiX reads. As with all automated laboratory steps, the instrument platform, sequencing chemistry kit and sequencing quality control steps should be reported.

Reference database and bioinformatics

The process by which raw sequence data are converted to taxon observations and/or counts has many steps, each of which can impact results (e.g. Furlan et al. 2020; Brandt et al. 2021; De Wolfe and Wright 2023). There are currently dozens of bioinformatic tools and pipelines available (see Hakimzadeh et al. 2023 for review) and, while no pipeline is considered ‘best’, users should report the bioinformatic programmes, versions and steps within which to provide transparency about bioinformatic processing. We strongly encourage full reporting of bioinformatic approaches used to process and analyse eNA sequences including, but not limited to software, scripts, parameters, configuration files and variable files to allow for the repeatability of the bioinformatic analyses. Deiner et al. (2017) explain and review these various bioinformatic processes in detail, which we also outline below.

Database source and/or curation

The comprehensiveness (taxonomic breadth) and curation of reference sequence databases (Richardson et al. 2020; Gold et al. 2021; Dziedzic et al. 2023; Jeunen et al. 2023) are vital to the accurate taxonomic assignment of metabarcoding sequences (Keck et al. 2023). Thus, it is critical to include detailed information on the generation and curation of custom databases used to ensure reproducibility of the taxonomic classification (Curd et al. 2024). Such details should include: where reference sequences were acquired (GenBank, BOLD etc.); steps used to select only the locus of interest (e.g. in silico PCR, key word search etc.); curation methodology applied (e.g. geographic, lowest common ancestor, dereplication etc.); and a link to a data repository containing the custom reference databases used. If a custom reference database containing sequences not publicly available was used, researchers should clearly indicate this. Alternatively, many studies use large and minimally curated databases such as GenBank (Benson et al. 2013). In either case, the method, database and any database curation should be reported.

Bioinformatic processing

Multiple steps are part of the bioinformatic workflow used to convert raw sequence data into biologically analysable data (i.e. a table of sequence read numbers and associated taxonomic assignments). These steps are outlined below and include: Primer removal, Sequence quality control, Read merging, Chimera removal, OTU or ASV creation, Taxonomic assignment, Additional data filtering, and Normalisation of read data. Software workflows that encompass multiple steps exist and include MOTHUR (Schloss et al. 2009), DADA2 (Callahan et al. 2016), QIIME2 (Bolyen et al. 2019), DNAFLOW (Mousavi-Derazmahalleh et al. 2021), iMETA (Liu et al. 2023) and REVAMP (McAllister et al. 2023). If such a programme is used, the version and input parameters for each step should be reported.

Primer removal

Reads produced by sequencing platforms have fixed start positions defined by primers and include the gene region of interest. Failing to remove primers may interfere with taxonomic assignments. Some commonly used programmes for this include CUTADAPT (Martin 2011), TRIMMOMATIC (Bolger et al. 2014) and ATROPOS (Didion et al. 2017). The name and version of the programme, along with parameters and thresholds used, should be reported for all primer removal and trimming steps.

Sequence quality control

After primer removal, sequence read quality controls can be filtered initially based on minimum Q-score values, which are generated by the sequencing instrument for every nucleotide. Bioinformatic programmes like CUTADAPT and TRIMMOMATIC can be used to set quality score thresholds. The quality control programme used and associated parameters should be reported.

Read merging

When paired-end sequencing is performed, forward and reverse reads can be merged to generate a complete amplicon sequence. Off-target amplification and unremoved primer dimers can result in amplicons with different lengths than expected, which can be bioinformatically filtered out by setting a length threshold. Any reads not passing the threshold will subsequently fail to merge and be removed from the dataset; thus, read type (e.g. forward, reverse, unmerged paired, merged paired), software and parameters used (e.g. minimum number of nucleotide overlap or number of mismatches allowed) should be reported.

Chimera removal

Chimeras form during PCR and occur when two different parent DNA strands anneal together to create PCR artefacts that do not exist in nature. Chimeras can be difficult to identify (Ashelford et al. 2005) and may result in inaccurate estimates of diversity. Not all chimera products can be removed by size selection; however, many bioinformatic programmes and pipelines can identify chimeras using either de novo or referenced-based methods. The programme used for chimera removal and cut-off scores should be reported.

OTU or ASV creation

Operational taxonomic units (OTUs) or amplicon sequence variants (ASVs) are created from sequencing reads, which reduces the overall size of datasets and computational power needed to analyse them (Schloss and Westcott 2011; Callahan et al. 2016). OTUs are created by clustering sequences with similar identity (using a threshold chosen by the user, often at 95–99%) (Westcott and Schloss 2015). The creation of ASVs uses denoising methods (rather than dissimilarity thresholds) that detect and remove sequencing errors and distinguish sequence variants differing by as little as one nucleotide (Callahan et al. 2017). The choice of ASVs vs. OTUs should be reported as it can have a significant impact on alpha and beta diversity metrics (Chiarello et al. 2022). For an in-depth review of OTU clustering methods, see Westcott and Schloss (2015).

Taxonomic assignment

There are many methods to assign taxonomy to OTUs/ASVs. The most well-known and longest supported method is the sequence similarity-based top BLAST hit approach (Altschul et al. 1997). This uses both global and local alignments to directly compare query sequences to sequences in a reference database. Another method, sequence composition-based, assigns taxonomy based on k-mer frequencies in the query and reference database sequences. The most widely used sequence composition classifier is the Ribosomal Database Project (RDP; Wang et al. (2007)) or Naïve Bayes-based classifier. The RDP classifier can be trained for any marker and reference sets already exist for the COI animal barcode marker as well as the prokaryote 16S, fungal ITS and LSU rDNA regions (Wang et al. 2007; Cole et al. 2009; Liu et al. 2012; Porter et al. 2014; Porter and Hajibabaei 2018). Other methods of taxonomy assignment include phylogeny-based methods (Pipes and Nielsen 2024) and probabilistic-based methods such as multinomial regression (Somervuo et al. 2016). See Porter and Hajibabaei (2018) and Hleap et al. (2021) for a list of software programmes and methods commonly used.

Taxonomic assignment of some reads may not be resolvable to species level for a number of reasons. When percentage identity parameters fall below a user-defined threshold or if the reference database is sparsely populated, a single OTU/ASV might be assigned to multiple species. Furthermore, higher taxonomic levels such as genus or family may be targeted for assignment rather than at the species level. When reporting on taxonomy assignment, include methods used, threshold parameters, the reference database(s) and the taxonomic ranks assigned to each OTU/ASV. For more information on taxonomic assignment, we refer readers to Hleap et al. (2021) and Keck et al. (2023) for best practices and how to handle problems that can arise in this step.

Additional data filtering

Bioinformatic decontamination, not to be confused with laboratory decontamination, conducts post-processing quality control of OTU/ASV tables. No consensus exists on how to filter data using sequenced PCR controls. This process may employ a range of steps, including cut-offs based on sample sequencing read depth, OTU/ASV prevalence across samples, OTU/ASV proportional abundance and reads identified in the positive and negative PCR controls (De Barba et al. 2014; Hänfling et al. 2016; Kelly et al. 2018; Gold et al. 2022). Any methods, thresholds and justification for removal of reads, OTU/ASVs or samples should be reported.

Normalisation of read data

Normalisation of read data is the process by which reads are scaled or transformed to allow more accurate comparisons across samples. Normalising high throughput sequencing read data can help account for biases that occur during PCR and sequencing (Weiss et al. 2017). Given the compositional nature of metabarcoding, the number of reads per sample may be highly variable within a sequencing run and not representative of true biological variation (Gloor et al. 2017; Willis 2019; Schloss 2024). There are many ways of normalising read data (e.g. relative abundance, eDNA index, rarefaction, Hellinger transformation), such that, authors may choose to employ different normalisation steps across different analyses. Therefore, specific normalisation methods and programme versions should be reported.

Sequencing summary statistics

Initial summary metrics from a sequencing run are often unreported, but are helpful to evaluate sequencing efficiency and evaluate metabarcoding results. They also indicate whether library preparation methods were highly effective or if changes might be necessary for future studies. Relevant summary metrics to report are: total number of raw reads; number or percentage of reads assigned to MIDs; total number or percentage of reads that passed bioinformatic filter thresholds (i.e. quality control, trimming and merging); number or percentage of reads assigned to taxonomy; the taxonomic level of assignments (i.e. species level, family level, phyla level); total number or percentage of reads unassigned to taxonomy; and total number of reads used in the final analysis or any subsequent analyses. Additionally, reporting per-sample metrics such as average number of reads per sample and minimum and maximum number of reads per sample allows an evaluation of both sequencing depth and evenness across samples.

The practice of sequencing control samples (both negative and positive) is highly recommended in the literature as deviations from expected results may indicate issues that should be addressed or accounted for either computationally or in data interpretation (Bell et al. 2017; Zinger et al. 2019; van der Loos and Nijland 2020). Reporting results from sequencing of negative controls provides transparency about potential contamination in sample processing. Likewise, read numbers from positive controls or mock communities can validate laboratory and bioinformatic procedures, as well as indicate possible false positive detections in field samples (Hänfling et al. 2016; Hleap et al. 2021). As discussed in the General Quality Control Reporting section, we acknowledge that the types and numbers of control samples taken depend on a study’s goals and constraints; regardless, reporting on the use of controls, sequencing results from controls and how these reads were accounted for in the bioinformatic decontamination step provide valuable quality control information on a study.

Data archiving and availability

Alongside peer-review publications, data sharing is a core pillar of the sciences. Raw sequence data should be deposited in the International Nucleotide Sequence Database Collaboration (INSDC) nucleotide sequence archives (e.g. NCBI, ENA, DDBJ). Data and metadata should adhere to established standards such as the Minimum information about a marker gene sequence (MIMARKS) and minimum information about any sequence specifications (MixS) developed by Genomic Standards Consortium (Yilmaz et al. 2011). The use of MIxS checklists significantly improves the availability of methods and metadata and consistency of vocabulary in archived datasets, allowing interoperability and reusability for future studies (Hassenrück et al. 2021). Processed data can also be archived using similar data repository structures (e.g. Dryad, Zenodo, Figshare) to ensure all analytical steps, especially taxonomic assignment, can be reproduced (Deiner et al. 2017). Such efforts are becoming standard in scientific publication with the proliferation of open-source code sharing platforms (e.g. GitLab, SourceForge, GitHub etc.).

Although significant bottlenecks to achieving completely FAIR metabarcoding data practices remain, at minimum, all data (i.e. OTU/ASV table, sequencing data, metadata), methods and code used to generate, analyse and interpret metabarcoding data should be provided to open-access repositories to support open science principles and enhance trust and reproducibility. Metabarcoding studies generate valuable biodiversity data and FAIR practices will allow biodiversity monitoring at increased speeds and scales, which is needed given current global biodiversity loss. All efforts should be made to adhere to the growing consensus to serve biodiversity data in large international biodiversity repositories (e.g. Global Biodiversity Information Facility (GBIF Data Standards) and Ocean Biodiversity Information System (OBIS)). In particular, we point users to Djurhuus et al. (2017), Abarenkov et al. (2023) and Silliman et al. (2023), whose articles provide all data needed to fully reproduce the study and ensure resultant biodiversity datasets follow FAIR principles. For an overview and examples of where and how to archive different datasets and metadata, see the NOAA Omics Study Data Management Guide (2024). For downloadable eDNA survey data templates, see NOAA Omics Study Data Templates (2024) that incorporate MIMARKS criteria (Yilmaz et al. 2011) and Better Biomolecular Ocean Practices protocol templates (BeBOP-OBON 2024), based on Minimum Information about an Omics Protocol (MIOP, Samuel et al. 2021).


The advent of DNA metabarcoding has transformed our ability to census and assess biological communities. With this new capacity for generating biological data at increasing sensitivity and scale comes a deluge in environmental DNA research datasets, hence it is important that we pause and take stock of what minimum metadata should accompany environmental metabarcoding publications. Here, we identified a suite of sampling, analytical and data archiving information that should be included in publications to meet FAIR data standards and provide context for eNA results to be repeatable and interpretable. We recommend authors report these in the manuscript, supplemental materials or online resources linked to the publication (e.g. GitHub, etc.). This is crucial for the use and reuse of eNA data in global scale biomonitoring efforts (Berry et al. 2021; Chavez et al. 2021). Furthermore, as eNA metabarcoding methods become more routinely adopted by experts and non-experts alike, users must be able to adequately evaluate and communicate methods and data.

We recognise that the generation and curation of metabarcoding data is time and labour-intensive and that analyses require substantial computational resources and bioinformatic expertise. This can severely limit the ability of the metabarcoding community to process data quickly and efficiently into actionable biodiversity information (Shea et al. 2023), which only adds emphasis to the need addressed in this study to maximise the usefulness of all metabarcoding datasets generated by ensuring complete and transparent reporting. The MIEM guidelines provided here on minimum information for reporting of environmental metabarcoding data parallel several similar publications by the Genomics Standards Consortium on minimum information requirements for various types of genomic data (e.g. Samuel et al. 2021; Bowers et al. 2017). Future work could support the development of additional resources to ensure truly FAIR metabarcoding data, including: 1) programmatic tools to facilitate ease of data management; 2) international metabarcoding standards via the appropriate International Standards Organisation committee (i.e. ISO/TC 147/SC 5/WG 13); and 3) consensus on best practices for data, methods and software archiving, linking and sharing. As the field continues to develop and rapidly advance, these proposed minimum reporting guidelines may be refined or updated with additional parameters. This enhanced reporting will allow for improved assessment of eNA studies during peer-review and interpretation of information for natural resource decisions.


The authors thank their respective research groups, collaborators and the eDNA community whose collective experiences led to the idea for this manuscript. We also thank Dr. Freya Rowland for assistance with figures and Dr. Brittany Perrotta for helpful comments on this paper. Any use of trade, firm or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. The findings and conclusions in this article are those of the authors and do not necessarily represent the views of the U.S. Fish and Wildlife Service.This publication represents NOAA Pacific Marine Environmental Laboratory Contribution No. 5641.

No funding was reported.

Author contributions

Conceptualization: Katy Klymus, Zachary Gold, Susanna Theroux. Visualization: Jacoby Baker, Katy Klymus. Writing- original draft: Katy Klymus, Jacoby Baker, Cathryn Abbott, Rachel Brown, Joseph Craine, Zachary Gold, Margaret Hunter, Mark Johnson, Devin Jones, Michelle Jungbluth, Sean Jungbluth, Yer Lor, Aaron Maloy, Christopher Merkes, Rachel Noble, Nastassia Patin, Adam J Sepulveda, Stephen Spear, Joshua Steele, Miwa Takahashi, Alison Watts, Susanna Theroux. Writing- review & editing: Susanna Theroux, Cathryn Abbott, Katy Klymus. Supervision/ Admisitration: Katy Klymus, Susanna Theroux.

Author ORCIDs

Katy E. Klymus

Jacoby D. Baker

Cathryn L. Abbott

Rachel J. Brown

Joseph M. Craine

Zachary Gold

Margaret E. Hunter

Mark D. Johnson

Devin N. Jones

Michelle J. Jungbluth

Sean P. Jungbluth

Yer Lor

Aaron P. Maloy

Christopher M. Merkes

Rachel Noble

Nastassia V. Patin

Adam J. Sepulveda

Stephen F. Spear

Joshua A. Steele

Miwa Takahashi

Alison W. Watts

Susanna Theroux

