Review Article |
Corresponding author: Wiebke Sickel ( wiebke.sickel@thuenen.de ) Academic editor: Alfried Vogler
© 2023 Wiebke Sickel, Vera Zizka, Alice Scherges, Sarah J. Bourlat, Petra Dieker.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Sickel W, Zizka V, Scherges A, Bourlat SJ, Dieker P (2023) Abundance estimation with DNA metabarcoding – recent advancements for terrestrial arthropods. Metabarcoding and Metagenomics 7: e112290. https://doi.org/10.3897/mbmg.7.112290
|
Biodiversity is declining at alarming rates worldwide and large-scale monitoring is urgently needed to understand changes and their drivers. While classical taxonomic identification of species is time and labour intensive, the combination with DNA-based methods could upscale monitoring activities to achieve larger spatial coverage and increased sampling effort. However, challenges remain for DNA-based methods when the number of individuals per species and/or biomass estimates are required. Several methodological advancements exist to improve the potential of DNA metabarcoding for abundance analysis, which however need further evaluation. Here, we discuss laboratory, as well as some bioinformatic adjustments to DNA metabarcoding workflows regarding their potential to achieve species abundance estimation from arthropod community samples. Our review includes pre-laboratory processing methods such as specimen photography, laboratory methods such as the use of spike-in DNA as an internal standard and bioinformatic advancements like correction factors. We conclude that specimen photography coupled with DNA metabarcoding currently promises the greatest potential to achieve estimates of the number of individuals per species and biomass estimates, but that approaches such as spike-ins and correction factors are promising methods to pursue further.
abundance, biodiversity monitoring, COI, insects, metabarcoding, spike-ins
Biodiversity is declining at alarming rates worldwide (
DNA-based approaches offer a promising alternative to arthropod diversity surveys and monitoring (
However, implementation in policy-mandated monitoring programmes is still hampered (
Several factors within the metabarcoding workflow affect extraction of abundance data (
A meta-analysis targeting 22 DNA metabarcoding studies revealed a weak relationship between biomass and generated read counts, with a large degree of uncertainty (
A variety of different approaches have emerged recently that can help improve abundance and biomass estimates from metabarcoding data, including species-specific correction factors applied to read counts, spike-ins, primer optimisation or multi-locus metabarcoding (e.g.
Here, we review potential methods that can improve abundance and biomass estimation in arthropod whole organism community (WOC) samples. Considering the variety of approaches and applications, we aim to formulate general recommendations for DNA metabarcoding workflows in arthropod monitoring. In addition, we explore approaches from metabarcoding studies targeting e.g. aquatic samples that have so far not been applied to terrestrial arthropods and their trophic interactions.
We performed an online literature search in Google Scholar and EBSCO Discovery Service on 17 January 2022 using the keywords [(quant*) AND (insect) AND (metabarcod*) AND (DNA)] and included only peer reviewed publications in English. Although the search term specifically targeted insects, we use the more general term “terrestrial arthropods” throughout the text. Additionally, some publications were added to the list based on the authors’ expertise.
We included studies that applied DNA metabarcoding to terrestrial arthropods as target organisms and/or in relation to their trophic interactions within ecosystems (e.g. pollination and food web studies), as these topics are strongly connected and play an important role in monitoring schemes (e.g. ecosystem services of pollination or natural pest control). With these criteria, WOC and tissue samples were included covering also pollen, gut contents and faeces as well eDNA metabarcoding approaches, such as extraction from soil and sample fixative. We excluded studies that applied individual-based DNA barcoding and next generation sequencing (NGS) barcoding, PCR-free approaches as well as long-read sequencing methodologies, as we wanted to focus on metabarcoding specifically. PCR-free approaches are, however, briefly discussed in an outlook section.
Based on 113 publications matching our search criteria (Suppl. material
Category | Approach | Sample types | Quantitative? |
---|---|---|---|
Semi-quantitative metrics | FOO/POO | all | semi-quantitative |
RRA | all | semi-quantitative | |
rarefaction | all | semi-quantitative | |
transformation | all | semi-quantitative | |
Reducing read abundance biases | correction factors via algorithm | WOC samples | yes; virtual specimen counts |
correction factors via mock communities | all | yes | |
spike-ins | all | yes; relative abundance | |
primer optimisation | all | no | |
multi-locus metabarcoding | all | yes | |
Combination of methods | general | all; depending on approach | yes; depending on approach |
photography and body measurements of single specimens | WOC samples | yes; number of individuals per species, biomass |
Abbreviations:
ASV amplicon sequence variant
ddPCR digital droplet PCR
eDNA environmental DNA
FOO / POO frequency of occurrence / percent of occurrence
NGS / HTS next generation sequencing / high-throughput sequencing
qPCR quantitative PCR
RRA relative read abundance
UMI unique molecular identifier
WOC samples whole organism community samples
Reviewing the literature, we identified three main methods to estimate species abundance with metabarcoding (Table
Semi-quantitative metrics. A ASV table as the outcome of a DNA metabarcoding experiment, rows are samples, columns are ASVs, numbers are raw read counts. From the ASV table, semi-quantitative metrics can be derived, e.g. frequency and percentage of occurrence, bipartite networks and relative read abundance. B Frequency and percentage of occurrence derived from ASV table, frequency of occurrence simplifies the ASV table into presence/absence data, indicated by presence or absence of a rectangle (left), when summarising this over all samples, percentage of occurrence can be an informative metric for abundance in a system (right). C Bipartite networks derived from the ASV table, samples and ASVs are nodes, edges indicate presence/absence of the ASVs per sample (left), when summarising this over all samples, link strength can be an informative metric for abundance in a system (right). D Relative read abundance derived from ASV table, relative read abundance for individual samples is determined by dividing raw read counts of individual ASVs by total read count per sample (left), when summarizing this over all samples, mean relative read abundance can be an informative metric for abundance in a system (right); abbreviations: S – Sample, ASV – Amplicon sequence variant, RRA – relative read abundance; ASVs are colour coded and refer to ASVs from (A), artwork: Alice Scherges.
Reducing read abundance biases. A Processing mock communities (bottle) with defined composition allows determining taxon-specific correction factors, which can be applied to correct relative read abundance of samples with unknown composition, indicated by a red line. Correction factors can only be determined for taxa included in the mock community. B Correction factors can be determined using iterative algorithms and a guess-and-test approach based on a morphological reference data set (not shown). The correction factors can be applied to correct relative read abundance of samples with unknown composition, indicated by a red line. Correction factors can only be determined for samples that show a good agreement in terms of taxa detected between the reference and the DNA metabarcoding data set. C Adding spike-ins, e.g. a defined amount of genomic DNA, to all samples and co-amplifying and co-sequencing the reference material allows correcting raw read counts by simply dividing read counts assigned to taxa (blue and brown bars) by read counts assigned to the spike-in (red bars); abbreviations: RRA – relative read abundance, S – sample, artwork: Alice Scherges.
DNA metabarcoding is comprehensively used to assess presence/absence from complex sample mixtures. Whilst this can be informative for some ecological assessments, including biodiversity measures (e.g. alpha diversity), interaction analyses (e.g. multi-trophic networks, food web structures, plant-pollinator interactions) require some form of (semi-)quantitative data. There are different approaches to conduct semi-quantitative analysis of DNA metabarcoding data (Fig.
Mock community experiments have shown a positive correlation of read counts per species with genomic template DNA concentration in pollen and WOC samples (
It may be possible to extend correction factors to closely-related taxa based on phylogenetic relatedness, whereby similar skews of read counts are assumed. In microbial analyses (
Correction factors can also be calculated using an iterative algorithm which mitigates data skews due to copy number variations of the target gene (
The algorithm can only be applied to samples with high concordance between morphological and DNA-based taxonomies, but it is a promising approach, as the predicted numbers of individuals per species were highly correlated with actual count data (
Spike-ins (Fig.
The use of spike-ins is not restricted by sample type, but comes with a low increase in effort and costs, because the spiking of samples is an additional, albeit minimal, step in the laboratory workflow, which has to then be integrated in the bioinformatic workflow. It should be noted that spike-in correction does not correct for biases across species within samples (
A variety of studies have shown that primer design is an essential part determining the success of DNA metabarcoding studies, both in terms of taxon recovery and read abundance biases (
Different genetic markers suffer from different taxonomic biases and thus some studies employ several different loci for the same organismal group, which is referred to as multi-marker (
Locus-specific biases can be mitigated by using rank order abundance or median-based proportional abundance summarised over all loci, as has been demonstrated in pollen DNA metabarcoding (
For pollen samples, as no single universal plant barcode exists (CBOL Plant Working Group 2009;
Some studies combine DNA metabarcoding with other methodologies. Thereby, DNA metabarcoding may be used to obtain a comprehensive species list of the detected taxa, whilst abundance estimates (e.g. number of individuals per species, DNA copy number) and/or biomass estimates are obtained with another methodology. One common example is the complementary morphological analysis of gut content remains, pollen grains or arthropod specimens (
One noteworthy approach of method combination is the photographic documentation of specimens from WOC samples before analysing them with DNA metabarcoding. This combined approach enables individual counts, body size measurements and thereby biomass estimation (
The available literature has revealed that the majority of (terrestrial) arthropod DNA metabarcoding studies do not sufficiently address the matter of estimating the number of individuals per species and/or biomass (Suppl. material
Additionally, the collected literature focused on approaches that apply to the sample processing stage of metabarcoding workflows. The effect of bioinformatics and data analysis strategies on abundance and biomass estimations is strongly underrepresented (Suppl. material
As expected, there is a variety of adjustments attempting to improve abundance and biomass estimation via DNA metabarcoding (Suppl. material
Currently, the most promising approach is to combine DNA metabarcoding with specimen photography, which would ideally be automated (
Recommended workflow for biodiversity assessments with bulk samples and DNA metabarcoding that obtains count and biomass data with species level taxonomic identifications. A Specimens from a bulk sample (bottle) are first processed individually. B Processing includes specimen photography (camera), specimen counts (abacus), body size measurements (caliper) and biomass estimation (scales). Ideally, this is done automatically (green robot icon) and involves automatic image recognition to achieve preliminary taxa identifications on broad taxonomic scales. C Specimens are then re-combined to a community sample, a spike-in is added and DNA is extracted (microcentrifuge tube). D DNA metabarcoding delivers species level identifications and raw read counts (ASV table), which are corrected via the spike-in. E Image data is combined to a taxon list containing count, size and biomass data (taxon list). F Image data and DNA metabarcoding data are combined using machine learning approaches (data assembly, orange robot icon) to obtain a data set that contains information on species level identities, along with count data and biomass estimates (taxa bubbles), abbreviations: ASV – amplicon sequence variant, artwork: Alice Scherges.
Regardless of application or sample type, general recommendations for every metabarcoding workflow are to use appropriate positive controls, i.e. mock communities (
For eDNA, obtaining count data is extremely difficult. Since eDNA dynamics (
Additionally, (e)DNA-based analyses open up new avenues that move away from the traditional estimation of numbers of individuals per species or biomass. One such avenue to pursue further is more sensitive detection rates of parasitism and invasive species (
In the following, we explore selected approaches from the wider literature that were not within the scope of the present review. However, there is high potential for the implementation in monitoring programmes in the future. Novel data analysis pipelines are constantly being developed and some focus on integrating uncertainties associated with the dynamics of DNA in the environment (
When grouping sequencing reads as ASVs instead of molecular operational taxonomic units, DNA metabarcoding can potentially deliver conservative abundance estimates in the sense of “minimum census estimates”, similar to those obtained from non-invasive sampling of hair and faeces (
There is an urgent need to shift away from a purely morpho-taxonomic approach and related indicators for long-term arthropod monitoring, towards an integrative framework, in which morphological and molecular biological methodologies are applied in parallel. This requires the development and implementation of novel proxies and indicators to indirectly assess species abundance based on genetic data. One possible approach is to apply Hill numbers to DNA-based and morpho-taxonomic assessments alike, as this improves comparability and they can even be applied to (phylo-)genetic data (
PCR-free methods represent a further alternative (
Even though there are many details to consider when applying DNA metabarcoding to arthropod monitoring, pollen and food web analyses, we were able to make some general recommendations. Generally, DNA metabarcoding should always be optimised for maximum taxon recovery and minimal amplification biases. The processing of adequate positive and negative controls is essential. Incorporating appropriate biological and technical replicates reduces the impact of certain methodological biases.
DNA metabarcoding as a rapid tool to obtain species occurrences is a very promising method for large-scale monitoring activities, especially when abundance estimates are not required. When combining DNA metabarcoding with specimen photography and body size measurements, the number of individuals per species and biomass can also be assessed.
Going forward, creating new DNA-based metrics to report (relative) abundances based on genetic units rather than processing individual specimens offers new innovations addressing the most central questions in arthropod monitoring, as these rarely require absolute measures of abundance. Detecting and assessing trends in monitoring relates more to within- and between-sample comparisons taken across spatial and temporal scales, which can be achieved with metabarcoding. Additionally, DNA metabarcoding facilitates the assessment of ecosystem services in a time- and cost-efficient manner, via processing pollen and food web analyses.
There are still many challenges to face until metabarcoding data can deliver robust abundance and biomass estimations. Currently, sorting and individual handling of specimens from WOC samples is unavoidable to obtain such data. However, it is important to apply both classical morpho-taxonomy and molecular biological approaches in parallel, which will allow the management and analysis of the large amounts of data generated by monitoring programmes in a timely and cost-effective manner. Thus, despite its limitations, DNA metabarcoding can and should be incorporated as an additional tool in routine arthropod monitoring to increase sample sizes and cover a broader range of taxonomic groups.
The authors have declared that no competing interests exist.
No ethical statement was reported.
The presented study is part of the joint project "Monitoring of biodiversity in agricultural landscapes" (MonViA) that has been funded by the German Federal Ministry of Food and Agriculture.
WS and PD devised the study. WS performed the literature review and drafted the first version of the manuscript. WS, PD, VZ and SJB were substantially involved in subsequent drafts. AS created the figures. All authors agreed to the final version of the manuscript.
Wiebke Sickel https://orcid.org/0000-0002-0038-1478
Vera Zizka https://orcid.org/0000-0001-8486-8883
Alice Scherges https://orcid.org/0009-0002-0824-7991
Sarah J. Bourlat https://orcid.org/0000-0003-0218-0298
Petra Dieker https://orcid.org/0000-0003-3468-4810
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Background information
Data type: docx
Explanation note: Background information on DNA metabarcoding and methodological adjustments to improve abundance and biomass estimates.
Literature collection
Data type: xlsx
Explanation note: Relevant publications were identified in two steps: 1) via an online literature search in Google Scholar and EBSCO Discovery Service; keywords: ((quant*) AND (insect) AND (metabarcod*) AND (DNA)), including only peer-reviewed publications in English, results were screened for suitability based on title and abstract; 2) addition of publications based on the authors’ expertise; included studies that applied metabarcoding with arthropods and/or in relation to their trophic interactions within ecosystems (e.g. pollination, food web studies); excluded topics were: individual-based DNA barcoding and NGS barcoding, long-read sequencing methodology, mito-/metagenomics and genome skimming.
Categories assessed in the literature review
Data type: xlsx
Explanation note: For each category, possible parameters are given, together with examples and more detailed explanation where appropriate and necessary.
Evaluation of methodological approaches
Data type: xlsx