Research Article |
Corresponding author: Daniel Marquina ( daniel.marquina@nrm.se ) Academic editor: Florian Leese
© 2022 Daniel Marquina, Tomas Roslin, Piotr Łukasik, Fredrik Ronquist.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Marquina D, Roslin T, Łukasik P, Ronquist F (2022) Evaluation of non-destructive DNA extraction protocols for insect metabarcoding: gentler and shorter is better. Metabarcoding and Metagenomics 6: e78871. https://doi.org/10.3897/mbmg.6.78871
|
DNA metabarcoding can accelerate research on insect diversity, as it is cheap and fast compared to manual sorting and identification. Most metabarcoding protocols require homogenisation of the sample, preventing further work on the specimens. Mild digestion of the tissue by incubation in a lysis buffer has been proposed as an alternative, and, although some mild lysis protocols have already been presented, they have so far not been evaluated against each other. Here, we analyse how two mild lysis buffers (one more aggressive, one gentler in terms of tissue degradation), two different incubation times, and two DNA purification methods (a manual precipitation and an automated protocol) affect the accuracy of retrieving the true composition of mock communities using two mitochondrial markers (COI and 16S). We found that protocol-specific variation in concentration and purity of the DNA extracts produced had little effect on the recovery of species. However, the two lysis treatments differed in quantification of species abundances. Digestion in the gentler buffer and for a shorter time yielded better representation of original sample composition. Digestion in a more aggressive buffer or longer incubation time yielded lower alpha diversity values and increased differences between metabarcoding results and the true species-abundance distribution. We conclude that the details of non-destructive protocols can have a significant effect on metabarcoding performance. A short and mild lysis treatment appears the best choice for recovering the true composition of the sample. This not only improves accuracy, but also comes with a faster processing time than the other treatments.
DNA extraction, insects, metabarcoding, non-destructive, taxonomy
In the current scenario of global change and dramatic decline in insect biomass and diversity (
Although single-specimen HTS barcoding – the generation of large numbers of DNA barcodes from individual DNA extractions and PCR amplification – is gaining momentum (
One alternative to homogenisation is to use only parts of every specimen in the sample, normally a leg, for DNA extraction (
A compromise between tissue homogenisation and analysis of the DNA leaked into preservative ethanol is then to incubate the sample in a digestion buffer that is moderately aggressive, i.e. one with only modest effects on specimen tissues. Such mild lysis methods could potentially retrieve the DNA from the insects efficiently, while preserving the morphological features that are needed to identify or describe the species. Potentially, differences in lysis efficiency could lead to higher representation in the pool of the DNA of small soft-bodied insects relative to homogenisation, facilitating their discovery. They may also allow additional genetic analyses of the specimens at a later stage, if desired. Furthermore, mild lysis protocols may be faster, less labour-intensive, and require fewer steps and less instruments than destructive methods. The composition of such buffers is, with some exceptions, standardised: they consist of a salt (to help precipitate DNA and to separate it from proteins bound to it), a detergent (to break cell membranes and bind to hydrophobic compounds), an inactive pH stabiliser, and a digestion enzyme (proteinase K). Depending on the buffer, it may also contain chelants (compounds that sequester metallic ions that are needed to activate enzymes) or metal salts to activate these enzymes. In addition, the buffers can also contain other compounds that have proteolytic activity without enzymatic intervention, such as dithiothreitol (DTT). It is the presence and concentration of these ingredients that will determine how aggressive the lysis is for the tissue and to what extent it will disrupt the morphological integrity of the specimen.
Mild lysis buffers have often been used in previous metabarcoding studies (e.g.
Given this consistency in previous outcomes, what is currently missing is a comparison of the performance of different mild lysis protocols. This is the topic we address in this paper. Specifically, we investigate the impact of different parameters of the lysis process (buffer type, digestion time, and purification method) and their effect on the accuracy of metabarcoding in estimating the composition of the original insect sample, as well as on the morphological preservation of the original samples.
To assess the impact of alternative choices in mild lysis protocols, we tested: 1) two different non-commercial buffers already used in previous studies, one being more chemically aggressive than the other; 2) two different incubation times; and 3) two different ways of purifying the DNA from the lysate (a manual and a robot-automated process). We then measured the performance of each method by comparing the metabarcoding results to the true composition of mock insect communities. Specifically, we focused on species detection, alpha diversity, and retrieval of the true species-abundance distribution (in terms of individual counts or biomass).
A total of 23 terrestrial arthropod species (including 21 insects, a collembolan and a crustacean) were obtained as live specimens from either standardised cultures of commercial suppliers, donations by other laboratories and the v (
The number of individuals of each species was always the same regardless of the community type. For example, Drosophila melanogaster was represented by six individuals in all ten community types, while D. yakuba was represented by three individuals in all community types, except for Community H from which D. yakuba was excluded). The total number of insects per community ranged from 70 to 74. Since our specific interest was in the impact of species properties on the detectability of species, rather than of individual variation in body size, the average weight per specimen was computed by recording the dry weight of ten individuals per species. We then selected all individuals of a species from the cultures to be of the same size. As a result of this rationale, the study is well aimed to detect effects of species averages, whereas it provides little information on the effects of added variation in individual size. The proportions of each species in the communities in terms of weight and numbers was registered as abundance in biomass or number, respectively.
Reference DNA barcode sequences were constructed using one additional individual of each species and one individual of the bumblebee Bombus pascuorum (used as a control for quantifying index swapping during the sequencing run, see below) as follows. DNA was extracted from all individuals using the KingFisher Cell and Tissue DNA kit on a KingFisher Duo robot (Thermo Fisher Scientific), except for Encarsia formosa, Folsomia candida and Tuberculatus annulatus, which were processed using QIAamp DNA Micro kit (Qiagen) following the manufacturer’s protocol. The entire barcoding region of COI (658 bp) and a fragment of 450–490 bp (depending on the species) close to the 5’ end of the mitochondrial 16S rRNA gene were amplified and sequenced. COI was amplified using the primers jgLCO1490-jgHCO2198 (
The mock communities were subjected to four different lysis treatments, resulting from the combination of two digestion buffers (referred to as B1 or Gentle, and B2 or Aggressive) and two incubation times (Fig.
Schematic overview of the experimental design and treatments. Each community (A–J) was represented by four initial replicates (four tubes containing the same mix of species, each of them with the same number of specimens). Each replicate was incubated in one buffer (B1 or B2) for 2 h 30’ (LT1) or 5 h (LT2). Then, DNA from each lysate (B1–LT1, B1–LT2, etc.) was extracted by using both a manual salt saturation-salt protocol (P1) or silica-coated magnetic beads in a robot (P2). Thus, for each original community (A–J), eight DNA extracts were obtained, one from each combination of purification method, buffer and lysis time.
Of the reagents used, sodium dodecyl sulphate (SDS) is a surfactant that breaks cells by disrupting the membrane, and both proteinase K and DTT have proteolytic activity. Thus, having a higher concentration of these compounds, buffer B2 is expected to produce a more aggressive digestion than buffer B1. In addition, buffer B1 contains EDTA, which inactivates proteolytic enzymes, such as nucleases and proteases, while buffer B2 contains Ca2+, which activates these enzymes.
In terms of incubation times, samples were split into two different treatments: 2 hours and 30 minutes vs. 5 hours (referred to as LT1 and LT2). When combined with the different lysis buffers, this resulted in four different treatments, each representing a unique combination of buffer and lysis time (B1–LT1, B1–LT2, B2–LT1, B2–LT2). In each case, the samples were incubated in 20 mL of buffer at 56 °C with a slight agitation in an orbital shaker, for each community replicate. Once the incubation time was over, the lysate was decanted out and collected for DNA extraction. The insects were rinsed with molecular biology-grade water, then with clean 70% ethanol, and finally stored in 80% ethanol. The insects remained at all times inside the tubes.
DNA from each lysate was extracted using two purification methods (referred to as P1 and P2). In short, the protocols differed as follows (see Suppl. material
A 321 bp fragment of COI was amplified with a modified version of the primer pair BF2-BR1 (
The detailed bioinformatic pipeline with commands and options can be found in Suppl. material
Subsequently, the reads were dereplicated and chimeras were filtered out using the uchime_denovo function in VSEARCH v2.7.1 (
All data analyses were performed in R v.3.3.3 (R Core Team 2017). We first visualised the differences in the estimated community composition produced by different methods, using non-metric multidimensional scaling (NMDS), based on Bray-Curtis dissimilarity (functions vegdist and metaMDS from package ‘vegan’ (
As the experimental set-up did not allow for an analysis of correlation of real abundance to read abundances, we then investigated how the lysis buffers and the digestion times performed in recovering compositional-related metrics from the samples. We compared the differential in alpha diversity (Shannon index, H’) from the original mock communities to estimates obtained through metabarcoding (function diversity from package ‘vegan’). We also calculated the Kullback-Leibler Divergence between the observed community and the original, known composition of the mock community (function KLD from package ‘LaplacesDemon’ (
Regardless of the digestion treatment, insects were recovered in a good state and maintained exoskeletal integrity as well as colour features. We observed no effect of incubation time, but those insects from communities digested with buffer B2 presented a faint red-brownish tone and a slightly higher transparency after the lysis step (Fig.
Examples of the mock communities after digestion. Insects in the community type E incubated in lysis buffer B2 for 5 h (A) are slightly discoloured (effect of the storage in ethanol), but the morphology and the colouration patterns are well preserved. Insects in the community type E incubated in lysis buffer B2 for 5 h (B) have a faint reddish tone (Acheta domesticus specimen inside the rectangle) and the colour of some of the small individuals have slightly faded (Aphidoletes aphidimyza specimen inside the circle), but the colours and morphology of most specimens is still reasonably well preserved.
The MiSeq run produced a total of 5,112,064 sequences of COI and 9,700,315 of 16S, of which 4,467,494 (reads/sample = 16,682 ± 14,797 (mean ± s.d)) and 9,264,057 (reads/sample = 35,710 ± 23,509 (mean ± s.d)), respectively, passed the quality filters. With COI metabarcoding, we recovered all 23 species, but no reads were obtained from sample C2.2.1, so this sample was excluded from all subsequent analyses. With 16S, we did not recover Porcellionides pruinosus nor any of the Formica species. MOTU tables with species identification, abundance in each sample and representative sequences are provided in the Suppl. material
The concentration of the DNA extracts ranged from 4.6 to 371.5 ng/µL (Suppl. material
All three factors and their interaction had significant effects on the DNA concentration of the extracts (Suppl. material
Several factors had a significant effect on the purity of the extract (ratio A260/A280; Suppl. material
Concentration and purity of the DNA extracts from different extraction methods. DNA concentration (upper panel) clearly increases with buffer aggressiveness and incubation time using the manual salt saturation purification protocol, while the increase due to incubation time is less clear, but the effect of lysis buffer can still be appreciated when using the robot purification protocol. Note that the starting input volume of lysate is 7.5 mL for the manual purification method and 225 µL (30 times smaller approximately) for the robot, while elution volume is 150 µL in both cases. Purity of the DNA extract (lower panel) is higher for the manual purification and the longer incubation times, regardless of the lysis buffer.
These differences had no significant effect on the number of species recovered with 16S. However, for COI, the Buffer and Purification effects were both significant, albeit small. Specifically, the number of species recovered was slightly higher for buffer B2 and for the salt saturation protocol. For COI, the buffer affected the mean number of species recovered (Suppl. material
In terms of community composition, the metabarcoding results did not resemble the mock communities in neither specimen abundance nor biomass and they all clustered together in the NMDS plot (Suppl. material
Representation of sequencing reads relative to biomass per sample. Relative representation is calculated as the log-ratio between relative read abundance and relative abundance in biomass of each species in each replicate. A higher log-ratio indicates that the species is over-represented in the metabarcoding dataset, while a lower value indicates that the species is under-represented in relation to its relative abundance in biomass in the mock community. Each community’s replicates are indicated in the following order: B1.LT1.P1, B1.LT1.P2, B1.LT2.P1, B1.LT2.P2, B2.LT1.P1, B2.LT1.P2, B2.LT2.P1, and B2.LT2.P2. No reads were recovered for COI from sample C-B2.LT2.P1.
Regarding alpha diversity, replicates incubated in buffer B1 returned values of the Shannon Index (H’) with a smaller decrease compared to those of the mock community based both on biomass and specimen numbers, irrespective of whether they had been incubated for LT1 or LT2 (Fig.
Estimated decrease in alpha diversity, measured as Shannon Index (H’), for different incubation treatments and markers. A short and gentle lysis (B1, LT1) recovers diversity values closer to the actual values of the original sample measured in biomass with the 16S marker (blue) and an increase in lysis time and chemical aggressiveness of the buffer returns more distant values. With COI (orange), this effect is dependent only on the lysis buffer. The decrease in alpha diversity compared to the mock samples based on number of individuals is greater than based on biomass, but they reproduce the same pattern. The black line indicates H’(mock community) = H’(metabarcoding sample).
The values of the Kullback-Leibler Divergences (i.e. the amount of information that is needed to transform the relative abundance distribution of species obtained with metabarcoding data into the original distribution of the mock communities) for the four treatments (two buffers, two incubation times) were quite similar regardless of whether community composition was based on biomass or specimen number (Fig.
Kullback-Leibler Divergences between the true community composition and the metabarcoding estimates of it. Community composition is measured in terms of biomass (left) or the number of specimens (right). Data are shown both for the 16S marker (blue) and the COI marker (orange). For the 16S marker, the divergence between the metabarcoding and the original sample increases with buffer aggressiveness and incubation time, while for the COI marker, the divergence is only affected by an increase in buffer aggressiveness.
For single specimens, DNA extraction protocols that preserve the morphology of the insects have been used for more than a decade (
As far as we could judge, all lysis protocols applied here essentially left insect morphology intact. Much of this beneficial outcome may be due to the limited lysis time used, since the digestion step of each protocol here examined was less than 5 hours. The incubation times are, thus, much shorter than those used in other protocols for terrestrial insect samples, which range from around 14 hours of lysis to up to 72 hours (
The high level of morphological preservation here achieved is hope-inspiring. From a taxonomist perspective, it allows the later description of new species from the material treated. As an exciting scenario, we may then apply bulk metabarcoding to generate taxonomic lists of contents for large sets of bulk samples. Such lists may then be offered to expert taxonomists, allowing them to direct their input to those samples offering the highest reward in terms of new and interesting species to examine. This is a quantum leap from the tedious manual sorting of mass samples, where the main effort typically goes into dealing with the most abundant and typically less interesting taxa. Such tasks represent the poorest possible use of skilled taxonomists, whose availability tends to be in short supply.
All methods of DNA extraction tested here yielded DNA of sufficient concentration and quality for successful PCR and sequencing. DNA concentration was consistently higher for the replicates of which lysates had been purified with the salt saturation method, compared to those in which the extraction was done using silica-coated magnetic beads in a robot. This is not unexpected, as the salt saturation method started from a lysate volume of 7.5 mL, whereas the robot-based method started from only 225 µL, with both methods ending at a final elution volume of 150 µL. Thus, it is likely that the difference in initial quantity of DNA is reflected in the final concentration. In addition, it is important to note that the amount of beads in the reaction was the same for all four treatments, which might explain why the longer lysis times did not increase the DNA concentration more. In any case, the results are intuitive: a more chemically aggressive buffer, a longer digestion time and a larger input volume will all increase the concentration of DNA in the extract.
In terms of sample purity, the overall values achieved were high. Compared to the value of the A260/280 ratio considered ideal (1.8–2.0), we found highly adequate readings (1.5–1.9). In three of the four lysis treatments, the replicates purified with the salt saturation method produced a higher A260/A280 ratio, but those corresponding to buffer B2 and short incubation time showed the opposite relation. This could possibly be explained if one assumes that the higher concentration of proteinase K and the presence of DTT and Ca2+ in buffer B2 released more proteins to the lysate than buffer B1, but that the short incubation time was not enough to hydrolyse these proteins completely. However, in general, a longer incubation time produced DNA extracts with higher purity, same as the manual purification with the salt saturation method. Although significant, these differences had only a small effect on the species recovery. This differs from a previous study, in which the salt saturation method was shown to provide metabarcoding data with higher species richness than those provided by commercial kits(
In our study, all analyses were based on communities of known composition. In terms of species recovery, the COI marker was able to detect all 23 species we used in the communities, albeit not all species were detected in all the samples where they were present. For instance, small and delicate insects like Tuberculatus annulatus (Hemiptera) and Chrysoperla carnea larvae (Neuroptera) yielded low read abundances in most samples and appeared missing from many of those that were subjected to lysis with buffer B2. The 16S marker failed to detect the isopod Porcellionides pruinosus and the two species of Formica (Hymenoptera). The absence of P. pruinosus is not surprising, as the 16S primers used in this study had low degeneracy and were designed to target only insects (
Importantly, the current study was explicitly aimed at evaluating the effect of the extraction protocol on the detectability of species against a community background of standardised complexity. Our communities were varied by excluding a single species amongst 23, whereas we did not vary the background complexity from highly species-poor to highly species-rich samples. As variation in the latter dimension provides an important aspect of natural communities, its effect should be the target of future studies. What we do see is that species detection rates, even against a standardised background, will never reach 100%. This pattern matches that reported by other studies. When using communities of known composition, both
Accurate abundance estimation is currently one of the main research fronts in metabarcoding. Early studies suggested that metabarcoding was unsuitable for quantification, and that accurate abundance estimates might only be achieved through shotgun sequencing using mitochondrial metagenomics (e.g.
Importantly, the current study focuses on the impact of a single step in sample processing: that of the lysis phase of DNA extraction. What we find is some factors that clearly contribute to a poor overall relationship between species abundance and read abundance For instance, the failure of the metabarcoding data to estimate the specimen counts of the different species appears to be due to a large extent to the over-representation, at least in some treatments, of DNA from large species that were represented by only one or a few specimens (see, for example, Calliphora, Acheta and Locusta in Suppl. material
In terms of other future improvements, it is quite plausible that mild lysis protocols can be further optimised to more accurately represent the contents of the sample processed. In our experiment, a moderately chemically aggressive lysis buffer and a short incubation time tended to reduce the difference in estimates of alpha diversity compared to the actual communities significantly more than a more destructive buffer or longer incubation times, showing that even when no precise estimates about species abundance distributions can be obtained, still some ecological insight can be drawn using this method. This likely illustrates simple considerations based on the relation between body volume and surface. During the early part of the lysis, both large and small insects presumably release DNA from their tissues in contact with the buffer, at a rate proportional to the exposed surface (roughly equivalent to the square of the body size). As the incubation time increases, the digestion will continue towards the internal tissues and, thus, the released DNA will be proportional to the volume of the individual (roughly equivalent to the cube of the body size) (
As a final caveat, we would like to re-emphasise that our results are based on a series of mock communities, the complexity of which is drastically lower than many real samples from Malaise traps or other efficient insect traps. In the future, experiments similar to the ones here conducted should thus be aimed at varying other aspects of community context, including significantly more diverse samples (
We have shown that non-destructive DNA extraction of mixed samples of terrestrial insects can provide DNA highly suitable for metabarcoding, while, at the same time, preserving the morphology of the individuals in good condition. Furthermore, our results indicate that a short and mild digestion followed by automated and commercially available DNA purification methods produces metabarcoding datasets that reliably retrieve most of the species in the original sample, while also providing closer approximations in measures linked to the relative abundance of the species. Metabarcoding can, thus, provide much help in the process of species discovery and description, and free up the expertise of taxonomists for the tasks where it matters the most.
DM, TR, PL and FR conceived and designed the study; DM prepared the communities, conducted the experiment, analysed the data, prepared figures and tables and wrote the first draft of the manuscript. All authors contributed critically to subsequent versions of the manuscript and gave final approval for publication.
Raw sequencing reads assigned to samples can be accessed freely at https://zenodo.org/record/6559343#.YoYBR5NBw-R.
We are indebted to Brandon Cooper (University of Montana) and Laura Van Dijk (Stockholm University) for very generously providing us with numerous specimens of some of the species in our mock communities. We are also thankful to all members of the Ronquist lab and the team of the Insect Biome Atlas (https://www.insectbiomeatlas.com/) for the fruitful discussions during the design and analysis of the study. We thank Owen Wangensteen for his valuable comments on the manuscript. This project was supported by the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement no. 642241 (BIG4 project, https://big4-project.eu) and by the Knut and Alice Wallenberg Foundation (KAW 2017.088). TR was further supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 856506; ERC-synergy project LIFEPLAN). Research reported in this publication was supported by the National Institute Of General Medical Sciences of the NIH of the US under award number R35GM124701 to Brandon S. Cooper. (insect cultures).
Community types
Data type: excel file
Explanation note: Species composition of the ten different mock community types, with information on the abundance and biomass per species.
COI reference library
Data type: FASTA file
Explanation note: Reference fasta file for the COI barcodes of each species.
16S reference library
Data type: FASTA file
Explanation note: Reference fasta file for the 16S barcodes of each species.
Bioinformatic pipeline
Data type: pdf file
Explanation note: Detailed bioinformatic pipeline followed, specifying software used (with references), commands, and options for each step.
COI MOTU table
Data type: excel file
Explanation note: Metabarcoding dataset from the COI marker, with taxonomy of the species and reads/sample information.
16S MOTU table
Data type: excel file
Explanation note: Metabarcoding dataset from the 16S marker, with taxonomy of the species and reads/sample information.
Tables and figures
Data type: docx file
Explanation note: Supplementary tables (S1-S14) and figures (S1-S4).
Lysis buffers and purification protocols
Data type: docx file
Explanation note: Detailed recipe for the two buffers used in the experiments and protocol for the manual purification method (P1).