Research Article |
Corresponding author: Kelly A. Meiklejohn ( kameikle@ncsu.edu ) Academic editor: Birgit Gemeinholzer
© 2022 Madison A. Moore, Melissa K.R. Scheible, James B. Robertson, Kelly A. Meiklejohn.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Moore MA, Scheible MK, Robertson JB, Meiklejohn KA (2022) Assessing the lysis of diverse pollen from bulk environmental samples for DNA metabarcoding. Metabarcoding and Metagenomics 6: e89753. https://doi.org/10.3897/mbmg.6.89753
|
Pollen is ubiquitous year-round in bulk environmental samples and can provide useful information on previous and current plant communities. Characterization of pollen has traditionally been completed based on morphology, requiring significant time and expertise. DNA metabarcoding is a promising approach for characterizing pollen from bulk environmental samples, but accuracy hinges on successful lysis of pollen grains to free template DNA. In this study, we assessed the lysis of morphologically and taxonomically diverse pollen from one of the most common bulk environmental sample types for DNA metabarcoding, surface soil. To achieve this, a four species artificial pollen mixture was spiked into surface soils collected from Colorado, North Carolina, and Pennsylvania, and subsequently subjected to DNA extraction using both the PowerSoil and PowerSoil Pro Kits (Qiagen) with a heated incubation (either 65 °C or 90 °C). Amplification and Illumina sequencing of the internal transcribed spacer subunit 2 (ITS2) was completed in duplicate for each sample (total n, 76), and the resulting sequencing reads taxonomically identified using GenBank. The PowerSoil Pro Kit statistically outperformed the PowerSoil Kit for total DNA yield. When using either kit, incubation temperature (65 °C or 90 °C) used had no impact on the recovery of DNA, plant amplicon sequence variants (ASVs), or total plant ITS2 reads. This study highlighted that lysis of pollen in bulk environmental samples is feasible using commercially available kits, and downstream DNA metabarcoding can be used to accurately characterize pollen DNA from such sample types.
DNA extraction, DNA metabarcoding, internal transcribed spacer 2, pollen, surface soils
Seed plants, which account for >90% of land plants, produce pollen grains which vary in shape size, aperture, and morphology (
The current “gold standard” for identifying pollen is via microscopic examination of grain morphology where distinguishing features can permit genus-level identification (
With advances in sequencing technologies, researchers have assessed the reliability and accuracy of DNA-based approaches for characterizing pollen (
DNA metabarcoding offers several advantages over traditional morphological identification of pollen from bulk environmental samples. 1) Increased taxonomic resolution, as most land plants can be identified to genus level at minimum, but often down to the species level (70–90% of cases [
Numerous studies have successfully implemented DNA metabarcoding to characterize pollen from diverse sample types, including ancient sediments, soil, insects and air filters (e.g.,
To address this pivotal gap, this study focused on assessing the lysis of morphologically (i.e., size, shape, aperture) and taxonomically diverse pollen for one of the most common bulk environmental sample types for DNA metabarcoding, surface soil. To achieve this, surface soil collected from three states in the continental U.S. representing various geological and climate features was spiked with a known four-taxa artificial pollen mixture and subjected to DNA isolation using two commercially available soil extraction kits. The impact of heated incubation (65 °C or 90 °C) on the lysis of pollen was assessed for each sample using both kits in duplicate (total n, 76). The internal transcribed spacer subunit 2 (ITS2) was amplified in duplicate for each sample, with duplicates pooled prior to library preparation and sequencing using Illumina chemistry. Downstream data analysis focused on assessing variation in the recovery of the baseline plant community along with known spiked pollen taxa, to identify the optimal method for pollen lysis.
Mixed corbiculae pollen granules collected from North America were used in this study. In a sterile biosafety cabinet, corbiculae pollen granules were initially sorted by eye into groups according to color. A second round of sorting was performed under a stereomicroscope (Fisher Scientific, Hampton, NH) to confirm that each group contained pollen of the same color and possessed similar morphological features. Each group of colored corbiculae pollen was subsequently treated as a different species. The following steps were completed to carefully remove the nectar, sugars, wax and other compounds associated with corbiculae pollen without lysing individual grains: 1) 1 mL of sterile water was added to ~1 cm3 corbiculae pollen in a 2 mL microcentrifuge tube, 2) the tube was incubated at 600 rpm for 30 minutes at room temperature, and 3) excess liquid was removed using a pipette and washed corbiculae pollen was allowed to air dry at room temperature in a fume hood for 21 days prior to storage at -20 °C until use. To confirm the taxonomic identity of the corbiculae pollen (n, 4 colored groups) the following was completed: 1) a subsample was ground using a disposable mortar and pestle and the DNA subsequently isolated following the manufacturers protocol for the Qiagen DNeasy Plant Mini Kit (Qiagen), 2) a 590 bp region of rbcL was amplified and bidirectionally Sanger sequenced as outlined in
Approximately 100 g of surface soil (top 1–10 cm) were collected during October 2019 from three locations with differing geology and climate: 1) Erie, Colorado, 2) Cary, North Carolina, and 3) Laboratory, Pennsylvania. Each sample was initially collected into a plastic zip-lock bag. Once the samples reached the laboratory, they were immediately transferred into separate sterile food-grade foil tins and allowed to air dry inside a fume hood. Once dry, each soil sample was sieved three times through a sterile food-grade metal kitchen sieve, in order to remove any large debris (i.e., plant and insect fragments) and homogenize the soil prior to downstream analysis.
To determine which isolation method could robustly lyse pollen, the artificial pollen mixture contained pollen with diverse morphological features (i.e., size, shape, aperture) from four different orders. A four-taxa artificial pollen mixture consisting of both dry granular (P. tremuloides [Salicaceae] and Z. mays [Poaceae]) and dry corbiculae pollen (Symphyotrichum spp. [Asteraceae] and Trifolium spp. [Fabaceae]) was created. To assess sensitivity (or limit of detection), the relative abundance of pollen from each taxa varied in the artificial mixture as follows: approximately 0.4% – Poaceae, 9.6% – Salicaceae, 40.5% – Asteraceae, and 49.5% – Fabaceae. The weight of pollen added to the mixture from each taxa, determined using an analytical scale (AG104, Mettler Toledo, Columbus, OH), was as follows: P. tremuloides – 0.395 g, Z. mays – 0.0179 g, Symphyotrichum spp. – 1.67 g, and Trifolium spp. – 2.045 g. The artificial pollen mixture was spiked into separate 5 g subsamples of each surface soil 1) at a concentration (i.e., number of grains/g of soil) which mimicked the reported naturally occurring concentration of pollen in soils collected from North Carolina (
Reported naturally occurring concentrations of pollen in soils from North Carolina (NC), Pennsylvania (PA) and Colorado (CO) used to calculate the weight of artificial pollen mixture to be spiked into subsamples. * denotes that the weight (g) of a single Zea mays pollen grain (2.47E-07;
Pollen concentration (grains/cm3) | Pollen weight/gram* | Weight (g) artificial pollen mixture spiked into 5 g soil subsample | ||
---|---|---|---|---|
Spiked normal | Spiked partial | |||
Erie, CO | 200,000 ( |
0.049 | 0.25 | 0.049 |
Cary, NC | 117,500 ( |
0.029 | 0.15 | 0.029 |
Laboratory, PA | 6,000 ( |
0.0015 | 0.0074 | 0.0015 |
Pollen present in both unspiked (baseline sample) and spiked (both normal and partial) North Carolina, Pennsylvania and Colorado soils were lysed and the DNA subsequently isolated using two different commercial soil DNA isolation kits: DNeasy® PowerSoil® Kit (Qiagen) and the DNeasy® PowerSoil® Pro Kit (Qiagen). These kits were chosen for use in this study, as they 1) are reported by the manufacturer to yield highly pure DNA, 2) use a patented Inhibitor Removal Technology to remove compounds which negatively impact downstream DNA analysis (i.e., humic acid associated with soil), and 3) have been used in some other studies for isolating pollen from diverse sample types (
Estimated number of pollen grains for each of the four taxa (rounded to the nearest single grain) spiked into 5 g subsamples of soil from Colorado (CO), North Carolina (NC) and Pennsylvania (PA). Number given in parentheses indicates estimated number of grains in a 100 mg subsample used for DNA isolations, providing complete homogenization after pollen spike. * denotes commercially purchased dry granular pollen, ^ denotes washed corbiculae pollen.
Spiked normal | Spiked partial | |||||
---|---|---|---|---|---|---|
CO | NC | PA | CO | NC | PA | |
Zea mays * | 54 (1) | 32 (0) | 2 (0) | 11 (0) | 6 (0) | 0 (0) |
Populus tremuloides * | 1,184 (24) | 704 (14) | 38 (1) | 237 (5) | 141 (3) | 8 (0) |
Symphyotrichum spp. ^ | 5,009 (100) | 2,978 (60) | 159 (3) | 1,002 (20) | 596 (12) | 32 (1) |
Trifolium spp. ^ | 6,131 (123) | 3,646 (73) | 195 (4) | 1,226 (25) | 729 (15) | 39 (1) |
A ~350 bp fragment of the nuclear ITS2 was chosen for DNA metabarcoding in this study. The rationale behind only using a single nuclear marker (as opposed to a combination of nuclear and plastid markers commonly used in plant DNA metabarcoding) was three-fold: 1) internal testing demonstrated that the primers chosen for use can successfully amplify DNA from the four taxa in the artificial pollen mixture if present (results not shown), 2) ITS2 sequences for the four taxa included in the artificial pollen mixture are in GenBank for comparisons, and 3) there are more overall ITS2 sequences available for comparison on GenBank than trnL (~84,000 vs 59,000, respectively [as of 20/7/2022]). Amplification of ITS2 was completed using ITS2F (5’- ATGCGATACTTGGTGTGAAT -3’;
Raw sequence data were processed and analyzed on the NC State University High Performance Cluster as follows 1) Cutadapt (v2.10) (
To strike a balance with respect to informational content and ease of interpretation, resulting statistical analyses focused on families in which both the total number of reads and ASVs were >1%. A total of nine families met both of these criteria: Asteraceae (daisies, sunflowers), Brassicaceae (mustards, cabbages), Caryophyllaceae (carnations), Fabaceae (legumes, peas, beans), Juglandaceae (walnuts), Poaceae (grasses), Rosaceae (roses), Salicaceae (willows, poplar) and Ulmaceae (elms) (Suppl. material
Principal components analyses were conducted to examine discriminatory ability of ASV read abundances between sample kit, method and location, with data then being plotted against the first two principal components. One observation was removed from 5-family considerations due to excessive influence on the model. For examining variability between duplicates, log-scale differences for each pair of duplicates were calculated for each of the nine key families. When examining whether a) pollen spikes were successful and b) the resulting sequence reads were recovered in the expected ratios, the difference between the averaged spiked duplicates (both partial and normal) and the averaged unspiked duplicates were obtained for the four target taxa at the genus level (i.e., ASVs assigned to Populus, Zea, Symphyotrichum or Trifolium). If spiked taxa were increased then the one-sample Hotelling’s T2 with 2 numerator and 10 denominator degrees of freedom was used to compare the isometric log-transformed observed ratios to the expected ratios. Zea was not considered for this analysis as all spiked samples saw reduced Zea compared to the unspiked. Statistical t-tests and Pearson correlation analyses were completed using JMP Pro, Version 16.0.0 (SAS Institute Inc., Cary, NC). The MVTests (
The final dataset of ASVs used in analyses are available in FigShare (10.6084/m9.figshare.20377146).
When using the PowerSoil Pro Kit for DNA isolations, significantly (p <0.0001 [t(41)=-7.62]) higher DNA quantities were obtained over the original PowerSoil Kit regardless of the incubation temperature; 28.5 ± 15.8 ng/µL vs. 7.08 ± 5.25 ng/µL, respectively. The incubation temperature used did not significantly impact the DNA yield with either kit; p = 0.242 (t(31)=-1.19; PowerSoil) and p = 0.634 (t(27)=-0.48; PowerSoil Pro) (Suppl. material
In this study, we observed an overall weak positive correlation between DNA yield and resulting library yield (r = 0.46). Notably, the DNA quantification method used in this study (Qubit™ HS DNA Assay Kit) quantifies all double stranded DNA present in the sample, regardless of source or length. While the quantity of only plant DNA isolated from each sample would have provided a more accurate and useful comparison, no commercially available plant-specific DNA quantification kits currently exist. After processing data through the DNA metabarcoding pipeline, a total of 5,746 ASVs encompassing 2,286,926 reads were recovered across all samples. When ASVs which a) did not match to sequences derived from a plant specimen (kingdom, Viridiplantae; n, 4,172), and b) were present in the reagent blanks (n, 28) were excluded, a total of 1,574 ASVs encompassing 878,771 reads across all samples remained for downstream statistical analyses. Notably, the vast majority of excluded ASVs were those for which taxonomic classification even at the highest level (superkingdom) was not obtained (given as ‘unknown’ in taxize output; n, 3,667). The average (± standard deviation) of the total ASVs was 75.9 ± 31.7 (range 2–149), which related to 12,205 ± 5,100 (range 33–33,467) reads per sample (Suppl. material
To assess the overall effect of extraction kit (PowerSoil or PowerSoil Pro kit) and incubation temperature (65 °C or 90 °C) on the plant community recovered, soils not spiked (i.e., baseline samples) with the four-taxa artificial pollen mixture (n, 24 [12 samples in duplicate]) were initially evaluated. At a broad level, no statistical difference was noted in the number of total reads (p = 0.1904 [t(21)=1.35]) or ASVs (p = 0.1102 [t(21)=1.66]) recovered between the two kits (PowerSoil or PowerSoil Pro). Incubation temperature did not have a statistical impact on the recovery of total reads (p = 0.666 [t(10)=-0.445] and p = 0.428 [t(10)=-0.825] for PowerSoil and PowerSoil Pro, respectively) or total ASVs (p = 0.249 [t(9)=-1.233] and p = 0.944 [t(10)=0.072] for PowerSoil and PowerSoil Pro, respectively) with either kit. To compare the differences in taxonomic composition between kits, methods and locations, a PCA using the ITS2 ASV read counts was completed (Fig.
Principal component analysis of read counts for amplicon sequence variants belonging to one of nine key plant families in unspiked (baseline) soil samples collected from Colorado (CO; circles), North Carolina (NC; squares) and Pennsylvania (PA; diamonds) determined by ITS2 DNA metabarcoding. The outline color of the shape denotes the kit (blue = PS [PowerSoil]; or orange = PSP [PowerSoil Pro]) and the color intensity denotes incubation temperature (light, 65 °C; dark, 90 °C). Axes represent the first and second principal components with percent variance explained in parentheses.
A comparison between the two extraction methods was also completed using the spiked soils (partial and normal; n, 48) by focusing only on key families which were not included in the artificial pollen mixture (n, 5; Brassicaceae, Caryophyllaceae, Juglandaceae, Rosaceae, Ulmaceae). After excluding reads assigned to one of the families of the spiked pollen taxa (Asteraceae, Fabaceae, Poaceae and Salicaceae; ~83% of total reads), 128,965 reads were available for comparisons. No statistical difference was noted in the number of total reads (p = 0.3145 [t(68.3]=1.01]) or total ASVs (p = 0.0916 [t(61.0]=1.71]) recovered for the five key families between the two kits. When comparing the kits for each soil location separately, a statistical difference in the number of total reads and ASVs was only observed for the Pennsylvania soil samples (p = 0.0266 [t(21.0]=2.38] and p = 0.0028 [t(21.1)=3.37], respectively). To compare differences between kits, methods and locations, a PCA using ASV read counts for the five families was completed (Fig.
Principal component analysis of read counts for amplicon sequence variants belonging to five key plant families in spiked soil samples (partial and normal) collected from Colorado (CO; circles), North Carolina (NC; squares) and Pennsylvania (PA; diamonds) determined by ITS2 DNA metabarcoding. The outline color of the shape denotes the kit (blue = PS [PowerSoil]; or orange = PSP [PowerSoil Pro]), fill color of the shape denotes spike level (blue = partial; orange = normal) and the color intensity denotes incubation temperature (light, 65 °C; dark, 90 °C). Axes represent the first and second principal components with percent variance explained in parentheses.
To compare the efficiency of the two different extraction kits on lysing the pollen spiked into the soil samples, ASVs which returned a high-quality match to the same genus of the four spiked pollen taxa were identified. A total of 151 ASVs across all samples were assigned to these four genera with breakdown as follows: Zea (Poaceae) – n, 0; Symphyotrichum (Asteraceae) – n, 6; Populus (Salicaceae) – n, 21; and Trifolium (Fabaceae) – n, 124. While these three genera only represent 9.6% of total ASVs, they encompass 46.3% of all total reads (406,904). A statistically significant difference in the total number of reads assigned to the spiked genera was observed between unspiked and partially spiked samples (p = <0.0001 [t(29.3]=-10.2]), along with unspiked and normal spiked samples (p = <0.0001 [t(26.9]=-7.66]).
The design of this study allows the limit of detection to be evaluated based on the compositional differences between each of the spiked taxa in the artificial mixture. For the comparisons described herein, we are using read count as a proxy for taxa abundance, given numerous previous pollen DNA metabarcoding studies have reported a positive correlation between sequence reads and relative abundance (
Across all spiked samples (n, 48), the proportion of reads assigned to the remaining three genera were as follows: 0.04% – Symphyotrichum, 28.04% –Populus, and 71.92% – Trifolium. These results do not correspond with the proportions of each of these species spiked into the artificial pollen mixture. The spiked samples did not consistently have higher reads for each of the known spiked genera; only eight normal spiked and six partially spiked samples had an increase in reads when compared to the appropriate unspiked sample. In the cases where there was an increase in read count for any of the known spiked genera (except Zea), the proportions were significantly different from the expected for both the normal spiked samples (p = <0.001, T2 1101.5 on 2 and 10 degrees of freedom) and partially spiked samples (p = <0.001, T2 270.2 on 2 and 10 degrees of freedom). The recovered proportions for those samples are given in Table
Observed spiked-in proportions of three genera included in the artificial pollen mixture, based on read counts (Zea excluded). The expected proportions for each genus are given in the column headers (adjusted for the exclusion of Zea). Only samples for which there was an increase in read count when compared to the appropriate unspiked samples are reported. Abbreviations are as follows: CO, Colorado; NC, North Carolina; PA, Pennsylvania; PS, PowerSoil kit; PSP, PowerSoil Pro kit.
State | Kit | Temp (°C) | Populus (9.6%) | Symphyotrichum (40.7%) | Trifolium (49.7%) | |
---|---|---|---|---|---|---|
Spiked normal | CO | PS | 65 | 5.2 | 6.9 | 87.9 |
NC | PS | 65 | 16.7 | 19.3 | 63.9 | |
NC | PS | 90 | 38.5 | 60 | 1.5 | |
NC | PSP | 65 | 30.1 | 30.2 | 39.7 | |
PA | PS | 65 | 31.7 | 39.1 | 29.3 | |
PA | PS | 90 | 35.9 | 35.7 | 28.4 | |
PA | PSP | 65 | 16.4 | 18.3 | 65.3 | |
PA | PSP | 90 | 22.1 | 20.7 | 57.2 | |
Spiked partial | CO | PS | 65 | 9.2 | 0.1 | 90.7 |
NC | PS | 65 | 28.9 | 65.7 | 5.4 | |
PA | PS | 65 | 19.6 | 34.9 | 45.5 | |
PA | PS | 90 | 36.9 | 39.8 | 23.3 | |
PA | PSP | 65 | 36.4 | 31.4 | 32.2 | |
PA | PSP | 90 | 42.1 | 35.5 | 22.4 |
When examining the logarithmic fold change in read count for spiked taxa between spiked and unspiked sample pairs (Fig.
Logarithmic fold change in read count for the spiked taxa (Populus, Symphyotrichum, Trifolium) between spiked (partial and normal) and unspiked pairs. Data are separated by incubation temperature (65 °C top, 90 °C bottom). Abbreviations are as follows: PS, PowerSoil kit; PSP, PowerSoil Pro kit.
This study focused on assessing the lysis of morphologically and taxonomically diverse pollen from one of the most common bulk environmental sample types for DNA metabarcoding, surface soil. To achieve this, an artificial pollen mixture was spiked into surface soils from North Carolina, Colorado and Pennsylvania and the DNA subsequently isolated using two commercially available soil extraction kits widely used by the scientific community. The PowerSoil Pro Kit statistically outperformed the PowerSoil Kit based on total DNA yields. For either kit, incubation temperature (65 °C or 90 °C) used had no impact on the recovery of DNA, ASVs, or total reads. A statistically significant increase in the total number of reads for the spiked pollen species was observed with both kits, which confirmed five key findings of this study: 1) pollen was successfully spiked into soil samples, 2) grain lysis releasing high-quality DNA was achieved using both kits and methods (i.e., different incubation temperatures), 3) the DNA contained within dry and corbiculae pollen was of sufficient quality and quantity to permit amplification and sequencing of ITS2, 4) the primer pair used permit the recovery of ITS2 from broad taxonomic groups, and 5) the components and chemicals associated with soil samples did not negatively impact the isolation of DNA from pollen grains using either kit or method. Future studies should assess whether the PowerSoil Pro Kit is appropriate for lysing pollen from other bulk environmental sample types, such as dust and feces for downstream DNA metabarcoding.
The authors declare no conflict of interest.
We thank Traci Carlson and Emma Timpano for collecting surface soils, along with Dr. Rebecca Irwin (NC State University) for providing corbiculae pollen. We also thank Dr. Christopher Bernhardt (USGS) for feedback on the design of this study, Teresa Tiedge for assistance with sequence data analysis, and Khushi Patel for assistance with statistical analyses. Seed funding was provided by the NC State University College of Veterinary Medicine.
Table S1
Data type: PDF file
Explanation note: Table S1. Raw data on the total number of reads and ASV across all samples for each of the 54 plant families identified. * denotes families in which both the % of total reads and % of total ASV were above 1% and subsequently included in downstream statistical analyses (data from remaining 45 families were combined as ‘Other’). Abbreviations are as follows” ASV, amplicon sequence variant.
Table S2
Data type: PDF file
Explanation note: Table S2. Raw sample data for key metrics in the DNA metabarcoding wet laboratory and bioinformatics processing. Abbreviations are as follows: PS, DNeasy® PowerSoil® kit; PSP, DNeasy® PowerSoil® Pro kit; ASV, amplicon sequence variant.