Research Article |
Corresponding author: Jon Lapeyra Martin ( jon_lapeyra@hotmail.com ) Academic editor: Thorsten Stoeck
© 2022 Jon Lapeyra Martin, Ioulia Santi, Paraskevi Pitta, Uwe John, Nathalie Gypens.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Martin JL, Santi I, Pitta P, John U, Gypens N (2022) Towards quantitative metabarcoding of eukaryotic plankton: an approach to improve 18S rRNA gene copy number bias. Metabarcoding and Metagenomics 6: e85794. https://doi.org/10.3897/mbmg.6.85794
|
Plankton metabarcoding is increasingly implemented in marine ecosystem assessments and is more cost-efficient and less time-consuming than monitoring based on microscopy (morphological). 18S rRNA gene is the most widely used marker for groups’ and species’ detection and classification within marine eukaryotic microorganisms. These datasets have commonly relied on the acquisition of organismal abundances directly from the number of DNA sequences (i.e. reads). Besides the inherent technical biases in metabarcoding, the largely varying 18S rRNA gene copy numbers (GCN) among marine protists (ranging from tens to thousands) is one of the most important biological biases for species quantification. In this work, we present a gene copy number correction factor (CF) for four marine planktonic groups: Bacillariophyta, Dinoflagellata, Ciliophora miscellaneous and flagellated cells. On the basis of the theoretical assumption that ‘1 read’ is equivalent to ‘1 GCN’, we used the GCN median values per plankton group to calculate the corrected cell number and biomass relative abundances. The species-specific absolute GCN per cell were obtained from various studies published in the literature. We contributed to the development of a species-specific 18S rRNA GCN database proposed by previous authors. To assess the efficiency of the correction factor we compared the metabarcoding, morphological and corrected relative abundances (in cell number and biomass) of 15 surface water samples collected in the Belgian Coastal Zone. Results showed that the application of the correction factor over metabarcoding results enables us to significantly improve the estimates of cell abundances for Dinoflagellata, Ciliophora and flagellated cells, but not for Bacillariophyta. This is likely to due to large biovolume plasticity in diatoms not corresponding to genome size and gene copy numbers. C-biomass relative abundance estimations directly from amplicon reads were only improved for Dinoflagellata and Ciliophora. The method is still facing biases related to the low number of species GCN assessed. Nevertheless, the increase of species in the GCN database may lead to the refinement of the proposed correction factor.
18S rRNA copy number, correction factor, plankton, protists, quantitative metabarcoding
During the last decades, unicellular eukaryotic plankton has been used as an indicator of ecosystem change due to its rapid response to environmental variations (e.g.
The small ribosomal subunit (SSU) 18S rRNA gene is the most widely used marker for the detection and classification of aquatic eukaryotic protists. The different gene regions such as V9 (
Quantification of marine protists based on ASV/OTU relative abundances has been largely discussed in the literature (
In the case of prokaryotes, phylogeny-based approaches have been applied to estimate 16S rRNA GCN and potentially correct this bias (
Some approaches have been lately developed to assess and/or mitigate quantification bias in metabarcoding: making use of control material in fish (
Therefore, assuming that the variation in relative abundances of different organisms and/or taxa can be partly attributed to the natural inherent differences in 18S rRNA GCN, the present study attempts to investigate the use of a GCN correction factor (GCN-CF) to mitigate the quantitative bias (cell number and biomass) in DNA metabarcoding approaches applied to marine protistan plankton surveys. We therefore compared the quantitative discrepancy between microscopic and molecular methods aiming to answer the following question: can we improve the estimations of cell and biomass relative abundances from metabarcoding reads if we consider the taxa-specific 18S rRNA GCN?
In order to answer this question (1) we compiled marine unicellular eukaryotic plankton data available in published literature of 18S rRNA GCN, (2) we created a taxa-specific GCN-CF from the GCN database (GCN database) and (3) we applied the GCN-CF to the metabarcodes of 15 environmental DNA (eDNA) samples collected in the Belgian Coastal Zone (BCZ). Corrected metabarcodes were compared to their corresponding microscopical counts and biomass estimations in relative abundances.
We focused on four single cell marine eukaryotic plankton groups (protists) commonly determined and enumerated under the microscope (
We built up a species-specific 18S rRNA GCN database for marine protists that contained species specific GCN per cell values, along with the estimated cellular biovolume and carbon content (C-content), which were acquired from several studies published in the literature and are listed in Suppl. material
From the GCN database, we calculated the mean, median, and standard deviations GCN per cell for each of the four established groups: Bacillariophyta, Dinoflagellata, Ciliophora and flagellated cells. Median GCN cell-1 values were used due to data limitation and presence of outliers. These values per group were used as key components for the correction factors and development of the mathematical equations.
The equation (i) estimates the corrected metabarcoding cell relative abundances (MTB_CFcell) for each defined plankton group (g) within a single sample. Metabarcoding number of reads (expressed in GCN) are divided by the corresponding plankton group CF (GCN cell-1), and the following division by the total sum of the four groups estimates ultimately the proportional contribution to the community (%).
The equation (ii) estimates the cellular GCN:C-content ratios (CC ratio) for each plankton group (values reported in Table
Plankton group specific correction factors (expressed in gene copy number per cell) cellular gene copy number (GCN) and GCN:C-content ratios (CC ratio) obtained with the application of equation (ii) for the Belgian Coastal Zone microscopy sample set.
Plankton group | Correction Factor (GCN cell-1) | CC ratios (GCN:pg C) |
---|---|---|
Bacillariophyta (Stramenopiles) | 166 | 4.41 |
Ciliophora (Alveolata) | 71710 | 3.26 |
Dinoflagellata (Alveolata) | 4919 | 27.17 |
Flagellated cells | 5.23 | 0.94 |
g = (Bacillariophyta, Dinoflagellata, Ciliophora, Flagellated cells)
(i)
(ii)
(iii)
The relative abundances of the uncorrected metabarcoding results and the corrected ones from equation (i) and (iii), MTB_CFcell and MTB_CFbio respectively, were directly compared to MCP. This comparison was performed for the fifteen environmental samples.
Seawater samples were collected at 3 m depth using 4 L Niskin bottles connected to a CTD sensor (Sea-bird SBE25). The monitoring in the Belgian Coastal Zone (
The DNA samples for the study of the protistan community were collected vacuum filtering 500–800 mL of water (from Niskin) through 0.22 µm polycarbonate filters (47 mm) and storing the samples immediately at -20 °C. Total DNA was extracted from filters using NucleoSpin Soil extraction Kit (Macherey-Nagel, Düren, Germany) following manufacturer’s protocol. For a maximum efficiency of the extraction from the filters, a sample lysis step was added using 10 mL cryotubes (using a high velocity bead beater for 10 min). Up to three filters were pooled and used for DNA extraction when no sufficient biomass was found on a single filter. Standard polymerase chain reactions (PCR) were performed to amplify the universal eukaryote SSU 18S rRNA gene. Primers TAReuk454FWD1 (5′-CCAGCASCYGCGGTAATTCC-3′), TAReukREV3 (5′-ACTTTCGTTCTTGATYRA-3′) were used to target the V4 region of the 18S rRNA gene (
The reads were denoised and merged with DADA2, v1.16. (
Seawater samples from St. 330 (100 mL) were fixed with Lugol’s solution (1% final concentration), stored in the dark until microscope analysis using Utermöhl-type sedimentation (
The cellular C-biomass was calculated based on biometric factors determined for each species and cell density. Mean species-specific biometric values were obtained from published data (
For each sample, proportional cell and C-biomass abundances from the microscopy dataset were calculated (percentage of the total cells and C-biomass per defined plankton group) and directly compared to the corrected and non-corrected metabarcoding data (i.e., relative abundance of each taxonomic group, estimated as the percentage of sequencing reads assigned to a taxonomic group compared to the total reads per sample). The microscopy results dataset used for this study is available in Suppl. material
All statistical analyses, data processing and plotting were performed using R version 4.0.2 (
The 18S rRNA GCN database comprised a total of 65 species’ absolute GCNs per cell, biovolumes and estimated biomasses (see Suppl. material
(A) 18S rRNA gene copy number per cell belonging to the major single celled eukaryotic plankton groups: Bacillariophyta (Stramenopiles), Dinoflagellata (Alveolata), Ciliophora (Alveolata) and flagellated cells. See Supplementary Table
Regarding the cellular carbon content, a significant positive correlation (R2 = 0.79; p < 0.001) was found in the GCN database dataset between the gene copy numbers and the estimated C-biomass of the single cell eukaryotes. When the same analysis was performed taking into consideration the taxonomic groups established (Fig.
The comparison of the data produced by the three datasets used in this study, sorted by taxonomic groups, is shown in Suppl. material
The cell relative abundances for the described plankton groups belonging to the 15 samples set from the BCZ are displayed in Fig.
Comparison between relative abundances (%) of the cell counts from microscopy (MCP; blue), reads from DNA metabarcoding (MTB; yellow) and corrected read abundance percentages (MTB_CFcell; green) per sample. Right panels: box plots summarizing the comparison percentages of cell counts (MCP), metabarcoding reads abundance (MTB) and the corrected (MTB_CFcell) out of the 15 marine water samples per plankton group: (A) Bacillariophyta (Stramenopiles), (C) Dinoflagellata, (B) Ciliophora and (D) flagellated cells. The vertical line inside the box plots represents the median, the top and the bottom hinges correspond to the interquartile range, and the whiskers show the minimum and maximum non-outlier values. Note the scale difference between plankton groups.
Bacillariophyta relative abundances from microscopy ranged between 1.4 – 61.9%, with an average (avg.) of 16.3 ± 14.7% throughout the total set (Fig.
When the correction factor was implemented, the estimation of relative abundances decreased considerably (avg. = 2.7 ± 2.6%) for Bacillariophyta (Fig.
Distributions of the two beta regression analyses, performed to assess the efficiency of the CF to improve cell relative abundances, were significant (p < 0.01) and detailed results are shown in Suppl. material
Equation (ii) was applied to calculate the cellular GCN:C-content ratios for the species present in the microscopy dataset of the BCZ and are reported in Table
The comparison of the biomass estimates (in relative abundances) from microscopy together with metabarcoding values and the average effect of the correction approach (MTB-CFbio) are displayed in Fig.
Boxplots summarizing the comparison of the 15 marine water samples biomass proportions from inverted microscopy (MCP), with DNA metabarcoding number of reads (MTB) and the corrected C-estimation values from metabarcoding reads (MTB_CFbio) using median cellular C-content values per taxum from microscopy dataset. (A) Bacillariophyta, (B) Dinoflagellata, (C) Ciliophora, and (D) flagellated cells. The vertical line inside the box plots represents the median, the top and the bottom hinges correspond to the interquartile range, and the whiskers show the minimum and maximum non-outlier values.
Equation (iii) was applied using CC ratios to estimate C-biomass proportions from metabarcoding reads (MTB_CFbio), which included median cellular C-content per taxum values from the microscopy dataset. Results showed that Bacillariophyta relative biomass corrections were underestimated from 42.3 ± 23.9% to 11.1 ± 9.2% (median) when compared to microscopy (Fig.
The two beta regression analyses were performed to assess the effect of the correction factor to improve the C-biomass relative abundances. The first one compared microscopy and MTB. The second model compared MTB-CFbio using microscopy taxa specific median C-biomass dataset values (Suppl. material
In metabarcoding, previous attempts to control biases have primarily focused on correcting single technical biases such as fixation of the sample, DNA-extraction, group-specific primer choice, or PCR (
However, to the authors’ knowledge, the only CF application in a plankton metabarcoding (freshwater) survey was carried out by
In addition, as previously proposed by various authors (
Cell number
Semi quantification by metabarcoding across groups has proven to be problematic in eukaryotes larger than 2 μm (Santoferrara et al. 2019) and good correlations between the proportion of reads and absolute species abundances in published literature are rare (
Our results regarding the contrast of the two methods — microscopy (cells) and metabarcoding (sequencing reads) — over the set of fifteen environmental samples from the Belgian Coastal Zone (Fig.
When the CF was applied, cell relative abundances were strongly improved in both alveolate groups and flagellated cells (Fig.
This variability was not so obvious in the other groups studied. In accordance with the GCN database results (Fig.
C-Biomass
There is a realization of the fact that number of reads is not generally well suited to determine absolute abundances (
As regards the entire marine plankton community,
The application of the CF to estimate biomass using GCN:C-content ratios, which included GCN and median C-content per taxum from the microscopy dataset, only improved the biomass estimations in two out of the four groups studied: Dinoflagellata and Ciliophora. This might be happening due to dinoflagellates that showed the highest GCN:C-content ratio and therefore the application of the correction factor resulted in the biggest impact correcting the relative abundance estimates.
The main weaknesses behind the use of GCN:C-biomass ratios for estimating biomass relative abundances from metabarcoding reads presented in this study are certainly attributed to the limitation of the number of species representing each of the plankton groups described. Particularly in the case of Ciliophora. Since there was a single ciliate species identified on our microscopy dataset, we were aware of the misrepresentation of the C-content values for the taxum Ciliophora. However, we decided to include this group in our biomass estimation analysis (application of equations ii and iii) in order to be able to perform the generalized linear modeling based on beta distribution for the entire community and assess the general effect of the correction factor.
In addition, there are more sources of bias related to the method that should be considered. For example, the estimations of C-content proportions in microscopy might be partially caused by errors in cell size measurements (and use of mean species-specific values), which are cubed to give volumes, and the effect of fixatives, which may cause shrinking or swelling of cells (
DNA metabarcoding still has practical limitations (inherent to the techniques) that our GCN correction factor did not address. Some technical biases are more important than others, and amplification by PCR together with primer biases are one of the largest sources (
Two major limitations were encountered as regards the GCN database and the application of the correction factors to estimate cell and C-biomass relative abundances from metabarcoding reads. First, due to the fact that few marine protists’ 18S rRNA GCNs have been assessed (
Secondly, the GCN values used for the application of equations (i), (ii) and (iii) (see Table
Traditionally, trend analyses and dynamics of autotrophic plankton biomass have been often based on chlorophyll-a (Chl-a) pigment concentration by fluorimetry (Strickland, Parsons, 2002). To date, combination methods such as cell counting or quantitative PCR remain the only means to estimate absolute abundances (
At the present time, community relative abundances are useful and reliable in the context of ecological interpretations (
In the current study, we present a promising mean to measure more accurately the relative abundances of the defined marine protistan plankton groups. Our findings highlighted the need to account for these taxonomic differences in the 18S rRNA gene copy number in marine eukaryotic community studies and we proved that these might largely impact the estimates of relative abundances. We believe that the major disproportions given by biological biases in DNA-barcoding in plankton surveys may be strongly reduced using a gene copy number correction factor that can partially be explained by the differences among plankton groups.
However, the development of GCN-CF can be challenging depending on the taxum/specimen studied, as it requires finding a clear relationship between DNA reads and taxum/specimen proportions. This might be hardly possible due to the accumulation of quantification biases (e.g. cell density, cell biomass, genome size, gene copy number, intra-genomic polymorphisms) in certain marine plankton group and taxa.
The applied CF did not account for technical biases, but it greatly improved the relationship between DNA sequence read abundances and cell percentages by the traditional morphological microscopy for three out of the four groups studied, helping to normalize severe disproportions. The use of the simple and time-cost effective method presented could open a window in the meta-omics era where DNA-barcoding becomes a predominant technique to assess the major taxonomic groups both qualitatively and quantitatively. Since we are persuaded that the one-to-one relationship between 18 rRNA amplicon reads and cells/C-biomass is no longer acceptable to depict protistan plankton communities, the here presented approach not only improves the quantification approach in plankton metabarcoding, but beyond that, it opens up many new opportunities and challenges: data from eDNA metabarcoding could be easily compared and combined with the output of other approaches such as mathematical modeling of the lower trophic levels in aquatic ecosystem (often C-biomass based), which is an aspiration for many biologists.
Further investigation is needed, not only as regards the development of the eukaryotic 18S rRNA gene copy number database (which may lead to the refinement of the proposed method) but also regarding the sample set used to apply the CF, which should aim for a wider temporal and geographical scope. Accounting for these factors will help to develop correction factors to improve estimates of community abundances.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
JLM contributed to data acquisition and analysis, data interpretation, preparation of figures and drafting of the manuscript. NG contributed to conceptualization, resources, supervision, project administration, writing (review and editing) and funding acquisition. IS contributed to conceptualization, preparation of figures, data analysis and drafting of the manuscript. UJ and VP contributed to resources, supervision, writing (review and editing). All authors reviewed the manuscript.
This research was supported by the MixITiN (www.mixotroph.org) project, which received funding from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Actions (MSCA) grant agreement No 766327. JLM was granted with MSCA funded ITN-ETN MixITiN Early Stage Researchers (ESRs) support; JLM, NG received financial support from the Fonds David et Alice Van Buuren. UJ was financially and logistically supported through the POF IV, topic 6 and subtopic 2 research program of the Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research.
We recall with affection and gratitude the dedicated technical assistance received from the crew of RV Simon Stevin and Heincke. We would like to express special thanks to the staff of the VLIZ (Flanders Marine Institute, the Hellenic Centre for Marine Research (HCMR) and Nancy Kühne, the eco-evolutionary genomics group, and AWI (Alfred Wegener Institute) for their valuable support provided.
Supplementary Data 1
Data type: morphological, genomic
Explanation note: Gene copy number dataset containing 18S rRNA copy number, biovolume and C-content of species.
Supplementary Data 2
Data type: genomic, occurences
Explanation note: ASV counts per sample of the 15 sample set (time-series) from the Belgial Coastal Zone.
Supplementary Data 3
Data type: occurences, morphological
Explanation note: Summary of the species identified in microscopy dataset, biovolumes and biomass estimations.
Tables S1–S4, Figures S1–S4
Data type: Figures, tables
Explanation note: Supplementary tables and figures.