Research Article |
Corresponding author: Thorsten Stoeck ( stoeck@rptu.de ) Academic editor: Alexander Probst
© 2024 Thorsten Stoeck, Sven Nicolai Katzenmeier, Hans-Werner Breiner, Verena Rubel.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Stoeck T, Katzenmeier SN, Breiner H-W, Rubel V (2024) Nanopore duplex sequencing as an alternative to Illumina MiSeq sequencing for eDNA-based biomonitoring of coastal aquaculture impacts. Metabarcoding and Metagenomics 8: e121817. https://doi.org/10.3897/mbmg.8.121817
|
Oxford Nanopore Technologies has recently launched a duplex sequencing strategy and announced an improved error rate which is in a similar order of magnitude as Illumina sequencing. We therefore conducted a pilot study to assess whether Nanopore duplex sequencing has potential to be used as a technology in eDNA-based marine biomonitoring. Specifically, we investigated bacterial communities of sediment samples collected from Atlantic salmon (Salmo salar) aquaculture installations and compared the ecological trends obtained from short Illumina (V3-V4 region of the 16S rRNA gene) and long Nanopore (full length 16S rRNA gene) sequence reads. The obtained duplex rate of Nanopore amplicon reads with a Phred score ≥ 30 was 36%, notably higher compared to previous reports from bacterial genome sequencing. When inferring alpha- and beta-diversity from Illumina ASVs and Nanopore OTUs, we found highly congruent ecological patterns. Only when collating ASVs and OTUs across taxonomic ranks, beta-diversity analyses of Illumina-data slightly changed, due to the difficulties to assign a taxonomy to short sequence reads. While on family rank, both sequence datasets had good agreement, genus-assignments of Illumina data were critical, resulting in higher disagreement between the two protocols. Our data provide evidence that eDNA-based monitoring of aquaculture-related environmental impacts could equally well be conducted with the improved Nanopore duplex sequencing. We discuss to what extent eDNA-based biomonitoring could benefit from long-read information.
Aquaculture impacts, eDNA-based biomonitoring, metabarcoding, microbial communities, Nanopore sequencing, 16S rRNA gene
Traditional marine ecosystem biomonitoring in research and in practice (compliance monitoring) relies on the morphology-based identification of macroinvertebrate indicators. This is expensive, requires a high-level of taxonomic expertise and, above all, takes more time to produce results than is needed for active management (
A reliable identification of bacterial species, based on 16S rRNA genes, requires the (nearly) complete sequence information included in this taxonomic marker gene (
Therefore, optimisation of the sequencing step in eDNA-based biomonitoring protocols would be the integration of a long-range technology that can compete with Illumina sequencing in accuracy and massive data production. One possibility that has been explored for taxonomic profiling of microbial communities was Nanopore sequencing (e.g.
Very recently, a further advancement in Nanopore sequencing was duplex sequencing, for which ONT reported an accuracy of > 99.9% (Q30) in combination with super-accuracy base-calling models and a duplex rate of > 70% when sequencing Escherichia coli (
To the best of our knowledge, no peer-reviewed publication has taken ONT’s Q30 sequencing chemistry to the test. However, prior research employing Q20+ chemistry has already conducted a comparison between the Illumina MiSeq platform and ONT sequencing. It revealed that the longer sequencing lengths can offset diminished sequence quality, enabling a comparable identification of bacterial communities at higher taxonomic levels between the two platforms (
In this pilot study, we therefore analysed samples collected along an organic enrichment gradient from two salmon farms in Scotland. Using the same eDNA extracts from these samples, we analysed the bacterial community structures with both Illumina MiSeq and the new Nanopore Q30 chemistry with duplex reads. The aim of this proof-of-principle study was to compare measures of alpha- and beta-diversity in natural fish-farming related bacterial communities which are of relevance in biomonitoring.
Samples were collected from two salmon farm locations in Scotland, namely DUN located close to Oban and LIS located in Loch Linnhe. Sampling was conducted during the mid-production and peak-production period, respectively. Sediment was collected at three sites along a transect from an outer cage edge (CE) to a reference site (REF) in the direction of the prevailing current flow, located ca. 800 m (DUN) and 525 m (LIS) distant from the CE. An intermediate impact zone (Allowable Zone of Effect, AZE) was located at ca. 100 m (DUN) and 109 m (LIS) distance from the cage edge. This sampling design followed a decreasing organic enrichment gradient from the CE towards the REF sites, resulting from the deposits of feed and fish faeces on the sea floor (
Schematic representation of the sampling design. Benthic sampling of the two salmon aquaculture installations (DUN and LIS) occurred along a transect from the outermost cage edge of each farm, towards a reference site (REF). Distances of the Allowable Zone of Effect (AZE) and the REF sampling sites are determined according to compliance monitoring regulations separately for each farm prior to stocking. These distances depend on different parameters, such as strength of current and seabed topology. Distances of reference sites are chosen to be outside the influence of seabed depositions originating from aquaculture activities. The inset image (lower left corner) shows the RV Seol Mara of SAMS, Oban, while sampling the CE of DUN farm.
Following our previously described protocol (
As DNA metabarcodes for Illumina MiSeq sequencing, we obtained the ca. 450 bp long hypervariable V3-V4 region of the bacterial 16S rRNA gene using the Bact_341F (5’-CCTACGGGNGGCWGCAG-3’) and the Bact_805R (5’-GACTACHVGGGTATCTAATCC-3’) primer pair (
For Nanopore sequencing, we obtained the full length 16S rRNA gene using primer pair Bac27f (5’-AGAGTTTGATCMTGGCTCAG-3’) (
In both cases, V3-V4 region and full-length 16S rRNA gene, we obtained three technical PCR replicates for each of the two grab replicates per sampling site. Resulting PCR products were then purified using Qiagen’s MinElute PCR purification kit. All six PCR replicates of one sampling site (2 sediment grabs × 3 technical PCR replicates per grab) were then pooled in equal amounts of DNA prior to library constructions.
Illumina: From the resulting PCR products, sequencing libraries were constructed using the NEB Next® Ultra™ DNA Library Prep Kit for Illumina (NEB, USA). The quality of the libraries was assessed with an Agilent Bioanalyzer 2100 system. V3-V4 libraries were sequenced on an Illumina MiSeq platform, generating 2 × 300-bp paired-end reads. Illumina short-read amplicons are available in NCBI’s BioProject database under BioProject ID PRJNA768445.
Nanopore: Libraries for Nanopore sequencing were constructed using the Duplex-enabled Native Barcoding Kit 24 V14 (SQK-NBD114.24) (Oxford Nanopore Technologies, ONT), following the manufacturer’s manual. Sequencing of libraries (15 ng DNA total, measured with a Quantus Fluorometer (Promega) was performed on a MinION sequencing device (MIN-101B, ONT) and an R10.4.1 flow cell (FLO-MIN114, ONT). Nanopore sequence reads were deposited in NCBI’s BioProject database under BioProject ID PRJNA1076812.
Sequence data processing of Illumina R1 and R2 output files and of Nanopore pod5 output files were conducted on the high-performance computing cluster (HPC) of the RPTU Kaiserslautern-Landau.
Illumina: Raw sequence reads were quality filtered and trimmed by executing the dada2 (divisive amplicon denoising algorithm) workflow (
Nanopore: Pod5 files were subjected to duplex base calling using ONT’s Dorado duplex base-caller version 0.5.3 with the super-accuracy model (dna_r10.4.1_e8.2_400bps_sup@v4.2.0, https://github.com/nanoporetech/dorado) (
Taxonomic classification of Illumina-derived ASVs and Nanopore-derived OTUs was conducted using the 16S BugSeq pipeline (
For pattern matching between the Illumina-derived ASV-to-sample matrix and the Nanopore-derived OTU-to-sample matrix, we calculated standard measures of diversity. Data analyses were conducted in R v. 4.0.5 using the packages vegan (
For a more in-depth taxonomic comparison, we compared the top ten (in terms of sequence reads) bacterial families and genera obtained from both sequence datasets.
The evolution of Nanopore sequence data from raw reads (pod5) per sample to high quality reads which could be taxonomically assigned to the domain Bacteria is shown in Table
In case of the Illumina dataset (Table
This table shows the loss of Nanopore sequence reads from the original pod5 output to the per-sample high-quality reads with taxonomic classification to the domain Bacteria.
Sample | n total reads | n simplex reads | n duplex reads | Duplex reads per barcode | n reads passing length filter (1400-1600 bp) | n reads ≥ Phred30 | n reads after taxonomic assignment** |
---|---|---|---|---|---|---|---|
LIS_CE | 5,199,645* | 3,243,488* | 1,956,157* | 27,654 | 20,566 | 12,954 | 11,512 |
LIS_AZE | 103,534 | 81,262 | 32,188 | 28,234 | |||
LIS_REF | 36,259 | 28,529 | 12,938 | 10,986 | |||
DUN_CE | 77,133 | 61,412 | 26,329 | 23,687 | |||
DUN_AZE | 141,389 | 104,983 | 45,583 | 41,580 | |||
DUN_REF | 52,399 | 35,399 | 17,549 | 15,495 | |||
Control | 280 | 16 | 8 | 0 |
This table shows the loss of Illumina sequence reads from the original Illumina R1/R2 fastq output to the per-sample high-quality reads with taxonomic classification to the domain Bacteria.
Sample | R1/R2 output reads | Paired reads with Phred score ≥ 30 | n reads with taxonomic classification* | n reads with taxonomic classification, singletons omitted** |
---|---|---|---|---|
LIS_CE | 187,682 | 154,039 | 102,574 | 102,519 |
LIS_AZE | 182,486 | 139,241 | 70,186 | 70,019 |
LIS_REF | 155,450 | 127,725 | 60,712 | 60,607 |
DUN_CE | 146,900 | 114,526 | 69,473 | 69,406 |
DUN_AZE | 185,666 | 137,589 | 81,482 | 81,361 |
DUN_REF | 180,007 | 133,086 | 64,324 | 64,124 |
The AZTI Marine Biotic Index (AMBI) obtained from the LIS samples in the framework of a compliance monitoring survey resulted in the following finding. Environmental quality was similarly high at LIS_ REF (AMBI: 2.0 for replicate 1 and 1.8 for replicate 2) and LIS_AZE (AMBI: 2.3 for replicate 1 and 2.2 for replicate 2), while LIS-CE was notably more impacted (AMBI: 5.7 for replicate 1 and 5.8 for replicate 2).
The relative trend in Shannon diversity was largely congruent between the Illumina-derived ASV and the Nanopore-derived OTU datasets (Table
The Nanopore and the Illumina datasets showed full agreement when matching their abundance-based Bray-Curtis beta-diversity patterns on an OTU and ASV basis, respectively (Fig.
When collating the OTU and ASV matrices on the taxonomic ranks family, genus and species, the pattern described above for ASVs and OTUs remained the same in case of the Nanopore dataset (Figs
While in case of the Nanopore dataset, sufficient bacterial species with high read abundances (> 0.5%) were identified (between 28 in samples LIS_CE and DUN_CE and 39 OTUs in sample DUN_REF, Table
The beta-diversity results obtained for the incidence-based Jaccard index were identical to the ones described above for the abundance-based Bray-Curtis index (Suppl. material
Hierarchical clustering (based on Bray Curtis index as a measure of beta diversity) as a measure of beta diversity, based on Nanopore-obtained amplicons (left panel) and Illumina-obtained amplicons (right panel) for the two salmon aquaculture installations DUN and LIS. The middle panel shows the (taxonomic) units on which the beta-diversity analyses are based. Colour and shape coding of samples helps visualisation and interpretation of data. AZE = Allowable Zone of Effect; CE = Cage Edge; REF = unimpacted Reference site.
On average, 86% (± 2.8%) of the high-quality Nanopore amplicon reads could be assigned to the taxonomic rank of bacterial family, whereas this was only 52% (± 7.8%) of the high-quality Illumina dataset (Table
Seven out of the ten most abundant Nanopore-derived families (combined over all six samples) were also amongst the ten most abundant bacterial families of the Illumina dataset, albeit with distinct relative abundances (Fig.
While Sedimenticolaceae and Sandarinaceae were also present in the Illumina dataset, albeit not amongst the ten most abundant families, Thiotrichaceae entirely escaped Illumina detection (Fig.
Comparing the ten genera with the most abundant reads in both sequence datasets (Fig.
With the exception of Desulfosarcina, none of the other five Illumina-derived top ten genera was recorded with the Nanopore dataset. Only one genus (Thiogranum) which had a sequence read abundance of > 0.5% in the Nanopore dataset escaped detection with the Illumina dataset (Fig.
Proportions of Nanopore and Illumina sequence reads that could be assigned to the taxonomic ranks family, genus and species. Proportional numbers refer to the proportion of all reads of an individual sample that could be assigned to each of these taxonomic ranks regardless of the sequence read abundance assigned to an individual taxon and to taxa which had a read abundance of at least 0.5% (= ”high abundant” taxa).
Sequencing platform | Sample | % reads assigned to rank family | % of families which have a read abundance of > 0.5% | % reads assigned to rank genus | % of genera which have a read abundance of > 0.5% | % reads assigned to rank species | % of species which have a read abundance of > 0.5% |
---|---|---|---|---|---|---|---|
Nanopore | LIS_CE | 88 | 84 | 81 | 69 | 59 | 72 |
Nanopore | LIS_AZE | 86 | 54 | 82 | 49 | 71 | 46 |
Nanopore | LIS_REF | 83 | 75 | 83 | 65 | 68 | 76 |
Nanopore | DUN_CE | 89 | 65 | 84 | 55 | 59 | 49 |
Nanopore | DUN_AZE | 90 | 48 | 85 | 46 | 62 | 34 |
Nanopore | DUN_REF | 83 | 49 | 80 | 43 | 72 | 66 |
Illumina | LIS_CE | 63 | 9 | 61 | 6 | 19 | 3 |
Illumina | LIS_AZE | 47 | 13 | 45 | 5 | 12 | 1 |
Illumina | LIS_REF | 44 | 12 | 42 | 5 | 11 | 0 |
Illumina | DUN_CE | 58 | 11 | 56 | 6 | 18 | 2 |
Illumina | DUN_AZE | 56 | 11 | 53 | 5 | 18 | 2 |
Illumina | DUN_REF | 45 | 13 | 43 | 4 | 12 | 1 |
The ten most abundant (in terms of sequence reads) bacterial families and their relative abundances obtained from the Nanopore and Illumina datasets.
Bacterial families which were detected exclusively with Nanopore or with Illumina sequencing. The violin-and-box-plots show the relative abundances. Violins (coloured areas) show the relative abundance distribution across the individual samples. Boxes show median, 25%- and 75%-quartiles and min-max values.
The ten most abundant (in terms of sequence reads) bacterial genera and their relative abundances obtained from the Nanopore and Illumina datasets.
Bacterial genera which were detected exclusively with Nanopore or with Illumina sequencing. The violin-and-box-plots show the relative abundances. Violins (coloured areas) show the relative abundance distribution across the individual samples. Boxes show median, 25%- and 75%-quartiles and min-max values.
Our results witness a notably improved data quality of Nanopore sequences compared to previous studies (e.g.
The proportion of duplex reads obtained in our study was 36% (Table
In addition, while
Both sequencing protocols produced largely consistent alpha- and beta-diversity results. This agrees with other studies which also observed consistent ecological trends in 16S rRNA gene data obtained from the two sequencing platforms (e.g.
Both LIS_REF and LIS_AZE had AMBI values, obtained from macroinvertebrate-based compliance monitoring of the LIS farm, which indicated a good ecological status (
When collating sequence data across the taxonomic levels family and genus, the branching of samples within the Illumina-derived high-impact clusters changed compared to all other beta-diversity analyses. Short hypervariable 16S rRNA gene regions often lack a differentiation between genera of the same family and, likewise, between species of the same genus (
The low taxonomic resolution of short Illumina reads is furthermore of disadvantage regarding the effect of sequencing errors on taxonomic assignment accuracy. A MiSeq-platform inherent weakness is the accumulation of substitution errors (
Different protocols for sequence generation may distort relative abundance comparisons, in particular due to PCR primer bias and stochasticity (
In addition, differences in relative read abundances typically affect ecological trends inferred from Nanopore and Illumina sequencing of the same bacterial (mock and natural) communities (e.g.
Nanopore amplicon duplex sequencing is a very promising alternative to Illumina sequencing to analyse bacterial community structures in sediments collected from salmon aquaculture installations. Ecological trends inferred from Nanopore- and Illumina-derived bacterial OTUs/ASVs are highly similar. Taxonomy-based ecological analyses of Nanopore 16S rRNA gene data are more reliable and trustworthy than the ones obtained with short Illumina reads of hypervariable 16S rRNA gene regions. This is in particular due to: (i) the higher robustness of the long Nanopore reads to errors compared to the short Illumina reads when the same Phred quality filter is applied and (ii) the high taxonomic resolution power of the (near) complete 16S rRNA gene. The obtained results of this pilot study provide confidence to move to the next step: the assessment of Nanopore amplicon duplex sequencing the performance, robustness and accuracy in compliance monitoring of aquaculture installations. As accomplished previously to assess the performance of Illumina sequencing in compliance monitoring (
The R scripts used in this study were deposited in GitHub (https://github.com/verubel/CoastMon/tree/main/Nanopore_vs_Illumina).
The authors thank Thomas A. Wilding the crew of the R/V Seol Mara and Gail Twigg (SAMS, Oban, Scotland) for collecting samples at the DUN site and Jason Dobson (Scottish Sea Farms Limited) for sample collection at the LIS site. Thanks also to Sheena Gallie and Kate MacKichan (both formerly Scottish Sea Farms Limited) and Iain Berrill of Salmon Scotland for participating in this project and enabling the sampling of the salmon farms. The authors thank both reviewers for their constructive and helpful comments to improve an earlier version of this manuscript.
The authors have declared that no competing interests exist.
No ethical statement was reported.
The research leading to these results received funding from the Deutsche Forschungsgemeinschaft (DFG grant STO414/15-2).
Conceptualization: TS. Data curation: TS. Formal analysis: TS, SNK, HWB, VR. Funding acquisition: TS. Investigation: TS. Methodology: TS. Project administration: TS. Validation: VR, TS, SNK. Visualization: SNK. Writing - original draft: TS. Writing - review and editing: HWB, VR, SNK.
Thorsten Stoeck https://orcid.org/0000-0001-5180-5659
Verena Rubel https://orcid.org/0000-0001-9630-9050
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Incidence-based Jaccard diversity index
Data type: pptx
Explanation note: Incidence-based Jaccard diversity index for all samples included in the study, based on OTUs (Nanopore) and ASVs (Illumina, as well as on taxonomic ranks bacterial family, genus and species (the latter for Nanopore data only).