Software Description |
Corresponding author: Daniela Beisser ( daniela.beisser@uni-due.de ) Academic editor: Thorsten Stoeck
© 2023 Aman Deep, Dana Bludau, Marius Welzel, Sandra Clemens, Dominik Heider, Jens Boenigk, Daniela Beisser.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Deep A, Bludau D, Welzel M, Clemens S, Heider D, Boenigk J, Beisser D (2023) Natrix2 – Improved amplicon workflow with novel Oxford Nanopore Technologies support and enhancements in clustering, classification and taxonomic databases. Metabarcoding and Metagenomics 7: e109389. https://doi.org/10.3897/mbmg.7.109389
|
Sequencing of amplified DNA is the first step towards the generation of Amplicon Sequence Variants (ASVs) or Operational Taxonomic Units (OTUs) for biodiversity assessment and comparative analyses of environmental communities and microbiomes. Notably, the rapid advancements in sequencing technologies have paved the way for the growing utilization of third-generation long-read approaches in recent years. These sequence data imply increasing read lengths, higher error rates, and altered sequencing chemistry. Likewise, methods for amplicon classification and reference databases have progressed, leading to the expansion of taxonomic application areas and higher classification accuracy. With Natrix, a user-friendly and reducible workflow solution, processing of prokaryotic and eukaryotic environmental Illumina sequences using 16S or 18S is possible. Here, we present an updated version of the pipeline, Natrix2, which incorporates VSEARCH as an alternative clustering method with better performance for 16S metabarcoding approaches and mothur for taxonomic classification on further databases, including PR2, UNITE and SILVA. Additionally, Natrix2 includes the handling of Nanopore reads, which entails initial error correction and refinement of reads using Medaka and Racon to subsequently determine their taxonomic classification.
Amplicon sequencing, Amplicon Sequence Variants, community profiling, metabarcoding, microbiome, Operational Taxonomic Units, Snakemake workflow, ultra-long reads
Analyzing nucleotide sequences of specific prokaryotic or eukaryotic DNA regions is the fundamental mechanism for advanced understanding of their biodiversity and biogeography. Amplicon sequencing of marker genes extracted from environmental samples can answer questions concerning presence, absence and even (relative) abundance of specific species or community composition. Due to constantly increasing demands, sequencing has developed rapidly in the recent decades. The cost and time intensive Sanger sequencing marks the beginning with further development to high-throughput sequencing like Illumina technologies to the latest real-time sequencing platform from Oxford Nanopore Technologies (ONT). Regardless of sequencing technology, raw sequencing reads need to be processed in multiple steps and clustered into taxonomically assigned sequence representatives for further analysis. Despite numerous available tools for each step, there are just few all-in-one and user-friendly workflows (
For Illumina amplicon data, Natrix is one of few efficient workflows for read processing, OTU or ASV clustering and assigning amplicon sequencing reads to taxonomy, with an adjustable workflow system (
However, sequencing platforms undergoing a constant development, thus adaptations to new sequencing technologies are required. One of the latest technologies, Nanopore, is capable of producing read lengths of more than 800,000 base pairs (
Natrix2, was thus extended to meet the above mentioned demands. On the one hand, it now includes specific pipeline options exclusively for Nanopore sequences. The automatic identification, reorientation and trimming of Nanopore reads were integrated, as well as Naopore specific error correction and clustering. On the other hand, clustering and taxonomic classification was improved for Illumina sequences providing further clustering options and additional databases for other marker genes. General improvements include the restructuring of input and output files, error checking and a detailed description and how-to of a complete workflow including example sequences and configuration files on GitHub (https://github.com/dbeisser/Natrix2).
In the new version of Natrix, Natrix2, four major improvements have been integrated compared to the previous version (Fig.
Schematic representation of the Natrix2 workflow. The processing of two split samples using AmpliconDuo is depicted. The color scheme represents the main steps, dashed lines outline the OTU and dotted edges outline the ASV variant of the workflow. Stars depict updates to the original Natrix workflow. Details on the ONT part are depicted in Fig.
As an alternative to the already contained Swarm clustering algorithm (
In addition to BLAST searches used in the previous version of Natrix, the ‘classify.seqs’ function from the open-source mothur package was added to assign a taxonomy from a specific database defined in the configuration file (
As the first version of Natrix was designed for Illumina sequencing reads only, support for processing of Nanopore long-reads was added (Fig.
Schematic diagram of processing nanopore reads with Natrix2 for OTU generation and taxonomic assignment. The color scheme represents the main steps of this variant of the workflow. (created with BioRender.com).
With the upgraded version of Natrix, processing of Nanopore short and long sequencing reads, including orientation, trimming, clustering and error correction, is possible. In addition, Illumina and Nanopore reads can now be taxonomically assigned via mothur and the accuracy of OTU clustering is enhanced via mumu post-clustering. Optionally, VSEARCH can now be used for clustering Illumina reads. The implementation of PR2 and UNITE as new databases makes Natrix2 a reliable tool for diverse metabarcoding approaches and now offers processing of sequences originating from other organismic groups like fungi, metazoa and plants or further marker genes like ITS.
Title: Natrix2 – Improved amplicon workflow with novel Oxford Nanopore Technologies support and enhancements in clustering, classification and taxonomic databases.
Study area description: Amplicon sequence analysis.
Download page: https://github.com/dbeisser/Natrix2.
Programming language: Snakemake, Python, R, Bash.
Licence: MIT Licence.
We acknowledge support by the Open Access Publication Fund of the University of Duisburg-Essen.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This study was performed as part of the Collaborative Research Center (CRC) RESIST and analyses were performed by Project A04 (AD and DBe), funded by the German Research Foundation (DFG) – CRC 1439/1; project number 426547801.
Conceptualization: MW, DH, JB, DBe. Formal analysis: SC, DBl, AD. Methodology: AD, DBl, SC, DBe. Supervision: JB, DBe. Validation: AD. Visualization: AD, DBl. Writing – original draft: AD, DBl, DBe. Writing – review and editing: DBl, DH, JB, AD, SC, MW, DBe.
Aman Deep https://orcid.org/0000-0001-7321-864X
Dana Bludau https://orcid.org/0009-0003-3982-3178
Marius Welzel https://orcid.org/0000-0002-4946-2156
Sandra Clemens https://orcid.org/0000-0002-9710-1152
Dominik Heider https://orcid.org/0000-0002-3108-8311
Jens Boenigk https://orcid.org/0000-0001-8858-8889
Daniela Beisser https://orcid.org/0000-0002-0679-6631
All of the data that support the findings of this study are available in the main text.