Research Article |
Corresponding author: Mohammed Ahmed ( mohammed.ahmed@nrm.se ) Corresponding author: Melanie Sapp ( melanie.sapp@uni-duesseldorf.de ) Academic editor: Carmelo Andújar
© 2019 Mohammed Ahmed, Matthew Alan Back, Thomas Prior, Gerrit Karssen, Rebecca Lawson, Ian Adams, Melanie Sapp.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ahmed M, Back MA, Prior T, Karssen G, Lawson R, Adams I, Sapp M (2019) Metabarcoding of soil nematodes: the importance of taxonomic coverage and availability of reference sequences in choosing suitable marker(s). Metabarcoding and Metagenomics 3: e36408. https://doi.org/10.3897/mbmg.3.36408
|
For many organisms, there is agreement on the specific genomic region used for developing barcode markers. With nematodes, however, it has been found that the COI region designated for most animals lacks the taxonomic coverage (ability to amplify a diverse group of taxa) required of a metabarcoding marker. For that reason, studies on metabarcoding of nematodes thus far have utilized primarily regions within the highly conserved 18S ribosomal DNA. Two popular markers within this region are the ones flanked by the primer pairs NF1-18Sr2b and SSUF04-SSUR22. The NF1-18Sr2b primer pair, especially, has been critiqued as not being specific enough for nematodes leading to suggestions for other candidate markers while the SSUF04-SSUR22 region has hardly been tested on soil nematodes. The current study aimed to evaluate these two markers against other alternative ones within the 28S rDNA and the COI region for their suitability for nematode metabarcoding. The results showed that the NF1-18Sr2b marker could offer wide coverage and good resolution for characterizing soil nematodes. Sufficient availability of reference sequences for this region was found to be a significant factor that resulted in this marker outperforming the other markers, particularly the 18S-based SSUFO4-SSUR22 marker. None of the other tested regions compared with this marker in terms of the proportion of the taxa recovered. The COI-based marker had the lowest number of taxa recovered, and this was due to the poor performance of its primers and the insufficient number of reference sequences in public databases. In summary, this study highlights how dependent the success of metabarcoding is on the availability of a good reference sequence collection for the marker of choice as well as its taxonomic coverage.
barcoding, taxonomy, DNA marker, nuclear, mitochondrial, reference database, primer
Fundamental to any DNA sequence-based identification method is the choice of barcode marker(s) (
Within the mitochondrial DNA, the cytochrome c oxidase subunit I (COI) protein coding gene has been the most widely used region, especially for DNA-barcoding of animals. Most studies involving insects and birds have utilized a region of this gene (
Despite its success as a barcode marker for most animals, attempts to utilize COI for some nematodes have not been successful for a number of reasons (
Besides being used for discriminating species of certain genera of nematodes (
The most widely used markers for metabarcoding to date have been ones associated with the nuclear ribosomal RNA gene repeats (rDNA) (
The 18S rDNA-based markers like all markers mentioned here have certain limitations. Aside from the fact that some 18S rDNA markers lack the resolution to distinguish certain species of nematodes, the primers used for amplification are often not specific. Using the primer pair described by
Two regions within the 18S rDNA are commonly used in metabarcoding studies involving nematodes. The first is a region amplified using the primer sets NF1-18Sr2b as used by
The taxa represented in the mock community were obtained either from pure cultures or soil samples from within the grounds of Fera Science Ltd. in Sand Hutton, York, UK (54.015514, -0.970281). For Meloidogyne hapla and Globodera rostochiensis, pure cultures of second stage juveniles were used. For Steinernema carpocapsae dauer larvae were used. For cultures of Trichodorus primitivus, Ditylenchus dipsaci and Laimaphelenchus penardi, adult females and/or males were used. Adult stages appropriate for reliable identifications were used for all taxa that were not kept in culture but obtained from soil samples. For most of the taxa, the Whitehead tray method (
Three replicates of artificial assemblages of nematodes were used as mock communities. For each replicate 23 different genera of known abundances were placed in Eppendorf tubes containing 20 µl of molecular grade water (MGW). The mock communities were assembled to consist of taxa spanning as much diversity across the phylum as possible. In total, 19 different families belonging to six orders within the phylum Nematoda were represented (Table
Nematode taxa included in the mock community, their families and abundances. Classifications are based on
Family | Species | GenBank Access. No | Abundance |
Alaimidae | Alaimus sp. | MG994936 | 2 |
Trichodoridae | Trichodorus primitivus | MG994943 | 1 |
Tripylidae | Tripyla glomerans | MG994928 | 2 |
Longidoridae | Longidorus caespiticola | MG994935 | 1 |
Longidoridae | Xiphinema diversicaudatum | MG994934 | 1 |
Aporcelaimidae | Aporcelaimellus sp. | MG994940 | 1 |
Mononchidae | Prionchulus punctatus | MG994945 | 2 |
Anatonchidae | Anatonchus tridentatus | MG994941 | 1 |
Plectidae | Anaplectus sp. | MG994930 | 1 |
Plectidae | Plectus sp. | MG993558 | 2 |
Neodiplogasteridae | Pristionchus sp. | MG994929 | 3 |
Anguinidae | Ditylenchus dipsaci | MG994937 | 3 |
Rhabditidae | Rhabditis sp. | MG994944 | 3 |
Steinernematidae | Steinernema carpocapsae | MG994932 | 12 |
Cephalobidae | Acrobeles sp. | MG994931 | 1 |
Cephalobidae | Acrobeloides sp. | Failed | 2 |
Tylenchidae | Tylenchus sp. | Too short | 3 |
Aphelenchoididae | Laimaphelenchus penardi | Not sequenced | 8 |
Aphelenchoididae | Aphelenchoides sp. | MG994938 | 2 |
Hemicycliophoridae | Hemicycliophora sp. | MG994927 | 3 |
Criconematidae | Criconema sp. | Failed | 1 |
Heteroderidae | Globodera rostochiensis | MG994942 | 10 |
Meloidogynidae | Meloidogyne hapla | Not sequenced | 7 |
DNA extraction
Extractions of DNA from the mock community replicates and the single specimens were performed using the Qiagen DNeasy Blood and Tissue Kit (Qiagen, Manchester, UK). All samples (single-specimen samples and the three mock community replicates) were placed in 1.5 ml microcentrifuge tubes containing 20 µl of MGW. The tubes were topped up to 180 µl by adding 160 µl of Qiagen ATL buffer, followed by 20 µl proteinase K before being incubated overnight at 56 °C. The lysed samples were further processed to obtain pure DNA according to the manufacturer’s instructions for genomic DNA extraction.
Molecular identification of single specimens using Sanger Sequencing
Sequences of single specimens for 21 of the taxa represented in the mock community were analyzed separately using the Sanger sequencing method for confirmation of their identities based on three distinct genomic regions. Each specimen was picked into a separate Eppendorf tube and sequences of three different regions were analyzed. These regions were a nearly complete 18S rDNA region, the D2-D3 segment of the 28S rDNA region and the COI region. Meloidogyne hapla and Laimaphelenchus sp. had previously been studied and identified and so did not require molecular confirmation.
Amplification of single specimen samples. For the 18S rDNA, an approximately 1800 bp long region was amplified as two overlapping fragments using two primer sets 988F-1912R and 1813F-2646R for the first and second fragments respectively (
The polymerase chain reaction (PCR) amplification of both fragments of the 18S rDNA region was carried out in 25 µl reactions containing, 5 µl template DNA, 12.5 µl of 2× BIO-X-ACT short mix (Bioline reagents Limited, London), 0.25 µM of each primer namely 988F (5ʹ-CTCAAAGATTAAGCCATGC-3ʹ) and 1912R (5ʹ-TTTACGGTCAGAACTAGGG-3ʹ) for the first fragment; 1813F (5ʹ-CTGCGTGAGAGGTGAAAT-3ʹ) and 2646R (5ʹ-GCTACCT GTTACGACTTTT-3ʹ) for the second fragment, and 6.3 µl MGW. The PCR conditions were 5 min at 95 °C; 5 cycles of (94 °C for 30 sec, 45 °C for 30 s and at 72 °C for 30 sec); 35 cycles of (94 °C for 30 sec, 54 °C for 30 s and 72 °C for 30 s); and a final extension for 5 min at 72 °C.
The D2-D3 segment of the 28S rDNA region was amplified using the primers D2Af and D3Br (
The 400 bp region of the COI gene was amplified using the JB3-JB5GED primers (
The PCR amplicons were purified using the QIAquick PCR Purification Kit (Qiagen) before being sent to Eurofins Genomics (https://eurofinsgenomics.eu) for sequencing using the same primers used for the PCR. The sequences obtained for single specimens are available from GenBank (
Analysis of Sanger Sequence data from single specimen samples. Sequences were received as both ABI and SEQ files. Both sequence file formats were visualized using BioEdit Sequence Alignment Editor (
Amplification and Library Preparation of Mock Community samples
For each target barcode marker, four separate PCRs were set up, one for each of the three replicates plus a blank sample spiked with MGW. The 5ʹ ends of each of the primers were tailed with Nextera adapter sequences (Table
Primers used for amplification of the target barcode markers. Underlined sections of the sequences represent the Illumina overhang adapters.
Primer | Sequence (from 5ʹ end) | Source |
Nex_NF1 | TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG GGTGGTGCATGGCCGTTCTTAGTT |
|
Nex_18Sr2b | GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTACAAAGGGCAGGGACGTAAT | |
Nex_SSUF04 | TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG GCTTGTCTCAAAGATTAAGCC |
|
Nex_SSUR22 | GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGCCTGCTGCCTTCCTTGGA | |
Nex_D3FA | TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG GACCCGTCTTGAAACACGGA |
|
Nex_D3BR | GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGCGGAAGGAACCAGCTACTA | |
Nex_JB3 | TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGTTTTTTGGGCATCCTGAGGTTTAT |
|
Nex_JB5GED | GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG AGCACCTAAACTTAAAACATARTGRAARTG |
|
Following the initial PCR reaction, the amplicons were all purified using Ampure XP Beads (Beckman Coulter, Inc. USA). The purified products were quantified using a Qubit® Fluorometer (Thermo Fisher Scientific, Wilmington, DE, USA). This was then followed by an index PCR where unique dual indices and the Illumina sequencing adapters were attached to each amplicon using Nextera XT index primers (Illumina, San Diego, CA, USA) for amplification (Illumina’s 16S Metagenomic Sequencing Library Preparation protocol). PCR was performed in 50 µl reactions containing 5 µl each of Nextera XT Index primers 1 and 2, 5 µl of template DNA, 1× HF buffer, 0.2 mM dNTPs, 1 mM MgCl2, 0.5 U Phusion polymerase and 22 µl MGW. The PCR programme was set at 98 °C for 3 min, 8 cycles of 98 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s and a final extension step at 72 °C for 5 min. A list of samples and the combination of indexes used are provided (Suppl. material
The indexed products were purified using Ampure XP Beads, quantified and pooled according to their molarity. After that, the pooled sample was run on an Agilent 2200 TapeStation system (Agilent Technologies, Santa Clara, CA, USA) to verify the size of the pooled amplicons. The pool was quantified and diluted to 4 nM concentration. The library was sequenced on an Illumina MiSeq using 2× 300 cycles V3 run kit.
Analysis of NGS data from mock community samples
Sequence analyses were performed using USEARCH version 8.1.1861 (
OTUs were assigned taxonomy based on the utax method within USEARCH. Details on the reference databases are described below. As an alternative to the utax approach, OTUs of each marker were assigned taxonomy using BLAST (
Proportion of reads assigned to nematode OTUs
Based on the results of the taxonomy assignments from the three approaches, the proportion of filtered reads assigned to nematodes in relation to the total number of reads was determined for each marker. The accuracy of the taxonomic assignment of each marker was determined by comparing the exactness with which each marker recovered the taxa in the mock community at the species or genus level. For sampled taxa whose species identities were known, accurate species identification was expected whereas for the rest, accurate genus identification was expected.
Reference databases for taxonomy
The reference library for assigning taxonomies to the OTUs generated from the two 18S rDNA markers was obtained from the Protists Ribosomal Reference database, PR2 v 4.72 (
For the 28S rDNA, reference sequences were obtained from the SILVA ribosomal RNA gene database (
A search through the BOLD database for nematode COI sequences revealed that only nine of the taxa included in the mock community had sequences available for comparison. Similar to the 28S rDNA references, sequences of nematode COI were obtained from GenBank (on 25th January 2018) using a command within the statistical assignment package (SAP 1.9.8) (
Availability of reference sequences in public databases
For each of the four markers, sequences within four different public databases were used to determine how many nematode sequences actually contain the entire length of region covered by the respective marker. Sequences were downloaded from NCBI nucleotide, SILVA, PR2 and BOLD databases (Table
Databases from which nematode sequences were downloaded for each of the four markers.
Database | NF1-18Sr2b | SSUF04-SSUR22 | D3Af-D3Br | JB3-JB5ED |
---|---|---|---|---|
NCBI nucleotide (18S, 28S rDNA, COI) | X | X | X | X |
SILVA (18S, 28S rDNA) | X | X | X | |
PR2 (18S rDNA) | X | X | ||
BOLD (COI) | X |
Taxonomic coverage and abundance prediction
The markers were compared on the basis of how well they predicted the mock community both qualitatively and quantitatively. The qualitative prediction was based on how many taxa in the mock community were recovered while the quantitative predictions were based on the coefficient of determination (r2) of the linear regression between the average relative read frequencies and relative abundances. Similarities in the abundance estimates between the replicates were also shown by the standard deviations of their relative read frequencies and their correlation coefficients.
BLAST searches were performed on the sequences of all three genomic regions that were analysed using the Sanger method against NCBI nucleotide database. This confirmed the morphological identifications of almost all the individuals. The only individuals whose identities could not be confirmed were the morphologically identified Criconema and Acrobeloides sp. (Table
Confirmed identities of individuals included in the mock communities. Sequences of the three DNA regions were analysed using the Sanger method. X denotes positive identification.
Samples | Morphology | 18S region | 28S region | COI region |
Specimen 1 | Hemicycliophora sp. | X | ||
Specimen 2 | Ditylenchus dipsaci | X | X | |
Specimen 3 | Aporcelaimellus sp. | X | X | |
Specimen 4 | Anatonchus tridentatus | X | X | |
Specimen 5 | Globodera rostochiensis | X | X | |
Specimen 6 | Trichodorus primitivus | X | X | |
Specimen 7 | Rhabditis sp. | X | X | |
Specimen 8 | Prionchulus punctatus | X | X | |
Specimen 9 | Criconema sp. | |||
Specimen 10 | Tripyla sp. | X | X | |
Specimen 11 | Pristionchus sp. | X | X | |
Specimen 12 | Anaplectus sp. | X | ||
Specimen 13 | Acrobeles sp. | X | ||
Specimen 14 | Acrobeloides sp. | |||
Specimen 15 | Steinernema carpocapsae | X | X | |
Specimen 16 | Plectus sp. | X | X | X |
Specimen 17 | Xiphinema diversicaudatum | X | X | |
Specimen 18 | Longidorus caespiticola | X | X | |
Specimen 19 | Alaimus sp. | X | X | |
Specimen 20 | Tylenchus sp. | X | ||
Specimen 21 | Aphelenchoides sp. | X | X |
The sequence reads were demultiplexed by the MiSeq Reporter software (MiSeq® Reporter Software Guide, Illumina, Inc., San Diego, CA, USA; Document # 15042295 v05) using default settings (allowing one mismatch in the indexes). Blank samples only yielded sequences of fungi and streptophyta. A summary of the number of reads generated for each marker from each of the three replicates is presented in Table
Number of sequence reads generated for each of the markers across the three mock community replicates with standard error of means of the replicate samples.
Samples | Number of reads | |||
---|---|---|---|---|
NF1-18Sr2b | SSUF04-SSUR22 | D3Af-D3Br | JB3-JB5GED | |
Replicate 1 (MC1) | 2,483,453 | 3,162,379 | 3,897,994 | 1,236,201 |
Replicate 2 (MC2) | 2,349,364 | 2,790,363 | 4,228,233 | 2,160,885 |
Replicate 3 (MC3) | 2,435,278 | 1,953,138 | 4,309,817 | 1,204,900 |
Standard error of mean | 39,216 | 357,585 | 125,899 | 377,501 |
The JB3-JB5 marker was the only region for which more than half (57%) of the paired reads were successfully merged. The marker with lowest percentage of merged reads was the SSUF04-SS0R22 (38%). Despite the low percentage of merged reads recovered for NF1-18Sr2b and D3Af-D3Br, the percentage of reads that passed the quality filtering step were much higher than that of the JB3-JB5ED marker (Table
Marker | Reads successfully merged (%) | Merged reads passing filtering (%) | Number of OTUs | Chimeric sequences |
---|---|---|---|---|
NF1-18Sr2b | 48.3 | 96.7 | 138 | 5,677 |
SSUF04-SSUR22 | 38.0 | 88.9 | 161 | 6,813 |
D3Af-D3Br | 43.1 | 98.4 | 144 | 3,295 |
JB3-JB5GED | 57.0 | 64.3 | 69 | 1,830 |
Major differences were observed for the coverage of the 18S rDNA-based markers, namely NF1-18Sr2b and SSUF04-SSUR22 across all three 18S rDNA databases (Figure
With the utax method, only those genera assigned with support (posterior probability) values of 0.5 or higher were considered valid in this study (Table
List of taxa recovered using the utax taxonomy assignment. For NF1-18Sr2b and SSUF04-SSUR22, the PR2 database was used as reference database and for D3Af-D3Br, combined nematode sequences from NCBI nucleotide database, sequences from this study and SILVA database were used. For JB3-JB5GED, sequences from NCBI nucleotide database, sequences from this study and the BOLD database were used.
NF1-18Sr2b | SSUF04-SSUR22 | D3Af-D3Br | JB3-JB5GED |
---|---|---|---|
Alaimus sp. | Rhabditida | ||
Anaplectus sp. | Tylenchida | ||
Aphelenchoides ritzemabosi | Aphelenchoides gorganensis | ||
Ditylenchus dipsaci | |||
Globodera | Globodera | Globodera ellingtonae | |
Hemicycliophora conida | Hemicycliophora wyei | ||
Laimaphelenchus penardi | |||
Longidorus | Longidorus | Longidorus macrosoma | |
Meloidogyne hapla | Meloidogyne hapla | ||
Prionchulus | |||
Pristionchus | |||
Rhabditis | Rhabditis | Rhabditis sp. | |
Steinernema | Steinernema | ||
Trichodorus primitivus | |||
Tripyla sp. | |||
Tylenchus arcuatus | |||
Xiphinema |
Using the utax method with the PR2 database, only eight OTUs of the SSUF04-SSUR22 marker were identified as nematodes, which accounted for only five of the sampled taxa (22%). The majority of the OTUs were not given taxonomic assignments, at least not with sufficient support for them to be considered valid.
For the 28S rDNA marker (D3Af-D3Br), only 22 of the total 144 OTUs were successfully assigned nematode identities and this accounted for eight of the sampled taxa (34%).
The COI marker (JB3-JB5GED) was the only marker for which no successful taxonomic assignments were achieved at the genus level. Only three OTUs were identified as nematodes and could only be correctly identified to the order rank. Two of the OTUs matched Rhabditida and the other one Tylenchida (according to the classification by
With the exception of a few, most of the recovered taxa occurred in all three replicates for all the markers (Suppl. material
The OTUs generated for each of the markers were used to perform a BLAST search against the NCBI Nucleotide Database on 16th July 2017. Only alignments with expect (E) values less than 0.001 were considered. The top hits were examined for matches that had complete taxonomies, and only matches with an identity ≥ 95 % were considered. Based on these criteria, all OTUs of the NF1-18Sr2b marker matched taxonomically assigned sequences in the NCBI nucleotide sequences. All sampled taxa were recovered with the BLAST method for NF1-18Sr2b marker, at least to the genus level (Table
List of taxa recovered based on BLAST searches. All searches were performed against the NCBI nucleotide database. Only taxonomic assignments appearing in top five hits and had similarities ≥ 95%, e value < 0.001 were considered.
NF1-18Sr2b | SSUF04-SSUR22 | D3Af-D3Br | JB3-JB5GED |
---|---|---|---|
Alaimus sp. | Alaimus sp. | Alaimus sp. | |
Anaplectus sp. | Anaplectus sp. | ||
Anatonchus tridentatus | Anatonchus tridentatus | Anatonchus tridentatus | |
Aphelenchoides ritzemabosi | Aphelenchoides ritzemabosi | ||
Aporcelaimellus obtusicaudatus | Aporcelaimellus obtusicaudatus | ||
Acrobeles sp. | Acrobeles complexus | ||
Acrobeloides sp. | Acrobeloides sp. | ||
Criconema sp. | |||
Ditylenchus dipsaci | Ditylenchus dipsaci | Ditylenchus dipsaci | |
Globodera rostochiensis | Globodera rostochiensis | Globodera sp. | |
Hemicycliophora conida | Hemicycliophora wyei | ||
Laimaphelenchus penardi | Laimaphelenchus deconincki | ||
Longidorus caespiticola | Longidorus caespiticola | Longidorus macrosoma | |
Meloidogyne hapla | Meloidogyne hapla | Meloidogyne hapla | Meloidogyne hapla |
Plectus andrassyi | Plectus sp. | ||
Prionchulus punctatus | Prionchulus punctatus | Prionchulus sp. | |
Pristionchus lheritieri | Pristionchus lheritieri | Pristionchus lucani | |
Rhabditis cf. terricola | Rhabditis cf. terricola | Rhabditis sp. | |
Steinernema carpocapsae | Steinernema carpocapsae | Steinernema carpocapsae | Steinernema carpocapsae |
Trichodorus primitivus | Trichodorus primitivus | Trichodorus primitivus | |
Tripyla glomerans | Tripyla glomerans | Tripyla sp. | |
Tylenchus arcuatus | Tylenchus naranensis | ||
Xiphinema sp. | Xiphinema diversicaudatum | Xiphinema diversicaudatum |
The NF1-18Sr2b-based tree placed most of the OTUs together with taxonomically assigned sequences from NCBI nucleotide database within the same clades (Figure
Maximum likelihood tree of the 18S rDNA-based NF1-18Sr2b OTUs and reference sequences from NCBI nucleotide database.
Maximum likelihood tree of the 18S rDNA-based SSUF04-SSUR22 OTUs and reference sequences from NCBI nucleotide database.
Maximum likelihood tree of the 28S rDNA-based D3Af-D3Br OTUS and reference sequences from NCBI nucleotide database.
The calculation of taxonomic coverage of the markers was based on how many of the sampled taxa were recovered by at least one of the three replicates. This was based on a consensus of the results of the taxonomy assignment via utax, BLAST and the phylogenetic analysis. The NF1-18Sr2b had the highest coverage, producing 100% recovery of the sampled taxa (Table
Taxa recovered by the markers in at least one of the replicates from the three taxonomy assignment methods used. The number of X indicates the number of replicates in which the taxon was detected. RefSeq denotes the availability of reference sequences for taxonomy assignment of NGS reads.
Taxa in mock community | NF1-18Sr2b | SSUF04-SSUR22 | D3Af-D3Br | JB3-JB5GED | ||||
---|---|---|---|---|---|---|---|---|
Coverage | RefSeq | Coverage | RefSeq | Coverage | RefSeq | Coverage | RefSeq | |
Alaimus sp. | X X X | Available | X X X | †Available | X X X | Available | ||
Trichodorus primitivus | X X X | Available | X X X | †Available | X X X | Available | ||
Tripyla glomerans | X X X | Available | X X X | †Available | X X X | Available | ||
Longidorus caespiticola | X X X | Available | X X X | *Available | X X X | Available | Available | |
Xiphinema diversicaudatum | X X X | Available | X X X | †Available | X X X | **Available | *Available | |
Aporcelaimellus sp. | X X X | Available | †Available | X X X | Available | |||
Prionchulus punctatus | X X X | Available | X X X | †Available | X X X | Available | ||
Anatonchus tridentatus | X X X | Available | X X X | †Available | X X X | **Available | ||
Anaplectus sp. | X X X | Available | X X X | †Available | X X X | Available | ||
Plectus sp. | X X X | Available | Available | X X X | *Available | **Available | ||
Pristionchus sp. | X X X | Available | X X X | Available | X X X | Available | ||
Ditylenchus dipsaci | X X X | Available | X X X | †Available | X X X | Available | ||
Rhabditis sp. | X X X | Available | X X X | †Available | X X X | Available | Available | |
Steinernema carpocapsae | X X X | Available | X X X | †Available | X X X | *Available | X X X | Available |
Acrobeles sp. | X X | Available | X X | Available | ||||
Acrobeloides sp. | X X X | Available | X X X | Available | X X X | Available | ||
Tylenchus sp. | X X X | Available | X X X | Available | ||||
Laimaphelenchus penardi | X X X | Available | X X X | Available | ||||
Aphelenchoides sp. | X X X | Available | X X X | Available | ||||
Hemicycliophora sp. | X X X | Available | X | Available | ||||
Criconema sp. | X X | Available | Available | |||||
Globodera rostochiensis | X X X | Available | X X X | †Available | X X X | *Available | ||
Meloidogyne hapla | X X X | Available | X X X | Available | X X X | Available | X X X | Available |
In the case of the SSUF04-SSUR22 marker, eight taxa were missing from all three assignment methods. The taxa that were recovered occurred in all three replicates. With all three methods of taxonomy assignment combined, the number of correctly assigned OTUs improved to 56. The proportion of the total reads that were accurately assigned to nematodes was 94.2%.
The 28S rDNA-based D3Af-D3Br marker assigned 70 OTUs to nematodes and recovered all taxa except Criconema in the consensus taxonomy. Amongst the recovered taxa, Hemicycliophora occurred in one of the replicates, Acrobeles in two, while the rest were found in all three replicates. The proportion of the filtered reads correctly assigned to nematodes with this marker was 95.5%.
For the COI-based JB3-JB5GED marker, even the consensus taxonomy drawn from all three assignment methods could only recover two taxa, namely Meloidogyne and Steinernema. Although the phylogenetic analysis included Longidorus in the assignment, it was discovered that OTU17 and the NCBI reference sequence KJ741245 Longidorus sp., which were clustered together had very low percentage similarity (81%), considering the 95% minimum set for the BLAST method. In general, the consensus taxonomies for all the markers were almost exactly as what the BLAST search produced. This is because all successful assignments made by utax against the references were also positive in the BLAST search against the nucleotide database, which detected even more taxa that were missing in the utax results. Even though only two genera could be recovered, a very high percentage of the filtered reads (92.8%) belonged to nematodes.
None of the four markers provided a signification correlation between relative read frequency and relative abundance of taxa in the mock community (Suppl. material
Comparison of the relative read frequencies and relative abundances of sampled taxa. Relative read frequencies are averages of the three replicates and error bars represent their standard deviations. Vertical axis represents proportion of the total number of reads or number of individuals. Blue bars represent relative read frequencies and orange bars represent relative abundance in the mock community.
Taxonomic coverage is crucial to any metabarcoding study. The ability of a marker to recover as many taxa as possible could easily be one of the main benchmarks for determining its suitability for metabarcoding. The main aim of this study was to evaluate the suitability of four widely used markers for metabarcoding of nematodes. Therefore, this discussion will focus on how the markers performed based on a consensus of all the assignment approaches rather than the differences in performance of the taxonomy assignment methods themselves. This subject is well covered in
Another marker whose poor coverage could be attributed to insufficient matching reference sequences was the 18S rDNA-based SSUF04-SSUR22. Given that this region is well conserved and that there is a large collection of reference sequences, particularly for nematodes, the failure to detect eight members of the mock community was quite surprising. However, the issue with this marker appears to be its location within the full-length 18S rDNA operon. As mentioned earlier, this marker is situated at the 5ʹ end of the 18S rDNA region and so unless the entire length of the 18S rDNA or this specific region are covered by a reference sequence, the reference is likely to not contain the homologous region for this maker. Although it has been used in a number of metagenetic studies involving meiofauna (
The D2-D3 expansion segment of 28S rDNA region may be the region besides the 18S rDNA region that has just the right amount of conservation and variability typical of a good metabarcoding marker. The region spanning these two high variability segments has also been the focus of phylogenetic studies for various groups of soil nematodes (
Unlike the SSUF04-SSUR22, the location of the NF1-18Sr2b marker within the 18S rDNA region puts it within the flanks of most sequences used for reconstructing 18S rDNA-based phylum-wide phylogeny of nematodes (
There are several important community indices used in ecological studies that depend on absolute or relative abundance of taxa in the nematode community. These include the maturity index (
As observed from the different taxonomy assignment methods, methods usually employed in analysis pipelines such as QIIME (
In summary, for metabarcoding of nematodes, this study has demonstrated that there are many reasons to favor the NF1-18Sr2b marker as the most suitable both in terms of coverage and ease of access to reference sequences. The issue of non-specificity of this marker, whilst a problem, can mostly be avoided by extracting nematodes from soil before DNA extraction to make sure most non-targets are excluded. According to
This study demonstrates how far a well curated nematode sequence database can go to facilitate the taxonomy assignment step of the analyses. A dedicated nematode database that is well curated by taxonomy experts will be helpful in eliminating the necessity for any further cross-check of uclust or utax-based taxonomy assignments. As stated earlier, the main rDNA reference databases, PR2 and SILVA, have a number of entries with incomplete taxonomies and thus make it a necessity for the assignments be checked. This process can be time consuming especially if there is a large number of OTUs to be checked. This may require collaborative work between nematode taxonomists and molecular biologists.
Finally, when making recommendations for appraisal and adoption of new barcode marker(s) other than the ones known and used so far, an important consideration that always has to be made is the availability of a comprehensive reference database. It will take a tremendous amount of work to develop new reference databases as comprehensive as that which exists now for the 18S or 28S rDNA region.
The authors wish to thank the European Phytosanitary Research Coordination (EUPHRESCO) for funding this research. We would also like to thank Erin Lewis and Ummey Hany for helping with the sequencing and Rachel Glover for providing advice on the bioinformatics.
Mock community sequence data for the different markers can be retrieved under study accession number PRJEB27581 (sample accession numbers ERS2593880–ERS2593883 for NF1-18Sr2b marker, ERS2593884–ERS2593887 for SSUF04–SSUR22 marker, ERS2593888–ERS2593891 for D3Af-D3Br marker and ERS2593892–ERS2593895 for JB3-JB5GED marker).
Tables S1–S6
Data type: species data
Custom python script
Data type: source code