Primer Validation |
|
Corresponding author: Ashrenee Govender ( agovender@ori.org.za ) Academic editor: Andrew R. Mahon
© 2025 Saiesha Harpal, Johan Groeneveld, Sandi Willows-Munro, Jenny Huggett, Ashrenee Govender.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Harpal S, Groeneveld J, Willows-Munro S, Huggett J, Govender A (2025) The design and testing of taxon-specific mini-barcode markers for metabarcoding of marine zooplankton. Metabarcoding and Metagenomics 9: e161301. https://doi.org/10.3897/mbmg.9.161301
|
DNA metabarcoding of zooplankton samples is well suited to surveys of marine pelagic ecosystems; however, consistent amplification across zooplankton groups cannot be assumed when using “universal” miniCOI primers. Under-representation of some groups may result from primer bias or inadequate coverage in barcode reference libraries. We designed taxon-specific mini-barcode primers to improve the representation of copepods, euphausiids, chaetognaths, and hydrozoans in metabarcoding outputs of mixed zooplankton samples from the Southwest (SW) Indian Ocean. In silico analyses of downloaded sequences per group identified the most informative mini-barcode region, and barcode gap analyses confirmed that the selected regions could distinguish among species within each group. In vitro analyses of DNA extracts from individual specimens per group showed that the new mini-barcode primers outperformed the standard and universal miniCOI primers by consistently recovering higher Phred quality scores. Metabarcoding of four in situ mixed zooplankton samples collected with plankton tow nets and processed with a combination of taxon-specific and universal primers identified 220 species. Use of the primer cocktails increased the proportionate representation of the target groups by three- to five-fold compared to a previous study. Read counts were dominated by copepods and euphausiids, implying that they had the highest relative biomass in samples. We conclude that a combination of universal and taxon-specific primers in metabarcoding assays will achieve a more comprehensive assessment of biodiversity by enhancing species richness estimates across different groups.
Biodiversity surveys, marine pelagic ecosystems, metabarcoding, mini-barcode primers
Metabarcoding is a powerful tool for tracking marine zooplankton diversity as a biologically essential ocean variable (EOV) of pelagic ecosystems (
The mitochondrial cytochrome c oxidase subunit I (COI) region is the preferred marker for barcoding of marine zooplankton (
Most high-throughput sequencing (HTS) platforms used for metabarcoding (e.g., Illumina MiSeq) have limited read lengths, necessitating the use of mini-barcode primers (shorter gene fragments of 100–350 base pairs [bp]) instead of Folmer’s original barcode of ~650 bp (
Several holoplanktonic taxa (copepods, euphausiids, and chaetognaths) as well as hydrozoans (comprising both holoplanktonic and meroplanktonic species) were abundant in visual analyses of zooplankton samples collected with plankton nets off eastern South Africa (SW Indian Ocean) but were under-represented in subsequent metabarcoding outputs (
The aims of this study were to: (1) design taxon-specific mini-barcode primers from the COI gene region for Copepoda (copepods), Euphausiacea (krill), Chaetognatha (arrow worms), and Hydrozoa (medusozoan Cnidaria) using in silico methods; (2) test the performance of taxon-specific primers against the standard (~650 bp) COI and the miniCOI primers for zooplankton (
Taxon-specific mini-barcode primers were designed following
Sequences were aligned separately for each of the four datasets using Clustal X2.1 (
Based on the SWAN analysis, a total of 42 potential mini-barcode fragments were selected for each dataset. Maximum likelihood (ML) analysis was conducted on potential mini-barcode fragments (comparison trees) and full-length sequence alignments (reference trees) for each dataset using Garli 0.951 (
DNA barcode gap analyses were conducted to confirm that the selected regions would be able to distinguish between species. The K2P nucleotide substitution model in MEGA 6.0 (
To evaluate the effectiveness of the mini-barcode primers for copepods (271 bp), euphausiids (310 bp), chaetognaths (283 bp), and hydrozoans (266 bp) (Table
Primers used in this metabarcoding study (first round PCR): each of the COI primer combinations amplifies different fragments of the COI-5P gene region. *Primers were designed for this study.
| Fragment | Primer name | Sequence (5’ - 3’) | Direction | Reference | Fragment size |
|---|---|---|---|---|---|
| Copepoda* | CopeMiniF | GTW ATR CCW ATT TTA ATT GGR GG | F | This study | 271 bp |
| CopeMiniR | CCT ARA AKA GAA CTA ACT CCT GC | R | |||
| Euphausiacea* | EuphausMiniF | CGA GCT GAA YTA GGW CAM CCA GG | F | This study | 310 bp |
| EuphausMiniR | GCT CCW GCA TGW GCA ATT CCW GC | R | |||
| Chaetognatha* | ChaetoMiniF | CCY ACT ATA ATR GGR GGG TTT GG | F | This study | 283 bp - 403 bp |
| ChaetoMiniR1 | GTA GTR ATR AAA TTW GCW GAT CC | R | |||
| ChaetoMiniR2 | GTR ATA GCY CCT GCT ART ACA GG | R | |||
| Hydrozoa* | HydroMiniF | GCC WGT WTT AAT WGG WGG TTT TGG | F | This study | 266 bp |
| HydroMiniR | CCC ATW ATW GAW GAA GCW CCW GC | R | |||
| COI_Leray | mlCOIintF | GGW ACW GGW TGA ACW GTW TAY CCY CC | F |
|
313 bp–319 bp |
| HCO2198 | TAA ACT TCA GGG TGA CCA AAA AAT CA | R |
|
||
| COI_Fish | mlCOIintF | GGW ACW GGW TGA ACW GTW TAY CCY CC | F |
|
313 bp–319 bp |
| HCO2198 | TAA ACT TCA GGG TGA CCA AAA AAT CA | R |
|
||
| FishR2 | ACT TCA GGG TGA CCG AAG AAT CAG AA | R |
|
||
| COI_Lobster | LobsterMinibarF | GGW GAT GAY CAA ATT TAY AAT G T | F |
|
230 bp |
| LobsterMinibarR | CCW ACT CCT CTT TCT ACT ATT CC | R |
PCR reactions (25 μl) contained 20 ng/μl genomic DNA, 12.5 μl OneTaq Quick-Load Master Mix (1X, BioLabs, New England), 0.50 μl forward and reverse primer (10 nmol/L), 7.5 μl sterile nuclease-free water, 1 μl MgCl2 (25 μmol/L), and 1 μl bovine serum albumin (BSA; 1 mg/ml). For primer cocktails, the 0.50 μl primer volume was divided by the number of primers for each forward and reverse reaction; for example, for two reverse primers, 0.25 μl was added for each. Thermal cycling for COI and the newly designed mini-barcode primers comprised an initial denaturation at 94 °C for 2 minutes, 35 cycles of denaturation at 94 °C for 30 seconds, annealing at different primer-specific temperatures for 30 seconds, and extension at 68 °C for 1 minute. The final extension was at 68 °C for 5 minutes. Annealing temperatures were 50 °C for the COI primer, 52 °C for copepods, 56 °C for both euphausiids and hydrozoans, and 53 °C for chaetognaths. A “touchdown” PCR was used for the miniCOI barcode (
PCR products were visualized on a 2% (w/v) TBE agarose gel containing SafeView™ Classic (Applied Biological Materials Inc., Cat. No. G108). Amplicon size was determined using a 100 bp molecular weight marker (BioLabs, New England). PCR bead clean-up and Sanger sequencing were performed at the Central Analytical Facilities (CAF) at the University of Stellenbosch (South Africa). Sequences were edited using Geneious Prime 2025.0.3, and the percentage of nucleotides with Phred quality scores of at least 30 was calculated for each sequence. Phred scores indicate the probability that a nucleotide has been correctly identified, serving as a standard measure of sequence quality. A Phred score of 30 corresponds to a 1 in 1000 chance of an incorrect base call and is commonly used as a threshold for high-quality sequence data. The nucleotide BLAST tool (BLASTn) on GenBank and BOLD was used for species identification, with a 97% sequence identity threshold. Sequences generated using the newly designed primers were translated into amino acid sequences using the ExPASy Translate tool (https://web.expasy.org/translate/) and screened for stop codons to verify that nuclear mitochondrial insertions (numts) were not inadvertently amplified. The resulting amino acid sequences were subsequently queried against GenBank using the protein BLAST tool (BLASTp) to confirm that only mitochondrial genes were amplified.
To evaluate the versatility of the newly designed primers (Table
| Library | Read count | Merged reads | Total amplicon sequence variants (ASVs) | Species at 97% |
|---|---|---|---|---|
| Oblique 1 | 155538 | 23331 | 400 | 113 |
| Oblique 2 | 140476 | 24356 | 433 | 115 |
| Vertical 1 | 186622 | 35351 | 522 | 129 |
| Vertical 2 | 229448 | 43411 | 537 | 111 |
| Total across nets | 712084 | 126449 | 1456 | – |
Whole zooplankton samples were preserved in 97% ethanol during sampling, with the ethanol replaced after 24 hours for long-term storage. In the laboratory, individual samples were homogenized in the 97% ethanol solution for 1 minute using a consumer blender (Milex; 1500 W at 22,000 rpm). Between samples, the blender was washed to remove residual material and rinsed with a 10% bleach solution and 70% ethanol to degrade any remaining DNA. Three subsamples were taken from each homogenate to improve diversity estimates (
PCRs were performed in triplicate to reduce the effects of stochasticity, improve accuracy, and minimize bias and amplification errors (
All primers used the same thermal cycling program: initial denaturation at 98 °C for 30 seconds, denaturation at 98 °C for 10 seconds, annealing at different primer-specific temperatures for 30 seconds, and extension at 72 °C for 30 seconds. The final extension step was carried out at 72 °C for 4 minutes. Annealing temperatures were 46 °C for malacostracans, fishes, and universal primers. The annealing temperatures for the newly designed primers were as per the primer testing section above. All PCRs included a no-template negative control. PCR products were visualized on a 2% (w/v) TBE agarose gel containing SafeView™ Classic. Amplicon size was determined using a 100 bp molecular weight marker. The triplicate PCR products for each of the seven primer sets were pooled and quantified using a Qubit 2.0 Fluorometer (Life Technologies, California, USA), and each of the seven different amplicon pools was further consolidated into a single sample per tow-net haul to create four libraries (one per individual tow) with equimolar concentrations (5 ng/μl). Each library was cleaned using 1.8× Ampure XP purification beads (Beckman Coulter, High Wycombe, UK). Index PCR was performed using the Nextera XT Index Kit (Illumina, San Diego, USA). Thereafter, libraries were cleaned using 0.6× Ampure XP purification beads (Beckman Coulter) and quantified using the Qubit dsDNA High Sensitivity assay kit on a Qubit 4.0 instrument (Life Technologies). The four libraries were sequenced on the Illumina MiSeq platform using the MiSeq Nano Reagent Kit v.2 (500 cycles), following the protocols described by
The DADA2 algorithm (
The most informative region of the COI gene was identified for each of the four groups. The smaller SWAN window sizes (120–140 bp for copepods and hydrozoans, 100–140 bp for euphausiids, and 100–130 bp for chaetognaths) had higher mean K2P distance and lower zero non-conspecific values. Larger window sizes (260–300 bp for copepods, 270–300 bp for euphausiids, 220–300 bp for chaetognaths, and 250–300 bp for hydrozoans) showed better congruence of NJ trees and generated lower K- and R–F scores when compared with reference trees. The intraspecific K2P pairwise distances ranged from 0.00 to 0.14 for copepods, 0.00 to 0.13 for euphausiids, 0.00 to 0.35 for chaetognaths, and 0.00 to 0.16 for hydrozoans. The interspecific distances ranged from 0.00 to 0.67 for copepods, 0.00 to 0.32 for euphausiids, 0.00 to 0.52 for chaetognaths, and 0.00 to 0.79 for hydrozoans. DNA barcode gap analyses based on in silico data showed minimal overlap between intra- and interspecific K2P pairwise distances in all four groups for the selected mini-barcode regions (Fig.
Frequency distribution of intra- and interspecific pairwise K2P genetic distances calculated using the selected mini-barcode regions for (a) Copepoda, (b) Euphausiacea, (c) Chaetognatha, and (d) Hydrozoa. The frequency data (Copepoda = n/120000, Euphausiacea = n/1200, Chaetognatha = n/100, Hydrozoa = n/20000) was normalized to obtain a range between 0 and 1.
Primers were designed within conserved regions flanking the mini-barcode regions (Table
Amplification of mini-barcode primers against COI and miniCOI primers for voucher specimens of (a) Copepoda, (b) Euphausiacea, (c) Chaetognatha, and (d) Hydrozoa. Lane 1 shows the 100 bp molecular weight marker. Lanes 2–6 (a, b) and 2–5 (c, d) display PCR products from DNA amplified using universal COI primers. Lanes 8–12 (a, b) and 7–10 (c, d) show the same samples that were amplified with miniCOI barcode primers. Lanes 14–18 (a, b) and 12–15 (c, d) show amplification with taxon-specific mini-barcode primers. All PCRs included a no-template negative control in the test gel, although this is not reflected in the gel shown above.
In total, four zooplankton community libraries were sequenced with Illumina MiSeq. Sequencing was efficient, requiring minimal filtering for both forward and reverse reads during the merging of the paired-end reads for all four zooplankton libraries. For the four libraries, a total of 712,084 read counts were consolidated into 126,449 merged reads, of which 1,456 ASVs were available for analysis across all groups amplified. These ASVs were assigned to a species level at > 97% sequence similarity against reference sequences on BOLD or GenBank and subsequently collapsed into 220 species (Table
Species richness was greatest for copepods (80 spp.; 36%), followed by euphausiids (25 spp.; 11%), molluscs (24 spp.; 11%), fishes (23 spp.; 11%), hydrozoans (21 spp.; 10%), decapods (14 spp.; 6%), and chaetognaths (12 spp.; 5%; Fig.
The integration of DNA-based methods into aquatic monitoring practices is presently on the agenda of several national and international fora (
The final selected mini-barcodes ranged between 220 and 300 bp, with smaller window sizes performing statistically worse (higher mean K2P distances and lower congruence in NJ trees). This result is consistent with
Primer cocktails used in the in situ metabarcoding analysis identified 220 zooplankton species from four plankton net tows at 97% sequence similarity to published barcode records. Proportionately, copepods comprised 36% of the identified species, an increase from only 10% of species identifiable in a previous metabarcoding study from the same region (
Read (or sequence) counts were dominated by copepods (41%) and euphausiids (35%), implying that they had the highest relative biomass in samples. These two groups are typically the most abundant (copepods) or are often significantly larger (euphausiids) than most other marine zooplankton, with greater biomass in most oceans (
In conclusion, we designed and tested taxon-specific mini-barcode primers for key holoplanktonic groups (copepods, euphausiids, and chaetognaths) and hydrozoans (comprising both holoplanktonic and meroplanktonic species) to improve their representation in large-scale biodiversity surveys of marine pelagic ecosystems. The new mini-barcode primers consistently amplified the target COI region in each of the four groups, with high sequence quality compared to the standard COI and miniCOI primers. Combinations of taxon-specific primers used in metabarcoding analysis of bulk zooplankton samples increased the proportionate representation of these groups by 3–5 times, at a 97% sequence similarity threshold, compared to a previous study (
We gratefully acknowledge postgraduate funding for the first author (SH) from the National Research Foundation (NRF). Opinions expressed and conclusions arrived at are those of the authors and are not necessarily to be attributed to the NRF of South Africa. We thank the Department of Forestry, Fisheries and the Environment (DFFE) and the crew of the S.A. Agulhas II for sampling at sea.
The authors have declared that no competing interests exist.
No ethical statement was reported.
No use of AI was reported.
This work was supported by the National Research Foundation (NRF).
Conceptualization: AG, JG. Data curation: AG, SH, JG. Formal analysis: JG, SH, AG. Funding acquisition: AG, JG. Investigation: JG, AG. Methodology: JG, AG. Project administration: AG, JG. Resources: JH, AG. Software: AG. Supervision: SWM, AG, JG. Validation: AG, SH. Visualization: AG. Writing - original draft: SH. Writing - review and editing: AG, SWM, JG, JH.
Saiesha Harpal https://orcid.org/0000-0002-6628-1453
Johan Groeneveld https://orcid.org/0000-0002-9831-9073
Sandi Willows-Munro https://orcid.org/0000-0003-0572-369X
Jenny Huggett https://orcid.org/0000-0001-9315-8672
Ashrenee Govender https://orcid.org/0000-0002-2860-4610
All raw sequence reads used to perform analyses have been uploaded to NCBI under accession number PRJNA1274484.
Additional tables
Data type: docx
Explanation note: table S1. Table showing primer design parameters. table S2. List of the 220 species detected by metabarcoding, with percentage sequence similarity to barcode records on BOLD and GenBank.
Spreadsheet 1
Data type: xlsx
Spreadsheet 2
Data type: xlsx