Research Article |
Corresponding author: Francesco Martoni ( francesco.martoni@agriculture.vic.gov.au ) Corresponding author: Tongda Li ( tongda.li@agriculture.vic.gov.au ) Academic editor: Kelly D. Goodwin
© 2024 Francesco Martoni, James Buxton, Kathryn S. Sparks, Tongda Li, Reannon L. Smith, Lea Rako, Mark J. Blacket.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Martoni F, Buxton J, Sparks KS, Li T, Smith RL, Rako L, Blacket MJ (2024) A morphological and high throughput sequencing workflow to identify Australian ants (Hymenoptera, Formicidae): a new tool for biosecurity and biodiversity. Metabarcoding and Metagenomics 8: e130531. https://doi.org/10.3897/mbmg.8.130531
|
Ants are often the most ubiquitous and ecologically influential components of terrestrial systems, exhibiting an exceptionally high level of endemic diversity. Simultaneously, exotic ants can be amongst the most environmentally and economically devastating biological invaders. Distinguishing between exotic and related natives is essential for early detection and to reduce the chance of establishment, but at the species-level there remains a great deal of uncertainty among several of the most diverse, widespread and ecologically dominant genera. This taxonomic impediment negatively impacts the use of molecular techniques relying on reference databases, due to the poor state of ant DNA sequences in publicly available repositories, with many species not represented at all and others incorrectly identified. As a result, biosecurity screening for targeted exotic ants and assigning ant species-level identifications for biodiversity remain two separate areas of focus in Australia.
Here we propose a workflow for the identification of ants that enables simultaneous processing of large numbers of “bulk” ant samples each containing many individuals, increasing resolution for taxonomic assignment, and generating a curated database linked to voucher specimens. We use a non-destructive DNA extraction method and compare two high throughput sequencing (HTS) metabarcoding platforms – MiSeq (Illumina) and MinION (Oxford Nanopore Technologies) – processing up to 180 bulk mixed ant samples per run. This approach allowed for the acquisition of curated DNA sequences from voucher specimens morphologically examined by expert taxonomists. This work highlights the advantages and current limitations of DNA-based identifications, the needs for standardisation, as well as the importance of a taxonomy-based curation of DNA database for both biodiversity and biosecurity.
Entomological collections, metabarcoding, non-destructive, taxonomy
Ants are a group of insects of great importance for biosecurity. Invasive ants have the potential to cause substantial damage to ecosystems worldwide, especially in the Asia-Pacific Region (
During the past two decades, the DNA barcoding technique (
Ideally, an integrative taxonomic approach that includes a morphological component, and multiple gene regions is considered a way to overcome the apparent shortcomings of COI barcode-based delimitation (e.g.,
On the other hand, high throughput sequencing (HTS) techniques – such as metabarcoding – are ideally placed to process large volumes of samples, to simultaneously provide identifications for both target and non-target taxa. For the past decade, most studies focusing on detection of invertebrates for biosecurity using metabarcoding have relied on short-read sequencing platforms such as Illumina MiSeq (
However, long-read instruments could improve taxonomic resolution (
Irrespective of the sequencing platform used, however, the taxonomical assignment of specimens using HTS metabarcoding is often hindered by the limited availability and quality of DNA reference sequences on public databases. Piper and colleagues (2019) highlighted a number of practical issues arising from this. In fact, sequences may not only be unavailable for a high number of taxa, but the sequences that are present on public databases may include errors and mistakes, such as barcode sequences being insufficiently annotated (
The aim of this work was to focus on Australian ants, as a biosecurity model which requires dealing not only with a large number of samples, but also with numerous native/endemic species that are often understudied or poorly represented in public sequence databases. Our focus was to develop a workflow capable of utilising and preserving specimens and DNA sequence data that is often filtered out from biosecurity procedures because it is not associated with the target exotic pests. Ultimately, this work had the aims of: i) developing an identification workflow combining high throughput molecular sequencing with morphological examinations and high-resolution photographs of voucher specimens; ii) testing the upscaling of sample volume using a short-read metabarcoding pipeline for the identification of bulk-trapped ants; iii) testing the upscaling of sample volume using a long-read in-field compatible metabarcoding pipeline for the identification of bulk trapped ants, targeting a longer, more informative genetic fragment.
A total of 180 bulk ant samples were collected at various Australian ports of entry in New South Wales and Western Australia, using baited traps or hand-collected from the ground. Insects were instantly killed with high grade ethanol (95–100%) before a preliminary morphological identification was performed by biosecurity diagnosticians at ports of entry, aiming to exclude the presence of exotic pests (Suppl. material
The DNA extraction protocol used a Dneasy Blood and Tissue kit (Qiagen, Germany) and it is based upon the protocol outlined in
Polymerase Chain Reaction (PCR) amplification was conducted using primers modified by incorporating partial Illumina adapter sequences (in bold) into the primer pair fwhF2 (5’-ACACTCTTTCCCTACACGACGCTCTTCCGATCTGGDACWGGWTGAACWGTWTAYCCHCC-3’)- fwhR2n (5’-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCTGTRATWGCHCCDGCTARWACWGG -3’) (
Three separate MiSeq libraries were prepared for sequencing. The first MiSeq metabarcoding library (Run1) was prepared using the first 90 samples (ID: 1–90) while a second library (Run2) was prepared using the remaining 90 samples (ID: 91–180). Finally, a third library (Run3) was prepared by combining the first two libraries, therefore including all 180 samples.
For each library, PCR amplicons were diluted 10 times and used as template for a second round of real time PCR (rtPCR) to attach the remainder of the Illumina sequencing adapters and unique dual indexes to each sample, using Phusion High-Fidelity DNA Polymerase (New England Biolabs, Massachusetts, USA). PCR conditions were an initial denaturation of 30 seconds at 98 °C followed by 7 cycles of denaturation at 98 °C for 10 seconds, annealing at 65 °C for 30 seconds and elongation at 72 °C for 30 seconds. Amplicons were purified and normalized using a SequalPrep Normalization Plate kit (Thermo Fisher Scientific, Massachusetts, USA). Each indexing rtPCR reaction (50 μL volume) contained 32.5 μL of ddH2O, 10 μL of 5 x Phusion HF Buffer, 1 μL of dNTP mix (10 mM), 1 μL of SYBR Green I Mix (Thermo Fisher Scientific, Massachusetts, USA) diluted 1/1000 in ddH2O, 0.5 μL Phusion DNA polymerase, 4 μL of sample-specific indexing primers (2.5 μM) and 1 μL of the diluted PCR product.
Library fragment size (amplicon + adapters + indexes) and absence of primer dimers was verified on an Agilent TapeStation (Agilent Technologies, California, USA) and all libraries were equimolarly pooled based on their concentrations as determined by Qubit dsDNA HS Fluorometric Quantification (ThermoFisher Scientific, Massachusetts, USA). As DNA concentrations in negative controls were too low to be measured, they were pooled at the same volume of the lowest concentration mock community library. The final pooled libraries (Runs 1–3) were then diluted to 7 pM, spiked with 15–25% PhiX (due to the expected low diversity of the library), and sequenced on the Illumina MiSeq platform using the V3 reagent kit (2 × 250 bp reads) (Illumina, California, USA).
Raw sequence reads were demultiplexed using bcl2fastq v2.2.0 allowing for no mismatches to the expected index combinations. The remaining quality control steps were performed using the software R v4.0.2 (
The first 90 ant samples used in Run 1 were selected for this experiment (Suppl. material
The minion library preparation was performed following the manufacturer’s instructions available on the Ligation Sequencing amplicons – Native Barcoding Kit 96 V14 (Available here: https://store.nanoporetech.com/native-barcoding-kit-96-v14.html). End-prep was performed using the NEBNext Ultra II End repair / dA-tailing Module (E7546, New England Biolabs, Massachusetts, USA), barcode ligation was performed using NEB Blunt/TA Ligase (M0367, New England Biolabs, Massachusetts, USA), and adapter ligation using the NEBNext Quick T4 DNA Ligase (E60567, New England Biolabs, Massachusetts, USA) and NEBNext Quick Ligation Reaction Buffer (B6058, New England Biolabs, Massachusetts, USA). The Barcode kit used was the Native Barcoding Kit 96 V14 (SQK-NBD114.96, Oxford Nanopore Technologies, United Kingdom). The flow cell used was an R10.4.1. flow cell (FLO-MIN114, Oxford Nanopore Technologies, United Kingdom). The library was loaded on a MinION Mk1C, which run for 72 hours. Sequence data (raw read signals) were basecalled using ONT’s Guppy basecaller (version 6.5.7, super high accuracy model), and were demultiplexed using Guppy based on the barcode sequences used by the library preparation kit (NBD114.96). The database used for taxonomical assignment of these reads was generated using the COI sequences generated as outlined in the following paragraph.
Demultiplexed raw read data for the three Illumina MiSeq runs as well as for the Oxford Nanopore MinION run were uploaded on NCBI SRA under the BioProject number PRJNA1161788 titled “Ants metabarcoding for biosecurity”.
The first 90 samples were morphologically examined to identify and count all individuals present in each sample. Samples varied from one single individual to 250 specimens, and from a single species to three. Morphological examination was conducted, and high-resolution images obtained using a Leica M205C stereo microscope with a Leica MC190 HD camera (Leica Camera, Germany). Montage images were processed through the LAS X Life Science Microscope Software platform (ver. 5.2.0.26130).
Reference photos for each species identified in this study have been uploaded in the AntWeb with identification numbers ANTWEB1060404–ANTWEB1060449 (Table
Species-level identifications were invariably based upon workers, with accompanying alates rare, and based on a combination of approaches, including examination of reference specimens and literature, depending on the requirements for the group (Suppl. material
Ants preserved in the main Australian entomological collections were examined, either in person, or through the observation of high-resolution images provided by the institution. Abbreviations for collections and institutions from which material not publicly available on online platforms was examined are as follows:
MV Melbourne Museum, Melbourne, Australia.
NHMG Muséum d’histoire naturelle Genève, Genève, Switzerland.
Where morphological matches to types and descriptions were found, and no clear published evidence of an underlying species complex was found based on morphological differences, a species name was assigned. When a specimen could not be identified to species because of taxonomic uncertainty around the group, the ‘species group’ match is provided, along with a letter to designate separate morphospecies. The term ‘group’ is used here when there is evidence of known complexes under a single name, and the specimen exhibits some degree of difference from the type material. In cases where there were clear and consistent morphological differences between specimens that the literature designates a single species, the valid species group name is accompanied by morphospecies number (e.g., “sp. 1”, “sp.2”, etc). This was the case for undescribed morphological and molecular variation present in our specimens or convincingly demonstrated in the literature, suggesting a possible cryptic species, or simply an undescribed species. A representative specimen for each morphospecies was then databased and incorporated into the
DNA extractions from single ant specimens for Sanger sequencing were conducted on material that was previously non-destructively processed using the DNA extraction method outlined above. Single specimen DNA extraction was performed using either a partially destructive 5% Chelex protocol using a single ant leg (modified from
Non-destructive DNA extractions generated successful amplification for a total of 176 of the 180 samples (Suppl. material
The combined three MiSeq metabarcoding runs generated a total of 27,189,475 reads post quality control (Table
Number of total reads and per-sample reads obtained for each metabarcoding run. Individual MiSeq runs are reported in light blue, the combined MiSeq run is reported in dark blue and the MinION run is reported in green. Reads percentage difference are calculated compared with the single MiSeq run.
Run ID | Reads from single MiSeq run | Reads from combined MiSeq run | Reads from MinION | |||
---|---|---|---|---|---|---|
Total | Per sample | Total | Per sample | Total | Per sample | |
Run1 (Samples 1–90) | 8,052,623 | 89,473.58 | 4,030,514 (-49.95%) | 44,783.49 | 15,689,917 (+94.84%) | 174,332.41 |
Run2 (Samples 91–180) | 10,286,245 | 114,291.61 | 4,820,093 (-53.14%) | 53,556.59 | ||
Run3 (Samples 1–180) | 8,850,607 | 49,170.04 |
A total of 19,043,746 raw reads was obtained from the 72 h MinION run, across 88 samples (two samples failed to sequence, likely due to library preparation issues, Fig.
List of the 38 taxa identified using this integrative workflow. A total of 46 representative specimens, as well as duplicates, were deposited in the Victorian Agricultural Insect Collection (
Genus | Species | Specimen-Voucher | AntWeb ID | GenBank Acc. Number |
---|---|---|---|---|
Brachyponera | Brachyponera lutea (Mayr, 1862) | VAIC85547 | ANTWEB1060419 | PP600694 |
Camponotus | Camponotus chalceus Crawley, 1915 | VAIC85545 | ANTWEB1060417 | PP600695 |
Camponotus terebrans (Lowne, 1865) | VAIC85571 | ANTWEB1060443 | PP600696 | |
Cardiocondyla | Cardiocondyla nuda group | VAIC85574 | ANTWEB1060446 | PP600697 |
Crematogaster | Crematogaster laeviceps Smith, F., 1858 | VAIC85575 | ANTWEB1060447 | PP600698 |
Dolichoderus | Dolichoderus ypsilon Forel, 1902 | VAIC85576 | ANTWEB1060448 | PP600699 |
Iridomyrmex | Iridomyrmex bicknelli group sp1 | VAIC85535 | ANTWEB1060407 | PP600700 |
Iridomyrmex bicknelli group sp2 | VAIC85537 | ANTWEB1060409 | PP600701 | |
Iridomyrmex brunneus Forel, 1902 | VAIC85569 | ANTWEB1060441 | PP600702 | |
Iridomyrmex chasei group sp1 | VAIC85532 | ANTWEB1060404 | PP600703 | |
Iridomyrmex chasei group sp2 | VAIC85534 | ANTWEB1060406 | PP600704 | |
Iridomyrmex discors Forel, 1902 | VAIC85567 | ANTWEB1060439 | PP600705 | |
Iridomyrmex mjobergi Forel, 1915 | VAIC85570, VAIC85577 | ANTWEB1060442, ANTWEB1060449 | PP600706, PP600707 | |
Iridomyrmex purpureus (Smith, F., 1858) | VAIC85568 | ANTWEB1060440 | PP600708 | |
Iridomyrmex rufoniger (Lowne, 1865) | VAIC85556 | ANTWEB1060428 | PP600709 | |
Iridomyrmex suchieri Forel, 1907 | VAIC85533, VAIC85546, VAIC85554 | ANTWEB1060405, ANTWEB1060418, ANTWEB1060426 | PP600710, PP600711, PP600712 | |
Melophorus | Melophorus marius group | VAIC85565 | ANTWEB1060437 | PP600713 |
Monomorium | Monomorium fieldi Forel, 1910 | VAIC85572 | ANTWEB1060444 | PP600714 |
Monomorium sordidum Forel, 1902 | VAIC85573 | ANTWEB1060445 | PP600715 | |
Myrmecia | Myrmecia ludlowi Crawley, 1922 | VAIC85544 | ANTWEB1060416 | PP600716 |
Myrmecia nigriceps Mayr, 1862 | VAIC85543 | ANTWEB1060415 | PP600717 | |
Notoncus | Notoncus capitatus Forel, 1915 | VAIC85563 | ANTWEB1060435 | PP600718 |
Notoncus gilberti Forel, 1895 | VAIC85536 | ANTWEB1060408 | PP600719 | |
Nylanderia | Nylanderia sp1 | VAIC85541 | ANTWEB1060413 | PP600720 |
Nylanderia sp2 | VAIC85561 | ANTWEB1060433 | PP600721 | |
Nylanderia sp3 | VAIC85549 | ANTWEB1060421 | PP600722 | |
Ochetellus | Ochetellus glaber (Mayr, 1862) | VAIC85553 | ANTWEB1060425 | PP600723 |
Paratrechina | Paratrechina longicornis (Latreille, 1802) | VAIC85552 | ANTWEB1060424 | PP600724 |
Pheidole | Pheidole bos group | VAIC85555, VAIC85562 | ANTWEB1060427, ANTWEB1060434 | PP600725, PP600726 |
Pheidole megacephala (Fabricius, 1793) | VAIC85558, VAIC85559 | ANTWEB1060430, ANTWEB1060431 | PP600727, PP600728 | |
Pheidole sp1 | VAIC85557, VAIC85564 | ANTWEB1060429, ANTWEB1060436 | PP600729, PP600730 | |
Pheidole vigilans (Smith, F., 1858) | VAIC85540 | ANTWEB1060412 | PP600731 | |
Polyrhachis | Polyrhachis ammon (Fabricius, 1775) | VAIC85539 | ANTWEB1060411 | PP600732 |
Polyrhachis hookeri Lowne, 1865 | VAIC85542 | ANTWEB1060414 | PP600733 | |
Rhytidoponera | Rhytidoponera metallica (Smith, F., 1858) | VAIC85548, VAIC85551 | ANTWEB1060420, ANTWEB1060423 | PP600734, PP600735 |
Rhytidoponera victoriae group | VAIC85538, VAIC85560, VAIC85566 | ANTWEB1060410, ANTWEB1060432, ANTWEB1060438 | PP600736, PP600737, PP600738 | |
Solenopsis | Solenopsis clarki Crawley, 1922 | -specimen destroyed- | N/A | N/A |
Tetramorium | Tetramorium caldarium (Roger, 1857) | VAIC85550 | ANTWEB1060422 | PP600739 |
Materials and references examined during the morphological identification process for each species. Material examined includes specimens examined from the Australian Museum (
ID | Material examined | Reference |
---|---|---|
Iridomyrmex chasei |
Physical: |
|
Rhytidoponera metallica |
Physical: |
|
Iridomyrmex suchieri |
Physical: |
|
Paratrechina longicornis |
Physical: |
|
Ochetellus glaber |
Physical: |
|
Tetramorium caldarium | Digital: CASENT0102333 (type), CASENT0003150 (type), CASENT0249091 (type), worker, Lau, Latei Tonga, Fiji. BPBM; CASENT0915081. |
|
Iridomyrmex bicknelli |
Physical: |
|
Notoncus gilberti | Physical: MV: HYM 48141, HYM 48149. Digital: CASENT0909837 (Syntype), CASENT0909838 (type), FOCOL2219 (type). |
|
Rhytidoponera victoriae group [R. ‘modesta’] |
Physical: |
|
Polyrhachis ammon |
Physical: |
|
Pheidole bos group | Physical: MV: T 11416, HYM 46135, HYM 46134, HYM 46113, HYM 46138. Digital: ANTWEB1008221 (minor, type), CASENT0908012 (minor, type), CASENT0901546 (major, type), CASENT0908011 (major, type), CASENT0908029 (major, type), CASENT0908034 (minor, type). |
|
Iridomyrmex rufoniger |
Physical: |
|
Pheidole sp.1 |
Physical: MV: HYM 46164, HYM 46163, HYM 46165. Digital: FOCOL1419 (major, type), CASENT0919760 (major, type), |
|
Pheidole megacephala |
Physical: |
|
Nylanderia sp. 2 [braueri group] |
Physical: |
|
Pheidole vigilans |
Physical: |
|
Nylanderia [glabrior group] |
Physical: |
|
Polyrhachis hookeri | Physical: MV: HYM 47866. Digital: CASENT0910828 (type), CASENT0915615 (type). |
|
Melophorus marius group | Digital: CASENT0280488, CASENT0280489 |
|
Iridomyrmex discors |
Physical: |
|
Iridomyrmex purpureus |
Physical: |
|
Melophorus chauliodon | Digital: ANIC32-900189-2 (major, type), ANIC32-900189-1 (minor, type), ANIC32-900067-1 (minor, type), ANIC32-900067-2 (minor, type). |
|
Iridomyrmex brunneus | Physical: MV: HYM 47657. Digital: ANIC32039031 (type), CASENT0907612 (type), CASENT0909533 (type). |
|
Myrmecia nigriceps | Physical: MV: HYM 44465. Digital: CASENT0915833 (type). |
|
Myrmecia ludlowi | Digital: CASENT0902802 (type). |
|
Iridomyrmex mjobergi | Digital: CASENT0907614 (type), CASENT0909548 (type). |
|
Lioponera clara (=Cerapachys princeps) | Digital: CASENT0902752 (type). |
|
Brachyponera lutea | Digital: CASENT0902499 (type), CASENT0915668 (type). |
|
Camponotus chalceus | Digital: CASENT0910366 (minor, type), CASENT0911753 (minor, type). |
|
Camponotus terebrans | Physical: MV: HYM 49447. Digital: CASENT0911963 (type), CASENT0903534 (type), CASENT0911964 (type). |
|
Monomorium fieldi | Digital: CASENT0902286 (type), CASENT0904586 (type), CASENT0908769 (type), CASENT0908773 (type), CASENT0913585 (type), CASENT0913861 (type). |
|
Monomorium sordidum | Digital: CASENT0905835 (type), CASENT0908685 (type), CASENT0908686 (type). |
|
Cardiocondyla nuda group |
Physical: |
|
Melophorus perthensis | Digital: ANTWEB1038575 (type), MCZ-ENT00303602-2 (major, type). |
|
Crematogaster laeviceps |
Physical: |
|
Solenopsis clarki | Physical: MV: T 21909 (PARATYPE). Digital: CASENT0902365 (type), CASENT0902366 (type). |
|
Dolichoderus ypsilon | Physical: MV: HYM 47453, HYM 47457. Digital: CASENT0909477 (type), ANIC32-015061 (type). |
|
Nylanderia sp.3 |
Physical: |
|
Overall, the results obtained using MinION matched those obtained using MiSeq (Suppl. material
For most of the 176 samples that generated results, a single ant taxon was recorded. However, a total of 20 samples showed more than one species reported both by the border identification and by the MiSeq identification (Suppl. material
When comparing the MiSeq metabarcoding results with the preliminary morphological border identification, these matched or partially matched for the vast majority of samples (169 out of 212 combinations, 79.7%). We considered a match when both results showed the same species-level identification (N = 49, 23.1%; Suppl. material
On the other hand, the remaining 43 combinations (20.3%; Suppl. material
Across all 176 samples, comparison of the short fragments of COI locus obtained here enabled identification of 64 different “operational taxonomic units” (OTUs) showing > 3% genetic variation. These morphospecies informed further steps for additional analyses.
For each OTU identified using MiSeq metabarcoding (Suppl. material
Two individuals of Rhytidoponera metallica, photographed post non-destructive DNA extraction. Lateral habitus (A, B), dorsal habitus (C, D) and details of the head (E, F). Scale bars: 1 mm.
Examples of ant species photographed post non-destructive DNA extraction. Head of Myrmecia nigriceps (A), head of Iridomyrmex suchieri (B), thorax of Melophorus perthensis (C), head of Tetramorium caldarium (D), head of Rhytidoponera victoriae var modestum (E). Scale bars: 1 mm (A–C, E); 0.2 mm (D).
Tetramorium caldarium was incorrectly identified morphologically as another Tetramorium species by biosecurity diagnosticians and an Australian species in the Cardiocondyla nuda group was misidentified as an exotic species (C. minutior) by a reference barcode (Suppl. material
Pheidole sp. 1 was consistent with molecular barcodes and specimens misidentified as Pheidole proxima (see discussion below). Rhytidoponera victoriae group was identical to type specimens of Ectatomma metallicum modestum Emery, 1895, currently synonymised with Rhytidoponera victoriae (André, 1896). However, it was both molecularly and morphologically distinct from R. victoriae. The three Nylanderia species were native but could not be convincingly matched to a described species. The Australian Nylanderia have not been monographed and exhibit considerable morphological overlap between species, with two of the three species from this study falling within dubious species groups (‘braueri’ and ‘glabrior’ groups, the other an intercaste), but not close enough to type material to confirm any species-level identification (Table
The morphological identification and the generation of curated COI sequences identified by expert taxonomists enabled re-running of the metabarcoding analysis using the enhanced database. This allowed comparison of the taxonomic identification results obtained after the four main steps of this workflow: the preliminary border biosecurity ID, the metabarcoding ID generated using the publicly available database, the species-level morphological identification and the second round of metabarcoding analysis using the updated and curated database (Fig.
Increased species-level identification accuracy for the 90 samples processed using the workflow described in this work. The 90 samples were processed using MiSeq metabarcoding and a publicly available database. This highlighted a number of incorrect taxonomical assignments (in red). The morphological examination and resequencing of COI fragments from the voucher specimens preserved post non-destructive DNA extraction enabled provision of species-level identification for most of the samples (in green) and to use these curated sequences for the taxonomical assignment of the next metabarcoding analysis conducted using MinION. Parts of this figure were created with BioRender.com.
The preliminary border biosecurity identification, which was mainly aimed at determining presence/absence of exotic species, reported 41 correct species-level (Fig.
To confirm if this enhanced result was due to the improved taxonomic resolution of the curated database, the curated sequences generated were compared to the ASVs generated using MiSeq metabarcoding and found to match. This suggested that both MiSeq and MinION could sequence the same DNA and that the improved results are due to the curated database and not to the sequencing platform.
This work focused on Australian ants collected during biosecurity surveillance activities, highlighting the importance of these samples for biodiversity studies.
Australian ants represent a great model system, involving large numbers of bulk samples as well as numerous native/endemic species that are understudied and, therefore, underrepresented in public sequence database. These are very well-known issues linked with ant biodiversity, not only in Australia but worldwide (e.g.,
Following the experiments conducted in our study, we propose an optimised workflow (Fig.
Proposed workflow for the high throughput processing of ants from biosecurity samples. Ant samples collected for biosecurity surveillance (A) are processed using a non-destructive DNA extraction (B) and sequenced using Illumina or Nanopore platforms (C), the generated ASVs/OTUs (D) are verified by examining the voucher specimens in order to identify morphospecies (E). Each morphospecies should be databased, deposited in a collection, and accompanied by high-resolution photographs (F), and a DNA sequence should be generated from each specimen (G) and uploaded into a public database in order to constantly improve the publicly available reference sequence data (Green arrow). This figure was created with BioRender.com.
To allow high-throughput processing of samples collected by biosecurity surveillance programs, we tested a non-destructive HTS approach comparing two different platforms to process large volumes of samples and determine the presence of “morphospecies” of interest. This step streamlines the first screening step allowing processing of hundreds of samples and immediate identification of priority pests, which are normally well-represented with DNA sequences available in public datasets. This fulfils the primary task of biosecurity activities, while also providing ASVs/OTUs for all other species present in the sample, including native and endemic species of ecological/taxonomical interest.
Ants collected during biosecurity surveillance activities using baited traps (Fig.
Following sequencing, the generated ASVs or OTUs should be matched with publicly available data (Fig.
Non-target specimens could be of interest for biodiversity studies in case they provide novel genetic records for poorly characterised or undescribed species. For these reasons, these specimens should be databased and imaged to retain morphological voucher specimens (Fig.
This would generate curated DNA sequences linked to a physical voucher specimen, improving publicly available database records and enabling correction/updating of records by allowing re-examination of a voucher specimen.
Ultimately, an additional advantage of this process would be its constant improvement in accuracy and turnaround times. Indeed, the more samples that are processed following this workflow, the more native species will be characterised and added to the public database, the more precise the first taxonomic assignment of ASVs/OTUs will be. In turn, this will require less and less morphological examinations, as more and more taxa will have a voucher specimen linked to a curated DNA reference.
Integrating morphological examination, aided by high-resolution photographs, together with DNA sequences is a well-established procedure within integrative taxonomy (e.g.,
Simultaneously curating a DNA database for the Australian ants also improves future biosecurity applications in case closely related exotic species find their way into the country. In the current study three previously introduced species with a limited established range in Australia were detected: Paratrechina longicornis, Pheidole megacephala and Tetramorium caldarium (
Misidentifications both through molecular and preliminary morphological identifications for groups of importance to biosecurity (e.g., Tetramorium and Cardiocondyla) and common native species (Rhytidoponera victoriae group, Myrmecia, and mid-sized Iridomyrmex) confirm that validating suspect specimen identifications with experts remains vital, as does the introduction of a standardised approach to generating barcode data and retaining specimens for future identifications.
Further, the taxonomic uncertainty present in several groups in this study, including genera like Nylanderia that are important to distinguish from related tramp species – organisms that have been spread globally by human activities – demonstrate the importance of preserving specimens for research and biosecurity studies for future validation when taxonomic work is eventually completed. Myrmecia nigriceps belongs to the taxonomically problematic M. gulosa group, with at least two molecular species existing under M. nigriceps, as recently demonstrated by
Pheidole sp. 1 was consistent with molecular barcodes and specimens identified as Pheidole proxima Mayr, 1876, a native Australian species thought to be present in New Zealand (
Ectatomma metallicum modestum Emery, 1895 was determined to be a synonym of Rhytidoponera victoriae (André, 1896) but was later confirmed through an unpublished thesis (
It is clear that numerous ant groups likely contain several species under a single valid name. However, we have taken the approach to use current valid names as accepted by the taxonomic literature, whenever a species matched a description and there was a lack of taxonomic work to convincingly demonstrate that previous designations have failed to accommodate the variation present in the specimen. This may not necessarily ally with all previous species delimitation practices but seemed the most consistent way of standardising an approach to suit both practical biosecurity and biodiversity surveillance.
Whilst a relatively large degree of COI and morphological variation was observed for several groups, this is at most an indication of potentially promising future research directions, with further integrative taxonomic work, including an examination of the wealth of material synonymised under current valid names, required to determine whether variability extends beyond what would be expected for a species.
Both short-read and long-read sequencing can be extremely valuable techniques, with sequence platform preference generally differing depending on the focus of the study, with longer reads generally being associated with better genetic resolution, especially useful in the case of cryptic species and species complexes that cannot be differentiated using shorter reads (
The second aim of this work was to upscale the sample volume when using a MiSeq platform. Here we achieved processing of greater numbers of samples as a batch on the same run compared to similar insect bulk samples previously processed on a MiSeq flow cell (N = 47;
As one would expect based on the compositional nature of HTS metabarcoding (
We demonstrated that sample volume upscaling from 90 samples to 180 did not impact the taxonomic identification of the ant species analysed in this work. Ultimately, increasing the number of samples that can be processed on a single MiSeq run not only makes the price-per-sample much cheaper, but it also enables a faster turn-around time between sample collection and data generation.
The third aim of this work was to analyse a similar volume of samples using Oxford Nanopore’s MinION. In order to do this, we used the largest barcode kit commercially available, with the same DNA extracts used in a single MiSeq run. The two main advantages of using Oxford Nanopore technology to move from short-read to long-read sequencing are the increased genetic information that can be recovered by obtaining a longer DNA sequence, and the fact that the MinION platform can be used in relatively simple laboratories, being portable and in-field compatible, and not requiring many thousands of dollars of investment, enabling its use outside larger laboratories. When using long-read metabarcoding, the analysis conducted here showed that the 96-barcode kit provided by Oxford Nanopore is a valid tool to process at least 90 samples, and generated almost double the number of reads obtained when processing the same number of samples on the MiSeq. This suggests that the MinION platform may actually be capable of processing twice as many samples (N = 180) although this experiment could not be tested due to the lack of any commercially available kit providing more than 96 barcodes. Furthermore, the use of longer reads may prove especially useful in instances where low-diversity insect groups are observed, since a shorter barcode may not be able to separate closely related species, while a longer gene sequence might provide sufficient information to detect genetic diversity present. However, this was not the case for this work, where the shorter MiSeq barcode could successfully separate all species present in the bulk samples, as confirmed by the subsequent morphological examination.
One of the limitations of using the MinION platform has been the higher error rate compared to the Illumina platforms. Here, each sample processed using MinION recorded more than one species. This occurred in all MinION samples, even though many samples actually only consisted of a single species. However, the number of reads ascribed to each species was used as a clear marker to determine which record was a true positive and which record may be the result of barcode index switching errors. For example, for each sample the species with the highest number of reads generally outputted an average of ~170,000 reads, while the species with the second highest number of reads had an average of ~4855 reads. Relative abundance thresholds have been commonly adopted to mitigate the risk of false positive results due to low levels of contaminations and/or index switching (
In general, the results obtained here are extremely promising. MinION-based long-read metabarcoding not only generated longer reads (~650 bp) but almost doubled the number of reads compared to a single run on the MiSeq. These results suggest that ant identification for biosecurity could be safely conducted using MinION platforms and future investments should be considered to research the development and application of dual unique indexes allowing processing of more samples on the same run. For example, doubling the number of samples (N = 180) would generate an average of approximately 87,000 reads per sample, very close to the average obtained using MiSeq. While increasing the number of samples of 50% (N = 135) would still generate approximately 115,000 reads per sample: an increase of 27% compared to the number of reads obtained with MiSeq.
The importance of comparability and standardization in high throughput sequencing workflows has been emphasized as critical for advancing biodiversity and biosecurity studies, a topic extensively addressed in recent literature (e.g.,
For instance, consensus regarding the selection of target genetic regions for amplification, as advocated by
The nature of the sample (e.g., number of individuals, collection method, preservative used) profoundly influences subsequent analytical outcomes. For instance, whereas previous studies have encountered challenges such as primer specificity issues or incomplete species records (
The morphological identification aspect of this workflow is identified as a critical component necessitating standardized procedures. Morphological identification should not be subjective but grounded in thorough examination of scientific literature and voucher specimens housed in reference collections. Similar to other scientific activities, morphological identification must be replicable, with clear protocols detailing the steps leading to taxonomic assignments. Here, we have provided detailed methodologies for each morphological identification, including voucher specimen identification numbers and comprehensive literature citations outlining diagnostic criteria (Table
Finally, the curation of DNA reference databases has been extensively debated in terms of standardization requirements, and warrants a more in-depth discussion here. DNA based methods aiming to provide a species-level taxonomical identification for insect groups rely entirely on reference databases and repositories linking genetic sequences to taxonomic names (
In regions with diverse and endemic biota, such as Australian ants, maintaining a well-curated reference database assumes heightened significance. Australian ants are characterized by substantial endemism, with many taxa exhibiting unique genetic profiles largely confined to the country, and the results obtained here could be ascribed to some of the issue mentioned above. For instance, some samples matching (100%) identical DNA sequences had conflicting taxonomic identifications, as in the case of Iridomyrmex anceps and I. suchieri. This has been highlighted as “taxonomic mislabelling” by Keck and colleagues (2023), as identical sequences are erroneously labelled with different taxonomic names. Other sequences had no closely related record available to match with on the database and could only be identified to family level, as “Formicidae sp. 1”. This is an example of “missing taxa” (
A total of 46 new COI DNA barcode reference sequences were generated from previously uncharacterised/unidentified specimens. These sequences provide the first genetic information for a number of ant species for which molecular-based identification has so far been challenged by their absence on public databases. As a result, physical voucher specimens linked to DNA sequences are now preserved in entomological collections, enabling future morphological re-examination of the vouchers as well as future assessments and comparisons with novel samples.
Ultimately, we think that recent developments surrounding the accessibility and increased use of metabarcoding and metagenomic techniques, as well as their reliance on public reference databases, require a shift in our way of approaching the generation of DNA sequences, from “quantity” to “quality”. With the costs of sequencing becoming more and more affordable, and the use of novel, high-throughput techniques (e.g., mega-barcoding;
The workflow proposed in this study endeavours to link each genetic record with physical voucher specimens and high-resolution images, ensuring the veracity of taxonomic identifications derived from genetic material. While databases like BOLD (
We thank Tegan Honing (DAFF) and Brendan Rodoni (Agriculture Victoria) for their assistance with managing this project. We thank the DAFF National Border Surveillance teams and management who provided the samples for the study. We thank the collections that granted access to specimens through loans, visits, or photographs of material to JTB: Australian National Insect Collection (Bonnie Koopmans, Jon Lewis), Australian Museum (Derek Smith), Museum Victoria (Simon Hinkley, Claire Keely), Western Australian Museum (Nikolai J. Tatarnic, Brian Heterick), MNHW (Manuela Vizek, Dominique Zimmermann), MNHG (Bernard Landry, Christina Lehmann Graber). Thanks also to Bonnie Blaimer (Museum für Naturkunde Berlin), Crystal A. Maier (Museum of Comparative Zoology, Harvard University), Leilani Walker (Auckland Museum), and Disna Gunawardana for insight into Australian species of interest in their collections, as well as Brian Heterick for taking the time to provide a wealth of advice beyond collection access. Many thanks to Daniel Kurek for providing specimens he collected and to Duncan Jaroslow for taking some of the photos used for this project.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This project was supported through funding from the Australian Department of Agriculture, Fisheries and Forestry through the Biosecurity Innovation Program, Novel Triage Tools project (C06474 - C10632).
Conceptualization: MJB, KSS, FM. Data curation: FM, JB. Formal analysis: LR, JB, FM, TL, RLS. Funding acquisition: FM. Investigation: JB, FM. Methodology: LR, FM, RLS, JB, KSS, TL. Project administration: FM. Supervision: MJB, KSS. Visualization: FM. Writing - original draft: FM, MJB, JB. Writing - review and editing: JB, RLS, LR, KSS, FM, TL, MJB.
Francesco Martoni https://orcid.org/0000-0001-8064-4460
James Buxton https://orcid.org/0000-0002-9309-6181
Kathryn S. Sparks https://orcid.org/0009-0003-4094-1139
Tongda Li https://orcid.org/0000-0001-9430-2779
Reannon L. Smith https://orcid.org/0000-0002-3794-5900
Lea Rako https://orcid.org/0000-0002-3528-8978
Mark J. Blacket https://orcid.org/0000-0001-7864-5712
All of the data that support the findings of this study are available in the main text or Supplementary Information. COI sequences are available on GenBank with accession numbers PP600694–PP600739. Reference photos are available on AntWeb with identification numbers ANTWEB1060404–ANTWEB1060449. Metabarcoding raw read data is available on NCBI SRA under the BioProject number PRJNA1161788, titled “Ants metabarcoding for biosecurity”.
The 180 samples processed for this study, including their metadata
Data type: docx
Explanation note: Morphological ID and MinION ID. For each sample, the morphological identification provided by border diagnosticians (Border ID) is reported together with the taxonomic identification matched after the MiSeq “Run1 and Run 2” for 90 samples (MiSeq 90 samples) and the MiSeq run for 180 samples (MiSeq 180 samples). Morphological identification provided by expert taxonomists (Morphological ID) was performed only on the first 90 samples and on any additional sample containing a unique ASV reported by the MiSeq metabarcoding. Morphological identification contributed to generate the curated COI sequences that were then used for the MinION experiment. MinION analysis was performed only on the first 90 samples.