Research Article |
Corresponding author: Sofia Duarte ( sduarte@bio.uminho.pt ) Academic editor: Baruch Rinkevich
© 2020 Sofia Duarte, Pedro E. Vieira, Filipe O. Costa.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Duarte S, Vieira PE, Costa FO (2020) Assessment of species gaps in DNA barcode libraries of non-indigenous species (NIS) occurring in European coastal regions. Metabarcoding and Metagenomics 4: e55162. https://doi.org/10.3897/mbmg.4.55162
|
DNA metabarcoding has the capacity to bolster current biodiversity assessment techniques, including the early detection and monitoring of non-indigenous species (NIS). However, the success of this approach is greatly dependent on the availability, taxonomic coverage and reliability of reference sequences in genetic databases, whose deficiencies can potentially compromise species identifications at the taxonomic assignment step. In this study we assessed lacunae in availability of DNA sequence data from four barcodes (COI, 18S, rbcL and matK) for NIS occurring in European marine and coastal environments. NIS checklists were based on EASIN and AquaNIS databases. The highest coverage was found for COI for Animalia and rbcL for Plantae (up to 63%, for both) and 18S for Chromista (up to 51%), that greatly increased when only high impact species were taken into account (up to 82 to 89%). Results show that different markers have unbalanced representations in genetic databases, implying that the parallel use of more than one marker can act complimentarily and may greatly increase NIS identification rates through DNA-based tools. Furthermore, based on the COI marker, data for approximately 30% of the species had maximum intra-specific distances higher than 3%, suggesting that many NIS may have undescribed or cryptic diversity. Although completing the gaps in reference libraries is essential to make the most of the potential of the DNA-based tools, a careful compilation, verification and annotation of available sequences is fundamental to assemble large curated and reliable reference libraries that provide support for rigorous species identifications.
BOLD, DNA barcode markers, gap-analysis, GenBank, marine and coastal ecosystems, non-indigenous species
Marine and coastal habitats are among the most important, but also the most threatened ecosystems in the world, providing invaluable services, such as provisioning, supporting, regulation and cultural/aesthetic for human well-being (
While morphology-based identification of taxa has largely ensured the ascertainment of the current status of NIS occurring in coastal environments in Europe (
DNA-based methods, such as DNA barcoding (
Efficient and accurate species identifications through DNA barcoding or DNA metabarcoding are dependent on reliable reference sequences libraries of known taxa (
In the current study, the gaps, for the most commonly used barcode markers in DNA-based studies for Animalia (COI and 18S), Chromista (COI, 18S and rbcL) and Plantae (COI, rbcL and matK), were analysed in the genetic databases GenBank and the Barcode of Life Data System (BOLD), with a focus on NIS occurring in European coastal regions by using updated lists retrieved from the European Alien Species Information Network (EASIN) (
The lists of non-indigenous species (NIS) occurring in European marine coastal regions were assessed using the two most important databases that compile crucial information on non-indigenous species occurring in Europe, on 23th October 2019: the European Alien Species Information Network (EASIN) (https://easin.jrc.ec.europa.eu/easin) (
The taxonomic classification and name validation of the NIS compiled in the lists was made through the World Register of Marine Species (WoRMS) database (www.marinespecies.org) and the Algaebase (https://www.algaebase.org/). Both databases adopted the Cavalier-Smith’s taxonomic classification system (
For each species in the lists, and within each taxonomic group (i.e. Animalia, Chromista and Plantae), the number of sequences available in the Barcode of Life Data System (BOLD) (www.barcodinglife.org) (
The number of Barcode Index Numbers (BINs; proxy of Molecular operational taxonomic units – MOTUs) (
After removal of the records with taxonomic ranks higher than species level and replicated records, the final AquaNIS list had 1,120 species and the final EASIN checklist had 1,554 species (Fig.
Taxonomic classification. Taxonomic distribution of the species from the AquaNIS and EASIN lists. Numbers on the right of each bar represent the total number of species per phyla. For the EASIN list the species were separated in high and low/unknown impact.
Within Animalia, the most well represented phyla were Arthropoda (19 and 18%), Chordata (12 and 14%) and Mollusca (11 and 19%) (data in parentheses correspond to % in AquaNIS and EASIN lists, respectively). Within Chromista, these phyla were Ochrophyta (8%) and Myzozoa (7%), in the AquaNIS list, and Ochrophyta (5%) and Foraminifera (4%), in the EASIN list. Within Plantae, the most well represented phylum was Rhodophyta (13 and 7%, in AquaNIS and EASIN, respectively) in both lists, while the other phyla accounted for less than 5% of the total species (Fig.
For all analysed taxonomic groups (Animalia, Chromista and Plantae), a higher number of records was found on GenBank than on Public BOLD (Table
Overall barcode coverage. Overall barcode coverage for selected markers and % of singletons (i.e. species with only one representative sequence) on GenBank and Public BOLD for the major taxonomic NIS groups of the AquaNIS and EASIN lists.
Taxonomic group | Database | No. of species | Marker | No. of records | No. of barcoded species | Singletons (%) | |
---|---|---|---|---|---|---|---|
GenBank | Public BOLD | GenBank + Public BOLD (% barcode coverage) | |||||
Animalia | AquaNIS | 739 | COI | 25,242 | 21,013 | 465 (62.9) | 9.0 |
18S | 1,821 | 7 | 331 (44.8) | 37.8 | |||
COI or 18S | 500 (67.7) | ||||||
COI+18S | 296 (40.1) | ||||||
EASIN | 1,180 | COI | 23,889 | 19,154 | 604 (51.2) | 11.9 | |
18S | 1,750 | 6 | 352 (29.8) | 40.3 | |||
COI or 18S | 650 (55.1) | ||||||
COI+18S | 306 (25.9) | ||||||
Chromista | AquaNIS | 186 | COI | 833 | 431 | 60 (32.3) | 18.3 |
18S | 1,190 | 0 | 95 (51.1) | 18.9 | |||
rbcL | 623 | 224 | 56 (30.1) | 28.6 | |||
COI or 18S or rbcL | 108 (58.1) | ||||||
COI+18S+rbcL | 30 (16.1) | ||||||
EASIN | 224 | COI | 801 | 308 | 51 (22.8) | 17.6 | |
18S | 1,123 | 0 | 79 (35.3) | 19.0 | |||
rbcL | 549 | 209 | 53 (23.7) | 24.5 | |||
COI or 18S or rbcL | 113 (50.4) | ||||||
COI+18S+rbcL | 18 (8.0) | ||||||
Plantae | AquaNIS | 195 | COI | 1,002 | 494 | 75 (38.5) | 18.7 |
rbcL | 1,358 | 718 | 121 (62.1) | 21.5 | |||
matK | 67 | 17 | 13 (6.7) | 23.1 | |||
COI or rbcL or matK | 125 (64.1) | ||||||
COI+rbcL+matK | 3 (1.5) | ||||||
EASIN | 150 | COI | 802 | 394 | 55 (36.7) | 12.7 | |
rbcL | 1,216 | 653 | 94 (62.7) | 16.0 | |||
matK | 30 | 20 | 5 (3.3) | 0 | |||
COI or rbcL or matK | 94 (62.7) | ||||||
COI+rbcL+matK | 0 |
For Animalia, in both lists, the phyla with the highest number of total records, taken into account all searched markers in both genetic databases, were Arthropoda (10,863 and 10,148), Chordata (12,478 and 11,808) and Mollusca (7,146 and 6,045, for the AquaNIS and EASIN lists, respectively) (Suppl. material
Gap-analysis. Barcode coverage (%) of each searched marker in Public BOLD and GenBank for AquaNIS (left panel) and EASIN (right panel) lists, for each taxonomic group (phyla) within Animalia (A, B), Chromista (C, D) and Plantae (E, F).
For Chromista, in both lists, Ochrophyta was the phyla which included the highest number of total records, taking into account all searched markers in both genetic databases (2,188 and 1,983, for the AquaNIS and EASIN respectively) (Suppl. material
For Plantae, in both lists, Rhodophyta was the phyla which included the highest number of total records in both genetic databases, taking into account all markers (2,362 and 1,931, for the AquaNIS and EASIN lists, respectively) (Suppl. material
Considering only the high impact species from the EASIN list, the gap was much lower for all analysed groups and barcode markers, than for the full lists (Fig.
Overall barcode coverage for high impact species. Overall barcode coverage for selected markers and % of singletons (i.e. species with only one representative sequence) on GenBank and Public BOLD for high impact species (EASIN).
Taxonomic group | No. of species | Marker | No. of records | No. of barcoded species | Singletons (%) | |
---|---|---|---|---|---|---|
GenBank | Public BOLD | GenBank + Public BOLD (% barcode coverage) | ||||
Animalia | 118 | COI | 9,968 | 8,033 | 105 (89.0) | 3.8 |
18S | 648 | 3 | 77 (65.2) | 22.1 | ||
COI or 18S | 110 (93.2) | |||||
COI+18S | 72 (61.0) | |||||
Chromista | 17 | COI | 75 | 35 | 10 (58.8) | 10.0 |
18S | 198 | 0 | 14 (82.3) | 7.1 | ||
rbcL | 62 | 25 | 6 (35.3) | 16.7 | ||
COI or 18S or rbcL | 14 (82.3) | |||||
COI+18S+rbcL | 5 (29.4) | |||||
Plantae | 13 | COI | 84 | 30 | 7 (53.8) | 14.3 |
rbcL | 155 | 94 | 11 (84.6) | 18.2 | ||
matK | 1 | 5 | 1 (7.7) | 0.0 | ||
COI or rbcL or matK | 11 (84.6) | |||||
COI+rbcL+matK | 0 |
Gap-analysis for high impact species. Barcode coverage (%) of each searched marker in Public BOLD and GenBank for high impact species of the EASIN list for each taxonomic group (phyla) within Animalia (A), Chromista (B) and Plantae (C).
For Animalia, the highest number of total records, considering all searched markers in both genetic databases, was found for Arthropoda, Mollusca and Chordata (2,797 to 4,595) (Suppl. material
Most remaining phyla containing high impact species, within Animalia, still had a barcode coverage higher than 50%, with the exception of Platyhelminthes for COI (25%), but that was well represented with 18S sequences (75%), and Bryozoa and Chordata for 18S (33 and 45%, respectively) (Fig.
Based on the COI marker, a total number of 1,649 Barcode Index Numbers (BINs) were found for the two lists: 1,541 for Animalia, 48 for Chromista and 60 for Plantae (Fig.
Barcode Index Numbers. Number of barcoded species and number of BINs, based on the COI marker, for each taxonomic group (A) and number of species with 1 to ≥5 BINs for the total number of barcoded species found in both lists (B). On (B) the numbers above bars indicate the number of species.
Our study brings to the forefront two main considerations: first, reference libraries still lack representative sequences for many NIS with extreme cases in some groups, and second, some NIS can be categorised as possible cryptic species. Both these cases may critically impair the detection of NIS and therefore, the current capability for NIS detection and monitoring using molecular tools.
Although the gaps (i.e., NIS still missing barcode sequences) were similar in both lists, the values of missing barcodes clearly differed among taxonomic groups and the barcode markers searched. In both lists the gap was highest for Chromista. In these lists, Chromista include Foraminifera, Myzozoa and Ochrophyta as dominant phyla, that can harbour very small sized species, such as small protists and diatoms and for which obtaining voucher specimens to generate sequences to deposit in genetic databases may be challenging. It has been reported that smaller organisms may have greater invasion opportunities in coastal ecosystems (
On the other hand, we found a lower gap for Animalia and Plantae. The gaps in BOLD and GenBank were recently analysed for the taxa frequently used in the WFD and the MSFD, under the scope of the COST Action DNAqua-Net (
Our results were somewhat discrepant from those obtained in a previous report where the gaps in BOLD and GenBank were analysed for aquatic NIS compiled from literature (
As above-mentioned, for each taxonomic group, the gap clearly differed among the barcode markers searched. For Animalia, most phyla were well represented with COI sequences in GenBank and BOLD, but Annelida, Ctenophora, Platyhelminthes and Porifera were better represented with 18S sequences. Within Chromista most phyla were better represented with 18S, but for instance Ochrophyta, which includes brown algae and diatoms, was an exception to this pattern, with the barcode coverage being greatest for rbcL. For Plantae, most phyla were better represented with rbcL sequences. Thus, the simultaneous use of more than one marker can act complimentarily and may greatly increase NIS identification rates through DNA-based tools. Recent studies have highlighted the advantage of using both 18S and COI markers for invasive species detection; the 18S for detecting a much broader range of taxa and the COI for discriminating between many metazoan species (
Approximately 37% of the species displayed more than one BIN, and many of these species displayed mean- and maximum-intraspecific distances higher than 3%, suggesting that many NIS may display hidden diversity or cryptic diversity, which may further complicate taxonomic assignment using DNA-based tools (
A closer look at the list of barcoded species with attributed BINs, in particular for COI and Animalia, indicated that many of them displayed discordant BINs (i.e. different species sharing the same BIN), possibly due to incorrect taxonomic assignments of numerous species, that have been repeatedly used in databases without a proper validation. A careful inspection in these BINs would be needed in order to check for potential artefacts such as misidentifications, incomplete taxonomy or sequences that were deposited under different synonyms. Incorrect species identifications could either artificially inflate or depress the number of NIS in an ecosystem, and lead to misdirecting limited resources against harmless species or inaction against problematic ones (
Although completing the gaps in reference libraries is essential to make the most of the potential of DNA-based tools in NIS surveillance in coastal ecosystems, correct species attribution (by morphology-based methods) and proper management of sequence deposition and voucher storage is vital to preserve correct connections between morphological and molecular data (
This work was supported by national funds through the Portuguese Foundation for Science and Technology (FCT, I.P.) in the scope of the project “NIS-DNA: Early detection and monitoring of non-indigenous species (NIS) in coastal ecosystems based on high-throughput sequencing tools” (PTDC/BIA-BMA/29754/2017). We are also grateful to two reviewers for comments and suggestions that improved the manuscript.
Supplementary figures and tables used to analyse the data
Data type: Lists of species, taxonomic classification, gap-analyses