Research Article |
|
Corresponding author: Paula Gauvin ( paula.gauvin@inrae.fr ) Academic editor: Alexander Weigand
© 2025 Paula Gauvin, David Eme, Isabelle Domaizon, Frédéric Rimet.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Gauvin P, Eme D, Domaizon I, Rimet F (2025) An annotated reference library for supporting DNA metabarcoding analysis of aquatic macroinvertebrates in French freshwater environments. Metabarcoding and Metagenomics 9: e137772. https://doi.org/10.3897/mbmg.9.137772
|
Freshwater ecosystems are increasingly threatened by human activities, leading to biodiversity loss and ecosystem degradation. Effective biodiversity monitoring, particularly through the use of aquatic macroinvertebrates as bioindicators, is crucial for assessing ecological health. While traditional morphological methods face limitations, DNA metabarcoding offers higher accuracy and efficiency in species identification using environmental DNA. However, the success of metabarcoding is contingent on the quality of reference libraries, which are often incomplete or biased. This study aimed to construct and share a comprehensive COI-based DNA barcode library for freshwater macroinvertebrates in France, specifically targeting short gene regions amplified with fwhF2/fwhR2N primers, suitable for degraded DNA. A list of species occurring in French freshwater ecosystems was established from official national checklists and Alpine lake surveys. The resulting library was analysed for taxonomic completeness, barcode coverage and cryptic diversity. The checklist consisted of 2,841 species across 10 phyla, for which 56% had at least one COI-5P sequence available in the Barcode of Life Data System (BOLD). The analysis of cryptic diversity, based on Barcode Index Numbers (BINs) highlighted a potential high rate of cryptic diversity, although it might have been overestimated due to the wide geographic origin of the sequences. Alignment challenges with the primers were identified for certain taxa, particularly amongst Coleoptera, Diptera and Malacostraca. The genetic diversity approached by the number of haplotypes per species highlighted that most of the species have limited diversity, with only three species having more than 100 haplotypes. Finally, this study showed that a total of 57 haplotypes were shared amongst 116 distinct species. This work emphasises the need for expanded sequencing efforts to improve barcode coverage and highlighted the pitfalls associated with the use of these primers for further biodiversity assessment of macroinvertebrates with DNA.
COI, cryptic diversity, freshwater, fwhF2 primers, gap analysis, haplotype diversity, macroinvertebrates, metabarcoding
Freshwater ecosystems face escalating anthropogenic pressures threatening their functionality (
In order to establish the most up-to-date checklist of macroinvertebrates species recognised and described occurring in freshwater aquatic habitats in metropolitan France, three checklists were complied. The first one comes from PERLA, an interactive tool managed by a French regional environmental agency (DREAL Auvergne-Rhône-Alpes) accessible at http://www.perla.developpement-durable.gouv.fr/index.php. PERLA serves as a national comprehensive checklist for water managers and encompasses larvae, nymphs and adults across various taxonomic groups, including insects, molluscs, crustaceans and worms, amongst others, found in rivers and aquatic ecosystems. In the manuscript, we will refer to this checklist as the ‘French aquatic ecosystems checklist’. The second checklist is coming from macroinvertebrate surveys conducted in four Alpine lakes (Lake Geneva, Lake Annecy, Lake Bourget and Lake Tignes) from 2015 to 2022. In the manuscript, we will refer to this checklist as the ‘French Alpine lakes checklist’. The third checklist we used was from a French NGO, Opie-benthos (https://www.opie-benthos.fr/opie/monde-des-insectes.html), studying freshwater insect’s taxonomy and diversity. Currently, Opie-benthos gathers, curates and maintains certainly the most up-to date freshwater insect species list for the French metropolitan territory. In the manuscript, we will refer to this checklist as the ‘French aquatic insect’s species list’. These three checklists were merged into a single one. When the taxonomic resolution of taxa was limited to a level above species (genus, family, class, phylum), we consulted the taxonomic referential of the National Inventory of Natural Heritage (INPN - Inventaire national du patrimoine naturel) (TAXREF v.17.0 2024, available at https://inpn.mnhn.fr/telechargement/referentielEspece/taxref/17.0/menu) to select species from those ranks. The phylum Acanthocephala and Rotifera, the classes Copepoda and Ostracoda (Arthropoda, Crustacea) and the families Succineidae (Arthropoda, Insecta, Diptera) and Hydrachnidae (Arthropoda, Arachnida, Trombidiformes), initially listed in French aquatic ecosystems checklist, were omitted from our search from TAXREF v.17.0 2024 as they are not considered as freshwater macroinvertebrates according
All the sequences for those species listed in the French metropolitan freshwater macroinvertebrates’ species list, whatever the gene, were downloaded from October 2023 to May 2024, from the Barcode of Life Data System v.4 -Bold- repository (https://boldsystems.org/index.php/databases). We opted to utilise BOLD as the reference library instead of GenBank due to concerns regarding the latter’s unverified submission process, which frequently results in misannotated sequences (
We firstly analysed the taxonomic composition of the French metropolitan freshwater macroinvertebrates’ species list. Secondly, to highlight any gaps in the availability of sequences in our list of species, the taxonomic coverage of barcodes at various taxonomic levels was assessed. To estimate the importance of cryptic diversity for the different taxonomic groups, we used the number of Barcode Index Numbers (BINs) per species. We also reported the number of BINs lumping sequences attributed to different species. To account for the unbalanced availability of the DNA sequences (and sampling effort) amongst species, we also evaluated the correlation between the number of BINs and sequences. Thirdly, we highlighted the taxonomic composition of species with COI-5P sequences that did not align with fwhF2/fwhR2N primers to reveal which species and taxonomic groups could be misrepresented in metabarcoding studies using that primer pair. Then, to explore the species haplotype diversity, the number of different haplotypes available relative to the number of species by phylum was examined. The categorisation of unique barcodes ranging from limited (< 5 haplotypes) to moderate (5–25 haplotypes) to good (> 25 haplotypes), was adopted from
The French metropolitan freshwater macroinvertebrates’ species list is composed of 10 phyla, 16 classes, 50 orders, 222 families, 670 genera and 2841 species (Table
Summary of taxonomic composition for each phylum of the French metropolitan freshwater macroinvertebrates’ species list.
| Phylum | Class | Order | Family | Genus | Species |
|---|---|---|---|---|---|
| Annelida | 1 | 2 | 6 | 18 | 40 |
| Arthropoda | 4 | 20 | 135 | 508 | 2469 |
| Bryozoa | 2 | 2 | 6 | 7 | 11 |
| Cnidaria | 1 | 2 | 3 | 3 | 7 |
| Entoprocta | 1 | 1 | 1 | 1 | 1 |
| Mollusca | 2 | 11 | 24 | 52 | 195 |
| Nematoda | 2 | 9 | 42 | 68 | 99 |
| Nemertea | 1 | 1 | 1 | 1 | 1 |
| Platyhelminthes | 1 | 1 | 3 | 7 | 12 |
| Porifera | 1 | 1 | 1 | 5 | 6 |
Fig.
Taxonomic composition of the French metropolitan freshwater macroinvertebrates’ species list. Percentage of the number of species according to phylum (a), class within the arthropod phylum (b) and order within the insect class (c). The number above each bar represents the total number of species affiliated to each taxonomic rank.
Amongst the insects, Coleoptera dominate, followed by Diptera and Trichoptera, with 684, 630 and 527 species, respectively. The predominant phyla with the highest number of species in our checklist matches with those identified in a study by
Barcoding coverage of the French metropolitan freshwater macroinvertebrates’ species list. The barcoding coverage gives for each phylum, for each class within Arthropoda and for each order within Insecta the total percentage of species with available COI-5P public sequence in BOLD. The number above each bar represents the total number of species listed in the Freshwater macroinvertebrate checklist.
There are some exceptions with Nemerta, which have a barcode coverage of 100% and the Entoprocta that have 0% coverage (both phyla with only one species referenced). Mollusca (195 species) and Nematoda (99 species), which are the second and third most diverse phyla in terms of species in the French metropolitan freshwater macroinvertebrates’ species list, have the worst barcode coverage of all phyla (41% and 5.1%, respectively). Within Arthropoda, most classes have a barcode coverage above 60%, with the highest being in the Malacostraca (65 species, 73.8%) and the lowest in the Branchiura (3 species, 66.7%). Finally, within insects, Hymenoptera (1 specie), Lepidoptera (5 species), Megaloptera (3 species), Odonata (89 species) and Hemiptera (83 species) have the highest barcode coverage (> 80%). The three most species diverse insects’ orders (Coleoptera (684 species), Diptera (630 species) and Trichoptera (527 species)) have a barcode coverage of 73.4%, 55.9% and 67.9%, respectively (Fig.
The 1811 species, with a BIN and at least one COI-5P sequence, encompassed 3348 unique BINs, which might indicate a high rate of cryptic diversity. In one hand, this diversity is probably underestimated due to 1689 BOLD identifiers, belonging to 410 species (for which 31 are not in the 1811 species), which have no associated BINs, representing 2.4% of the total identifiers recovered (69083). This can be explained by the fact that the sequences associated with these identifiers did not meet the conditions required to be evaluated for inclusion in the BINs according to the algorithm used to delimit this cryptic diversity (
There was a significant correlation (R2 = 0.6, p < 0.01) between the number of unique COI sequences per species and the number of BINs generated (Fig.
However, some groups like Annelida or Megaloptera have a high cryptic diversity, but a low genetic diversity and, on the contrary, Platyhelminthes or Lepidoptera have a high genetic diversity, but a low cryptic diversity (Table
Summary of species, sequences and BINs number and their ratio for each phylum, class within the arthropod phylum and order within the insect class.
| Taxonomic rank | Total number of species | Total number of sequences | Total number of unique BINS | Mean number of sequences per BINs | Mean number of BINs per species |
|---|---|---|---|---|---|
| Phylum | |||||
| Annelida | 24 | 328 | 81 | 4.0 | 3.4 |
| Arthropoda | 1708 | 33547 | 3690 | 9.1 | 2.2 |
| Bryozoa | 8 | 17 | 12 | 1.4 | 1.5 |
| Cnidaria | 7 | 245 | 34 | 7.2 | 4.9 |
| Mollusca | 80 | 2761 | 244 | 11.3 | 3.1 |
| Nematoda | 5 | 89 | 7 | 12.7 | 1.4 |
| Nemertea | 1 | 6 | 2 | 3.0 | 2.0 |
| Platyhelminthes | 4 | 104 | 5 | 20.8 | 1.3 |
| Porifera | 5 | 24 | 5 | 4.8 | 1.0 |
| Class within Arthropoda | |||||
| Branchiopoda | 22 | 815 | 90 | 9.1 | 4.1 |
| Branchiura | 2 | 25 | 5 | 5.0 | 2.5 |
| Insecta | 1636 | 28900 | 3146 | 9.2 | 1.9 |
| Malacostraca | 48 | 3807 | 449 | 8.5 | 9.4 |
| Order within Insecta | |||||
| Coleoptera | 504 | 3773 | 755 | 5.0 | 1.5 |
| Diptera | 352 | 12585 | 675 | 18.6 | 1.9 |
| Ephemeroptera | 116 | 2962 | 353 | 8.4 | 3.0 |
| Hemiptera | 70 | 781 | 113 | 6.9 | 1.6 |
| Hymenoptera | 1 | 7 | 1 | 7.0 | 1.0 |
| Lepidoptera | 5 | 79 | 8 | 9.9 | 1.6 |
| Megaloptera | 3 | 21 | 8 | 2.6 | 2.7 |
| Neuroptera | 4 | 18 | 6 | 3.0 | 1.5 |
| Odonata | 86 | 1670 | 155 | 10.8 | 1.8 |
| Plecoptera | 137 | 2105 | 300 | 7.0 | 2.2 |
| Trichoptera | 358 | 4899 | 772 | 6.3 | 2.2 |
The Fig.
Number of species associated with COI-5P genetic sequences that cannot be aligned with fwh2 primer pair at the phylum rank (a), within the Arthropoda phylum (b) and within the Insecta and Malacostraca class (c, c’).
Arthropoda emerged as the predominant taxonomic groups facing alignment difficulties at the phylum level. Within insects (Arthropoda), Coleoptera and Diptera had the species with the most alignment issues. Specifically, within Malacostraca (Arthropoda), the gammarid (Gammaridae) and crayfish (Astacidea) families had the highest number of species with alignment issues (Fig.
Taxonomic composition at the class rank for the species from the aligned DNA library. Each box at the phylum level represents the distribution of COI-5P haplotypes relative to the number of species.
In our reference library, for most phyla, the majority of species have less than five haplotypes, except cnidarians, which have the majority of their species with 5–25 haplotypes. This pattern of low haplotypes availability could be due to a lack of samples; therefore, too few taxa are sequenced to detect genetic variability. Only three species presented more than 100 unique barcodes: one trichopteran (Agraylea multipunctata), one isopod (Asellus aquaticus) and one gasteropod (Physella acuta). These findings are consistent with the observations of
Our study underscores a widespread absence of reference barcodes for numerous extant invertebrate species. Although barcoding offers advantages over morphological identification in biomonitoring, existing gaps in barcode libraries may hinder their effectiveness (
We would also like to thank Tristan Lefébure and Maylïs Gauthier from LEHNA (CNRS) for their insights into this study. We would also like to thank the reviewers for correcting and improving this study. The authors have declared that no competing interests exist.
The authors have declared that no competing interests exist.
No ethical statement was reported.
This study was made possible thanks to a funding of the Carnot Eau & Environnement institute, a funding of the INRAE department AQUA and a funding of the pôle INRAE-OFB ECLA (ECosystèmes LAcustres).
Conceptualization: PG, FR. Data curation: PG. Formal analysis: DE, PG. Investigation: DE, PG. Methodology: PG, DE. Supervision: ID, FR. Validation: ID, DE, FR. Visualization: ID, DE, FR. Writing - original draft: PG. Writing - review and editing: DE, ID, FR.
Paula Gauvin https://orcid.org/0000-0002-9268-1856
David Eme https://orcid.org/0000-0001-8790-0412
Frédéric Rimet https://orcid.org/0000-0002-5514-869X
All of the data that support the findings of this study are available in the main text or Supplementary Information.
Taxonomic harmonization of the French metropolitan freshwater macroinvertebrates’ species list
Data type: xlsx
Explanation note: This table lists the taxa requiring taxonomic harmonization, their original names from the Freshwater macroinvertebrates checklist and their harmonized names. Reasons of this harmonization are specified (synonymy, wrong spelling, subgenus).
Percentage of genetic sequences depending on the type of genes
Data type: pdf
Explanation note: Only the genes representing at least 0.1% of the sequences are shown. All the others genes are grouped in the “Other genes” bar. The numbers above each bar show the total number of sequences retrieved from BOLD affiliated to each gene.
Percentage of sequences depending on the countries
Data type: pdf
Explanation note: Only the countries representing at least 0.2% of the sequences are shown. All the others countries are grouped in the “Other countries (133)” bar. The numbers above each bar show the total number of sequences retrieved from BOLD affiliated to each country.
Number and names of species associated with BINs
Data type: xlsx
Explanation note: 267 BINS are shared by 410 species.
Taxonomic composition of species sharing the same COI haplotype after alignment with fwhF2/fwhR2N primers
Data type: xlsx
Explanation note: A total of 57 haplotypes shared among 116 species were identified.