Data Paper
Data Paper
The mitochondrial genomes of 11 aquatic macroinvertebrate species from Cyprus
expand article infoJan-Niklas Macher, Katerina Drakou§, Athina Papatheodoulou§, Berry van der Hoorn, Marlen Vasquez§
‡ Naturalis Biodiversity Center, Leiden, Netherlands
§ Cyprus University of Technology, Limassol, Cyprus
Open Access


Aquatic macroinvertebrates are often identified, based on morphology, but molecular approaches like DNA barcoding, metabarcoding and metagenomics are increasingly used for species identification. These approaches require the availability of DNA references deposited in public databases. Here we report the mitochondrial genomes of 11 aquatic macroinvertebrate species from Cyprus, a European Union island country in the Mediterranean. Only three species could be provisionally assigned to a binomial species name, highlighting the current lack of molecular references for aquatic macroinvertebrates from Cyprus.

Graphical Abstract

Key Words

freshwater biodiversity, Mediterranean, insects, Biodiversity Hotspot


Aquatic macroinvertebrates are commonly used for the assessment of ecosystem quality (Hering et al. 2006; Jackson et al. 2016; Morse et al. 2007). Taxa are often identified based on morphology, but the advent of DNA sequencing technologies has led to a paradigm shift, as techniques like DNA barcoding (Hebert et al. 2003), metabarcoding (Taberlet et al. 2012) and metagenomic approaches (Crampton-Platt et al. 2016; Ji et al. 2019; Macher et al. 2018) allow the identification and study of large numbers of macroinvertebrate specimens, species and communities in a short amount of time and at reasonable costs. DNA-based identification of macroinvertebrates relies on the comparison of an obtained sequence (often the mitochondrial cytochrome c oxidase I gene) to reference sequences that are deposited in databases like the Barcode of Life Database (BOLD, Ratnasingham and Hebert 2007) or NCBI GenBank (Benson et al. 2017). Here we report the full mitochondrial genomes of 11 aquatic macroinvertebrate species from Cyprus, a European Union island country in the Mediterranean Biodiversity Hotspot (Mittermeier et al. 1999). The invertebrate fauna of Cyprus has been studied morphologically (e.g. Georghiou 1977; Malicky 1976; Soldán and Godunko 2008; Villastrigo et al. 2017); however, the Barcode of Life reference database contained fewer than 50 DNA barcodes of Cyprus freshwater invertebrates as of May 2020. We sequenced and assembled the mitochondrial genomes of two Baetidae and one Caenidae (Ephemeroptera), one Chironomidae, one Dixidae and one Simuliidae (Diptera), one Hydrachnidae (Acari), one Gomphidae and one Euphaeidae (Odonata), one Gyrinidae (Coleoptera) and one Erpobdellidae (Hirudinea) species. In addition, we report the nuclear 18S and 28S ribosomal RNA gene sequences, which are commonly used for phylogenetic studies (Whiting et al.1997, Gao et al. 2008, Hiruta et al. 2016).

Material and methods

Specimens were collected from a perennial stream in Cyprus (coordinates: 34.769801N, 32.911568E, see Fig. 1) on 14-08-2018 using a surber sampler (500-µm mesh bag). Specimens were identified at the family level, stored in 96% ethanol and shipped to the Naturalis Biodiversity Center, Leiden, Netherlands.

Figure 1.

A) Location of the sampling site in Cyprus B) Photo of the sampling site.

In the laboratory, specimens were carefully rinsed with ultrapure distilled water to remove external adhesions. DNA was extracted from the whole body as follows. The specimens were cut with scissors and crushed with spatulas in 2-ml Eppendorf tubes. All equipment was sterile. DNA was extracted using the Macherey-Nagel (Düren, Germany) NucleoSpin tissue kit on the KingFisher (Waltham, USA) robotic platform, following the manufacturer’s protocol. A negative control containing ultrapure water was processed together with the tissue samples and was checked for contamination during all of the following steps. Extracted DNA was stored at -20 °C overnight until further processing. Quantity and size of the extracted DNA was checked on the QIAxcel platform (Qiagen, Hilden, Germany). DNA (15 ng) per sample was used for further processing. The New England Biolabs (Ipswitch, USA) NEBNext kit and oligos were used for library preparation, following the manufacturer’s protocol and selecting for an average DNA fragment length of 300 bp. Fragment length and quantity of DNA in samples was checked on the Agilent (Santa Clara, USA) Bioanalyzer. Equimolar samples were pooled before sending for sequencing. The negative control, which did not show any signal of DNA present, was added to the final library with 10% of the total library volume. The final library was sent to BGI (Shenzhen, China) for sequencing on the NovaSeq 6000 platform (2× 150bp).

Raw reads were quality checked using MultiQC (Ewels et al. 2016). Fastp (Chen et al. 2018) was used for trimming of Illumina adapters and removal of reads shorter than 100 bp. Reads were assembled using megahit (Li et al. 2015) with a maximum k-mer length of 69 and default parameters. SPAdes (Bankevich et al. 2012) was used for a separate assembly to assess consistency of assembly results (all scripts: Suppl. material 1). Contigs were BLAST-searched against the NCBI GenBank nucleotide database (Benson et al. 2017). Identified putative mitochondrial genomes were annotated using the MITOS (Bernt et al. 2013) web server. Reading frames and annotations were manually checked and corrected using Geneious Prime (v. 2020.1), which was subsequently used to submit the annotated mitochondrial genomes to NCBI GenBank. Gene order and direction were compared to the available mitochondrial genomes of related species downloaded from NCBI GenBank. The obtained cytochrome c oxidase I (COI) genes were compared with sequences deposited in the NCBI GenBank and BOLD databases to check for availability of putative species-level barcodes. A match of 98% identity or higher was used as threshold for assigning a provisionally species name to a sequence (Alberdi et al. 2018). Occurrence data were compared to literature and to data available in the Global Biodiversity Information Facility ( The nuclear 18S and 28S rRNAs were identified by BLAST-searching the assembled contigs against the NCBI database. Annotation was conducted using the Geneious annotation tool with a minimum similarity of 80%, followed by manual refinement by alignment and comparison to 18S and 28S sequences of related species available in NCBI GenBank.


Sequencing on the NovaSeq platform resulted in 43,369,218 reads (reads per sample: Suppl. material 2: Table S1). The negative control contained 952 reads (0.002% of total reads). Illumina platforms are known to produce tag switching (Owens et al. 2018; Schnell et al. 2015) and a small proportion of reads is commonly found in negative controls. As the number of reads observed in our negative control was low, we did not suspect contamination. We obtained the full mitochondrial genome for all 11 macroinvertebrate species. Coverage varied from 18-fold (Euphaeidae_Cy2020_sp. 1, Odonata) to 449-fold (Caenidae_Cy2020_sp. 1, Ephemeroptera), with an average coverage of 135-fold (see Suppl. material 2: Table S2 for coverage and assembly length). All mitochondrial genomes contained 13 protein coding genes, 22 tRNAs, the large and small subunit of the ribosomal RNA and an AT-rich control regions. Megahit and SPAdes assemblies resulted in identical mitochondrial contigs, with the expected exception being length variations in the highly repetitive and AT-rich regions, which are difficult to assemble with short reads. Gene order and direction of all mitochondrial genomes was identical to that of related taxa available in GenBank. The 18S and 28S rRNAs were recovered for all sequenced species (information on assembly length and best hits in NCBI GenBank: see Suppl. material 2: Table S3).

Of the 11 sequenced species from Cyprus, six could be provisionally identified to the species level by comparing sequences to reference databases and using a 98% identity threshold of the COI Folmer region (Folmer et al. 1994). However, only three of the matching references in the BOLD database have a full binomial name, while the other three have preliminary names, pending formal taxonomic identification or description (see Table 1). The remaining five species from Cyprus could not be molecularly identified on species level, meaning that they either represent new, yet undescribed species or species that have been described, but for which molecular barcodes are not available. We point out that even though a 98% identity threshold is often used for the identification of macroinvertebrate species based on the COI barcoding fragment, this threshold can be misleading as it is not always possible to resolve species based on the mitochondrial COI marker (Darschnik et al. 2019). Further, closely-related species can show a lower genetic divergence in this marker, as previously shown for island endemics (Gattoliat et al. 2018). We therefore stress that the molecular identification of species should be regarded as preliminary. Further research, including a higher number of sequenced specimens and in collaboration with taxonomic experts, is needed to securely identify species and deposit reference sequences with binomial names in databases.

Table 1.

BLAST results and NCBI accession numbers for the 11 assembled mitochondrial genomes. Identities of > 98% in the COI Folmer region were recorded as provisional species level match (highlighted in bold). Accession numbers for nuclear 18S and 28S rRNA are provided in the last column.

Reference database match (% identity) Reference BOLD ID & collection location NCBI accession number of mitogenome NCBI accession number of 18S and 28S rRNA
Baetidae_Cy2020_sp. 1 Baetis sp. 2 ZY-2019 (99.24%) MH827966, Israel (Yanai et al. 2018) MT671494 18S: MT921242 28S: MT921241
Baetidae_Cy2020_sp. 2 Baetis vardarensis Ikonomov, 1962 (85.32%) MT671488 18S: MT921243 28S: MT921240
Caenidae_Cy2020_sp. 1 Caenis (macrura) sp. MAA05 (98.31%) BMIKU-0029, Iraq MT671487 18S: MT921244 28S: MT921239
Chironomidae_Cy2020_sp. 1 Chironomidae sp. (95.37%) MT671495 18S: MT921245 28S: MT921238
Simuliidae_Cy2020_sp. 1 Simulium petricolum (Rivosecchi, 1963) (98.93%) GQ465950, United Kingdom (Day et al. 2010) MT671497 18S: MT921252 28S: MT921231
Dixidae_Cy2020_sp. 1 Dixidae sp. (99.23%) FiDip134, Norway MT671496 18S: MT921246 28S: MT921237
Euphaeidae_Cy2020_sp. 1 Epallage fatime Charpentier, 1840 (95.72%) MT671490 18S: MT921248 28S: MT921235
Gomphidae_Cy2020_sp. 1 Onychogomphus forcipatus Linnaeus, 1758 (99.85%) ZFMK-TIS-2534021, Germany MT671493 18S: MT921249 28S: MT921234
Hydrachnidae_Cy2020_sp. 1 Mideopsis roztoczensis Biesiadka & Kowalik, 1979 (99.1%) JN018102, Norway MT671492 18S: MT921251 28S: MT921232
Gyrinidae_Cy2020_sp. 1 Gyrinus sp. (94.78%) MT671491 18S: MT921250 28S: MT921233
Erpobdellidae_Cy2020_sp. 1 Dina sp. (93.94%) MT671489 18S: MT921247 28S: MT921236

Of the six species from Cyprus that could be provisionally identified to the species level using molecular reference databases, ‘Baetidae_Cy2020_sp. 1’ had the best match to ‘Baetis sp. 2 ZY-2019’, a species previously identified in Israel (Yanai et al. 2018). The mayfly ‘Caenidae_Cy2020_sp. 1’ had the closest match to ‘Caenis (macrura) sp. MAA05’, which has been reported from streams in Northern Iraq (only published in BOLD, ID: BMIKU-0029). The dipteran ‘Simuliidae_Cy2020_sp. 1’ had the best match to Simulium petricolum (Rivosecchi, 1963), which was previously reported from areas around the Mediterranean, but also from Britain (Day et al. 2010). The dipteran ‘Dixidae_Cy2020_sp. 1’ had the best match to a Dixidae species with BOLD ID ‘FiDip134’, which was previously found in Norway. The dragonfly ‘Gomphidae_Cy2020_sp. 1’ had the best match to Onychogomphus forcipatus Linnaeus, 1758, which is known from a wide range across Europe, including the Mediterraenan areas (Onychogomphus forcipatus GBIF dataset:, accessed via on 20-06-2020). The water mite ‘Hydrachnidae_Cy2020_sp. 1’ had the closest match to Mideopsis roztoczensis Biesiadka & Kowalik, 1979, which is known from areas in central Europe as well as from the areas around the Mediterranean (Mideopsis roztoczensis GBIF data:, accessed via on 2020-06-20).

Our results highlight the need for further work to fill the current gaps in molecular reference databases containing aquatic macroinvertebrates (Porter and Hajibabaei 2018; Weigand et al. 2019) and taxonomic expertise should be utilised to morphologically identify and name voucher specimens. Shotgun sequencing can be used for rapid recovery of mitochondrial genomes and nuclear markers and adding new data to reference databases can help with species identification and phylogenetic analyses in future studies.

Data availability

All raw data have been deposited in the NCBI Short Read Archive (SRA), Bioproject number PRJNA641878. All mitogenomes have been deposited in NCBI GenBank; Accession numbers: MT671487MT671497. All 18S and 28S rRNA sequences have been deposited in NCBI GenBank; Accession numbers: 18S: MT921242MT921252; 28S: MT921231MT921241. All DNA extracts are stored in the Naturalis Biodiversity Center collection, Leiden, Netherlands.


We thank Elza Duijm, Frank Stokvis, Marcel Eurlings and Roland Butôt for support in the lab.

This article is based upon work from COST Action DNAqua‐Net (CA15219), supported by the COST (European Cooperation in Science and Technology) programme.


  • Alberdi A, Aizpurua O, Gilbert MTP, Bohmann K (2018) Scrutinizing key steps for reliable metabarcoding of environmental samples. Methods in Ecology and Evolution 9(1): 134–147.
  • Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. Journal of Computational Biology: A Journal of Computational Molecular Cell Biology 19(5): 455–477.
  • Bernt M, Donath A, Jühling F, Externbrink F, Florentz C, Fritzsch G, Pütz J, Middendorf M, Stadler PF (2013) MITOS: improved de novo metazoan mitochondrial genome annotation. Molecular Phylogenetics and Evolution 69(2): 313–319.
  • Darschnik S, Leese F, Weiss M, Weigand H (2019) When barcoding fails: development of diagnostic nuclear markers for the sibling caddisfly species Sericostoma personatum (Spence in Kirby & Spence, 1826) and Sericostoma flavicorne Schneider, 1845. ZooKeys 872: 57–68.
  • Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R (1994) DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Molecular marine biology and biotechnology 3(5): 294–299.
  • Gattoliat JL, Rutschmann S, Monaghan MT, Sartori M (2018) From molecular hypotheses to valid species: description of three endemic species of Baetis (Ephemeroptera: Baetidae) from the Canary Islands. Arthropod Systematics & Phylogeny 76(3): 509–528.
  • Gao Y, Bu Y, Luan YX (2008) Phylogenetic relationships of basal hexapods reconstructed from nearly complete 18S and 28S rRNA gene sequences. Zoological Science 25(11): 1139–1145.
  • Georghiou GP (1977) The Insects and Mites of Cyprus: with emphasis on species of economic importance to agriculture, forestry, man, and domestic animals. Benaki Phytopathological Institute Kiphissia, Athens, 347 pp.
  • Hebert PDN, Cywinska A, Ball SL, de Waard JR (2003) Biological identifications through DNA barcodes. Proceedings of the Royal Society of London Series B: Biological Sciences 270(1512): 313–321.
  • Hering D, Johnson RK, Kramm S, Schmutz S, Szoszkiewicz K, Verdonschot PFM (2006) Assessment of European streams with diatoms, macrophytes, macroinvertebrates and fish: a comparative metric-based analysis of organism response to stress. Freshwater Biology 51(9) 1757–1785.
  • Hiruta SF, Kobayashi N, Katoh T, Kajihara H (2016) Molecular phylogeny of cypridoid freshwater Ostracods (Crustacea: Ostracoda), inferred from 18S and 28S rDNA sequences. Zoological Science 33(2): 179–185.
  • Jackson MC, Weyl OLF, Altermatt F, Durance I, Friberg N, Dumbrell AJ, Piggott JJ, Tiegs SD, Tockner K, Krug CB, Leadley PW, Woodward G (2016) Recommendations for the next generation of global freshwater biological monitoring tools. Advances in Ecological Research: 615–636.
  • Ji Y, Huotari T, Roslin T, Schmidt NM, Wang J, Yu DW, Ovaskainen O (2019) SPIKEPIPE: A metagenomic pipeline for the accurate quantification of eukaryotic species occurrences and intraspecific abundance change using DNA barcodes or mitogenomes. Molecular Ecology Resources 20(1): 256–67.
  • Li D, Liu CM, Luo R, Sadakane K, Lam TW (2015) MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics 31(10): 1674–1676.
  • Macher J, Zizka VMA, Weigand AM, Leese F (2018) A simple centrifugation protocol for metagenomic studies increases mitochondrial DNA yield by two orders of magnitude. Methods in Ecology and Evolution 9(4): 1070–1074.
  • Malicky H (1976) A progress report on studies on Trichoptera of the eastern Mediterranean Islands. Proceedings of the First International Symposium on Trichoptera 1976: 71–76 Springer, Dordrecht.
  • Mittermeier RA, Myers N, Mittermeier CG, Robles Gil P (1999) Hotspots: Earth’s biologically richest and most endangered terrestrial ecoregions. CEMEX, SA, Agrupación Sierra Madre, SC, 430 pp.
  • Owens GL, Todesco M, Drummond EBM, Yeaman S, Rieseberg LH (2018) A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X. Molecular Ecology Resources 18.
  • Schnell IB, Bohmann K, Gilbert MTP (2015) Tag jumps illuminated--reducing sequence-to-sample misidentifications in metabarcoding studies. Molecular Ecology Resources 15(6): 1289–1303.
  • Villastrigo A, Ribera I, Bolton DT, Velasco J, Millán A (2017) An updated list of the water beetles and bugs of Cyprus. Latissimus 40: 9–17.
  • Weigand H, Beermann AJ, Čiampor F, Costa FO, Csabai Z, Duarte S, Geiger MF, Grabowski M, Rimet F, Rulik B, Strand M, Szucsich N, Weigand AM, Willassen E, Wyler SA, Bouchez A, Borja A, Čiamporová-Zaťovičová Z, Ferreira S, Dijkstra KD, Eisendle U, Freyhof J, Gadawski P, Graf W, Haegerbaeumer A, van der Hoorn BB, Japoshvili B, Keresztes L, Keskin E, Leese F, Macher J, Mamos T, Paz G, Pešić V, Pfannkuchen DM, Pfannkuchen MA, Price BW, Rinkevich B, Teixeira MAL, Várbíró G, Ekrem T (2019) DNA barcode reference libraries for the monitoring of aquatic biota in Europe: Gap-analysis and recommendations for future work. The Science of the Total Environment 678.
  • Whiting MF, Carpenter JC, Wheeler QD, Wheeler WC (1997) The Strepsiptera problem: phylogeny of the holometabolous insect orders inferred from 18S and 28S ribosomal DNA sequences and morphology. Systematic Biology 46(1): 1–68.

Supplementary materials

Supplementary material 1 

Scripts used for Megahit and Spades assemly of mitochornial genomes and nuclear 18S and 28S rRNAs

Jan-Niklas Macher, Katerina Drakou, Athina Papatheodoulou, Berry van der Hoorn, Marlen Vasquez

Data type: Scripts used for assembly

This dataset is made available under the Open Database License ( The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited.
Download file (12.85 kb)
Supplementary material 2 

Supplementary tables showing reads numbers, coverage and length of mitochondrial genomes, and length and blast results of 18S and 28S rRNAs

Jan-Niklas Macher, Katerina Drakou, Athina Papatheodoulou, Berry van der Hoorn, Marlen Vasquez

Data type: Tables

This dataset is made available under the Open Database License ( The Open Database License (ODbL) is a license agreement intended to allow users to freely share, modify, and use this Dataset while maintaining this same freedom for others, provided that the original source and author(s) are credited.
Download file (11.83 kb)
login to comment