Environmental DNA metabarcoding is a promising method for assaying fish diversity in cenotes of the Yucatán Peninsula, Mexico

The karst aquifer of the Yucatán Peninsula (YP) in southeastern Mexico is a unique ecosystem in which water-filled sinkholes, locally known as cenotes, connect subterranean waters with the surface. This system is home to around 20 species of freshwater fishes, including several that are endemic and/or threatened. Studies on this unique ichthyofauna have been partially hampered by the technical difficulties associated with sampling these habitats, particularly submerged caves. In this proof-of-concept study, we use environmental DNA (eDNA) metabarcoding to survey the diversity of freshwater fishes associated with the YP karst aquifer by sampling six cenotes from across the Ring of Cenotes region in northwestern Yucatán, a 180-km-diameter semicircular band of abundant karst sinkholes. Through a combination of conventional sampling (direct observation, fishing) and eDNA metabarcoding, we detected eight species of freshwater fishes across the six sampled cenotes. Overall, our eDNA metabarcoding approach was effective at detecting the presence of fishes from cenote water samples, including one of the two endemic cave-dwelling fish species restricted to the subterranean section of the aquifer. Although our study was focused on detecting fishes via eDNA, we also recov ered DNA from several other vertebrate groups, particularly bats. These results suggest that the eDNA metabarcoding approach represents a promising and largely noninvasive method to assay aquatic biodiversity in these vulnerable habitats, allowing more effective, frequent, and wide-ranging surveys. Our detection of DNA from aerial and terrestrial vertebrate fauna implies that eDNA from cenotes, besides being a means to survey aquatic fauna, may also offer an effective way to quickly survey non-aquatic biodi versity associated with these persistent water bodies.


Introduction
The Neotropical realm, including Central and South America, is home to most of the world's freshwater fish species diversity (Lévêque et al. 2008;Albert and Reis 2011), as well as functional diversity (Toussaint et al. 2016). However, information about the distribution patterns of this extraordinary diversity across freshwater ecosystems in these regions remains poorly documented, hampering efforts to manage and conserve important habitats. This problem is particularly acute for isolated aquatic systems that are both difficult to sample and of-ten harbor endemic species, such as water-filled sinkholes that form in karstic landscapes.
The Yucatán Peninsula (YP) in southeastern Mexico contains one of the largest karstic aquifers on the planet (Bauer-Gottwein et al. 2011), which is characterized by the presence of thousands of water-filled sinkholeslocally known as cenotes-that connect the surface to an extensive network of submerged caves, including the world's longest known underwater cave system, Sistema Sac Actun (Coke IV 2019). This remarkable system is home to unique communities of freshwater and anchialine aquatic fauna, including several endemic species.
However, biodiversity research in cenotes beyond alpha taxonomy studies (i.e., species descriptions, first records) is lagging, and only recently has research been conducted on broader ecological and evolutionary questions (Barrientos-Villalobos and Schmitter-Soto 2019; Benítez et al. 2019;Chávez-Solís et al. 2020;Arroyave et al. 2021;Macario-González et al. 2021). Documenting diversity in cenotes is also important due to ongoing anthropogenic threats to the YP karst aquifer, including pollution (from solid waste as well as from inadequate sewage treatment and wastewater disposal), increased groundwater extraction (for urban uses, tourism, and livestock uses), saline intrusion (Ceballos et al. 2016;Kane 2016;Deng et al. 2017;Saint-Loup et al. 2018), and more recently, the construction of the mega infrastructure work Tren Maya Railway Project, a 1,525 km-long railway line that will interconnect the major cities and tourist regions of the Yucatán Peninsula in Mexico, which, according to some critics, is undergoing construction without proper environmental impact studies (Abi-Habib 2022). Another recent development involves increased tourism pressure in cenotes driven by large deposits of sargassum along Yucatán beaches (thought to be climate-related), which has resulted in increased pollution in nearby cenotes as tourists avoid the seaweed-covered beaches (Casas-Beltrán et al. 2020).
Beyond some fragmentary information about composition and distribution, knowledge of the ichthyofauna from cenotes of the YP is very limited (Schmitter-Soto 1999). While a comprehensive taxonomic synthesis and compendium (checklist) of the fishes from cenotes and submerged caves of the YP is currently lacking, a review of relevant literature suggests that around 20 species freshwater fishes (between primary, secondary, and vicarious) inhabit this system, four of which are endemic (Schmitter-Soto 1999, 2007, 2020Schmitter-Soto et al. 2002;Miller 2005;Chumba Segura and Barrientos Medina 2010;Camargo-Guerra et al. 2013;Arroyave, 2022). Such relatively low species diversity is partially explained by the system's very young geologic age (<15 ka), which effectively postdates the Last Glacial Maximum (Macario-González et al. 2021). Notably, although cenotes and caves are effectively a continuum, most species of freshwater fishes that inhabit the YP karst aquifer appear to be restricted to the cenote pool, rarely venturing into the cavern or cave zones (Arroyave et al. 2021). Only the Mexican blind brotula, Typhlias pearsei (Ophidiiformes: Dinematichthyidae), and the blind swamp eel, Ophisternon infernale (Synbranchiformes: Synbranchidae), are known to permanently reside in the aphotic zone of the aquifer, being found only in either partially or completely submerged caves (Arroyave et al. 2019b(Arroyave et al. , 2019a.
Our understanding of fish communities associated with cenotes has been hindered by the challenges inherent in sampling these habitats, particularly submerged caves. Collecting fishes from cenotes can be labor-intensive and technically challenging, and it requires the use of a variety of fishing gear and techniques aimed at effectively sampling a spectrum of microhabitats, ecologies, and behaviors. Cave-dwelling species can only be sampled by means of highly technical and specialized cave-diving techniques coupled with superb collecting skills. In addition, these conventional sampling methods run the risk of disturbing or damaging these fragile ecosystems and potentially causing unintended mortality of captured individuals.
Environmental DNA (eDNA) has quickly become a widely utilized method for surveying aquatic diversity without the need for capturing animals or observing them directly Thomsen and Willerslev 2015;Keck et al. 2017). This method assesses DNA in the water column or sediment that has been shed via feces, sloughing, gametes or other means, in order to determine species presence and diversity in a noninvasive manner. The method is therefore particularly advantageous for sampling species that are difficult to collect by means of conventional methods, and for detecting the presence of exotic and invasive species (e.g. Darling and Mahon 2011;Sepulveda et al. 2020). Collecting water samples for eDNA analysis is straightforward and does not require specialized equipment, making it suitable for sampling in remote or difficult-to-access areas. Moreover, metabarcoding methods have been shown to be as sensitive, or better than, conventional survey techniques at capturing fish diversity across a variety of habitats including lakes and coral reefs, at lower overall cost (Darling and Mahon 2011;Dejean et al. 2012;Shaw et al. 2016;Sard et al. 2019). Another advantage of eDNA metabarcoding is that the same samples can potentially provide information across multiple groups of organisms using different primer sets (for example, primers designed specifically for microbes or fishes).
The goal of this study was to test the ability of eDNA metabarcoding to detect fish species diversity in cenotes of the Yucatán peninsula karst aquifer. Specifically, we sought to assess 1) whether a simplified method of filtration and preservation of filters could be used to successfully retrieve and preserve DNA from water samples in tropical field conditions, and 2) the extent to which eDNA results match species lists generated from conventional methods of survey and capture in the same locations.

Sampling sites, water collection and filtration
To detect eDNA, we collected water samples from six cenotes across the Ring of Cenotes region in northwestern Yucatán, a 180-km-diameter semicircular band of abundant karst sinkholes (Figs 1, 2). Sites were chosen to represent a variety of different cenote types (Hall 1936), from the open type with a large area exposed on the surface and extensive allochthonous input, to closed, cave-like cenotes, located within dry caves. We took water samples prior to any other activity in the water (e.g., scuba diving, snorkeling, fishing). At each site, a total of three liters of water was collected by submerging sterile polypropylene 1L Nalgene bottles just below the water's surface. Bottles were placed in sterile bags to prevent cross-sample contamination and placed in a cooler with ice until filtration (<12 hours from collection). We also recorded presence of fish species using conventional sampling methods both by direct observation (snorkeling, scuba diving) and by specimen extraction/fishing using minnow traps, gill nets, seine nets, hooks and lines, and dip nets while cave-diving. Conventional sampling (CS) methods were not employed as part of a standardized biodiversity survey but rather as part of field collections in the context of broader research on comparative phylogeography of cenote fishes (Arroyave et al. 2021).
To filter DNA from water samples, we used a Millipore filtration system and vacuum pump with 0.45um glass filters (Whatman). Where possible, we filtered a single 1L bottle through a single filter; some 1L samples required two filters due to higher turbidity. Filters were folded using sterile forceps with the exposed filtration surface on the inside and placed in sterile Ziploc bags containing 10-12 grams of desiccant (Multisorb, Sigma-Aldritch) and were kept at room temperature until DNA extraction at the end of the fieldwork (2-6 days following filtration).

DNA extraction, library preparation and sequencing
DNA extraction was performed in a dedicated hood that had been thoroughly sterilized, in a lab where no research had ever been performed on the expected species. All reagents, pipettors, and all plastic supplies were sterilized prior to use by autoclaving and UV treatment in a Crosslinker. Only filtered tips were used. We extracted total DNA from filters using Qiagen PowerWater DNA extraction kits, including an extraction blank. Each filter was extracted separately. Following extraction, DNA from the same sites was pooled together in single tubes. DNA quantity was assessed using a Qubit 3.0 fluorometer (High Sensitivity kit).
We amplified a ~110bp fragment of the 12S marker using Ecoprimers (Riaz et al. 2011). Primers were designed with Illumina adapter overhang nucleotide sequences (in italics): Forward: 5'-TCG TCG GCA GCG TCA GAT GTG TAT AAG AGA CAG ACT GGG ATT AGA TAC CCC -3'. Reverse: 5'-GTC TCG TGG GCT CGG AGA TGT GTA TAA GAG ACA GTA GAA CAG GCT CCT CTA G -3'. Reactions were performed in triplicate with each reaction containing 12.5 ul PCR master mix, 0.5ul each primer (at initial concentration 10uM), 7.5ul PCR-grade water, and 4ul purified DNA, and the following cycling conditions: 95 °C for 3 min, 35 cycles of 95 °C for 30 s, 52 °C for 30 s, 72 °C for  Negative controls were included in each PCR replicate, and extraction blanks were also included in PCRs alongside extracted DNAs. Amplicons were visualized on an agarose gel and successful amplifications were cleaned using Ampure XP beads and prepared for sequencing on an Illumina MiSeq (V2 kit) using the AmpliconEZ protocol (Genewiz, Inc, Plainview NJ). No amplifications were detected for any negative controls and therefore these reactions were not sequenced. and underwater view from the cavern zone (c)), cenote X'baba (open type) (surface view (d) and underwater view from the cavern zone (e)), cenote Polaban (cavern type) (view from the outside (f) and showing stairway into the water below (g)), cenote Santa María (well type) (h), cenote Xel Aktun (open type) (i), and cenote Ebis (well type) (showing ladder into the water below (j)).

Generation of reference sequence data for 12S
We compiled a list of freshwater fish species with documented presence in cenotes and submerged caves of the northwestern YP and therefore expected to potentially occur in the sampled localities (Table 1). Evidence of species occurrence was primarily based on vouchered specimens collected in the system and region by JA since 2017-including those sampled during the fieldwork component of this study (Table 2)-but also based on reports in the literature (Hubbs 1936;Schmitter-Soto 1998, 2020Chumba Segura and Barrientos Medina 2010;Camargo-Guerra et al. 2013;Vega-Cendejas et al. 2013). After compiling this species list, we searched for 12S sequence data from those species in the GenBank/NCBI nucleotide database (https://www.ncbi.nlm.nih.gov/nuccore). Additional 12S sequences were generated as part of this study from available tissue samples (Table 1), in a different lab facility and institution than those where environmental DNA samples were processed. Total genomic DNA was extracted from fin clips using the Qiagen DNeasy Tissue Extraction Kit following the manufacturer's protocol. Amplification and sequencing of 12S was carried out using the primer pairs 12S229F (5′-GYCGGTAAAAYTCGTGCCAG-3′) and 12S954R (5′-YCCAAGYGCACCTTCCGGTA-3′) (Li and Ortí 2007), using the following PCR thermal profile: initial denaturation at 95 °C for 3 mins, followed by 32 cycles of denaturation at 94 °C for 30 s, annealing at 57 °C for 30 s, and extension at 72 °C for 60 s, followed by a 5-min final extension at 72 °C. DNA sequencing was carried out at Laboratorio de Secuenciación Genómica de la Biodiversidad y de la Salud (Instituto de Biología, UNAM), in-house Sanger sequencing facilities. Contig assemblage and sequence editing was performed using Geneious Prime 2020.0.4 (https://www.geneious.com).
With the exception of Astyanax altior, we were able to retrieve and/or generate 12S data from all freshwater fish species with documented presence in the study region/ system (i.e., northwestern section of the YP karst aquifer). 12S reference sequence data from Astyanax aeneus (BK013055) was used in lieu of A. altior for the purpose of detecting the latter in cenotes from our eDNA data due to their phylogenetic affinity, as A. altior was described from a YP lineage of A. aeneus sensu lato (Ornelas-García et al. 2008;Schmitter-Soto 2017). Our augmented 12S reference dataset includes 14 newly generated sequences, including eight from species not previously represented in GenBank for this marker (Table 1).

Bioinformatics and data analysis
Following sequencing, we trimmed, filtered and demultiplexed data using the OBITools pipeline Valentini et al. 2016). Briefly, after trimming adapters, we filtered out sequences <20bp and those with fewer than 10 occurrences in the dataset. OBITools program obiuniq was used for clustering identical samples. Additional filtering removed sequences with a frequency of <0.002 per taxon and library and obvious contaminants (including human and ungulate DNA). We first assigned taxonomy using OBITools program ecotag with NCBI Reference Sequence Release 223, in order to identify Table 1. List of freshwater (primary, secondary, and vicarious) fish species with documented presence in cenotes and submerged caves of the northwestern YP and therefore expected to potentially occur in the sampled cenotes, including GenBank accession numbers of 12S sequences used and generated in this study and their corresponding sources and voucher specimens when available. *Sequence from A. aeneus in lieu of A. altior. Molecular Operational Taxonomic Units (MOTUs) to species (99-100% identity), genus (90-98% identity) or family (80-90% identity). For teleost fish sequences that could not be identified to species using NCBI as a reference, we compared them with 12S sequences generated as part of this project (see above) by aligning them in Geneious Prime and assessing percent identity across the length of the 12S sequence and assigning them to species or genus based on the same thresholds shown above.

Overall diversity of fishes and matches with reference databases
The number of raw reads, reads after quality filtering, number of OTUs, and OTUs assigned to species (inclusive of non-fishes identified) resulting from the initial data processing are presented in Suppl. material 1. The complete results from the ecotag analysis, including the initially assigned taxonomy, are presented as Suppl. material 2. A comparison of species detected via conventional sampling (CS) and indirect detection via eDNA metabarcoding (eDNA) shows both overlap and some discrepancies (Table 2). Overall, all cenotes yielded fish DNA, and initial DNA concentrations and sequencing read counts averaged 3.1 ng/ul and 69,913 per site, respectively (Suppl. material 1). Because NCBI/ GenBank did not contain 12S sequences for several fish species with documented presence in the YP aquifer, we aligned unknown sequences with new 12S sequences generated as part of this study, allowing us to confirm species identity for some sequences that had imperfect or no matches with available GenBank data (Table 2). Of the fish species detected via eDNA metabarcoding, all are known to occur in cenotes of the YP. However, only a small fraction of the known freshwater fish diversity from the system/region was detected by either or both sampling methods (i.e., CS, eDNA). Of the 20 native fish species that have been documented in cenotes in the Yucatan (Arroyave 2022), only eight were detected across the sampled cenotes (  Table 2). The number of fish species detected at each individual cenote ranged from two (Ebis, Polabán) to seven (Xel Aktun) when considering both CS and eDNA approaches.
Though our study was focused on detecting fishes via eDNA, as a result of using vertebrate-specific primers for amplification of 12S we also recovered DNA from several other vertebrate groups (mammals, birds, and amphibians), particularly bats (with at least eight species) (Table 2, Suppl. material 1). Bat DNA was detected in four of the six sampled cenotes and represented the most abundant (with regard to read number) non-human vertebrate DNA at two of the six cenotes (Ebis (100%) and Santa Maria (96.6%)). Notably, most non-fish taxa detected in our eDNA survey are non-aquatic vertebrates ( Table 2).

Discussion
As a pilot, proof-of-concept study, this investigation resulted in the successful preservation, extraction and sequencing of eDNA from cenotes of the YP for the detection of fish diversity present in these habitats. Collection and preservation occurred in hot and humid field conditions, possibly resulting in low DNA yields for some samples (Suppl. material 1). Other possible causes of low yield include poor DNA preservation or low productivity/densities of populations. Despite these challenges, our approach proved useful for detecting fish and other vertebrates associated with this karst aquifer. Although the number of fish species detected at each cenote (via both CS and eDNA approaches) may appear low (2-7 species), this pattern of local fish diversity is not necessarily uncommon for cenotes of the YP. Whereas regional diversity estimates are moderate (~20 spp.), local diversity at the individual cenote level tends to be much lower (< 10 spp.) (Arroyave, pers. obs.). The reasons for this, however, are yet to be understood, as patterns of cenote fish community assemblage have not been systematically investigated.
Notably, eDNA was successful at detecting most fish species sampled via CS methods, with the blind swamp eel, O. infernale, being the only exception ( Table 2). The inability of our eDNA approach to detect O. infernale in the sampled cenotes could be related to the fact that this troglobitic species, contrary to the other cave-dwelling fish inhabiting the system (T. pearsei), is extremely rare, cryptic and benthic in nature, and reportedly with very small population sizes (Arroyave et al. 2019a), thus likely to shed less DNA in the water column when compared to pelagic and more abundant species. We investigated the possibility that mismatches between the 12S primers and sequence in O. infernale were driving our inability to detect the species via eDNA, but we found no such mismatches using a recently published complete mitochondrial genome for the species ). The relatively consistent detection via eDNA of the cavefish T. pearsei (Table 2) is particularly promising and demonstrates that water samples taken at the cenote surface and in the photic zone (particularly the case of cenote Xel Aktun; Ebis and Santa Maria being more cavernous cenotes) may be useful for detecting DNA from obligate cave-dwelling species. Overall, our results offer validation to the eDNA approach as a promising and effective method to assay aquatic biodiversity in cenotes and their associated submerged caves, permitting more frequent, and wide-ranging surveys.
Our detection of DNA from aerial and terrestrial vertebrate fauna such as bats, birds, marsupials, and canids, indicates that eDNA from water in cenotes, besides being a means to survey aquatic fauna, may also offer a useful method for quickly surveying diversity across cave sites with persistent water bodies, particularly in the case of bats, which reside in caves overhanging karstic pools. Our study detected a surprising diversity of bats (including six genera) from water samples and suggests that future work to establish eDNA as a method for assaying bat diversity could be worthwhile, given the difficulties and potential hazards involved in sampling bats directly. Likewise, the identification of additional species of mammals, birds and amphibians using the cenotes may facilitate a more complete picture of their role in local ecosystems. Our study adds to a growing number of metabarcoding studies that have documented the detection of vertebrate "bycatch", offering insights into non-fish vertebrate communities associated with aquatic habitats, and highlighting the potential of aquatic eDNA samples to characterize non-aquatic communities (Macher et al. 2021;Mariani et al. 2021;Duarte Ritter et al. 2022). Table 2. Taxonomic identifications from cenotes studied by conventional sampling (CS) (i.e., direct observation and/or fishing) and indirect detection via eDNA metabarcoding (eDNA). ND = not detected. Taxonomic identifications based on eDNA show the percent identity followed by the GenBank accession number for the best match on NCBI, including reference sequences generated as part of this study. *100% match with A. aeneus, A. mexicanus. A. altior sequence unavailable. **Match to multiple species within the genus; where one species is listed, others were eliminated based on known range. Because cenotes and the YP karst aquifer as a whole are subject to increasing anthropogenic activities, including pollution, water withdrawal, saline intrusion, and in some areas, foot traffic and infrastructure development (Lopez-Maldonado and Berkes 2017), frequent monitoring of these systems is critical for tracking the response of biological communities to these impacts. Several species of the fishes from cenotes are already threatened or near-threatened according to IUCN extinction risk assessments (Arroyave et al. 2019a, 2019b. Endemic species are particularly vulnerable to cenote degradation, given the lack of suitable alternative habitats. Because eDNA monitoring can be employed with greater regularity and across more sampling sites than conventional sampling, it is an excellent complementary method for gauging changes in community composition and diversity (Thomsen et al. 2012;Beng and Corlett 2020). Surveying for rare endemic species can be challenging using conventional methods but use of both eDNA metabarcoding as well as speciesspecific quantitative PCR for elusive endemic species such as O. infernale could yield important information about the distribution and population trajectories of such taxa. In addition, because eDNA surveys can be deployed quickly, they have also proven useful for early detections of nonnative or invasive species (Takahara et al. 2013;Mahon and Jerde 2016). Despite both anecdotal and documented accounts of introduced and invasive genera of freshwater fishes such as Oreochromis spp. (African tilapias) and Pterygoplichthys spp. (South American armored catfishes) in YP cenotes, we did not detect the presence of those or any other non-native fish species in our study.
Several previous studies have conducted explicitly quantitative assessments and comparisons of the performance of eDNA vs. CS methods in detecting species (e.g., McElroy et al. 2020;Czeglédi et al. 2021;Fediajevaite et al. 2021), based on approaches such as Lin's concordance correlation coefficient (CCC) (Lawrence and Lin 1989) or Bland-Altman analyses (Bland and Altman 1986) as a way to calibrate eDNA metabarcoding to conventional surveys. While we had insufficient data to conduct such comparative analyses in our study due to the small number of sites and the lack of standardized CS methods, examination of the relative contributions of each mechanism of detection (eDNA only, CS only, a combination of both) in our results (Table 2), suggests that eDNA outperforms CS in the detection of fishes from cenotes. Experimental designs based on standardized sampling methods should be considered in future studies for meaningful comparisons between eDNA and conventional surveys.
Collection of eDNA samples and data may directly aid conservation and enforcement efforts in regions that show impacts of anthropogenic activities on fishes and other species. While we focused on fishes in this study, one of the great advantages of this method is that the same samples can be used to evaluate species diversity across a breadth of taxa, including bacteria, zooplank-ton, and invertebrates. In consideration of our findings, we recommend that future research and applied work (by conservation and biodiversity agencies, for instance) focus on the implementation of eDNA as a tool to supplement biomonitoring, and to develop a consistent and standardized plan for sample collection and analyses in these ecosystems. Such efforts should include consideration of practical standards for field and laboratory procedures (McGee et al. 2019;Weigand et al. 2019) including decisions related to substrate (water, sediment, biofilms, etc.), filtration protocols, marker selection, protocols for sample storage and prevention of cross-sample contamination, and data analysis and storage.

Conclusions and future work
This proof-of-concept study demonstrates that eDNA metabarcoding is an effective and promising approach for assaying fish diversity in karstic aquifers and similar freshwater habitats. Furthermore, this study lays the groundwork for more comprehensive studies aimed at investigating patterns of distribution and abundance in fishes from cenotes and submerged caves. Additional sampling across seasons and more sites/replicates, as well as the use of alternative metabarcoding primers (e.g., MiFish (Miya et al. 2015)), will yield a richer picture of patterns of diversity in these ecosystems. Sampling different habitats within cenotes, for example from submerged caves, is also likely to be important to detecting cave endemics. As the potential for assessing patterns of population structure and gene flow within species using eDNA has emerged recently (Sigsgaard et al. 2016), this raises the possibility of using this method to investigate long-standing questions about connectivity between cenotes, a subject that has only recently begun to be addressed using comparative genetic data (Arroyave et al. 2021). While eDNA analysis is not a replacement for standard monitoring, our results indicate that it has great potential to contribute to our understanding of biological communities in cenotes and their response to changing anthropogenic pressures. grant (PAPIIT IA200517) and by the Consejo Nacional de Ciencia y Tecnología (CONACyT) through a "Ciencia Básica" grant (A1-S-28293).
Reference barcodes generated for 12S as part of this study have been deposited in the NCBI database (Accession #s ON364117-ON364132). Illumina (metabarcoding) files have been deposited in the NCBI Short Read Archive (BioProject ID PRJNA874083).