Data Paper |
Corresponding author: Eugenia Naro-Maciel ( enmaciel@nyu.edu ) Academic editor: Florian Leese
© 2022 Eugenia Naro-Maciel, Melissa R. Ingala, Irena E. Werner, Brendan N. Reid, Allison M. Fitzgerald.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Naro-Maciel E, Ingala MR, Werner IE, Reid BN, Fitzgerald AM (2022) COI amplicon sequence data of environmental DNA collected from the Bronx River Estuary, New York City. Metabarcoding and Metagenomics 6: e80139. https://doi.org/10.3897/mbmg.6.80139
|
In this data paper, we describe environmental DNA (eDNA) cytochrome c oxidase (COI) amplicon sequence data from New York City’s Bronx River Estuary. As urban systems continue to expand, describing and monitoring their biodiversity is increasingly important for sustainability. Once polluted and overexploited, New York City’s Bronx River Estuary is undergoing revitalization and restoration. To investigate and characterize the area’s diversity, we collected and sequenced river sediment and surface water samples from Hunts Point Riverside and Soundview Parks (ntotal = 48; nsediment = 25; nwater = 23). COI analysis using universal primers mlCOIintF and jgHCO2198 detected 27,328 Amplicon Sequence Variants (ASVs) from 7,653,541 sequences, and rarefaction curves reached asymptotes indicating sufficient sampling depth. Of these, eukaryotes represented 9,841ASVs from 3,562,254 sequences. At the study sites over the sampling period, community composition varied by substrate (river sediment versus surface water) and with water temperature, but not pH. The three most common phyla were Bacillariophyta (diatoms), Annelida (segmented worms), and Ochrophyta (e.g. brown and golden algae). Of the eukaryotic ASVs, we identified 614 (6.2%) to species level, including several dinoflagellates linked to Harmful Algal Blooms such as Heterocapsa spp., as well as the invasive amphipod Grandidierella japonica. The analysis detected common bivalves including blue (Mytilus edulis) and ribbed (Geukensia demissa) mussels, as well as soft-shell clams (Mya arenaria), in addition to Eastern oysters (Crassostrea virginica) that are being reintroduced to the area. Fish species undergoing restoration such as river herring (Alosa pseudoharengus, A. aestivalis) failed to be identified, although relatively common fish including Atlantic silversides (Menidia menidia), menhaden (Brevoortia tyrannus), striped bass (Morone saxatilis), and mummichogs (Fundulus heteroclitus) were found. The data highlight the utility of eDNA metabarcoding for analyzing urban estuarine biodiversity and provide a baseline for future work in the area.
eDNA, MEGAN, metabarcoding, next-generation sequencing, QIIME2, river sediment, river water, urban ecology
Urbanization is increasingly disrupting ecological layouts of cities and their surroundings (
Despite having one of the world’s largest human populations and containing several key habitats such as coastal ecosystems and forests, New York City’s wildlife areas remain insufficiently characterized (
The Estuary Section of the Bronx River Watershed (Fig.
Location of the Hunts Point Riverside and Soundview Park study sites in the Bronx River Estuary (New York City, USA). Samples from each park were collected within 2/10th kilometer. The inset shows the location of the study site (boxed) within the greater New York City metropolitan area. Map data 2019 Google.
To appropriately characterize and manage such a complex and impacted system, biodiversity inventories and monitoring are key first steps, starting with the correct identification of organisms. Locally in the Bronx and around the world, this has traditionally been achieved through manual surveys requiring organismal capture and/or collection. While providing important information, these methods are potentially labor-intensive and costly, require specific taxonomic expertise, may fail to detect cryptic, microscopic, or elusive taxa, and could provide incorrect or incomplete information. Environmental DNA (eDNA), or DNA sequenced directly from a substrate such as water, sediment, or air, is a flourishing new, non-invasive, rapid, and standardized technology that addresses some of these shortcomings and provides extensive genetic information useful for identifying species through next-generation sequencing (
Biodiversity characterization and monitoring have substantially benefitted from the high quality next-generation bioinformatics pipelines now available to accurately analyze genetic markers with rapidly growing reference databases (
Here we expand our analysis with new COI sequences amplified from the previously analyzed environmental samples (n = 48). In traditional single-species barcoding, COI has been the standard marker for animals due to its conserved priming regions and informatively variable target segment (
We sampled benthic sediments and surface waters at Hunts Point (HP, 40.82°N, 73.88°W; nsediment = 9; nwater = 8) and Soundview (SVP, 40.81°N, 73.87°W) Parks (Fig.
We processed and extracted DNA from these environmental samples within 24 hours as previously described (
A commercial laboratory performed the polymerase chain reaction, clean-up, and sequencing procedures (MRDNA, Molecular Research LP, Shallowater, TX, USA) using previously described industry-standard procedures and controls (
We used the FASTQ Processor to extract indexes and sort forward and reverse reads (
Summary of COI sample data. Sample ID and statistics on the recovery of reads per sample after filtering, denoising, merging, and chimeric sequence removal are displayed, along with the index sequence and sequencing batch. The linker primer sequence for all samples was GGWACWGGWTGAACWGTWTAYCCYCC.
Sample | Index Sequence | input | Filtered | % input passed filter | Denoised | Merged | % of input merged | Non-chimeric | % of input non-chimeric | Batch |
---|---|---|---|---|---|---|---|---|---|---|
S.B.BRC | AATGCAGG | 404307 | 378045 | 93.5 | 363757 | 322683 | 79.81 | 305828 | 75.64 | 3 |
S.B.BRO | AATGCTAT | 224552 | 171248 | 76.26 | 169030 | 164525 | 73.27 | 160883 | 71.65 | 1 |
S.B.HP | AATGCGAC | 264262 | 197507 | 74.74 | 194996 | 187122 | 70.81 | 180612 | 68.35 | 1 |
S.C.BRC | AATGCCGT | 446701 | 419281 | 93.86 | 403346 | 359484 | 80.48 | 339110 | 75.91 | 3 |
S.C.BRO | AATTAAGC | 230633 | 172908 | 74.97 | 169692 | 163307 | 70.81 | 157334 | 68.22 | 1 |
S.C.HP | AATGTTCG | 232082 | 180166 | 77.63 | 176501 | 168019 | 72.4 | 165625 | 71.36 | 1 |
S.D.BRC | AATGCGAC | 357594 | 335024 | 93.69 | 320807 | 284262 | 79.49 | 267806 | 74.89 | 3 |
S.D.BRO | AATTATGT | 202121 | 154403 | 76.39 | 150533 | 144088 | 71.29 | 142652 | 70.58 | 1 |
S.D.HP | AATTATAA | 200934 | 150589 | 74.94 | 147721 | 141728 | 70.53 | 138902 | 69.13 | 1 |
S.E.BRC16 | AATCTATT | 292305 | 179623 | 61.45 | 161324 | 140460 | 48.05 | 119493 | 40.88 | 2 |
S.E.BRO16 | AATTTAGG | 261540 | 203142 | 77.67 | 199995 | 191698 | 73.3 | 174901 | 66.87 | 1 |
S.E.HP16 | AATTCTCA | 225186 | 170786 | 75.84 | 166593 | 158551 | 70.41 | 150573 | 66.87 | 1 |
S.F.BRC16 | AATGAGCA | 156813 | 101302 | 64.6 | 89909 | 71922 | 45.86 | 64045 | 40.84 | 2 |
S.F.BRO16 | AATTTCTA | 212778 | 160538 | 75.45 | 158567 | 153911 | 72.33 | 145613 | 68.43 | 1 |
S.F.HP16 | ACAAGGCC | 239942 | 181575 | 75.67 | 179036 | 169157 | 70.5 | 161451 | 67.29 | 1 |
S.G.BRC16 | AATGCAGG | 147928 | 95165 | 64.33 | 85196 | 68675 | 46.42 | 60917 | 41.18 | 2 |
S.G.BRO16 | ACAATAGA | 212958 | 167753 | 78.77 | 165091 | 159365 | 74.83 | 154835 | 72.71 | 1 |
S.G.HP16 | ACAATCTG | 261294 | 197228 | 75.48 | 193922 | 185907 | 71.15 | 180603 | 69.12 | 1 |
S.H.BRC16 | AATGCCGT | 176614 | 111803 | 63.3 | 100290 | 81044 | 45.89 | 71905 | 40.71 | 2 |
S.H.BRO16 | ACAATTCG | 190999 | 147290 | 77.12 | 143306 | 135326 | 70.85 | 133829 | 70.07 | 1 |
S.H.HP16 | ACACAAAT | 199858 | 153767 | 76.94 | 150265 | 142697 | 71.4 | 139658 | 69.88 | 1 |
S.I.BRC16 | AATGCGAC | 157544 | 100475 | 63.78 | 90631 | 73741 | 46.81 | 62025 | 39.37 | 2 |
S.I.BRO16 | ACACAGCG | 180039 | 138230 | 76.78 | 134493 | 127002 | 70.54 | 126083 | 70.03 | 1 |
S.I.HP16 | ACACAGGT | 251257 | 191589 | 76.25 | 188477 | 179857 | 71.58 | 176210 | 70.13 | 1 |
S.J.HP16 | ACACCCAG | 296667 | 220466 | 74.31 | 217602 | 210582 | 70.98 | 202212 | 68.16 | 1 |
W.B.BRC | AATCTATT | 539622 | 501610 | 92.96 | 493535 | 470360 | 87.16 | 448905 | 83.19 | 3 |
W.B.BRO | AATGAGCA | 246358 | 188743 | 76.61 | 185575 | 175281 | 71.15 | 168676 | 68.47 | 1 |
W.B.HP | AATCTATT | 219499 | 173898 | 79.22 | 171749 | 163737 | 74.6 | 157321 | 71.67 | 1 |
W.D.BRC | AATGAGCA | 497087 | 462883 | 93.12 | 454574 | 427476 | 86 | 403036 | 81.08 | 3 |
W.D.BRO | AATGCAGG | 233173 | 178523 | 76.56 | 174353 | 164570 | 70.58 | 160054 | 68.64 | 1 |
W.D.HP | AATGCCGT | 163774 | 123746 | 75.56 | 115874 | 105270 | 64.28 | 101538 | 62 | 1 |
W.E.BRC16 | AATGCTAT | 215382 | 153878 | 71.44 | 134734 | 124968 | 58.02 | 98006 | 45.5 | 2 |
W.E.BRO16 | ACACCGGT | 234991 | 184536 | 78.53 | 181026 | 171483 | 72.97 | 165766 | 70.54 | 1 |
W.E.HP16 | ACACCGAG | 180305 | 137446 | 76.23 | 133047 | 128092 | 71.04 | 124416 | 69 | 1 |
W.F.BRC16 | AATGTTCG | 264274 | 186824 | 70.69 | 164570 | 153668 | 58.15 | 116958 | 44.26 | 2 |
W.F.BRO16 | ACAGCGTC | 186549 | 140451 | 75.29 | 137606 | 130972 | 70.21 | 127594 | 68.4 | 1 |
W.F.HP16 | ACAGCACC | 173939 | 129442 | 74.42 | 126420 | 120337 | 69.18 | 116456 | 66.95 | 1 |
W.G.BRC16 | AATTAAGC | 213600 | 150028 | 70.24 | 131103 | 121232 | 56.76 | 89620 | 41.96 | 2 |
W.G.BRO16 | ACAGGGAT | 167412 | 126516 | 75.57 | 122513 | 114505 | 68.4 | 109860 | 65.62 | 1 |
W.G.HP16 | ACAGTCGT | 233719 | 172803 | 73.94 | 168503 | 159402 | 68.2 | 145463 | 62.24 | 1 |
W.H.BRC16 | AATTATAA | 245212 | 174443 | 71.14 | 152455 | 141996 | 57.91 | 109579 | 44.69 | 2 |
W.H.BRO16 | ACAGTTAG | 215300 | 163339 | 75.87 | 159885 | 148340 | 68.9 | 142294 | 66.09 | 1 |
W.H.HP16 | ACAGTTGC | 236955 | 181379 | 76.55 | 178385 | 168769 | 71.22 | 163511 | 69.01 | 1 |
W.I.BRC16 | AATTATGT | 246099 | 171387 | 69.64 | 149653 | 137933 | 56.05 | 105361 | 42.81 | 2 |
W.I.BRO16 | ACATGGCC | 229960 | 178352 | 77.56 | 175925 | 166024 | 72.2 | 159860 | 69.52 | 1 |
W.I.HP16 | ACATTCTC | 228584 | 177506 | 77.65 | 175016 | 166120 | 72.67 | 162118 | 70.92 | 1 |
W.J.HP16 | ACATTGAT | 210529 | 162533 | 77.2 | 159328 | 150060 | 71.28 | 146483 | 69.58 | 1 |
W.J.SVP16 | ACATTGTG | 214000 | 167062 | 78.07 | 162840 | 152186 | 71.11 | 147561 | 68.9 | 1 |
TOTALS | 11623231 | 7653541 |
To assign taxonomic identity to ASVs, a sequence search was conducted against the NCBI database (downloaded 1/27/22) using the blastn algorithm with default parameters in BLAST+ v.2.11.0. (
We used R v.4.0.0 (R Core Team 2021) as implemented in RStudio v. 1.4.1103 (
Next, we removed ASVs identified as Archaea (n = 1,185) and Bacteria (n = 12,774) or Domain Unclassified (n = 3,526) from further analysis. We computed sequence abundance-based basic alpha diversity metrics (Observed ASVs, Shannon richness, Faith’s phylogenetic diversity, and Pielou’s evenness) using a combination of custom functions and commands from the BTOOLS v. 0.0.1 package (
We then performed Principal Coordinates (PCoA) ordinations on the abundance-based Bray-Curtis distance matrix and visualized the results by plotting the ordination. 95% confidence ellipses for each site + sample type combination were produced using the stat_ellipse function in ggplot2. To test for turnover in beta diversity among sites and substrates, we performed a PERMANOVA (nperm = 1000) on the Bray-Curtis distance matrix. Because a key assumption of this test is homogeneity of dispersion, we assessed whether our samples met this condition by using the betadisper and permutest functions in VEGAN. We also tested for the effects of pH, surface water temperature, and year on community composition using a Canonical Correspondence Analysis (CCA) as implemented in VEGAN. Significance was assessed through ANOVA performed on the CCA matrix.
A total of 48 environmental samples were successfully collected, sequenced, and analyzed for COI (nwater = 23; nsediment = 25). Following quality control and contaminant removal, 27,328 ASVs representing Archaea, Bacteria, and Eukarya were recovered from 7,653,541 sequences (Tables
COI sequence and ASV statistics of the Bronx River Estuary. Total or mean values across samples are reported and standard error is shown in parentheses.
Total samples | 48 |
Sample Sites | HP sediment (n = 9) |
HP water (n = 8) | |
SVP sediment (n = 16) | |
SVP water (n = 15) | |
Total raw reads | 11,623,231 |
Total reads, passed filter | 7,653,541 |
Raw reads per sample (mean) | 242,151 (± 11,768) |
Reads per sample, passed filter (mean) | 159,449 (± 11,768) |
Percent reads passed filter | 64.1% |
Unique ASVs, pre-filter | 27,567 |
Unique ASVs, contaminants removed | 27,328 |
Total ASVs removed by DECONTAM | 239 |
We tested whether there were differences in eukaryotic community composition. There was no significant overall distinction among sites and substrates in phylogenetic diversity (Kruskal-Wallis p = 0.71; Fig.
COI diversity comparison between sediment and water samples from Hunts Point (HP) Riverside and Soundview (SVP) Parks. A) Faith’s Phylogenetic Diversity. Result of a global Kruskal-Wallace significance test is shown at the top of the plot. Letters indicate no groupings were significantly different from one another based on pairwise significance tests (p <0.05). B) Principal Coordinates Analysis (PCoA) of Bray-Curtis distances. 95% confidence ellipses for each site + sample type combination were produced using the stat_ellipse function in ggplot2.
The analysis detected a variety of common organisms (
Eukaryotic species of special interest detected by COI from the Bronx River Estuary. C = Commonly observed; M = of Management Concern.
Class and Genus species | Common name | TYPE | COI |
---|---|---|---|
Actinopterygii | |||
Alosa pseudoharengus | River Herring/Alewife | M | |
Alosa estivalis | River Herring | M | |
Ameiurus nebulosus | Brown bullhead | C | |
Ameiurus spp. | Bullhead catfish | C | ✓ |
Anguilla rostrata | American eel | M | ✓ |
Brevoortia tyrannus | Menhaden | C | ✓ |
Fundulus heteroclitus | Mummichog | C | ✓ |
Fundulus majalis | Striped Mummichog (or killifish) | C | |
Gobiesox strumosus | Skillet fish | C | |
Lepomis spp. | Sunfish | C | |
Menidia menidia | Atlantic silverside | C | ✓ |
Morone americana | White perch | C | ✓ |
Morone saxatilis | Striped bass | C, M | ✓ |
Perca flavescens | Yellow perch | C | ✓ |
Ascidiacea | |||
Botryllus schlosseri | Golden star tunicate | C | ✓ |
Molgula spp. | Sea grape | C | |
Perophora sagamiensis | Sea squirt | C | |
Aves | |||
Branta canadensis | Canada goose | C | |
Egretta spp. | Egrets, Herons | C | |
Larus spp. | Gulls | C | ✓ |
Bivalvia | |||
Crassostrea virginica | Eastern oyster | M | ✓ |
Euglesa casertana | Pea Clam | ||
Geukensia demissa | Ribbed mussel | C | ✓ |
Macoma petalum | Atlantic Macoma | ✓ | |
Mercenaria mercenaria | Hard or chowder clam | C | |
Mulinia lateralis | Dwarf surf clam | C | ✓ |
Mya arenaria | Soft-shell clam | C | ✓ |
Mytilus edulis | Blue mussel | C | ✓ |
Nucula proxima | Atlantic nut clam | ✓ | |
Petricolaria pholadiformis | False angelwing | C | ✓ |
Demospongiae | |||
Cliona spp. | Boring sponge | C | |
Halichondria panicea | Breadcrumb sponge | C | ✓ |
Dinophyceae | |||
Alexandrium spp. | HAB (potential) | M | |
Amphidinium carterae | HAB (potential) | M | |
Dinophysis sacculus | HAB (potential) | M | |
Gymnodinium spp. | HAB (potential) | M | |
Gyrodinium spp. | HAB (potential) | M | ✓ |
Heterocapsa rotundata | HAB (potential) | M | ✓ |
Heterocapsa triquetra | HAB (potential) | M | ✓ |
Heterocapsa spp. | HAB (potential) | M | ✓ |
Karlodinium sp. RS-24 | HAB (potential) | M | ✓ |
Margalefidinium polykrikoides | HAB (potential) | M | ✓ |
Gastropoda | |||
Corambe obscura | Obscure Corambe | ✓ | |
Crepidula fornicata | Common slipper snail | C | |
Ercolania fuscata | Sea Slug | ✓ | |
Tritia obsoleta (syn Ilyanassa obsoleta) | Eastern mudsnail | C | ✓ |
Urosalpinx cinerea | Oyster drill | C | |
Malacostraca | |||
Callinectes sapidus | Blue crab | C, M | |
Carcinus maenas | Green crab | C, M | |
Dyspanopeus sayi | Mud crab | C | |
Gammarus oceanicus | Scud amphipod | C | |
Grandidierella japonica | Invasive amphipod | M | ✓ |
Hemigrapsus sanguineus | Asian shore crab | C, M | |
Microdeutopus gryllotalpa | Slender tube maker | C | |
Pagurus longicarpus | Long-clawed hermit crab | C | |
Palaemonetes pugio | Common shore shrimp | C | |
Panopeus herbstii | Black fingered mud crab | C | |
Rhithropanopeus harrisii | White fingered mud crab | C | |
Mammalia | |||
Homo sapiens | Human | C | ✓ |
Ondatra zibethicus | Muskrat | C | |
Rattus norvegicus | Brown rat | C | ✓ |
Merostomata | |||
Limulus polyphemus | Horsheshoe crab | C, M | ✓ |
Polychaeta | |||
Alitta succinea (syn Nereis succinea) | Clam worm | C | |
Amphitrite ornata | Ornate worm | ✓ | |
Capitella teleta | Thread worm | C | ✓ |
Glycera americana | Blood worm | ✓ | |
Lycastopsis pontica | Spring worm | C | |
Platynereis dumerilii | Dumeril’s clam worm | C | |
Streblospio benedicti | Ram's horn worm | C | ✓ |
Sequences from the diatom phylum Bacillariophyta were the most commonly detected at most sites and in the dataset overall (Fig.
Community profiles of eukaryotic COI Amplicon Sequence Variants (ASVs) in sediment and water samples from Hunts Point Riverside and Soundview Parks. Shown in temporal order of collection at the level of phylum; bar heights indicate relative abundance of sequences from each taxon.
Further, several key organisms being restored and monitored in the Bronx River, as well as commonly observed species, were detected (Table
All amplicon gene sequences from this study are posted on the NCBI Sequence Read Archive (SRA>) under BioProject PRJNA606795. DNA extracts are stored at the American Museum of Natural History.
The authors declare no competing interests.
We are grateful to the New York University (NYU) Research Challenge Fund and NYU Liberal Studies New Faculty Scholarship and Creative Production Awards (to ENM), and to private donors through Experiment.com (to IW) for funding the research. Site access was provided by NY/NJ Baykeeper and the New York City Department of Parks and Recreation (Natural Resources Group). Our special thanks to Michael Tessler for initial guidance, as well as student assistants Christian Bojorquez, NaVonna Turner, Sean Thomas, Jennifer Servis, Patrick Shea, Vanessa Van Deusen, and Seth Wollney. We are very thankful for two anonymous reviewers, Reviewer Lise Klunder, and our Subject Editor Florian Leese, whose helpful comments improved the manuscript.
Supplementary Data Files 1, 2
Data type: QIIME AND R Scripts
Explanation note: Scripts used for metabarcoding analysis. Document 1: QIIME2 workflow; Document 2: R script.
Tables S1,S2, Figures S1–S3
Data type: Figures and tables
Explanation note: Fig. S1. Sample-based species accumulation curves of COI Amplicon Sequence Variant (ASV) diversity by substrate type (sediment, water). Calculated using the VEGAN 2.4-3 package. Fig. S2. Eukaryotic alpha diversity comparison between sites and substrate types. Measured by COI for Observed ASVs, Shannon richness, and Pielou’s evenness. Results of a global Kruskal-Wallace significance test are shown at the top of each plot. Letters indicate groupings that were significantly different from one another based on pairwise significance tests (p <0.05). Fig. S3. Canonical Correspondence Analysis indicating the influence of water temperature and pH on eukaryotic community composition inferred by COI. Study sites (Hunts Point (HP) and Soundview (SVP) Parks) and substrates (sediment, water) are shown as different shapes, and arrow lengths indicate the strength and direction of the influence. Table S1. Taxonomic Assignment including COI ASV identification to Domain, Kingdom, Phylum, Class, Order, Family, Genus, and/or Species. ASVs identified as putative contaminants are included at the end of the table. Table S2. Taxonomic Assignment Totals of COI ASVs (number and percentage) identified to Domain, Kingdom, Phylum, Class, Order, Family, Genus, and/or Species.