Data Paper |
Corresponding author: Eugenia Naro-Maciel ( enmaciel@nyu.edu ) Academic editor: Florian Leese
© 2021 Melissa R. Ingala, Irena E. Werner, Allison M. Fitzgerald, Eugenia Naro-Maciel.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Ingala MR, Werner IE, Fitzgerald AM, Naro-Maciel E (2021) 18S rRNA amplicon sequence data (V1–V3) of the Bronx river estuary, New York. Metabarcoding and Metagenomics 5: e69691. https://doi.org/10.3897/mbmg.5.69691
|
Characterising and monitoring biological diversity to foster sustainable ecosystems is highly recommended as urban centres rapidly expand. However, much of New York City’s biodiversity remains undescribed, including in the historically degraded, but recovering Bronx River Estuary. In a pilot study to identify organisms and characterise biodiversity patterns there, 18S rRNA gene amplicons (V1–V3 region), obtained from river sediments and surface waters of Hunts Point Riverside and Soundview Parks, were sequenced. Across 48 environmental samples collected over three seasons in 2015 and 2016, following quality control and contaminant removal, 2,763 Amplicon Sequence Variants (ASVs) were identified from 1,918,463 sequences. Rarefaction analysis showed sufficient sampling depth, and community composition varied over time and by substrate at the study sites over the sampling period. Protists, plants, fungi and animals, including organisms of management concern, such as Eastern oysters (Crassostrea virginica), wildlife pathogens and groups related to Harmful Algal Blooms, were detected. The most common taxa identified in river sediments were annelid worms, nematodes and diatoms. In the water column, the most commonly observed organisms were diatoms, algae of the phylum Cryptophyceae, ciliates and dinoflagellates. The presented dataset demonstrates the reach of 18S rRNA metabarcoding for characterising biodiversity in an urban estuary.
eDNA, metabarcoding, New York City, next-generation sequencing, QIIME2, urban ecology
With the extensive modification of ecosystems by people and increasing urbanisation, the disrupted ecology of cities is garnering substantial research interest (
The once highly polluted, 23 mile-long (ca. 37 km), Bronx River is currently considered “impaired”, with pollutants including faecal coliforms, garbage, refuse, polychlorinated biphenyls (PCBs) and other toxins, coming from a combination of urban and stormwater runoff, Combined Sewer Overflow (CSO) outfalls, contaminated sediments, and other sources (
Sampling site map depicting greater New York City waterways. Inset: Detail map of sample area showing Soundview Park (SVP) and Hunts Point Riverside Park (HP) in bold outline. Right: site photos of the SVP and HP estuaries on the Bronx River. Map data 2019 (C) Google.
Although the first step in any ecological study is the correct identification of organisms in the focal system, there are numerous challenges in conducting biodiversity inventories (
Recently developed, non-invasive and transformative environmental DNA (eDNA) metabarcoding technology provides volumes of data for biodiversity assessment via next-generation sequencing. DNA barcoding was the first worldwide effort to document life and identify species using genetic sequences from a standard DNA segment (
Environmental DNA has numerous advantages, offers high-throughput presence, absence and relative abundance data, and can improve representation of microscopic or cryptic taxa (
One advancement that facilitates biodiversity assessment and monitoring is state-of-the-art bioinformatics pipeline development to perform quality-control and large-volume data analysis (
The Bronx River Estuary was sampled from August 2015 to September 2016, monthly from May to October during low tide (Fig.
All materials were processed within 24 hours of sampling (
All remaining lab work (amplification, purification and sequencing) was conducted in a commercial laboratory (MRDNA, Molecular Research LP, Shallowater, TX, USA) using previously described industry-standard procedures and controls (
After 2% agarose gel checks, the uniquely barcoded PCR samples were pooled in equal proportions, based on a combination of electrophoresis-based size and density estimations and DNA concentrations. The pooled samples were then purified with calibrated Ampure XP beads (Agencourt Bioscience, USA); the ratio of beads to PCR products used for purification was 0.75× as per Illumina manufacturer guidelines. Next, an Illumina DNA library was created from these purified and pooled PCR products ligated to Illumina adapters using the Illumina TruSeq DNA library preparation protocol. Finally, sequencing was performed on an Illumina MiSeq according to manufacturer’s instructions, using paired-end 2 × 300 bp v.3 chemistry (
The FASTQ Processor was used to extract barcodes and sort forward and reverse reads into distinct files (
Summary of sample data, including sample ID and statistics on the recovery of reads per sample after filtering, de-noising and chimeric sequence removal.
Sample | Input | Filtered | % input passed filter | De-noised | Non-chimeric | % of input non-chimeric |
---|---|---|---|---|---|---|
S.B.BRC | 61653 | 51307 | 83.22 | 49865 | 49355 | 80.05 |
S.B.BRO | 57043 | 46675 | 81.82 | 45055 | 43716 | 76.64 |
S.B.HP | 41172 | 34193 | 83.05 | 32911 | 31537 | 76.60 |
S.C.BRC | 65637 | 54030 | 82.32 | 52074 | 50699 | 77.24 |
S.C.BRO | 53923 | 43911 | 81.43 | 42387 | 41716 | 77.36 |
S.C.HP | 48441 | 40413 | 83.43 | 38983 | 37581 | 77.58 |
S.D.BRC | 37670 | 31600 | 83.89 | 29892 | 29647 | 78.70 |
S.D.BRO | 39984 | 32324 | 80.84 | 30829 | 30329 | 75.85 |
S.D.HP | 38926 | 32241 | 82.83 | 30923 | 29822 | 76.61 |
S.E.BRC16 | 81853 | 67172 | 82.06 | 65692 | 64195 | 78.43 |
S.E.BRO16 | 64498 | 52646 | 81.62 | 51301 | 49170 | 76.23 |
S.E.HP16 | 50365 | 41017 | 81.44 | 39681 | 38571 | 76.58 |
S.F.BRC16 | 74188 | 62748 | 84.58 | 61411 | 60144 | 81.07 |
S.F.BRO16 | 72355 | 59701 | 82.51 | 57927 | 57472 | 79.43 |
S.F.HP16 | 56688 | 47187 | 83.24 | 46094 | 44870 | 79.15 |
S.G.BRC16 | 60173 | 50378 | 83.72 | 48470 | 47934 | 79.66 |
S.G.BRO16 | 59639 | 50520 | 84.71 | 49155 | 45773 | 76.75 |
S.G.HP16 | 62125 | 49689 | 79.98 | 48429 | 45891 | 73.87 |
S.H.BRC16 | 84518 | 62017 | 73.38 | 60776 | 60186 | 71.21 |
S.H.BRO16 | 64609 | 53593 | 82.95 | 51590 | 50524 | 78.20 |
S.H.HP16 | 48136 | 40675 | 84.5 | 39217 | 37346 | 77.58 |
S.I.BRC16 | 63314 | 53030 | 83.76 | 51647 | 51166 | 80.81 |
S.I.BRO16 | 54232 | 44896 | 82.79 | 43223 | 42157 | 77.73 |
S.I.HP16 | 51530 | 40901 | 79.37 | 39682 | 37770 | 73.30 |
S.J.HP16 | 55575 | 44475 | 80.03 | 43004 | 41364 | 74.43 |
W.B.BRC | 66549 | 57336 | 86.16 | 49282 | 46759 | 70.26 |
W.B.BRO | 66937 | 57216 | 85.48 | 49679 | 47022 | 70.25 |
W.B.HP | 50568 | 43407 | 85.84 | 37591 | 35988 | 71.17 |
W.D.BRC | 17978 | 15045 | 83.69 | 14346 | 14173 | 78.84 |
W.D.BRO | 33475 | 27576 | 82.38 | 26316 | 25681 | 76.72 |
W.D.HP | 23573 | 19891 | 84.38 | 17931 | 17902 | 75.94 |
W.E.BRC16 | 39716 | 34005 | 85.62 | 32661 | 31003 | 78.06 |
W.E.BRO16 | 37689 | 32271 | 85.62 | 30832 | 29645 | 78.66 |
W.E.HP16 | 39206 | 33090 | 84.4 | 31243 | 30725 | 78.37 |
W.F.BRC16 | 37895 | 32172 | 84.9 | 30931 | 30462 | 80.39 |
W.F.BRO16 | 43923 | 36032 | 82.03 | 34993 | 34027 | 77.47 |
W.F.HP16 | 70651 | 57008 | 80.69 | 56284 | 52129 | 73.78 |
W.G.BRC16 | 50138 | 42646 | 85.06 | 41774 | 38491 | 76.77 |
W.G.BRO16 | 47758 | 40433 | 84.66 | 39051 | 36563 | 76.56 |
W.G.HP16 | 54025 | 45735 | 84.66 | 44776 | 40909 | 75.72 |
W.H.BRC16 | 41440 | 35408 | 85.44 | 34245 | 32717 | 78.95 |
W.H.BRO16 | 42214 | 34834 | 82.52 | 33777 | 32871 | 77.87 |
W.H.HP16 | 55457 | 46005 | 82.96 | 44750 | 41631 | 75.07 |
W.I.BRC16 | 47134 | 39762 | 84.36 | 37335 | 35006 | 74.27 |
W.I.BRO16 | 48683 | 40787 | 83.78 | 38592 | 35516 | 72.95 |
W.I.HP16 | 46537 | 38277 | 82.25 | 37216 | 35306 | 75.87 |
W.J.HP16 | 41706 | 34926 | 83.74 | 33496 | 32281 | 77.40 |
W.J.SVP16 | 57638 | 48829 | 84.72 | 47289 | 42721 | 74.12 |
Totals | 2509137 | 1918463 |
Once taxonomic annotation was complete, R Studio v. 1.2.1335 (
The PHYLOSEQ v. 1.28.0 package was then used for basic data manipulation and, visualisation and community-level statistical analyses were performed using tools available in the VEGAN v. 2.5.5 package (
This pilot study identified key organisms, explored biodiversity patterns and established a baseline for future work in the area, but the data must be interpreted with caution considering methodological issues. In total 48 environmental samples were successfully collected and sequenced for the 18S rRNA gene (nwater = 23; nsediment = 25). Within these samples, protists, plants, fungi and animals encompassing 2,763 ASVs were recovered from a total of 1,918,463 post-quality-control sequences (Suppl. material
18S rRNA community profiles of Amplicon Sequence Variants (ASVs) in sediment and water samples from Hunts Point Riverside and Soundview Parks shown at the level of phylum. Bar heights show relative abundance of sequences from each taxon.
Several organisms of known occurrence, including taxa of management concern, were detected. Commonly observed species identified in this survey included soft-shell clams (Mya arenaria) and blue mussels (Mytilus edulis) (Suppl. material
In terms of alpha diversity, Soundview Park sediments were significantly higher in the observed number of ASVs compared with all other sites (Kruskal-Wallis, p = 0.05, Fig.
Alpha diversity comparison between sediment and water samples from Hunts Point Riverside and Soundview Parks computed using A) Observed ASVs and B) Shannon Diversity. Pairwise comparisons are indicated as follows: * = p < 0.05, ** = p < 0.001, ns = non-significant.
Community comparisons amongst substrates (sediment and water) and sites (Hunts Point Riverside and Soundview Parks), based on Amplicon Sequence Variants (ASVs) and using Non-metric multidimensional scaling analysis (NMDS).
Future metabarcoding work in the area would benefit from lessons learned during and resources developed since this pilot study. The high quality, comprehensive protocols now available to standardise and ensure eDNA metabarcoding excellence should be carefully followed (
In conclusion, the 18S rRNA V1 – V3 dataset, presented here, complements our prior study, “16S rRNA Amplicon Sequencing of Urban Prokaryotic Communities in the South Bronx River Estuary”. Future work will comparatively analyse information from these two genetic regions and new data from Cytochrome Oxidase I, the standard locus for animal barcoding (
All 18S rRNA amplicon gene sequences from this study are posted on the NCBI Sequence Read Archive (SRA) under BioProject PRJNA606795 accession numbers SAMN19729835–SAMN19729882 (Table
The authors declare no competing interests.
We are grateful for the New York University Research Challenge Fund and NYU Liberal Studies New Faculty Scholarship and Creative Production Awards (to ENM) and to private donors through Experiment.com (to IW) for funding the research. Site access was provided by NY/NJ Baykeeper and the New York City Department of Parks and Recreation (Natural Resources Group). Special thanks are extended to student assistants Christian Bojorquez, NaVonna Truner, Sean Thomas, Jennifer Servis, Patrick Shea, Vanessa Van Deusen and Seth Wollney, as well as to Dr .Brendan Reid.
Scripts used for metabarcoding analysis
Data type: QIIME and R Scripts (in zip. archive)
Explanation note: Document S1. QIIME2 workflow. Document S2. R script.
Figure S1, Tables S1, S2
Data type: jpg. file, docx. file and xslx. file (in zip. archive)
Explanation note: Figure S1. Sample-based species accumulation curves of total 18S rRNA diversity by substrate type (sediment, water) for each site (Hunts Point Riverside and Soundview Parks), calculated using the Vegan 2.4-3 package for Amplicon Sequence Variants (ASVs). Table S1. Taxonomies of contaminating ASVs removed by “decontam” analysis. Table S2. Taxonomic Assignment including ASV ID to Domain (d), Phylum (p), Class (c), Order (o), Family (f), Genus (g) and Species (s), as applicable.