Data Paper |
|
Corresponding author: Eugenia Naro-Maciel ( enmaciel@nyu.edu ) Academic editor: Anastasija Zaiko
© 2022 Brendan N. Reid, Jennifer A. Servis, Molly Timmers, Forest Rohwer, Eugenia Naro-Maciel.
This is an open access article distributed under the terms of the CC0 Public Domain Dedication.
Citation:
Reid BN, Servis JA, Timmers M, Rohwer F, Naro-Maciel E (2022) 18S rDNA amplicon sequence data (V1–V3) of the Palmyra Atoll National Wildlife Refuge, Central Pacific. Metabarcoding and Metagenomics 6: e78762. https://doi.org/10.3897/mbmg.6.78762
|
To address the global biodiversity crisis, standardized data that are rapidly obtainable through minimally invasive means are needed for documenting change and informing conservation within threatened and diverse systems, such as coral reefs. In this data paper, we describe 18S rRNA gene amplicon data (V1–V3 region) generated from samples collected to begin characterizing coral reef eukaryotic community composition at the Palmyra Atoll National Wildlife Refuge in the Central Pacific Ocean. Sixteen samples were obtained across four sample types: sediments from two sieved fractions (100–500 μm, n = 3; 500 μm-2 mm, n = 3) and sessile material scrapings (n = 3) from Autonomous Reef Monitoring Structures (ARMS) sampled in 2015, as well as seawater from 2012 (n = 7). After filtering and contaminant removal, 3,861 Amplicon Sequence Variants (ASVs) were produced from 1,062,238 reads. The rarefaction curves demonstrated adequate sampling depth, and communities grouped by sample type. The dominant orders across samples were polychaete worms (Eunicida), demosponges (Poecilosclerida), and bryozoans (Cheilostomatida). The ten most common orders in terms of relative abundance comprised ~60% of all sequences and 23% of ASVs, and included reef-building crustose coralline algae (CCA; Corallinophycidae) and stony corals (Scleractinia), two taxa associated with healthy reefs. Highlighting the need for further study, ~21% of the ASVs were identified as uncultured, incertae sedis, or not assigned to phylum or order. This data paper presents the first 18S rDNA survey at Palmyra Atoll and serves as a baseline for biodiversity assessment, monitoring, and conservation of this remote and pristine ecosystem.
ARMS, eDNA, next-generation sequencing, PANWR, QIIME2, seawater, sediment, sessile
Tropical coral reefs are among the most biologically diverse, complex, and productive of ocean ecosystems (
To address this gap, baseline biodiversity assessment using standardized methods to enable comparison among sites represents a fundamental first step (
The Palmyra Atoll National Wildlife Refuge in the Central Pacific. The inset shows sampling sites at the PANWR. Autonomous Reef Monitoring Structure (ARMS) and water samples were collected from nine coral reef sites at the Palmyra Atoll National Wildlife Refuge, Central Pacific. ARMS samples were obtained at sites marked by stars, water at sites marked by circles, and at site PAL-19 both ARMS and water samples were collected.
Targeted metabarcoding research at Palmyra has established baselines and patterns for fish (
The data set presented here constitutes the first broad eukaryotic biodiversity survey of coral reef systems at the Palmyra Atoll National Wildlife Refuge. Both seawater and ARMS samples were examined using 18S rDNA, a promising genetic marker offering advantages of broad amplification across eukaryotic kingdoms, a rapidly growing reference database, and wide use (
Palmyra Atoll is located in the Central Pacific Ocean approximately 1,700 km southwest of Hawai‘i, lying on the northwest end of the Northern Line Islands (Fig.
Three sites around the atoll were selected for ARMS deployment (Fig.
Once disassembled, the seawater within the tub was sieved through adjoining 2 mm and 500 μm pans and an attachable 100 μm mesh net, creating two separate sediment size fractions (100–500 μm and 500 μm-2 mm). These fractions were stored in 95% ethanol within a -20 °C freezer. Back at a NOAA lab onshore, each sediment fraction underwent a decantation process to isolate organic material from sediment particles following
At the Sackler Institute for Comparative Genomics, four replicate extractions (~ 0.25 g each) were performed using the MOBIO PowerSoil DNA Isolation Kit on each of the ARMS sediment and sessile samples. Although extraction blanks and positive controls were not used, standard decontamination, disinfection, and sterilization practices stringently in place at this dedicated molecular laboratory, including use of PCR-free areas, were assiduously followed, and in silico contaminant removal was carried out as described below. The DNA eluates from the four replicates were pooled (combined and mixed into one tube) and stored at -20 °C prior to being sent out for sequencing.
Water samples (n = 7) were collected in May 2012 from seven forereef sites < 1 m above the benthos at depths ranging from 3.7–13.7 m (Fig.
Sixteen samples, 7 from water and 9 from the three ARMS (two sediment fractions and one sessile material scraping sample per unit) (Fig.
Amplicon Sequence Variant (ASV) diversity measurements for each sample type. Included are the total number of samples, sequences retained after filtering, and the total and mean numbers of ASVs detected per sample. Number of (unique) ASVs, Shannon-Weaver and Faith’s phylogenetic diversity for each sample type, before and after rarefaction to the lowest combined number of reads per sample type (152,590), are given to the left and to the right of slashes respectively.
| Sediment (100- 500 μm) | Sediment (500 μm-2 mm) | Sessile material scrapings | Water | Total | |
|---|---|---|---|---|---|
| Number of samples | 3 | 3 | 3 | 7 | 16 |
| Sample sites | ARMS: PAL 01, 17, and 19 | ARMS: PAL 01, 17, and 19 | ARMS: PAL 01, 17, and 19 | PAL 04, 05, 06, 10, 12, 19, and 25 | |
| Sampling year | 2015 | 2015 | 2015 | 2012 | |
| Number of replicates | 6 (2 per site) | 6 (2 per site) | 6 (2 per site) | 7 | 25 |
| Number of sequences after filtering | 164,799 | 232,724 | 152,590 | 512,125 | 1,062,328 |
| Total no. ASVs | 815 / 808 | 420 / 416 | 631 / 625 | 2,646 / 2,474 | |
| Unique ASVs | 445 / 442 | 143 / 140 | 338 / 336 | 2,487 / 2,322 | |
| H (Shannon-Weaver) before / after rarefaction | 3.98 / 3.97 | 2.63 / 2.63 | 3.79 / 3.79 | 5.47 / 5.46 | |
| Faith’s phylogenetic diversity before / after rarefaction | 84.98 / 84.01 | 53.31 / 52.68 | 59.82 / 58.84 | 238.10 / 215.42 |
Forward and reverse reads extracted from MRDNA’s Fastq Processor (
Amplicon-specific reference databases were used to obtain more robust taxonomic classifications (
To improve taxonomic assignments, two alternate assignment methods were also used. For the first method, the steps above were repeated employing an alternate curated 18S database with a focus on planktonic sequences (pr2 v.4.14.0) (
The QIIME2 outputs were imported into R v.4.0.0 (
To assess if the number of sequences was appropriate for accurately estimating community composition and taxonomic diversity, rarefaction curves for each sample were produced using the R package VEGAN v. 2.5.5 (
Sequences were grouped by sample type, the total number of ASVs was calculated, and the number of taxa was estimated at the second- and fourth-highest taxonomic levels present in the SILVA138 database. These levels are roughly consistent with conventional “phylum” and “order” classifications, respectively, with the caveat that taxonomic ranks in SILVA are assigned to preserve roughly the same level of evolutionary divergence at a given rank across the tree (
The Shannon-Weaver diversity index and ACE richness estimate were calculated using VEGAN for individual samples, pooled sample types, and the collective dataset. Faith’s phylogenetic diversity (defined as the sum of all branch lengths in the phylogenetic tree for all samples in a given group) was calculated at the sample type levels and for the entire dataset with the R package PICANTE (
This project generated promising baseline data for characterizing overall eukaryotic diversity of the remote and relatively pristine PANWR reef community, and for contributing to the ongoing global assessment of reef biodiversity. A total of 1,610,301 raw sequences were generated, with on average 100,000 sequences per sample (range: 54,090–190,997). After denoising and sequence filtration, 1,113,657 (70.5%) sequences remained, resulting in 3,936 ASVs. As noted above, 75 ASVs (51,419 sequences) were removed as putative contaminants, leaving 1,062,238 sequences and 3,861 ASVs for analysis (Suppl. material
Rarefaction at the sample level (n = 16) indicated that sequencing depth was sufficient for characterizing biodiversity, as ASV accumulation curves reached an asymptote in each case (Fig.
Sample-based rarefaction curves of ASV diversity detected in 18S rRNA (V1–V3) gene amplicon analysis of Palmyra Atoll. Sample types include 100-500 μm sediment, 500 μm -2 mm sediment, sessile material scrapings, and reef water. Given the uneven sampling depths, for some comparative analyses samples were later rarefied to even sequencing depth. Numbers indicate sampling site as shown in Fig.
Principal coordinate analysis (PCoA) depicting distinct clusters of communities at the Palmyra Atoll National Wildlife Refuge. In addition to sea water obtained in 2012, three types of samples were collected in 2015 from the Autonomous Reef Monitoring Structures (ARMS): two separate sediment size fractions (100-500 μm and 500 μm-2 mm), and sessile material scraped from the plates. The PCoA is based on ASV presence or absence (Jaccard distance) in each of the four sample types
Taxonomic composition consisted of 73 different phyla and 261 orders (Suppl. material
Relative ASV abundance of the ten most common orders sequenced for 18S rDNA (V1–V3). Samples were obtained from four sample types (500 μm-2 mm sediment fraction, 100-500 μm sediment fraction, sessile material scrapings; and water) collected from nine sites at Palmyra Atoll (Fig.
The most common groups across samples were polychaete worms (order Eunicida), followed by demosponges (order Poecilosclerida) and bryozoans (order Cheilostomatida). Order Podocopida (ostracod crustaceans) was the fourth most frequent, the fifth was the dinoflagellate order Syndiniales, and the sixth was the Peracarida crustaceans (a group containing amphipods and isopods, ranked as an order in the SILVA 138 taxonomy but classified as a superorder in other taxonomies). The Corallinophycidae red algae, which contains the order Corallinales (crustose coralline algae, CCA), was the seventh most common. Order Scleractinia, or stony corals, was the eight most frequent, followed by another demosponge order (Dendroceratida) and Calanoida (a marine copepod group). Notably, sequences from two orders critical to healthy reefs (Corallinophycidae and Scleractinia) were detected in both water and sessile ARMS samples. Corallinophycid genera identified by eDNA included Amphiroa, Hydrolithon, Jania, Lithothamnion, Mesophyllum, Neogoniolithon, Porolithon, Sporolithon, Titanoderma, and the scleractinian coral genera Acropora, Favites, and Montipora were also detected (Suppl. Material 3, 5: Tables S1, S3). Coral symbiotic zooxanthellae (genus Symbiodinium), which are ejected when corals bleach, were also found, mainly in water but also in the sessile and coarse sediment fractions of the ARMS samples.
Sample types differed significantly for the number of ASVs (Kruskal-Wallis p < 0.004), Shannon-Weaver diversity (Kruskal-Wallis p < 0.05), and phylogenetic diversity (Kruskal-Wallis p < 0.005). Water samples contained many ASVs assigned to orders that were not represented in the list of top ten most prevalent orders, most notably non-metazoan eukaryotes / protists, and green algae (Fig.
Among the ARMS samples, the sessile material scraping and 100–500 μm sediment fractions tended to have higher diversity than the 500 μm-2 mm ones (Table
Notably, all four of the sample types contained unique ASVs, suggesting that complete characterization of reef biodiversity requires metabarcoding of a range of different sample types over time. This provides a complementary picture of reef biodiversity that recovers many sessile components of reef structure and small metazoans that would otherwise be missed in water column sampling. In conclusion, this data set paves the way for a better understanding of eukaryotic biodiversity in this largely pristine reef system, as well as guidance for future studies that should pay careful attention to updated protocols. This includes careful use of replicates at each site, sequencing of extraction blanks and negative / positive PCR controls, and meticulous attention to potential errors in relative abundance estimates (
The 18S rDNA amplicon gene sequences from this work are posted on the NCBI Sequence Read Archive (SRA) under BioProject number PRJNA804389 (Table
Summary of sample data. The sample name (see Fig.
| Sample Name | Barcode Sequence | Site | Sample type | Sample Year | Raw # of reads | SRA Accession |
|---|---|---|---|---|---|---|
| 01A115 | CTCTGACT | 1 | Sediment (100-500μm) | 2015 | 17036 | SRS11916995 |
| 01A115b | CTCTGAGA | 1 | Sediment (100-500μm) | 2015 | 17606 | SRS11916994 |
| 01A515 | CTCTGTCA | 1 | Sediment (500μm-2mm) | 2015 | 23522 | SRS11917005 |
| 01A515b | CTCTGTGT | 1 | Sediment (500μm-2mm) | 2015 | 22814 | SRS11917012 |
| 01A15 | CTCTTCAG | 1 | Sessile | 2015 | 32664 | SRS11917011 |
| 01A15b | CTCTTCTC | 1 | Sessile | 2015 | 29761 | SRS11917013 |
| 17A115 | CTGACTCT | 17 | Sediment (100-500μm) | 2015 | 29888 | SRS11916996 |
| 17A115b | CTCTGTCA | 17 | Sediment (100-500μm) | 2015 | 58338 | SRS11916997 |
| 17A515 | CTGACTGA | 17 | Sediment (500μm-2mm) | 2015 | 49053 | SRS11916998 |
| 17A515b | CTCTGTGT | 17 | Sediment (500μm-2mm) | 2015 | 82789 | SRS11916999 |
| 17A15 | CTCTTGAC | 17 | Sessile | 2015 | 32198 | SRS11917000 |
| 17A15b | CTCTTGTG | 17 | Sessile | 2015 | 29883 | SRS11917001 |
| 19A115 | CTGAGACT | 19 | Sediment (100-500μm) | 2015 | 21791 | SRS11917003 |
| 19A115b | CTCTTGAC | 19 | Sediment (100-500μm) | 2015 | 33386 | SRS11917004 |
| 19A515 | CTGAGAGA | 19 | Sediment (500μm-2mm) | 2015 | 20800 | SRS11917002 |
| 19A515b | CTCTTGTG | 19 | Sediment (500μm-2mm) | 2015 | 35722 | SRS11917006 |
| 19A15 | CTGAACAC | 19 | Sessile | 2015 | 32291 | SRS11917007 |
| PAL.04 | CTCTGACT | 4 | Water | 2012 | 78908 | SRS11917014 |
| PAL.05 | CTCTGAGA | 5 | Water | 2012 | 76076 | SRS11917016 |
| PAL.06 | CTCTGTCA | 6 | Water | 2012 | 66775 | SRS11917015 |
| PAL.10 | CTCTGTGT | 10 | Water | 2012 | 73617 | SRS11917017 |
| PAL.12 | CTCTTCAG | 12 | Water | 2012 | 80428 | SRS11916993 |
| PAL.19 | CTCTTCTC | 19 | Water | 2012 | 67378 | SRS11917009 |
| PAL.25 | CTCTTGAC | 25 | Water | 2012 | 71342 | SRS11917010 |
The authors declare no competing interests.
Funding for this project was provided by the Professional Staff Congress of the City University of New York (to ENM) and the Lerner Gray Fund for Marine Research of the American Museum of Natural History (to JAS). All work was carried out under authorized permits to Forest Rohwer and Rusty Brainard (NOAA). We would like to thank Kevin Green and Ben Knowles at San Diego State University for the logistical processing of water samples, as well as Kerry Reardon at NOAA for ARMS collections, and Seth Wollney and Vasiliki Stergioula at CSI for assistance with bioinformatics and lab work. Finally, we extend thanks to Eleanor Sterling and George Amato for their initial advice and direction on the project. Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We are grateful to our Editor, Anastasija Zaiko, as well as Reviewers David Stankovic, Florian Leese, and anonymous for helpful comments.
Figure S1
Data type: jpg. file
Explanation note: Venn diagram indicating the number of shared and unique ASVs (in bold) in the full dataset. Numbers 1-4 in parentheses correspond to the following sample types: 100-500 μm sediment fraction (1); 500 μm-2 mm sediment fraction (2); sessile material scrapings (3); and water (4).
Figure S2
Data type: jpg.file
Explanation note: Heatmap of sequence abundances for the top 25 order-level classifications. RAD_A represents a Retarian group, and Subclade_B is a group of chlorophyte algae.
Table S1
Data type: xslx.file
Explanation note: The number of sequences for each ASV detected by sample type. When possible, assignments at the lowest taxonomic level in the SILVA database were associated with higher levels (genus and family) consistent with the NCBI’s currently accepted taxonomy. ASVs identified as putative contaminants are included at the bottom of the table (available at https://github.com/nerdbrained/palmyra_edna).
Table S2
Data type: pdf. file
Explanation note: Site-level diversity statistics for either all eukaryotes, or metazoans only, in each sample type. Included are Amplicon Sequence Variants before (ASVs) and after (ASVsrare) rarefaction to the lowest sample size (30,443 reads). H = Shannon-Weaver diversity. PD = Faith’s phylogenetic diversity.
Table S3
Data type: xslx. file
Explanation note: Ranked phyla, with class and orders, by relative abundance summed across sample type types.
Table S4
Data type: xslx. file
Explanation note: Additional order-level identifications added after pr2 and BLAST analyses.