Characterising planktonic dinoflagellate diversity in Singapore using DNA metabarcoding

Dinoflagellates are traditionally identified morphologically using microscopy, which is a time-consuming and labour-intensive process. Hence, we explored DNA metabarcoding using high-throughput sequencing as a more efficient way to study planktonic dinoflagellate diversity in Singapore’s waters. From 29 minimally pre-sorted water samples collected at four locations in western Singapore, DNA was extracted, amplified and sequenced for a 313-bp fragment of the V4–V5 region in the 18S ribosomal RNA gene. Two sequencing runs generated 2,847,170 assembled paired-end reads, corresponding to 573,176 unique sequences. Sequences were clustered at 97% similarity and analysed with stringent thresholds (≥150 bp, ≥20 reads, ≥95% match to dinoflagellates), recovering 28 dinoflagellate taxa. Dinoflagellate diversity captured includes parasitic and symbiotic groups which are difficult to identify morphologically. Richness is similar between the inner and outer West Johor Strait, but variations in community structure are apparent, likely driven by environmental differences. None of the taxa detected in a recent phytoplankton bloom along the West Johor Strait have been recovered in our samples, suggesting that background communities are distinct from bloom communities. The voluminous data obtained in this study contribute baseline information for Singapore’s phytoplankton communities and prompt future research and monitoring to adopt the approach established here.


Introduction
Dinoflagellates (Alveolata: Dinophyceae) are a diverse and abundant group of unicellular protists found in both marine and freshwater environments.With more than 2,000 living species described (Gómez 2012), they are one of the most functionally important organisms in a variety of aquatic ecosystems (Spector 1984;Taylor et al. 2008).On top of being species rich, they have highly diversified morphologies, physiologies and biochemical properties (Hackett et al. 2004;Taylor et al. 2008).Dinoflagellates can be armoured or unarmoured (naked), based on the presence or absence of thecal plates respectively (Netzel and Dürr 1984).They play diverse ecological roles-as autotrophs, heterotrophs or mixotrophs-and can be endosymbionts of other organisms, notably the shallow-water reef-building corals, or host other endosymbionts them-selves (Taylor et al. 2008).As part of the phytoplankton community, they are primary producers that form the base of food chains and are a large constituent of the aquatic system's food web (Spector 1984;Taylor et al. 2008).
In the marine environment, there are more than 1,500 species including both photosynthetic and non-photosynthetic types (Taylor et al. 2008, Gómez 2012).Massive proliferation and accumulation of planktonic dinoflagellates can lead to water colouration and certain species are also known to cause damage to marine ecosystems as harmful algal blooms or 'red tides' (Berdalet et al. 2016).Fish kills and other animal mortalities are common especially when the bloom involves toxin-producing dinoflagellates such as Karenia brevis (Landsberg et al. 2009).Incidents of paralytic shellfish poisoning by dinoflagellates such as Alexandrium fundyense have serious health implications higher up the food chain especially where human consumption is involved (Tan and Lee 1986;John et al. 2015).For example, in 2008, an intense bloom in the St Lawrence Estuary was found to be responsible for the death of many fish, bird and mammal species and extremely hazardous levels of paralytic shellfish toxins produced by the dinoflagellate were detected in edible mussels collected during the event (Starr et al. 2017).
The most common and earliest method of observing phytoplankton is through light microscopy, which is time consuming, labour intensive and demands a high level of taxonomic expertise (Sinigalliano et al. 2009;Medlin 2013;McNamee et al. 2016).Some species, such as those in Alexandrium Halim, 1960, are relatively featureless and furthermore belong to species complexes, rendering them challenging to identify morphologically (John et al. 2005;Anderson et al. 2012;John et al. 2014).In addition, naked dinoflagellates are also difficult to identify from preserved samples.Hence, more efficient approaches to study dinoflagellates have been developed, including the use of scanning electron microscopy (Jung et al. 2010), natural fluorescence (Karlson et al. 2010), flow cytometry (Sinigalliano et al. 2009), high performance liquid chromatography (HPLC) (Gin et al. 2006) and molecular methods (Karlson et al. 2010).
Molecular techniques exploit the genetic distinction between species to identify and quantify species and have played an integral role in our understanding of the systematic positions and evolutionary relationships amongst organisms (Karlson et al. 2010).Examples of earlier molecular methods include cloning and sequencing (Zardoya et al. 1995), as well as heteroduplex mobility assay (Oldach et al. 2000).More recently, DNA barcoding has been successful in identifying dinoflagellates (Stern et al. 2010).It uses the amplified sequence from a short, standardised DNA locus as diagnostic characters for species identification (Lin et al. 2009;Shokralla et al. 2012).Loci tested for dinoflagellates include the nuclear small ribosomal subunit (18S rRNA) (Lin et al. 2006), large ribosomal subunit (28S rRNA; D1/D2 region) (Elferink et al. 2017), internal transcribed spacers (Stern et al. 2012) and the mitochondrial cytochrome c oxidase subunit I and cytochrome b (Lin et al. 2009).The mitochondrial genes cannot be easily amplified from all dinoflagellate taxa (Stern et al. 2012) and, amongst the nuclear loci, 18S rRNA remains the most well-sequenced marker in public databases due to their widespread use in dinoflagellate systematics (Mordret et al. 2018).
Improved technologies for performing high-throughput sequencing have enabled multiplexed amplification products to be sequenced from numerous samples or from an environmental sample, in a method generally known as DNA metabarcoding, which is meeting the needs of many ecologists for rapid taxon identification (Metzker 2010;Taberlet et al. 2012).DNA extracts can be amplified using primers tagged with a short barcode which labels the amplicons according to the sampling design and all amplicons can be pooled for sequencing and data traced back to individual samples (Schnell et al. 2015).DNA metabarcoding enables thousands of samples to be sequenced simultaneously, rapidly and cost-effectively (Aylagas et al. 2016).Important ecological applications of metabarcoding include diet and biodiversity analyses, with environmental samples of various ecosystems obtained mainly from faeces (Srivathsan et al. 2015;Guillerault et al. 2017), soil (Andersen et al. 2012;Treonis et al. 2018) or water samples (Thomsen et al. 2012;Yamamoto et al. 2017).
In the marine environment, metabarcoding has been useful for studying both benthic and planktonic eukaryotic diversity (Fonseca et al. 2010(Fonseca et al. , 2014;;Logares et al. 2014;de Vargas et al. 2015;Guardiola et al. 2015;Le Bescot et al. 2016;Tragin et al. 2018).Studies have targeted the 18S rRNA and successfully characterised phytoplankton communities, which are more diverse than previously thought (de Vargas et al. 2015;Le Bescot et al. 2016).Small and symbiotic species have escaped traditional microscopic detection but can now be identified via metabarcoding (Le Bescot et al. 2016, Leblanc et al. 2018).Here we use this technique to study phytoplankton communities in the coastal waters of Singapore.
The earliest studies of Singapore's marine phytoplankton communities were carried out in the mid-1900s, focusing on the Singapore Strait (Tham 1953(Tham , 1973; see Lee et al. 2015).It has long been hypothesised that the succession of phytoplankton species is likely caused by the biological history of water, stability and turbulence of the water column and monsoon-driven currents (Tham 1973).There have also been new species discoveries (Holmes 1998) and new species records for Singapore (Leong et al. 2015).More recently, studies have begun to use more advanced techniques, which include the use of extracted chlorophyll measurements (Gin et al. 2000), flow cytometry (Gin et al. 2003), HPLC pigment analysis (Gin et al. 2003(Gin et al. , 2006)), spectral fluorometric characterisation (Kuwahara and Leong 2015) and also molecular methods (Tang et al. 2007;Leong et al. 2015).The consensus of these efforts is that, amongst all phytoplankton groups, diatoms have the highest abundance except during dinoflagellate blooms (Gin et al. 2006;Leong et al. 2015).Particularly in the Johor Strait, there appears to be an inverse relationship between dinoflagellate and diatom counts (Chia et al. 1988).
There have been more than 20 positively identified species of dinoflagellates reported from Singapore's waters, but those which are known in detail have been studied mainly because they are associated with harmful algal blooms (Leong et al. 2015).These toxin-producing dinoflagellate species are commonly associated with human illnesses such as paralytic shellfish poisoning and diarrhetic shellfish poisoning (Holmes and Teo 2002).Based on phylogenetic analysis and identification using the large subunit rDNA (28S rDNA), Tang et al. (2007) presented the first evidence of icthyotoxin production by Alexandrium leei and showed that A. leei strains in Singapore were more similar to isolates from Malaysia than to a strain from Korea.More recently, Leong et al. (2015) isolated clones of the 18S rDNA that were sequenced and compared to publicly available data from the GenBank nucleotide da-tabase.With this method and morphological confirmation, they detected three new dinoflagellate records for Singapore-Karenia mikimotoi, Karlodinium cf.australe and Karlodinium cf.veneficum (Leong et al. 2015).
While more techniques are being explored, high-throughput sequencing has yet to enter the mainstream of Singapore's phytoplankton research, especially for the non-bloom baseline phytoplankton communities.Hence, the aim of this study is to take a metabarcoding approach to characterise the dinoflagellate communities at four sites along the western coast of Singapore (Fig. 1).We seek to understand the efficacy of this approach to recover dinoflagellate taxa that are typically identified microscopically and even detect species new to this locality.The data also allow a preliminary analysis of spatial and temporal variations in community composition along the West Johor Strait (WJS), to test the hypothesis that mixing between the inner and outer WJS homogenises communities along the Strait.Overall, this DNA metabarcoding study advances our understanding of dinoflagellate diversity and distribution and provides a baseline against which monitoring of phytoplankton communities in Singapore can be performed.

Study sites and sampling
A total of 29 plankton samples were collected between September and December 2015 at four sites in western Singapore (Fig. 1).Along the West Johor Strait (WJS), three replicate samples per month were collected from stations at inner and outer WJS (1.45883°, 103.7202° and 1.34037°, 103.63018°, respectively; 24 samples).Along the Singapore Strait, two samples were collected from St John's Island (SJI) and three at a station off Jurong Island (JI) (1.22247°, 103.8487° and 1.29527°, 103.68202°, respectively; 5 samples).
Collections were carried out using a 15-μm-mesh plankton net hauled vertically from 5 m depth to the water surface.From each haul, the volume of water collected was standardised using a 50-ml measuring cylinder and temporarily transferred to a plastic bottle.Each sample was filtered through an 8-µm cellulose filter paper (Whatman, Sigma-Aldrich), which was immediately wrapped in sterilised aluminium foil, snap frozen in liquid nitrogen and stored at -30 °C prior to DNA extraction.

DNA extraction, amplification and sequencing
Each filter paper was cut into two and placed in separate 1.5-ml tubes for DNA extraction using the standard CTAB (cetyltrimethylammonium bromide) and phenol-chloroform protocol (Doyle and Doyle 1987).DNA pellets were re-suspended in 100 μl of water, pooled between the two tubes and stored at -30°C until DNA amplification.
A 313-bp, V4-V5 region of the 18S rRNA locus was amplified using newly-designed forward primer, REL18S1F (5'-GTT GCG GTT AAA AAG CTC GTA GTT GGA-3') and reverse primer, REL18S1R (5'-AAC AAA TCC AAG AAT TTC ACC TCT GAC-3'), which is the reverse complement of the published Dino18SF3 designed specifically for dinoflagellates (Lin et al. 2006; see also Ki 2012).This primer combination was designed based on highly conserved priming sites that could amplify a variable fragment, using a MAFFT version 7.222 (Katoh et al. 2002;Katoh and Toh 2008;Katoh and Standley 2013) alignment of known dinoflagellate species from Singapore (Leong et al. 2015).The fragment length chosen was aimed at maximising the overlap between the two paired-end reads from the Illumina MiSeq System.A unique 8-bp barcode generated using Barcode Generator 2.8 (Comai and Howell 2009) was added to the 5' end of each primer to pool samples for multiplexed sequencing (Suppl.material 1).
Three PCR replicates were carried out for each sample with a 25-μl reaction mixture containing 2.5 μl 10× reaction buffer, 2.0 μl dNTPs, 1.0 μl of 10μM forward and reverse primers, 0.2 μl Bioready rTaq (Bulldog Bio), 2 μl DNA diluted 5× or 10× and 16.3 μl water.For each unique primer combination, a negative control (without DNA) was also prepared.In other words, every PCR for an actual sample was accompanied by a negative control using the same pair of barcoded primers.During the analyses, MOTUs appearing in the negative controls post-filtering would be removed from their corresponding samples.The PCR protocol comprised 1 min of initial denaturation at 94 °C, followed by 35 cycles of 45 s at 94 °C, 45 s at 53 °C and 1 min at 72 °C, ending with 3 min at 72 °C.Amplified products were quantified using the Qubit dsDNA BR Assay Kit on a Qubit 3 Fluorometer (Thermo Fisher Scientific) in order to pool approximately equal amount of PCR product for each tagged amplicon.Purification of the mixed products was performed using SureClean Plus (Bioline) following the manufacturer's protocol.
The pooled PCR products, including all negative controls, were split into two for Illumina DNA library preparation and paired-end sequencing with the Illumina MiSeq System.Approximately half of a sequencing run was targeted for each library-the first run generated read lengths of 2× 250 bp (MiSeq Reagent Kit v2), while the second 2× 300 bp (MiSeq Reagent Kit v3).

Bioinformatic pipeline
Assembly of paired-end reads was performed using Paired-End reAd mergeR (PEAR) version 0.9.6 (Zhang et al. 2014), with the criteria of 100-bp minimum overlap, minimum and maximum lengths at 150 and 330 bp respectively and Phred quality threshold of 30.Amongst all the dinoflagellate sequences from GenBank that were analysed here, few indels were found internal to the priming regions, with minimum and maximum lengths of 248 and 263 bp respectively.Therefore, our filtering criterion with respect to read length would capture all dinoflagellates.
The assembled reads were analysed in OBITools version 1.2.0 (Boyer et al. 2016).A maximum of 2-bp mismatch for primer tags was used to assign sequence records to corresponding samples with ngsfilter.obiannotate followed by obisplit re-annotated sequence descriptions and separated them into files based on their assigned samples.Within each sample file, strictly identical sequences were grouped together and assigned a count number with obiuniq.obiclean was used to detect amplification or sequencing errors, by classifying sequence records to either 'head', 'internal' or 'singleton', taking into account sequence similarities and record counts.The 'head' is the most common sequence amongst all sequences, while a 'singleton' sequence is one with no other variants in the amplification product (Guardiola et al. 2015); 'internal' is neither of the above and most likely corresponds to a sequencing or amplification error.Only 'head' sequences of at least 150 bp were retained for further analysis using obigrep and obisplit.
Sequences with at least 20 reads were clustered into molecular operational taxonomic units (MOTUs) with USEARCH version 8.1.1861(Edgar 2010) at a similarity threshold of 97% and the greedy clustering algorithm.For each cluster, the sequence with the highest number of reads was taken as the 'centroid', where counts of the remaining sequences with at most 3% dissimilarity were added to it.Chimeric sequences were also removed in this process.For taxonomic assignment, the representative sequence for each MOTU was searched against the GenBank sequence database using BLAST+ version 2.2.31 (Altschul et al. 1990).Only sequence matches with minimum identity of 95% to any dinoflagellate were retained for further analyses.

Data analyses
We compared dinoflagellate MOTU identities across the PCR triplicates and only retained those that met the above criteria in at least two replicates.MOTUs appearing in the negative controls were to be removed from their corresponding sample set if they met the same criteria.Retained sequences were combined first according to their sample and then site, noting their respective number of reads for each MOTU.Sequences from all samples were combined to determine the total number of unique dinoflagellate MOTUs amongst all samples.
Phylogenetic analyses were carried out to determine relationships amongst the MOTUs and previously sequenced taxa.Rhodophytes Rhodella violacea (Kornmann) Wehrmeyer (GenBank accession AF168624) and Bangia atropurpurea (Mertens ex Roth) C.Agardh (AF169339) were selected as outgroups.Taxa analysed in Leong et al. (2015) as well as published sequences from GenBank that matched ≥95% to MOTUs obtained here were also included to construct a data matrix with 100 terminals.Sequences were aligned using MAFFT version 7.222 (Katoh et al. 2002;Katoh and Toh 2008;Katoh and Standley 2013).A neighbour-joining (NJ) tree was inferred using PAUP* version 4.0b10 (Swofford 2003), with branch supports assessed using 1,000 bootstrap pseudoreplicates (Felsenstein 1985).We also performed model-based maximum likelihood (ML) and Bayesian analyses.ML tree searches via RAxML version 8.0.9  (Stamatakis 2006(Stamatakis , 2014;;Stamatakis et al. 2008) were carried out with 50 replicates, default GTRGAMMA substitution model and 1,000 bootstrap replicates.For Bayesian analysis, we selected the most suitable substitution model using jModelTest 2.1.10(Guindon and Gascuel 2003;Posada 2008;Darriba et al. 2012) based on the Akaike information criterion (AIC).MrBayes 3.2.6 (Huelsenbeck and Ronquist 2001;Ronquist and Huelsenbeck 2003;Ronquist et al. 2012) was used to generate four runs of 11 million Markov chain Monte Carlo iterations each, saving a tree every hundredth iteration.Convergence amongst runs was determined using Tracer 1.6 (Rambaut et al. 2014), which led to the removal of the first 50,001 trees as burn-in.

Results
The two Illumina MiSeq sequencing runs produced 2,847,170 assembled paired-end reads corresponding to 573,176 unique sequences.Further error pruning, sequence filtering and chimera removal further reduced the number of unique sequences to 268.After clustering the sequences using a 97% similarity threshold, 133 MOTUs were recovered.Filtering based on ≥95% sequence similarity to GenBank dinoflagellate sequences resulted in 28 unique MOTUs remaining (Table 1).Sequences have been deposited into GenBank (accession numbers MH234223-MH234250).The mean number of reads per sample was 46,386 (± S.E.9,743) and the mean number of MOTUs per sample was 3.79 (± S.E.0.45).At the site level, considering only stations with three samples collected, there were on average 4.56 (± SE 1.25) dinoflagellate MOTUs per sampling station (Table 1).Our negative controls contained an average of 93 sequences per sample, but none of these met the set criteria (≥150 bp, ≥20 reads, ≥95% match).
The phylogenetic analyses revealed deep relationships that were generally inconsistent amongst NJ, ML and Bayesian reconstructions, but these were poorly supported across all analyses.There were no topological conflicts that were supported by any of the analyses, so we discuss the well-supported nodes using the NJ tree (Fig. 2).Branch supports increased closer to the tips, thus helping to corroborate most of the BLAST matches.Of the 28 MOTUs, 14 were matched 100% to a known sequence from GenBank and most of these were strongly supported on the tree (NJ, ML bootstrap ≥90, and Bayesian posterior probability =1; e.g.OTU61 + uncultured dinoflagellate GU819712, and OTU31 + Gonyaulax spinifera).
OTU10, OTU37 and OTU61 showed no sequence similarity to any known species but exhibited high similarity to a limited set of 'uncultured dinoflagellates' (Table 1).They were phylogenetically distinct from the rest of the dinoflagellates (Fig. 2) and considered to be associated with Syndiniales, belonging to the deep-branching marine alveolates (MALV; López-García et al. 2001;Moon-van der Staay et al. 2001;Guillou et al. 2008).Other MOTUs that were not genetically similar to any known species and were not nested within a known clade include OTU12, OTU45, OTU58, OTU96, OTU97 and OTU102.High genetic diversity was observed amongst the MO-TUs nested within Gyrodinium spirale and G. fusiforme (OTU1, OTU64, OTU110, OTU123 and OTU127).Some MOTUs could be placed in a likely taxon, such as OTU20 in Gymnodinium and OTU129 in Prorocentrum, as they had high sequence similarity with and were nested within known taxa.There were also MOTUs exhibiting exact sequence matches to specific dinoflagellate species, an indication that such species were already known.These were represented by zero branch length difference between the MOTU and the known species, such as OTU23 with Amphidinium klebsii, OTU73 with Alexandrium cohorticula and OTU81 with Gyrodinium instriatum.
None of the samples recovered bloom-forming dinoflagellates such as Karlodinium and Takayama, which were detected in abundance during a recent phytoplankton bloom along the WJS in February 2014 (Lim et al. 2014;Leong et al. 2015).OTU98 was closely related to Symbiodinium sequences and was also observed during the bloom but could be part of the background community as ex hospite zooxanthellae.
The two most widespread MOTUs were OTU1 (identical sequence with G. fusiforme) and OTU20 (uncultured dinoflagellate).The former was present at both WJS sites and Jurong Island, while the latter was detected at both WJS sites and St John's Island (Table 1).None of the MOTUs were found at all sites.OTU1 was detected at WJS during every sampling and registered consistently high read count across sites-as high as 10,000× the next most abundant taxon (i.e.September 2015 at inner WJS).
Nearly half of the MOTUs (13 of 28) were detected at inner and outer WJS each.There was an overlap of four MOTUs, including the abundant OTU1 (G.fusiforme), between the two WJS sites.Fewer MOTUs were found at Jurong Island (6 of 28) and St John's Island (8 of 28).Two-thirds of the former's MOTUs were unique to Jurong Island, while most of St John's Island's MOTUs (6 of 8) were found at WJS. Jurong Island and St John's Island were sampled only at one time point, so results ought to be viewed with caution.At WJS, the highest MOTU richness was observed during October 2015 (Fig. 3).Particularly for outer WJS, 12 of the 13 MOTUs were detected during that month, but apart from October, the site had very low MOTU richness.

Discussion
The two Illumina MiSeq sequencing runs generated a large number of reads that likely represent the majority of dinoflagellate individuals captured on the sample filters.Overall, the use of DNA metabarcoding on minimally sorted water samples has recovered a greater diversity of dinoflagellates in Singapore than a previous study that sequenced clone isolates amplified from water samples at comparable sites (Leong et al. 2015).As the diversity estimates here are based principally on sequence analy- sis, they need to be validated with other high-throughput methods such as flow cytometry and morphological examination using, for instance, scanning electron microscopy.Nevertheless, the spatial and temporal variations of community composition, shown here, form essential hypotheses for further tests of local and regional dinoflagellate distribution patterns.
After retaining sequences with ≥95% similarity to GenBank dinoflagellate sequences, only 28 dinoflagellate MOTUs have been detected.It is worth noting that the extent of how well represented 18S rRNA dinoflagellate sequences are in GenBank varies amongst genera.Sequences from the genus Alexandrium are most abundant on GenBank, with approximately 500 available.This is followed by Prorocentrum with 97 sequences retrieved; all other genera are represented by fewer sequences, with some having only one sequence available.Despite having the largest collection of published 18S rRNA sequences available, sequencing effort for dinoflagellates is highly skewed amongst genera.The majority of MOTUs, not assigned to dinoflagellates, indicate that the primers which have been designed for dinoflagellates also capture oth-Table 1.Molecular operational taxonomic units of dinoflagellates recovered from four sites along the western coast of Singapore, with information about their closest GenBank matches and sequencing read counts at each site and sampling month in 2015.(Henrichs et al. 2013) 100.0 er organisms.Our database searches have indeed recovered several diatoms amongst other eukaryotes.The 18S region amplified falls within the V4-V5 region that is commonly used to broadly amplify eukaryotic 18S rDNA using universal primers (Hadziavdic et al. 2014).Despite using primer sequences that appear to be specific to dinoflagellates, the PCRs have also allowed non-specific binding.Future studies could consider adjacent priming sites to reduce non-target amplification.

MOTU
As with the previous study in Singapore by Leong et al. (2015), our high-throughput sequencing recovers parasitic and symbiotic dinoflagellates which are challenging to detect using conventional microscopic sorting.The three MOTUs nested in the Syndiniales or MALV cluster (OTU10, OTU37 and OTU61) are clear outgroups of the 'core' dinoflagellate clade (Guillou et al. 2008;Taylor et al. 2008).While this and previous studies (e.g.Guillou et al. 2008;Leong et al. 2015) find Syndiniales to be monophyletic, as a whole, it may be paraphyletic (Strassert et al. 2018).Syndiniales are obligate parasites found throughout marine habitats from the water surface to sediments (Guillou et al. 2008) and are often highly represented in environmental samples (Kok et al. 2012).They have extremely diverse host ranges, associating with not only alveolates, but also copepods, cnidarians and even fish eggs (Chambouvet et al. 2008).Even if they are in low abundance, the community could quickly increase in population size and affect the abundance of its host, consequently impacting ecosystem functioning (Logares et al. 2015).It has only been recorded recently in Singapore's waters during the 2014 bloom (Leong et al. 2015).Our results show that read counts of sequences closely related to Syndiniales are relatively high (Table 1)-an indication that these marine parasites are prevalent in our waters even in non-bloom conditions.
A symbiotic dinoflagellate of the genus Symbiodinium (OTU98) has been detected at inner WJS in December 2015.That this taxon is not more widespread and even absent in samples from the reef environment of St John's Island is surprising because of its ubiquity and importance as endosymbionts of scleractinian corals (zooxanthellae; Tanzil et al. 2016), other marine invertebrates and protists (Takabayashi et al. 2012).They can exist as free-living cells that are detectable outside the hosts, sometimes in high densities and serve as symbiont sources to be taken up by juvenile corals or for uptake by adult corals faced with environmental stress (Littman et al. 2008;Takabayashi et al. 2012).While it is not impossible to identify them using microscopy alone, they are morphologically similar to other dinoflagellates and have often been misclassified (Littman et al. 2008).Therefore, DNA-based methods are useful for distinguishing Symbiodinium from other dinoflagellates.Knowing that they can be captured in our study, future research or monitoring programmes can utilise DNA metabarcoding to help track the pool of potential coral endosymbionts in Singapore waters.
By far, the most read-abundant MOTU is OTU1, which has a sequence identical to Gyrodinium fusiforme.While taxon assignments based on the 18S rRNA gene may not be precise, this species is known to be widespread and has been recorded in Indonesia and the Malacca Strait (Dodge 1982;Noor et al. 2007).Present at the inner and outer WJS every month of sampling from September to December 2015, it is one of only four shared MOTUs at the two sites.Interesting, G. fusiforme was not detected during the 2014 Karlodinium australe bloom on the Singapore side of the channel (Leong et al. 2015).Furthermore, Gyrodinium was only one of several dinoflagellate taxa constituting a minute proportion (<0.2%) of cell densities observed on the Malaysian side (Lim et al. 2014).With the exception of the MALV and Symbiodinium, there appears to be mutual exclusivity between bloom and background dinoflagellate communities (Fig. 2).In particular, Takayama and Karlodinium, genera linked to mass fish mortality during the bloom (Leong et al. 2015), were not detected from our samples.These is possibility that they have been excluded from the inner WJS due to limited exchange with the Singapore Strait which connects to other regions.However, as they were also absent from the outer WJS, Jurong Island and St John's Island, it is more likely that they were encysted and benthic during our sampling, although cysts have yet to be observed for these species (Bergholtz et al. 2005;Wang et al. 2011).Sequencing of benthic sediments from the WJS may help detect and identify cysts that can be matched to the vegetative forms in the water column (Godhe et al. 2002;Nagai et al. 2012;Gao et al. 2017).
Contrary to the hypothesis that mixing between the inner and outer WJS homogenises the communities along the Strait, considerable differences have been found between the two sampling stations (Fig. 3; Table 1).The West Johor Strait is a shallow channel bounded by the Causeway and Pulai River (Kazemi et al. 2014).Species compositional differences between the inner and outer WJS suggest that there are site-specific variations related to coastal hydrodynamics and physico-chemical properties.Surrounding water bodies and monsoonal patterns determine much of the local hydrodynamics and water parameters such as salinity, temperature and turbidity (Behera et al. 2013).The circulation of Singapore's waters is largely driven by two dominant monsoon seasons-the northeast (November to March) and southwest (June-September) monsoons separated by two brief inter-monsoon periods (Behera et al. 2013).Our sampling period coincides only briefly with the two monsoons and the four-month sampling period is insufficient for making meaningful temporal comparisons.Nevertheless, the temporal variations appear to be related to flushing potential.Tidal mixing would reduce stratification and the hydrodynamic response to variation in freshwater inflows is dependent on the flushing time of the channel (Kazemi et al. 2014).The inner WJS is known to have low flushing potential with a high residence time of >70 days (Bayen et al. 2013) and is also affected by consistently high levels of run-off from the Johor rivers resulting in more eutrophic conditions (Sin et al. 2016).These factors could drive the slightly more uniform community structure and diversity over time at the inner WJS compared to the outer WJS which, being closer to the opening of the channel, would have greater tidal mixing and flushing potential.
The primary goal of all high-throughput sequencing of amplified markers is to recover accurate MOTU sequences and richness estimates from the voluminous sequence data (Nguyen et al. 2015).The number of MOTUs is dependent on the similarity threshold applied to sequence reads for delimiting taxonomic units.Sequences that are more similar than a given threshold will be grouped into the same MOTU (Blaxter 2004).The ideal similarity value used for clustering should be one that most closely approximates the diversity of a sample at the species level (Guardiola et al. 2015).At a lower similarity threshold, fewer MOTUs could be recovered, but these then become divided into separate MOTUs when a higher similarity level is used (Logares et al. 2015).The differences in the absolute number of MOTUs could be as much as six times with clustering thresholds set at 3% and 1% dissimilarities (Logares et al. 2015) and these can vary for different genetic loci (Ki 2012).Here we have attempted clustering at various dissimilarity thresholds (1-10%) to determine the sequence variabilities that show no change in the number of clusters obtained.The suitable cut-off distance ranged from 2% to 5% amongst our samples, so we used 3% as a conservative threshold so as not to overestimate diversity at each site.
The 18S rRNA is a commonly-used and well-characterised genetic marker with a highly conserved function across all living cells.It comprises nine hypervariable regions (V1-V9) (Ki 2012;Hadziavdic et al. 2014), with V2, V4 and V9 being at sufficient interspecific variabilities for biodiversity assessments (Hadziavdic et al. 2014).Nevertheless, all eight regions of eukaryotes (lacking V6) have been targeted for sequencing (Ki 2012).Here, in order to place our MOTUs in the same phylogenetic context with known dinoflagellates from Singapore, we designed primers to target the hypervariable V4-V5 region, which is also internal to the ~600-bp marker used by Leong et al. (2015) and has a length of 313 bp for paired-end sequencing by the Illumina MiSeq System (Hadziavdic et al. 2014;Le Bescot et al. 2016;Searle et al. 2016).It is important to note that different 18S regions can lead to varying diversity estimates (Edgcomb and Stoeck 2012), so our results need to be verified in future by analysing other variable regions.
Downstream of the amplification and sequencing steps, we have attempted to account for possible errors associated with high-throughput sequencing and DNA metabarcoding.These measures include the omission of sequences <150 bp represented by fewer than 20 reads or those not matching at least 95% to published dinoflagellate sequences.Consequently, despite high read recovery, there was low read usability, with only 268 unique sequences that were eventually used for analysis.We also retained unique sequences only if they were detected in two of three PCR replicates and would potentially remove signals appearing in the negative controls.While high-throughput sequencing studies such as this are efficient in producing large amounts of sequence data, it is important to note that there are known issues concerning Illumina-based DNA metabarcoding, including primer tag jumps, contamination and false positive detections (Esling et al. 2015).We have sought to minimise these effects through the detailed experimental design and by implementing the rigorous criteria in our bioinformatic pipeline.

Conclusion
Despite the stringent thresholds, we have recovered a higher diversity of planktonic dinoflagellates-28 MO-TUs-than previously reported for Singapore waters (Leong et al. 2015).This was achieved within a short four-month sampling period with minimal pre-sorting and without isolation of specific organisms.The 15-μm plankton net used for capturing microplankton would have omitted even smaller dinoflagellates since studies have suggested that many open water environments are dominated by small cells in the pico-(0.8-5μm)and nanoplankton (5-20 μm) ranges (Gin et al. 2000;de Vargas et al. 2015).Therefore, our estimates are conservatively low.Further studies targeting greater ranges of organism size, depth and habitat are likely to detect more taxa, both known and unknown.The large and yet expanding DNA sequence database-to which the MOTUs obtained here contribute-will enable more precise matches for DNA metabarcodes locally and in the region.

Figure 1 .
Figure 1.Sampling stations where plankton was collected for this study between September and December 2015.Arrow represents predominant current direction, stronger during the northeast monsoon (November to March) and weaker during the southwest monsoon (June-September).

Figure 2 .
Figure 2. Neighbour-joining (NJ) tree showing 18S rRNA sequence relationships amongst dinoflagellates in Singapore (bold) and from GenBank.The 28 molecular operational taxonomic units detected in this study are denoted by prefix 'OTU'.Dinoflagellate orders are represented by coloured branches.Bootstrap supports based on NJ and maximum likelihood (ML) methods (≥50), as well as Bayesian posterior probabilities (≥0.8) are shown as circles at the nodes.

Figure 3 .
Figure 3. Number of dinoflagellate molecular operational taxonomic units (MOTUs) recovered at each of four sites, inner West Johor Strait (Inner WJS, red), outer West Johor Strait (Outer WJS, orange), Jurong Island (JI, green) and St John's Island (SJI, blue), as well as the month of sampling in 2015.