Data Paper |
Corresponding author: Ľubomír Rajter ( lubomir.rajter@gmail.com ) Academic editor: Jan Pawlowski
© 2021 Ľubomír Rajter, Micah Dunthorn.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Rajter Ľ, Dunthorn M (2021) Ciliate SSU-rDNA reference alignments and trees for phylogenetic placements of metabarcoding data. Metabarcoding and Metagenomics 5: e69602. https://doi.org/10.3897/mbmg.5.69602
|
Although ciliates are one of the most dominant microbial eukaryotic groups in many environments, there is a lack of updated global ciliate alignments and reference trees that can be used for phylogenetic placement methods to analyze environmental metabarcoding data. Here we fill this gap by providing reference alignments and trees for those ciliates taxa with available SSU-rDNA sequences derived from identified species. Each alignment contains 478 ciliate and six outgroup taxa, and they were made using different masking strategies for alignment positions (unmasked, masked and masked except the hypervariable V4 region). We constrained the monophyly of the major ciliate groups based on the recently updated classification of protists and based on phylogenomic data. Taxa of uncertain phylogenetic position were kept unconstrained, except for Mesodinium species that we constrained to form a clade with the Litostomatea. These ciliate reference alignments and trees can be used to perform taxonomic assignments of metabarcoding data, discover novel ciliate clades, estimate species richness, and overlay measured ecological parameters onto the phylogenetic placements.
Ciliophora, diversity, environmental sequencing, HTS, protists, 18S
Phylogenetic placement is increasingly used to analyze environmental metabarcoding data. For example, it allows us to retain data that is deeply divergent from taxonomic reference databases (
Ciliates are an ancient clade of microbial eukaryotes that originated sometime in the Proterozoic (
There are five global ciliate alignments and trees suitable for phylogenetic placement methods. All are based on the SSU-rDNA, containing the hypervariable V4 and V9 regions that are usually amplified in metabarcoding studies of ciliates. The first was published by
Here we provide updated ciliate reference alignments and trees designed to be used in phylogenetic placement analyses of environmental metabarcoding data. To do this, we: i) created SSU-rDNA dataset containing 478 ingroup and six outgroup sequences, ii) made three multiple sequence alignments with different masking of the ambiguously aligned nucleotide positions, and iii) constructed the maximum likelihood (ML) trees with constrained monophyly of main ciliate taxa based on the
We enriched the ciliate nuclear SSU-rDNA alignment from Dunthorn et al. (2014) with subsequently-published ciliate sequences from GenBank (
Multiple sequence alignments were built with MAFFT v7.471 (
We created three multiple sequence alignments to compare the effects of various masking strategies (Suppl. material
Phylogenetic trees were inferred from all three alignments using RAXML v8.2.12 (
We forced the monophyly of all ciliate taxa based on the
All ciliate lineages clustered into two primary groups in all our trees. The first group, containing the Postciliodesmatophora clade, encompassed the Karyorelictea and Heterotrichea. This branching pattern is supported by previous morphological (
The second group, containing the Intramacronucleata clade, encompassed the rest of the major ciliate taxa. Intramacronucleata was further splitting into two clades corresponding to SAL and CONthreeP groups. SAL contained the Spirotrichea, Armophorea (incl. Cariacotrichea, Muranotrichea, and Parablepharismea), and Litostomatea. Although SAL had only low statistical support in our unconstrained tree, its monophyly is highly supported in phylogenomic studies (
The CONthreeP clade is the most diverse, containing six major ciliate groups: Colpodea, Oligohymenoporea, Nassophorea, Phyllopharyngea, Plagiopylea, Prostomatea (
We constrained monophyly of the 11 main ciliate taxa based on the
One of these problematic taxa is the Nassophorea. Recent molecular studies have inferred this taxon as non-monophyletic, because its two subtaxa, Microthoracida and Nassulida, have not grouped together (
Spirotrichea represents a species-rich ciliate taxon with some taxonomically questionable members (
The taxon Armophorea is also non-monophyletic in SSU-rDNA phylogenies (
The taxon Cariacotrichea was created based on the environmental sequences from the anoxic Cariaco Basin’s deep-sea waters in Venezuela (
Although most of our classification followed
Despite the recent advances in sequencing techniques and more molecular data available, the phylogenetic position of some ciliate taxa is still uncertain. We still wait for more data from these enigmatic lineages that we labelled as incertea sedis for now. We discuss below those taxa one by one and describe their evolutionary position in our reference trees compared to results from other phylogenetic studies.
Protocruziidae (Protocruzia adherens and Protocruzia contrax) represented a separate lineage within the Intramacronucleata in all our trees (Fig.
Phacodiniidia represented by Phacodinium metchnikoffi sequences from the Korean and China populations created a separate lineage clustered within SAL lineage as sister to Spirotrichea or as a sister to Litostomatea in our reference trees. Although Phacodiniidia represents a spirotrichean subgroup according to (
Although Askenasia sp. had a stable phylogenetic position within the lineage Prostomatea in all our reference trees, this taxon represents incertea sedis in
Cyclotrichium cyclokaryon and Pseudotrachelocerca trepida formed a close relationship in all our trees, clustering always inside the CONthreeP group. Both taxa are also incertae sedis in
Phylogenetic placement of metabarcoding data allows researchers to ask broader evolutionary and ecological questions. But without taxonomical expertise on specific taxa, interpretation of these placements is difficult. Here we offer suggestions on some of these interpretations for ciliates using our reference alignments and trees. First, the user can determine the taxonomic position of the investigated environmental sequences as during phylogenetic placement is each sequence attached to the reference tree. The exact phylogenetic positions for all placed sequences can be retrieved using the taxonomical assignment command in the GAPPA package (
Second, the user can discover novel ciliate clades. Despite a long history of the ciliate investigation, novel ciliate clades outside of the main ciliate taxa could be still found in understudied environments such as the anoxic deep-sea waters (
Third, the user can estimate the total number of ciliate species and their subtaxa. This is possible by using the assign subcommand in the GAPPA package (
Ciliate species diversity has mostly been estimated using morphological differences to define species boundaries.
Fourth, the user can correlate phylogenetic placements with measured environmental parameters using the edge correlation command in GAPPA. In this way, it is possible to combine environmental parameters with phylogenetic placements and determine which environmental factors influence ciliate community composition.
As the phylogenetic placement methods are increasingly used in metabarcoding studies, here we provided global ciliate reference alignments and trees. We designed and formatted these reference alignments and trees specifically for the ciliate diversity research and phylogenetic placement demands as there are no such available datasets. These files are easily downloadable from the online supplement (Suppl. material
We received funding from the Alexander von Humboldt Foundation to LR, and the Deutsche Forschungsgemeinschaft (grant DU1319/5-1) to MD.
File S1
Data type: pdf file
Explanation note: Description of the relationships within the main ciliate lineages in the reference trees.
File S2
Data type: multiple sequence alignment
Explanation note: Masked reference alignment except the V4 region in the FASTA format.
File S3
Data type: multiple sequence alignment
Explanation note: Masked reference alignment except the V4 region in the PHYLIP format.
File S4
Data type: NEWICK tree
Explanation note: Reference tree built from the masked alignment except the V4 region in the NEWICK format.
File S5
Data type: multiple sequence alignment
Explanation note: Unmasked reference alignment in the FASTA format.
File S6
Data type: multiple sequence alignment
Explanation note: Unmasked reference alignment in the PHYLIP format.
File S7
Data type: NEWICK tree
Explanation note: Reference tree built from the unmasked alignment in the NEWICK format.
File S8
Data type: multiple sequence alignment
Explanation note: Masked reference alignment in the FASTA format.
File S9
Data type: multiple sequence alignment
Explanation note: Masked reference alignment in the PHYLIP format.
File S10
Data type: NEWICK tree
Explanation note: Reference tree built from the masked alignment in the NEWICK format.
File S11
Data type: multiple sequence alignment
Explanation note: Unconstrained unmasked alignment in the FASTA format.
File S12
Data type: multiple sequence alignment
Explanation note: Unconstrained unmasked alignment in the PHYLIP format.
File S13
Data type: NEWICK tree
Explanation note: Phylogenetic tree built from the unconstrained unmasked alignment in the NEWICK format.
File S14
Data type: text file
Explanation note: Input file for a taxonomical assignment function in the GAPPA package; this file containing a tab-separated list of reference taxa with corresponding taxonomical assignment to the main ciliate lineages or incertae sedis.
File S15
Data type: text file
Explanation note: Input file for a taxonomical assignment function in the GAPPA package; this file containing a tab-separated list of reference taxa with corresponding taxonomical assignment.
Figure S1
Data type: figure (pdf file)
Explanation note: Reference tree built from the masked alignment except the V4 region.
Figure S2
Data type: figure (pdf file)
Explanation note: Reference tree built from the unmasked alignment.
Figure S3
Data type: figure (pdf file)
Explanation note: Reference tree built from the masked alignment.
Figure S4
Data type: figure (pdf file)
Explanation note: Reference tree built from the unconstrained unmasked alignment.
Table S1
Data type: word document
Explanation note: Complete list of the taxa with the sequence entries used in this study.
Table S2
Data type: word document
Explanation note: Characterization and evolutionary models of the alignments analyzed.