Metabarcoding and Metagenomics :
Research Article
|
Corresponding author: Nadine Graupner (nadine.graupner@uni-due.de)
Academic editor: Thorsten Stoeck
Received: 27 Jul 2017 | Accepted: 30 Aug 2017 | Published: 20 Sep 2017
© 2017 Nadine Graupner, Jens Boenigk, Christina Bock, Manfred Jensen, Sabina Marks, Sven Rahmann, Daniela Beisser
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Graupner N, Boenigk J, Bock C, Jensen M, Marks S, Rahmann S, Beisser D (2017) Functional and phylogenetic analysis of the core transcriptome of Ochromonadales. Metabarcoding and Metagenomics 1: e19862. https://doi.org/10.3897/mbmg.1.19862
|
Background
Most protist lineages consist of members with diverging features e.g. different modes of nutrition and adaptations for life in different habitat types and climatic zones. The nutritional mode is particularly variable in chrysophytes and they are therefore an excellent model group to study the core genes and metabolic pathways of a functionally diverse lineage. The objective of our study is the identification of the joint genetic repertoire expressed in closely related chrysophytes as well as the extent of variation on species and strain level. Therefore, we investigated the transcriptomes of six strains belonging to four species of Ochromonadales. We performed analyses on metabolic pathway level as well as on sequence level.
Results
We could identify 1,574 core genes shared between all six investigated strains of Ochromonadales. Most of these core genes were affiliated with the primary metabolism. Phylogenetic analysis of 166 protein-coding core genes supported a close relation of Poteriospumella lacustris and Poterioochromonas malhamensis and resolved for more than 50% of investigated genes the relationship of strains affiliated with the species P. lacustris. Further, we found diverging phylogenetic patterns for genes interacting with the environment.
Conclusions
In Ochromonadales, a functionally diverse lineage, the core transcriptome represents only a minor part of the individual transcriptomes. But this small fraction of genes comprises the basal metabolism essential for life in several protist lineages. Phylogenetic analyses of these genes indicate a similar degree of conservation as observed for genes coding for ribosomal proteins.
Chrysophyceae, protist, expressed sequence tags (EST), evolutionary ecology, phylotranscriptomics, core genes, metabolic pathways
Organisms and their genomes evolved under persistently fluctuating ecological and evolutionary pressures. As a consequence, eukaryotic genomes and their gene content vary considerably in size ranging from 2,000 to 35,000 genes (
Previous investigations of core genes aimed at various research topics spanning from the identification of the minimal gene set necessarily for cellular life (
Further, we ask the question to what extent do gene phylogenies reflect the history, i.e. phylogeny, of the organism and to what extent are these pattern concealed by ecological adaptation. Recent studies (
We used six strains affiliated with four species within the order Ochromonadales (Chrysophyceae), i.e. Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11), which were part of the overarching transcriptome study of
We focused on strains affiliated with the order Ochromonadales. For our study we used six strains affiliated with four species: Poteriospumella lacustris, Poterioochromonas malhamensis, Spumella vulgaris and Pedospumella encystans. For further information regarding origin, mode of nutrition and culturing conditions see Table
Poteriospumella lacustris | Poterioochromonas malhamensis | Spumella vulgaris | Pedospumella encystans | |
Strain | JBC07, JBM10, JBNZ41 | DS | 199hm | JBMS11 |
18S clade | C3 | C3 | C2 | C1 |
Geographical origin | China, Austria, New Zealand | Austria | Arctic | Austria |
Habitat origin | freshwater | freshwater | freshwater | soil |
Mode of nutrition | heterotroph | mixotroph | heterotroph | heterotroph |
Media | IB + 3g/l nutrient broth, soytone & yeast extract | IB + 3g/l nutrient broth, soytone & yeast extract | IB + bacteria (Listonella pelagia PG5) | IB + bacteria (Listonella pelagia PG5) |
Temperature | 15°C | 15°C | 15°C | 15°C |
Light:dark-cycle | 16 : 8 | 16 : 8 | 16 : 8 | 16 : 8 |
Illumination | 75 - 100 µE | 75 - 100 µE | 75 - 100 µE | 75 - 100 µE |
The transcriptome sequences were generated in an overarching study of 18 chrysophyte strains, published in
Base quality of raw sequence reads was checked using the FastQC software (v0.10.1;
Within the KEGG database genes are associated with orthologous groups and thus assigned to KEGG Orthology (KO) identifiers. In the following we use the term gene for the annotated orthologous gene of the considered transcript.
The assigned KO identifiers were used to determine the core transcriptome, constituted by the intersection of the KO identifier sets between all strains, shared transcripts between several species and exclusive transcripts of single strains using the R package Vennerable (3.0;
Orthologous genes (KOs) assignable to KEGG pathways of the KEGG BRITE functional hierarchy level A, B and C (
Gene expression counts of all six strains were compared. Therefore, transcript expression values were obtained with the tool eXpress (v1.3.1;
To create sequence alignments the pairwise alignments to KEGG gene sequences were used as a reference. All transcripts aligning to the same gene were truncated to the minimum overlapping region of at least 100bp and combined in a fasta file. The sequence alignments were constructed with the MAFFT software (7.164b;
European Nucleotide Archive (ENA) under accession number PRJEB13662
Sequencing of the chrysophyte strains resulted in 13.8 to 19.4 million read pairs, which were assembled into 24,783 to 58,003 transcripts (Table
Overview statistics of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11) including transcriptome size, assembly quality and annotation.
P. lacustris (JBC07) | P. lacustris (JBM10) | P. lacustris (JBNZ41) | P. malhamensis (DS) | S. vulgaris (199hm) | P. encystans (JBMS11) | |
No. read pairs (million) | 13.8 | 19.4 | 15.8 | 18.8 | 13.9 | 14.2 |
Reads after quality control (%) | 92.5 | 91.7 | 92.4 | 94.3 | 92.4 | 93 |
No. transcripts | 24,783 | 26,330 | 27,754 | 39,537 | 58,003 | 40,532 |
N50 | 1,155 | 1,246 | 1,275 | 1,405 | 983 | 1,077 |
Remapped reads (%) | 97.3 | 97 | 91.6 | 95.1 | 89.5 | 86.1 |
Estimated no. of protein-coding genes | 20,441 | 20,515 | 21,629 | 30,189 | 38,883 | 28,497 |
No. KEGG hits (E-value < 10-5) | 6,619 | 6,784 | 7,025 | 9,556 | 10,711 | 9,893 |
No. unique KEGG orthologs | 2,248 | 2,265 | 2,265 | 2,620 | 2,694 | 2,652 |
No. KEGG orthologs assignable to pathways | 1,367 | 1,389 | 1,378 | 1,599 | 1,635 | 1,591 |
No. assigned KEGG pathways | 243 | 246 | 244 | 247 | 259 | 257 |
Estimates based on the number of Trinity components yielded a maximum of 20,441 to 38,883 genes. The transcriptomes of the three strains of Poteriospumella lacustris displayed a similar number of estimated genes (20,441 to 21,629 Trinity components) and similar functional annotations despite diverging sequencing depths. This finding and the completeness of several KEGG modules (main reaction steps in KEGG pathways), e.g. glycolysis, citrate cycle, oxidative phosphorylation and nucleotide biosynthesis, denoted a sufficient sequencing depth in all samples.
Between 45.2% and 56.7% of all predicted genes (Trinity components) could be assigned to a gene in the KEGG database. The removal of redundant hits resulted in 2,249 to 2,695 unique KEGG orthologous genes of which 1,367 to 1,635 could be assigned to 243 to 259 pathways (Table
Functional assignment of non-redundant KEGG orthologous genes of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11) to 31 pathway groups of the KEGG categories “Metabolism”, “Genetic Information Processing” (GenInfoPro), “Environmental Information Processing” (EnvInfoPro), “Cellular Processes” and “Organismal Systems”. A All functional hits per pathway group and strain were summarized for the whole transcriptomes, for the core transcriptome of all six strains and for the shared and exclusive genes. B The percentage proportion of genes per pathway group and strain was calculated for the core genes and the shared and exclusive genes.
A PCoA based on presence-absence data of KOs clearly separated all species (Fig.
PCoA based on all identified KEGG orthologous genes of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11). A PCoA based on the presence-absence of the respective genes. B PCoA based on the relative expression level of the respective genes.
Genes that were responsible for the grouping of strains were present in several but not in all species (around 800 genes) or exclusive to one species (181 to 307 genes). Poteriospumella lacustris had the lowest number and Spumella vulgaris the highest number of exclusive genes. Most of these partly shared and exclusive genes were affiliated with “Signal transduction”, “Amino acid metabolism”, “Cell growth and death”, “Carbohydrate metabolism”, “Energy metabolism” and “Replication & repair” (Fig.
The pairwise comparison showed that the percentage of shared KEGG orthologous genes (Fig.
Pairwise comparison of the proportion of shared KEGG orthologous genes of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11). The values in each matrix element are provided as the percentage of shared KEGG orthologous genes in relation to the number of KEGG orthologous genes in the smaller transcriptome. The absolute number of shared KEGG orthologous genes is given in brackets.
Sequence comparisons based on all transcripts (instead of the annotated ones only) revealed that the majority of transcripts was exclusive to one strain. However, the strains of Poteriospumella lacustris shared approximately 50% of their transcripts with each other also indicating their close relation. The number of core transcripts was comparable to those of the annotated part.
The core transcriptome derived from the annotated part of the transcriptomes of the investigated strains of Ochromonadales comprised 1,574 KEGG orthologous genes. Of these, we were able to assign 1,017 to one or more KEGG pathways (Fig.
The core transcriptome comprised a large number of genes involved in basic cell metabolism: Roughly 38% of the genes of the core transcriptome were assigned to “Metabolism” and 27% of the genes were assigned to “Genetic Information Processing” (Fig.
Genes of the core transcriptome that were highly expressed in all strains were affiliated with the KEGG category “Genetic Information Processing” coding for various ribosomal proteins of the large and small subunit of ribosomes and the elongation factor affiliated with RNA transport. But also genes like ubiquitin c, heat shock proteins, calmodulin and solute carrier proteins affiliated with various signaling pathways as well as the F-type H+-transporting ATPase affiliated with the oxidative phosphorylation were highly expressed.
Phylogenetic analyses were performed for 166 orthologous genes (Suppl. material
In topology A (Fig.
Main topologies (unrooted phylogenetic trees; values at the nodes indicate statistical support >50% estimated by maximum likelihood method with 1,000 replicates) of 166 investigated KEGG orthologous genes of the core transcriptome of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11). For each topology one gene is exemplarily illustrated. A most frequent tree topology (obtained for 76 genes), B second most frequent tree topology (obtained for 36 genes), C third most frequent tree topology (obtained for 19 genes), and D fourth most frequent tree topology (obtained for 13 genes).
The phylogenetic relation between the three strains of Poteriospumella lacustris was resolved by approximately 58% of the investigated genes; approximately 53% of these indicated the closest relationship between the strains JBC07 and JBNZ41. Further, approximately 36% of these genes indicated a close relationship between strain JBM10 and Poterioochromonas malhamensis.
The affiliation to a distinct pathway is known for 117 of the genes included in the phylogenetic analysis (Fig.
Functional assignment of core genes used for phylogeny analyses of Poteriospumella lacustris (JBC07, JBM10, JBNZ41), Poterioochromonas malhamensis (DS), Spumella vulgaris (199hm) and Pedospumella encystans (JBMS11). The KEGG category “Organismal Systems” was not illustrated as no core genes were affiliated to this category.
The comparative transcriptomic approach allowed insights into the genetic repertoire transcribed by Ochromonadales showing that the transcriptomes differed considerably between strains. Between 20,441 and 38,883 estimated protein coding genes were identified. This is in the lower and mid-range of previously reported values for gene estimates based on transcriptomes (
The majority of genes could not be affiliated with a distinct function.
Accordingly, roughly 65% of the identified orthologous genes are part of the core transcriptome in our study. A significant fraction of these core genes, was affiliated with the primary metabolism, reflecting the general importance of these metabolic pathways irrespective of phylogeny, nutritional mode and origin. They were affiliated with metabolic pathways including translation and ribosomal biogenesis, transcription and protein processing as well as with pathways affiliated with “Carbohydrate metabolism”, “Lipid metabolism”, “Nucleotide metabolism”, “Amino acid metabolism” and “Signal transduction”. This corresponds well with transcriptomic studies of other taxa which revealed a similar number and affiliation of genes of the core transcriptome with metabolic pathways (prymnesiophytes 1,433 core genes:
We calculated alignment-based phylogenetic trees of the six investigated strains for 166 core genes. The analyses resulted in four different tree topologies. The most frequent and the third frequent topology confirmed the close relation of Poterioochromonas malhamensis and Poteriospumella lacustris inferred from SSU rRNA gene sequences (both species are members of the C3-clade:
Furthermore, our analyses helped to resolve the relationship between closely related strains affiliated with one species. More than 50% of the investigated protein-coding genes showed sequence variations between the three strains JBC07, JBM10 and JBNZ41, which all belong to the species Poteriospumella lacustris. Earlier studies of these strains based on a multigene phylogeny of the protein-coding genes alpha-tubulin, beta-tubulin and actin (
The present study reveals the interplay of functionality and phylogeny of the core transcriptome of Ochromonadales. We could demonstrate that the core transcriptome of Ochromonadales with its 1,574 genes represents only a small proportion of the transcriptomes but it comprises the genes affiliated with the primary metabolism. We assume that roughly 1,400 genes represent the basic “active” genetic repertoire of various protist lineages. Furthermore, we performed phylogenetic analyses of 166 protein-coding core genes. Most of the investigated genes coding for ribosomal genes or metabolism confirmed the close relation of Poterioochromonas malhamensis and Poteriospumella lacustris known from SSU rRNA gene phylogenies. Genes interacting with the environment largely show diverging phylogenetic patterns presumably due to a stronger impact of ecological selection pressures. Furthermore, we demonstrated the strength of comparative transcriptomics for the analysis of intraspecific and interspecific variation. Both, orthologous gene content analysis (PCoA) and phylogenetic analyses for several genes lead to congruent results of the relationship of Ochromonadales supporting the robustness of our results.
We thank the projects DFG Projekt BO 3245/17 and BO 3245/14 for financial support. Further, we thank Susann Chamrad for technical assistance.
DFG Projekt BO 3245/17 and BO 3245/14.
The authors declaire no conflict of interests.
KEGG orthologous genes of the core trancriptome of the herein investigated Ochromonadales (Poteriospumella lacustris strains JBC07, JBM10, JBNZ41; Poterioochromonas malhamensis DS; Spumella vulgaris 199hm; Pedospumella encystans JBMS11) used for phylogenetic analyses.