Software Description |
Corresponding author: Alexis Canino ( alexis.canino@inrae.fr ) Academic editor: Mehrdad Hajibabaei
© 2021 Alexis Canino, Agnès Bouchez, Christophe Laplace-Treyture, Isabelle Domaizon, Frédéric Rimet.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Canino A, Bouchez A, Laplace-Treyture C, Domaizon I, Rimet F (2021) Phytool, a ShinyApp to homogenise taxonomy of freshwater microalgae from DNA barcodes and microscopic observations. Metabarcoding and Metagenomics 5: e74096. https://doi.org/10.3897/mbmg.5.74096
|
Methods for biomonitoring of freshwater phytoplankton are evolving rapidly with eDNA-based methods, offering great complementarity with microscopy. Metabarcoding approaches have been more commonly used over the last years, with a continuous increase in the amount of data generated. Depending on the researchers and the way they assigned barcodes to species (bioinformatic pipelines and molecular reference databases), the taxonomic assignment obtained for HTS DNA reads might vary. This is also true for traditional taxonomic studies by microscopy with regular adjustments of the classification and taxonomy.
For those reasons (leading to non-homogeneous taxonomies), gap-analyses and comparisons between studies become even more challenging and the curation processes to find potential consensus names are time-consuming. Here, we present a web-based application (Phytool), developed with ShinyApp (Rstudio), that aims to make the harmonisation of taxonomy easier and in a more efficient way, using a complete and up-to-date taxonomy reference database for freshwater microalgae. Phytool allows users to homogenise and update freshwater phytoplankton taxonomical names from sequence files and data tables directly uploaded in the application. It also gathers barcodes from curated references in a user-friendly way in which it is possible to search for specific organisms. All the data provided are downloadable with the possibility to apply filters in order to select only the required taxa and fields (e.g. specific taxonomic ranks). The main goal is to make accessible to a broad range of users the connection between microscopy and molecular biology and taxonomy through different ready-to-use functions. This study estimates that only 25% of species of freshwater phytoplankton in Phytobs are associated with a barcode. We plead for an increased effort to enrich reference databases by coupling taxonomy and molecular methods. Phytool should make this crucial work more efficient.
The application is available at https://caninuzzo.shinyapps.io/phytool_v1/
barcoding data, bioinformatics, biomonitoring, freshwater microalgae, metabarcoding, microscopy, phytoplankton, taxonomy, web-based application
Freshwater phytoplankton constitutes a key-element in water biomonitoring and surveys are required by water policies (
The rapid increase in the use of molecular techniques comes along with an amount of new DNA sequences (i.e. DNA barcodes) that are generally made available online (e.g. National Center for Biotechnology Information, NCBI https://www.ncbi.nlm.nih.gov/) in libraries (e.g. GenBank,
The proposed application, Phytool, is an innovative tool that enables users to homogenise taxonomic names collected from different types of files: DNA sequences in FASTA format (fulfilling some conditions, see §2.2.1 in Results section) for molecular biologists and simple dataframes for microscopists or taxonomists. Phytool uses the up-to-date taxonomy of freshwater microalgae as proposed in Phytobs (
Finally, Phytool scripts are open-access and gathered in a user-friendly ShinyApp interface with the goal to realise analyses easily for a broad range of users.
Phytool is a Shiny Web Application, built with Rstudio (v.1.3.959), using the following R packages: BiocManager (v.1.30.10); Biostrings (v.2.58.0); data.table (v.1.14.0); dplyr (v.1.0.5); DT (v.0.17); htmltools (v.0.5.1.1); markdown (v.1.1); readr (v.1.4.0); shiny (v.1.6.0); shinyalert (v.2.0.0); shinybusy (v.0.2.2); shinyjs (v.2.0.0); shinythemes (v.1.2.0); stringr (v.1.4.0); tibble (v.3.1.0).
A user-friendly interface enables users to navigate easily through the different functionalities of Phytool. A complete tutorial is available in video format, providing more details and instructions to facilitate Phytool use. This tutorial can also be found directly in Phytool (see “Help” buttons or “About Phytool” tab). The different tab pages in the Phytool navigation bar and their functioning are discussed in more details in the following paragraphs.
Within Phytool application, the tab “Homogenise taxonomy” allows users to upload files from the computer to homogenise and update the taxonomical names included in them. The input files can be FASTA files with DNA sequences (.fasta only) or data tables (.txt; .csv); more details about the specificities for each file types are provided in the corresponding section dealing with Phytool functionalities (see Results section). The reference used for the taxonomic homogenisation process is the data table displayed at the main page of the application (“Taxonomic browser” tab). Briefly, the process works as follows: in the uploaded files, the R algorithm looks for the pattern corresponding to both genus and species names in each row of the file. If the pattern is present in the reference database, then the ascendant taxonomy is changed (if the taxonomic rank is different from the one in input file) or added (if absent in the input file) with the one matching in the reference list. The current binomial names (‘Genus species’) can also be changed if they are not considered as the ‘currently accepted names’: for instance, if there is a more recent denomination (name has evolved through time) or if the name is unaccepted (i.e. nom. inval.; nom. illeg.; nom. rej.) and can be changed into an accepted name. If the provided ‘Genus species’ is not found in the Phytool reference list, then the taxonomic ranks associated and the name remain unchanged. Checkboxes allow then: (1) to keep (or not) the ‘old’ taxonomic names when an update occurs; (2) to keep (or not) only taxa matching with the reference list during the homogenisation process (i.e. present in Phytobs and thus selection of freshwater phytoplankton taxa only). An additional file (logfile) is also created and downloadable at the end of the process. It tracks the following modifications: ‘Genus not found’; ‘Genus_species not found’ and ‘Current accepted name change’ (see Figure
To date, the molecular data added in Phytool v.1.0 come from curated reference barcoding libraries only. These are represented in Table
Genetic markers used for phytool barcode libraries and their original databases.
Genetic marker | Sequence libraries used | Reference |
---|---|---|
Prokaryotic small subunit ribosomal rRNA 16S | Silva_138.1 | Quast et al. ( |
PR2 | Guillou et al. ( |
|
PhytoRef |
|
|
Eukaryotic small subunit ribosomal rRNA 18S | Silva_138.1 | Quast et al. ( |
Prokaryotic large subunit ribosomal rRNA 23S | Silva_138.1 | Quast et al. ( |
µgreen-db |
|
After being downloaded from the web, the collected sequences (FASTA format) were re-arranged (on Linux terminal) in order to be comparable (identical FASTA format with same taxonomical ranks). A curation process (schematic shown in Figure
Phytool application is available online at the following address https://caninuzzo.shinyapps.io/phytool_v1/. It allows free access with a user-friendly interface (the functioning is explained more in details ahead). The number of taxa per phyla and per barcode (16S, 18S, 23S) gathered in Phytool are summed up in Figure
As shown by the pie chart (Figure
Summary of the number of sequences resulting from the curation process of the different libraries and genetic markers.
Reference libraries | SSU16S | SSU18S | LSU23S | |||
---|---|---|---|---|---|---|
Silva 138.1 | PhytoRef | PR2 | Silva 138.1 | Silva 138.1 | µgreen-db | |
Phytoplankton sequences found | 49224 | 9190 | 5992 | 8562 | 13160 | 2326 |
Sequences kept after curation process made within the libraries | 3175 | 892 | 603 | 4191 | 613 | 732 |
Sequences kept after curation process made between the libraries | 2670 | 766 | 201 | 4191 | 253 | 687 |
(A) Barplots representing the number of taxa by phyla in Phytobs and the different barcode libraries gathered in Phytool. (B) Pie chart showing the proportion of taxa in Phytobs having (or not) a barcode (they can also have multiple barcodes, but this information is not detailed in this Figure). (C) Proportion of identified (blue) and unidentified (pink) species constituting the three barcode reference libraries in Phytool.
Sections below describe the interactive functionalities of Phytool application that are available through different tabs.
III.2.1. Taxonomic browser
The “Taxonomy browser” tab enables the display and download of the different species registered in Phytool and to check if DNA barcodes are available in reference barcode libraries. An interactive table enables users to choose amongst different fields: the taxonomic ranks of the species, their potential synonym (i.e. potential other species name that is no longer accepted and refer to the current accepted ‘Genus_species’) and the different barcodes implemented in Phytool (SSU16S; SSU18S and LSU23S). Ticking a checkbox on the left panel will display the associated column on the table; it is also possible to select rows by clicking directly on them within the table (click again to remove selection).
The different fields are searchable in order to target species or lineages easily; finding a pattern within the complete table is also possible through search input at the top-right of the table. Finally, the download buttons on the left panel allow the download of the complete table (with current fields selected) or the download of only the current selection (fields selected and rows selected). The second option is possible only if at least one row is selected (it renders the button clickable).
III.2.2. Homogenise taxonomy
The “Homogenise taxonomy” tab is a key functionality of Phytool which allows users to homogenise (and update) the taxonomy from personal files. This can be done on FASTA files with DNA sequences or on data tables with taxonomy. The homogenisation process is restricted only to freshwater microalgae present in Phytobs (or related species). A Help button provides guidelines through a video tutorial, two other buttons enable the selection of the input file according to its format (FASTA or dataframe) and finally a submit button (which is disabled until a file is chosen). The input file should obviously respect some prerequisites to enable the pattern recognition process. Those conditions depend on the type of the data uploaded (see following subsections); however, whatever the input file selected, its size should not exceed 100 MB. If the prerequisites are not respected and/or the input files contain issues, then the process will not work and an error message will be displayed.
III.2.2.1. Uploading DNA sequences
The sequence files should be in FASTA format, with each sequence on a single row (not spread over multiple rows as is often the case for some formats of FASTA files). If it is not the case, the tool ‘rearrange FASTA format’, provided in Phytool to convert the file into the appropriate format (more details in §III.2.4.1), can be used. The field delimiters in the identifier lines should be semi-columns (“;”) or tabulations (“\t”). Other kinds of delimiters are not accepted; it is, thus, possible to replace them easily with the tool ‘rearrange FASTA delimiters’ also provided in Phytool (“Other tools” tab, more details in §2.4). Another essential point is to ensure that identifier lines end with the “Genus species” names. Finally, users need to pay attention to things, such as empty lines at the beginning/end of files or inappropriate lines in FASTA files which will lead to errors when using the application.
III.2.2.2. Uploading data tables
Prerequisites for data tables are less constraining than FASTA files. The provided data table just needs to contain a field called “Genus_species” in the header, inside which, the algorithm will look for patterns. Field delimiters can be semi-columns (“;”) or tabulations (“\t”), and can be specified when uploading the file. The table needs to be in an acceptable format (i.e. readable as a data.frame in Rstudio).
III.2.2.3. Output files
After processing the taxonomic homogenisation, two download buttons appear: one for downloading the input file with homogenised taxonomy and the second to get a logfile from the process. Additional checkboxes let users choose the content of the output file (default: no checkboxes are selected) and it is possible to combine different possibilities to download the desired output format. Users can choose to keep homogenised taxa only; in that case, other taxa (i.e. non-matching with Phytobs) will not be included in the output file. In addition to the updated taxonomic name, it is also possible to choose to keep the initial taxonomic name which will be provided in an additional field (ex: Genus_species). The application allows users to combine different possibilities through the checkboxes and download the output file in the desired format. Whatever the choices made with checkboxes, the logfile remains the same and tracks information, such as “Genus not found”; “Genus_species not found” and “Current accepted name”.
III.2.3. Barcode reference libraries
The “Barcode libraries” tab displays the three different barcode reference libraries with the barcodes gathered in Phytool, which are (as a reminder) prokaryotic and eukaryotic small subunit ribosomal (16S and 18S, respectively) and prokaryotic large subunit ribosomal (23S). After selecting one of the three barcode reference libraries, the functioning of the interactive table is similar to the “Taxonomy browser”. Amongst the different selectable (and searchable) fields provided, the original barcode reference library, in which the sequence was found, is available, as well as its original id number. The different taxonomic ranks, homogenised with Phytool, the potential synonyms and the size of the sequences (in base pairs) are provided. Users can choose to download the complete database or just a selection.
III.2.4. Other tools
Two functions have been implemented within the ‘Other tools’ tab:
the first one “rearrange FASTA format” enables the transformation of a FASTA file in which each sequence is spread over multiple rows to another FASTA file in which one sequence fulfils one row. The input FASTA file (with sequences spread over multiple rows) needs to be uploaded (its size should not exceed 100 MB). Thereafter, the submit button becomes clickable, the process of rearrangement is launched and a download button appears to save the transformed FASTA file.
the second one “rearrange FASTA delimiters” allows modifying the delimiter present in the identifier lines (starting by “>”) of a FASTA file. After the upload of the FASTA file, the original delimiter (to modify) and the new delimiter (desired) can be provided. To use this function, follow recommendations given for “rearrange FASTA format”.
The current application, described in this paper, is the first release of Phytool; it comes here as an innovative tool allowing to make easier some routine and time-consuming computer tasks for people working on freshwater phytoplankton. It aims to provide a common base for users, allowing a better comparability through the different studies, no matter the methodology used. Moreover, it gathers barcodes from different reference libraries which have benefited from another curation and can be downloadable in the format desired by users. Finally, some functionalities are also provided to reformat DNA sequences files (FASTA), which can be useful, especially for non-programmers.
Although it has been thoroughly tested, some issues may still occur. In case of issues/bugs, we encourage users to report them as explained in the tab “About Phytool”, in order to improve the application.
The next release will mainly focus on enriching the barcode reference libraries by manual curation of the sequences rejected in this first release of Phytool from reference libraries. New barcodes will be implemented on Phytool in the future and these will also be deposited to the NCBI library. The project in which the current application was developed, focuses on the development of eDNA tools applicable for phytoplankton biomonitoring. We, therefore, selected specific barcodes within the two marker genes (rRNA16S and rRNA23S) allowing us to target the entire freshwater phytoplankton community. These barcodes will thus be enriched in the next releases of the application. Users who want to contribute in the enrichment process (for the same barcodes or other ones) are welcome to participate. The former versions will not be erased, but will remain accessible in order to conserve traceability (especially about the taxonomic updates which evolve through time). New functionalities which are widely used in bioinformatics are expected to be implemented in the next releases, such as the possibility to conduct in silico PCR over a selection of sequences. Other ideas can be found in the “Future perspectives” tab and ideas or suggestions from users are more than welcome as Phytool tends to be a participative web-based application to help people working on freshwater phytoplankton.
This tool was developed in the framework of the project PhytoDOM funded by the OFB (Office Français de la Biodiversité) and the pôle INRAE/OFB ECLA (ECosystèmes LAcustres).