Software Description |
Corresponding author: Dominik Buchner ( dominik.buchner524@googlemail.com ) Academic editor: Dirk Steinke
© 2020 Dominik Buchner, Florian Leese.
This is an open access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Citation:
Buchner D, Leese F (2020) BOLDigger – a Python package to identify and organise sequences with the Barcode of Life Data systems. Metabarcoding and Metagenomics 4: e53535. https://doi.org/10.3897/mbmg.4.53535
|
DNA metabarcoding workflows produce hundreds to ten-thousands of Operational Taxonomic Units (OTUs) or Exact Sequence Variants (ESVs) per analysis. In most workflows, a taxonomic assignment to these generated sequences is needed. This is typically done using publicly available databases. Especially, yet not exclusively, for Eumetazoan metabarcoding, the Barcode of Life Data system (BOLD) is the most comprehensive and curated reference barcode database and, therefore, typically the first choice for taxonomic assignment. While an application programme interface (API) exists to query data in large batches, no information on the many and important unpublished data are obtained through the API. The alternative approach using the BOLD identification engine on the website provides full access, yet it is restricted to 100 sequences at once. We developed a small platform-independent and graphical user interface (GUI) software package, BOLDigger, which aims to solve this problem by automating the process of sending successive requests of up to 100 sequences without surpassing the capacities of BOLD. BOLDigger can be used to download the results of the identification engine, as well as metadata for the obtained hits. For the selection of the best fitting hit, three different methods are implemented. A new approach, combining a threshold-based approach with the metadata information, was implemented to make use of the metadata.
metabarcoding, species identification, BOLD, OTUs, taxonomic assignment, database
DNA metabarcoding is a cost- and time-effective method to assess species diversity of bulk or environmental samples (
The BOLD Identification System (IDS) can be used to identify an unknown query sequence via the website or the provided (fast) API by tracing and returning the nearest neighbours to the query sequence from a global alignment of all reference sequences (
Sequence similarity thresholds are used for taxonomic assignment across all domains of life (
The presented Python package BOLDigger aims to act as an interface for species identification, to download additional data and organisation of these. As a platform-independent, open-source tool, it can be used to collect IDS results from BOLD, including private and early release data. It also provides the user with additional data for all public references in the dataset, as well as implementing a safer way to determine the top-hit by combining a threshold-based approach with the additional information provided by BOLD. To improve user-friendliness, a BOLDigger comes with a GUI (Fig.
The Python package BOLDigger (version 1.1.5) is available from the Python Package Index (PyPI) at https://pypi.org/project/boldigger/. It can be installed using the Python package installer (pip) with the command pip install boldigger. In case both python version 2 and 3 are installed on the operating system, the correct version of pip has to be used (pip3 install boldigger). All operating systems (Windows, Linux and MacOS) are supported, as long as Python 3 is installed. It can be started with the command boldigger from the command line after installation. Updates can be automatically downloaded and installed with the command pip install -- upgrade boldigger. Further information about installation, the current version and troubleshooting are provided via the GitHub repository page (https://github.com/DominikBuchner/BOLDigger).
BOLDigger comes with a GUI for easy operation (Fig.
BOLDigger is a platform-independent GUI software package that allows users to query metabarcoding data against the BOLD sequence database in a simple fashion. It facilitates data analysis and provides alternative approaches for the assignment of the best hit.
Title: BOLDigger – a Python package to identify and organise sequences with the Barcode of Life Data systems
Study area description: Metabarcoding, eDNA, Barcoding, Biomonitoring
Download page: https://pypi.org/project/boldigger/
Programming language: Python 3
Licence: MIT Licence
Conceived and designed the study: DB; Wrote the Python package: DB; Wrote the paper: DB, FL
We thank the leeselab members, especially Arne J. Beermann and Till-Hendrik Macher for comments and feedback on the programme. We thank the BOLD support team for support with respect to the data mining via the IDS. Till-Hendrick Macher kindly designed the BOLDigger logo. FL is member of and supported by COST Action DNAqua-Net (CA15219).