Biodiversity Informatics

Through our activities of biodiversity data mobilisation, we encourage researchers to make use of Biodiversity Informatics tools and standards. We have recently opened an online forum which you can join to discuss biodiversity informatics. Tools and standards we promote include:

 

BIODIVERSITY INFORMATICS TOOLS

 

  • INTEGRATED PUBLISHING TOOLKIT

The Integrated Publishing Toolkit (IPT) is a publishing tool developed by GBIF that is used to publish biodiversity data into the GBIF network. It supports different data input formats (Databases, Microsoft Excel, text files) and helps to translate their structure in order to match the Darwin Core standards. The IPT also allows to edit the metadata, describing the dataset, and to merge the content and it’s description under a single file called Darwin Core archive. The Belgian Biodiversity Platform shares informations on IPTs, their added-value and how it works to any interested parties. We also provide technical support for the installation of IPTs and we can also host IPTs. For instance, we currently host BioFresh and AntaBIS IPTs.

 

  • GEOGRAPHIC TOOLS

Geographic tools allow us to customise geographic maps to showcase biodiversity data. For instance, the map of the Belgian Data Portal was created by using a map freely available on OpenStreetMap. We recommend Belgian scientists to make use of the following geographic tools:

  • Geographic Information System (GIS), are systems designed to capture, store, manipulate, analyse, manage, and visualise all types of geographically referenced data. GIS basic principle is to assemble different sets of spatialized data, which can be considered as different 'data layers', that are superposable and interoperable if they use the same geographic coordinate system. We preferably use freely available open source GIS such as Quantum-GIS, DIVA-GIS or Carto (cloud-based GIS).

  • Spatial extensions for database management systems allow to spatialize your database and the processing of its geographic content. Concretely, geographic queries can be operated on the spatialized data through SQL (Structured Query Language) statements. These tools give you the possibility to deal with spatial features within your relational database, but also make it interpretable for a GIS software which can be used in conjunction. All data published by the Belgian Biodiversity Platform are stored in PostgreSQL data management system with PostGIS  spatial extension.

 

  • DATA CLEANING TOOLS

Data Cleaning is one of the steps required in the manipulation of biodiversity data that aims to check data quality. Tools used in the data cleaning process aim to detect omission, typographic, convention and coherence errors. In this regard, we recommend Darwin Test, a software application that checks and validates records from tables in a DarwinCore or DarwinCore Archive format. We also use and promote other free access tools that can help you to manage and clean your data, such as relational databases management systems (RDBMS), and OpenRefine.

  • RDBMS are softwares that help you to build, manage and use a relational database, which is a set of data distributed among specific and formally defined tables which can be interrelated. It can be considered as a set of inter-referenced spreadsheets, in which the data can be easily accessed and reorganised without changing its initial structure. The standard language used to manipulate data within a relational database is called Structured Query Language (SQL), and the relations used to link the differents subsets of elements over different tables are called ‘keys’. We tend to make use of these relational databases and their freely available open source management systems, such as PostreSQL and SQLite to manipulate datasets, screen their contents and spot potential inconsistencies.
  • OpenRefine is a free and open source tool to explore and clean the content of unclean datasets, or to change its format. The tool is powerful, well-documented, easy to handle, and can be extended with different extensions and web services. For example, OpenRefine can help you to use regular expressions or clustering functions to spot syntax and encoding errors, and to smooth your table contents.

Note that training manuals and background information on data quality and best practices are available on GBIF Online Resource Center and through the Spanish node website on data quality.

 

BIODIVERSITY INFORMATICS STANDARDS

Biodiversity Informatics standards are essential to ensure interoperability between different types of data provided by different data publishers. Standards allow a variety of data to be understood by using a same common ‘language’ so that all data can be integrated, understood and processed by anyone or any automated process without ambiguity. Biodiversity Informatics standards are in the public domain so that anyone can use them.

The Taxonomic Databases Working Group (TDWG) is the main organisation that creates standards for biodiversity data. The Belgian Biodiversity Platform is actively participating in TDWG and uses the TDWG standards reference body, called DarwinCore, for all biodiversity data publication.

 

  • DarwinCore

DarwinCore is a body of Biodiversity Informatics standards developed by the TDWG. It includes a glossary of terms intended to facilitate the sharing of information about biological diversity. See for exemple the DarwinCore terms describing the occurrence of an organism. Because this standard is used by many, including GBIF and the Encyclopedia of Life (EOL), anyone can easily understand a dataset observed in any country by any researcher. The Belgian Biodiversity Platform helps scientists ‘translate’ their datasets to the DarwinCore format, a requirement to make it uploadable and understandable by all on GBIF. 

 

  • Ecological Metadata Language (EML)

The EML is a metadata standard. Metadata are the description of a dataset (data about your data) and are essential to make it understandable by anyone. It helps the users to know who created the dataset, its purpose, the sampling methods used, its taxonomical, geographical and temporal coverages, etc. The context of the creation of a dataset is a key information for potential users that want to know if it can be suitable for their own use. EML follows an XML scheme, allowing to include the structure and the content of the metadata into a single machine-interpretable file. The IPT interface facilitates edition of metadata and generates the corresponding EML file, that will be joined to the data (under a DarwinCore-formated text file) into a compressed and ready to share folder called ‘DarwinCore Archive”.  

 


 

WHAT'S IN IT FOR ME?

  • Make use of biodiversity informatics tools and standards so your data can easily be shared and used by others for better research and better policy.
  • You wish to publish your dataset, suggest your dataset on GBIF data mobilisation form!
    Or contact Dimitri Brosens (for data related to Flanders) and Maxime Coupremanne (for data related to Wallonia).
  • If you wish to discuss biodiversity data and biodiversity informatics matters online with other concerned parties, join us on our online forum!
  • You would like to get to know more about the tools and standards listed above, refer to the inventory of other Biodiversity Informatics tools available on GBIF website.
  • For more information, please contact Ir. André Heughebaert.