By Nawel van Lin, FAIR Data Steward at Radboudumc
The Euro-NMD Registry Hub for all neuromuscular diseases, including undiagnosed patients, will be a FAIR registry where interoperability and FAIRification will be realised. This means that it should be possible for health records written in different languages, formats and stored in separate registries to be combined. To achieve this goal, an innovative and sustainable FAIR solution called ‘CDE-in-a-Box’ [1] has been developed, deployed and is currently being tested by the ERN technical team at the University Medical Center in Freiburg.
Innovative FAIR Solution
The ‘CDE-in-a-Box’ (Figure1.) also referred to as ‘FAIR-in-a-Box’ describes an automated FAIR transformation workflow. The ‘CDE’ part refers to the set of 16 Common Data Elements (CDEs) recommended by the European Platform on Rare Disease Registration (EU RD Platform) that should be implemented by all RD registries (e.g. Genetic diagnosis). The set of Common Data Elements is a core component of a FAIRification process. The term ‘Box’ gives the impression that everything happens automatically in a single place. In reality, cloud-based servers are communicating with each other through API calls.
How does ‘CDE-in-a-Box’ work in practice?
First of all, it makes the implementation of a FAIR transformation pipeline more reachable to those without LinkedData or FAIR expertise. It’s an innovative solution because all the complicated programming has already been done and the actual FAIR transformation happens in three simple steps:
- Extract: The original data is uploaded into a simple CSV file and extracted through an API call.
- Transform: The CSV file containing the ontologised data is sent to receive a series of transformations, automatically.
- Load: The output is FAIR data in RDF format (Resources Description Framework and the machine-language format necessary for SPARQL queries) which is then loaded into a Triplestore.
FAIR data are then ready to be linked and merged if they’re related to the same topic or disease. The nightly updates make it possible for any old data to be deleted and the refreshed data to be updated without human intervention.
What was involved in the technical development?
Through a team-based iterative [1] approach within the European Joint Programme on Rare Diseases (EJP-RD), experts created semantically grounded data models to represent each of the CDEs, using the SemanticScience Integrated Ontology (SIO) as the core framework for representing the entities and their relationships. Within that framework, they mapped the concepts represented in the CDEs, and their possible values, into domain ontologies such as the Human Phenotype Ontology. Then, they deployed an ETL pipeline (Extract-Transform-Load) via docker-compose, where every component is packaged as a docker image and uses a docker network to facilitate communication between the components. Finally, they created a suite of four docker images that are referred-to as the “CDE-in-a-Box”. The instructions for running this process, as well as how to interact with CDE-in-a-Box, are available on a dedicated GitHub [2].
Sustainable FAIR Approach
The CDE-in-a-Box follows a sustainable approach as it is based on international standards, recommended at European level by EJP-RD. The same standards were deployed by other registries that have implemented a FAIRification process such as VASCA Registry for Vascular Anomalies [3].
Initially, the FAIR Transformation solution was commissioned by the Duchenne Parent Project in the Netherlands. The fact that their registry (Duchenne Data Platform (DDP)) was built with interoperability in mind, presented an ideal use-case. Once the FAIR solution was found and deployed in collaboration with FAIR experts within EJP-RD, the Duchenne domain owners opted towards making their code of CDE-in-a-Box an open source. Their generosity stems from their belief in FAIR as a new paradigm for optimising data ‘visiting’ and as a result, pledged to support others in their own FAIR endeavours [4]. ERN Euro-NMD core registry as well as DM-Scope, a national French registry for Myotonic Dystrophies are both implementing this FAIR solution and their tests are showing promising results.
Next steps
Moving forward, EJP-RD Semantic Models as well as the CDE-in-a-Box will undertake further updates to include Disease Specific Elements. This is an on-going process, involving a multidisciplinary team.
Once all the FAIRIfication processes of the core registry and pilot registries of ERN Euro-NMD (CRAMP, DM-Scope, DDP and SMArtCARE) are successfully implemented, the next stage would be to test the level of FAIRness of all participating registries. This will be discussed in more detail in one of our next newsletters.
References
[1] Rajaram Kaliyaperumal, et al. Semantic modelling of Common Data Elements for Rare Disease registries, and a prototype workflow for their deployment over registry data; https://doi.org/10.1101/2021.07.27.21261169
[2] CDE-in-box: This repository contains software to create and deploy CDEs [Internet]. [cited 2021 Jul 6]. https://github.com/ejp-rd-vp/cde-in-box
[3] Karlijn H. J. Groenen, et al. The de novo FAIRification process of a registry for vascular anomalies.
https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y[4] Nawel van Lin, et al. How Patient Organizations are Driving FAIR efforts to Facilitate Research and Health Care. https://content.iospress.com/articles/journal-of-neuromuscular-diseases/jnd210721