A new proof-of-concept was realised to demonstrate the value of querying FAIR data between two FAIR registries: The Duchenne Data Platform and the FAIR-by-design ERN registry for all neuromuscular diseases, ERN EURO-NMD. Furthermore, an important step was accomplished in solving the long-standing obstacle of how to conduct queries between rare disease registries without exposing sensitive patient details. Overcoming this obstacle was attainable through a close collaboration between FAIR Data Systems, S.L. (Spain), ERN EURO-NMD (Germany), Leiden University Medical Center, Radboud University Medical Center and Duchenne Parent Project (all in The Netherlands), in the context of EJP RD Work Package 12.
“We have made it easy for new FAIR registries to participate in federated queries, without deep concerns about security.” Explains Peter-Bram ‘t Hoen, Professor of Bioinformatics at Radboud University Medical Center and work-package leader, FAIRification and Interoperability of the EURO-NMD EU-funded project. “For patients, they can be assured that their data are being reused, in a secure manner, by scientists, clinicians or other authenticated and authorised health professionals – which is their desire.”
Methods
FAIR stands for Findable, Accessible, Interoperable, and Reusable, and is the acronym used to describe a global initiative to make data more valuable by increasing the ability of computers to find, interpret, integrate, and analyse those data autonomously. Extensively revising an existing technology called the Git Repository Linked Data API Constructor (grlc) allowed it to be used safely in privacy-sensitive environments. We were then able to explore: “How can we take advantage of the machine-readability of FAIR data to ask questions that span multiple neuromuscular disease repositories, and yet not expose sensitive patient details?” The technology produces a machine-accessible Web address that is connected to a database query. Calling that Web address will execute the query inside the FAIR registry’s secure space, and gives back only the anonymous output data.
The prototype leverages the power of FAIR data through sharing a publicly available database of queries that have been manually curated and filtered by experts in FAIR and neuromuscular diseases to ensure they are both accurate and privacy-preserving. Identical queries are then executed over independent registries, leading to convergence between them – for example, the Duchenne Parent Project (i.e. Duchenne Data Platform) and European Reference Network for all Neuromuscular Diseases (i.e. EURO-NMD), as illustrated below:
As an example, one shared query is the “counting query”, which outputs the number of patients in the registry that have a specific diagnosis. The diagram below shows a graph of the output from running the counting query twice – once for Duchenne and once for Becker Muscular Dystrophy. The only query output is a number, and no patient information is exposed.
Convergence of efforts
EURO-NMD replicated the prototype, on their own initiative, by installing their own query server, connecting it to both their (currently mock) FAIR clinical registry, and loading the shared public database of approved queries. With almost no effort, they were then able to add their data into the Duchenne analytics environment. To make sure other registries could replicate this success, the demonstration prototype was made public and accessible using this link: https://github.com/markwilkinson/Duchenne-daru.
Conclusion
In the above example, the teams involved demonstrated the value of FAIR data by simply sharing a publicly accessible query. The same query could be executed over both registries, because both sites have made their data FAIR, and therefore share standards like Orphanet codes that can be used for data integration between resources. Any number of FAIR registries can (and hopefully will) participate in this initiative, fully independently of guidance from experts. While this was a simple demonstrative example, it provides a proof-of-concept for how to maximize the investment on data FAIRification without needing to expose any sensitive patient information.
Acknowledgements
Elizabeth Vroom and Mirjam Franken at Duchenne Parent Project and the Steering Committee of EURO-NMD would like to extend their appreciation to their collaborators: FAIR Data Systems S.L. (Mark D. Wilkinson, Eduardo Quemada and Alberto Camara), EURO-NMD (Adrian Tassoni and Dagmar Jäger), Radboud University Medical Centre (Peter-Bram ‘t Hoen and Bruna dos Santos Vieira), Leiden University Medical Centre (Marco Roos) and EJP-RD (Rajaram Kaliyaperumal).
For any additional information, please reach out to Nawel Lalout, FAIR Data Steward at Radboudumc nawel.lalout@radboudumc.nl.