Published Versions 2 Vol 2 (1) : 56–65 2019
Download
A Generic Workflow for the Data FAIRification Process
150 18 0
Abstract & Keywords
Abstract: The FAIR guiding principles aim to enhance the Findability, Accessibility, Interoperability and Reusability of digital resources such as data, for both humans and machines. The process of making data FAIR (“FAIRification”) can be described in multiple steps. In this paper, we describe a generic step-by-step FAIRification workflow to be performed in a multidisciplinary team guided by FAIR data stewards. The FAIRification workflow should be applicable to any type of data and has been developed and used for “Bring Your Own Data” (BYOD) workshops, as well as for the FAIRification of e.g., rare diseases resources. The steps are: 1) identify the FAIRification objective, 2) analyze data, 3) analyze metadata, 4) define semantic model for data (4a) and metadata (4b), 5) make data (5a) and metadata (5b) linkable, 6) host FAIR data, and 7) assess FAIR data. For each step we describe how the data are processed, what expertise is required, which procedures and tools can be used, and which FAIR principles they relate to.
Keywords: FAIR data; FAIRification workflow; FAIR data stewardship; Hands-on FAIRification; FAIR dissemination
Acknowledgements
The work of AJ, RK, MR and MT is supported by funding from the European Union’s Horizon 2020 research and innovation programme under the EJP RD COFUND-EJP N° 825575. The work of AJ, RK, MR and MT is supported by funding from ELIXIR EXCELERATE, H2020 grant agreement number 676559. MR and MT received funding from NWO (VWData 400.17.605) and H2020-EU 824087. The work of BM and LB is funded by the H2020-EU 824068 and the GO FAIR ISCO grant of the Dutch Ministry of Science and Culture.
[1]
M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, … & B. Mons. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3(2016), Article No. 160018. doi: 10.1038/sdata.2016.18.
[2]
M.D. Wilkinson, S.A. Sansone, E. Schultes, P. Doorn, L.O.B. Da Silva Santos, & M. Dumontier Comment: A design framework and exemplar metrics for FAIRness. Scientific Data 5(2018), 1–4. doi: 10.1038/sdata.2018.118.
[3]
M.D. Wilkinson,M. Dumontier, S.A. Sansone, L.O. B. da Silva Santos, M. Prieto, D. Batista, ... & E. Schultes. Evaluating FAIR maturity through a scalable, automated, community-governed framework. bioRxiv preprint, 2019. doi: 10.1101/649202.
[4]
R. de Miranda Azevedo, M.D. Wilkinson, & M. Dumontier. Defining FAIRness evaluations - Considerations and guidance for conceptualizing community-specific FAIR maturity indicators evaluations. Special issue on Emerging FAIR practices. (In press). DI-2019-0015.
[5]
T. Weigel, U. Schwardmann, J. Klump, S. Bendoukha & R. Quick. Making data and workflows findable for machines. Data Intelligence 2(2020), 40–46. doi: 10.1162/dint_a_00026.
[6]
M. Thompson, K. Burger, R. Kaliyaperumal, M. Roos, & L.O. Bonino Da Silva Santos. Making FAIR easy with FAIR tools: from creolization to convergence. Special issue on Emerging FAIR practices. (In press). DI-2019-0008.
[7]
B. Hooft, C. Goble, C. Evelo, M. Roos, S. Sansone, F. Ehrhart, ... & B. Mons. ELIXIR-EXCELERATE D5.3: Bring Your Own Data (BYOD). doi: 10.5281/zenodo.3207809.
[8]
C. Haupt, A. Waagmeester, M. Zimmermann, & E. Willighagen. Guidelines for exposing data as RDF in Open PHACTS. Available at: http://www.openphacts.org/specs/2013/WD-rdfguide-20131007/.
[9]
S.A. Sansone, P. Rocca-Serra, D. Field, & E. Maguire. Toward interoperable bioscience data. Nature Genetics 44(2)(2012), 121–126. doi: 10.1038/ng.1054.
[10]
M. Bloemers & A. Montesanti. The FAIR funding model: Providing a framework for research funders to drive the transition toward FAIR data management and stewardship practices. Data Intelligence 2(2020), 171–180. doi: 10.1162/dint_a_00039.
[11]
A. Jacobsen, R. de Miranda Azevedo, N. Juty, D. Batista, S. Coles, R. Cornet, ... & E. Schultes. FAIR principles: Interpretations and implementation considerations. Data Intelligence 2(2020), 10–29. doi: 10.1162/dint_r_00024.
[12]
L. Lannom, D. Koureas & A.R. Hardisty. FAIR data and services in biodiversity science and geoscience. Data Intelligence 2(2020), 122–130. doi: 10.1162/dint_a_00034.
[13]
S. Jones, R. Pergl, R. Hooft, T. Miksa, R. Samors, J. Ungvari, R.I. Davis & T. Lee. Data management planning: How requirements and solutions are beginning to converge. Data Intelligence 2(2020), 208–219. doi: 10.1162/dint_a_00043.
[14]
N. Juty, S.M. Wimalaratne, S. Soiland-Reyes, J. Kunze, C.A. Goble & T. Clark. Unique, persistent, resolvable: Identifiers as the foundation of FAIR. Data Intelligence 2(2020), 30–39. doi: 10.1162/dint_a_00025.
[15]
B. Smith, M. Ashburner, C. Rosse, J. Bard, W. Bug, W. Ceusters, ... & S. Lewis. The OBO Foundry: Coordinated evolution of ontologies to support biomedical data integration. Nature Biotechnology 25(2007), 1251-1255. doi: 10.1038/nbt1346.
[16]
Karma: A data integration tool. Available at: https://usc-isi-i2.github.io/karma/.
[17]
K. Wolstencroft, S. Owen, M. Horridge, O. Krebs, W. Mueller, J.L. Snoep, F. du Preez, & C. Goble. RightField: Embedding ontology annotation in spreadsheets. Bioinformatics, 27(14)(2011), 2021–2022. doi: 10.1093/bioinformatics/btr312.
[18]
E. Maguire, A. González-Beltrán, P.L. Whetzel, S.A. Sansone, & P. Rocca-Serra. OntoMaton: A Bioportal powered ontology widget for Google Spreadsheets. Bioinformatics, 29(4)(2013), 525–527. . doi: 10.1093/bioinformatics/bts718.
[19]
M.A. Musen, C.A. Bean, K.-H. Cheung, M. Dumontier, K.A. Durante, O. Gevaert, ... & the CEDAR team. The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association, 22(6)(2015), 1148–1152. doi: 10.1093/jamia/ocv048.
[20]
B. Mons, C. Neylon, J. Velterop, M. Dumontier, L.O.B. Da Silva Santos, & M. D. Wilkinson. Cloudy, increasingly FAIR; Revisiting the FAIR data guiding principles for the European Open Science Cloud. Information Service Use 37(1)(2017), 49–56. doi: 10.3233/ISU-170824.
[21]
C. Brewster, B. Nouwt, S. Raaijmakers & J. Verhoosel. Ontology-based access control for FAIR data. Data Intelligence 2(2020), 66–77. doi: 10.1162/dint_a_00029.
[22]
A. Landi, M. Thompson, V. Giannuzzi, F. Bonifazi, I. Labastida, L.O. Bonino da Silva Santos & M. Roos. The “A” of FAIR – as open as possible, as closed as necessary. Data Intelligence 2(2020), 47–55. doi: 10.1162/dint_a_00027.
[23]
E.A. Schultes, A. Jacobsen, K.Hettne, M. Thompson, M. Kuzak, R. Hooft, ... & C. Evelo. Essential steps of the FAIRification Process. OSF, 2019. osf.io/avrys.
Article and author information
Cite As
A. Jacobsen, R. Kaliyaperumal, L.O. Bonino da Silva Santos, B. Mons, E. Schultes, M. Roos & M. Thompson. A generic workflow for the data FAIRification process. Data Intelligence 2(2020), 56–65. doi: 10.1162/dint_a_00028
Annika Jacobsen
The workflow presented in the manuscript is a result of many years of experience by all authors. A. Jacobsen and M. Thompsontook are the lead in writing the manuscript. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
a.jacobsen@lumc.nl
Annika Jacobsen is a postdoctoral researcher at the BioSemantics group, Human Genetics Department, Leiden University Medical Center, The Netherlands. She obtained her Bachelor and Master degrees at the Technical University of Denmark in 2009 and 2012, and her PhD degree at the Vrije Universiteit Amsterdam in 2019. Her research interests are to create interoperable FAIR rare disease data with the aim to learn more about cause, diagnosis and treatment.
0000-0003-4818-2360
Rajaram Kaliyaperumal
The workflow presented in the manuscript is a result of many years of experience by all authors. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Rajaram Kaliyaperumal was born in Pondicherry, India. He received a B.Tech degree in Biomedical Engineering from Pondicherry University, India in 2008 and an M.Sc degree in Biomedical Engineering from Linköping University, Sweden in 2011. In 2012 he joined the department of Computer and Information Science, Linköping University as a software engineer. During this time he developed methods and tools to align and repair ontologies. In 2013 he joined the Biosemantics group, Leiden, in the Netherlands as a software developer. His current research activities include investigating the use of semantic Web technology in the context of FAIR data and developing prototypes to demonstrate the use of FAIR data.
0000-0002-1215-167X
Luiz Olavo Bonino da Silva Santos
The workflow presented in the manuscript is a result of many years of experience by all authors. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Luiz Olavo Bonino da Silva Santos is the International Technology Coordinator of the GO FAIR International Support and Coordination Office, and Associate Professor of the BioSemantics group at the Leiden University Medical Centre in Leiden, The Netherlands. His background is in ontology-driven conceptual modelling, semantic interoperability, service-oriented computing, requirements engineering and context-aware computing. In the last five years Luiz has been involved in a number of activities to realize the FAIR principles, including the development of a number of technologies and tools to support making, publishing, indexing, searching and annotating FAIR (meta)data.
0000-0002-1164-1351
Barend Mons
The workflow presented in the manuscript is a result of many years of experience by all authors. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Barend Mons is Professor of BioSemantics at the Human Genetics Department of Leiden University Medical Center and founder of the BioSemantics group. He was elected CODATA President in 2018. Next to his leading role in the research of the group, Barend plays a leading role in the international development of “data stewardship” for biomedical data. For instance, he was head-of-node of ELIXIR-NL at the Dutch Techcentre for Life Sciences (until 2015), is Integrator Life Sciences at the Netherlands eScience Center, and board member of the Leiden Center of Data Science. In 2014, Barend initiated the FAIR data initiative and in 2015, he was appointed Chair of the European Commission’s High Level Expert Group for the “European Open Science Cloud”, from which he retired by the end of 2016. Presently, Barend is co-leading the GO FAIR initiative, an initiative to kick start dvelopments towards the Internet of FAIR data and services, which will also contribute to the implementation of components of the European Open Science Cloud. The focus of the contribution of the BioSemantics group is on developing an interoperability backbone for biomedical applications in general and rare disease in particular.
0000-0003-3934-0072
Erik Schultes
The workflow presented in the manuscript is a result of many years of experience by all authors. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Erik Schultes is International Science Coordinator at the GO FAIR International Support and Coordination Office where he has been working with a diverse community of stakeholders to develop FAIR data and services. Erik is also a member of the Leiden Center for Data Science at Leiden University. Erik is an evolutionary biologist with long standing interests in data-intensive research. In addition to private consulting, he has held previous academic appointments at the University of California, Los Angeles, The Whitehead Institute for Biomedical Research at the Massachusetts Institute of Technology, Duke University, and The Santa Fe Institute.
0000-0001-8888-635X
Marco Roos
The workflow presented in the manuscript is a result of many years of experience by all authors. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Marco Roos is assistant professor and group leader of the Biosemantics group of the Leiden University Medical Centre (Human Genetics Department). The group is known for co-founding and advocating the FAIR data principles. His research focus is on making state-of-the-art computer science applicable to enhance biomedical research (e-Science), particularly the application of computational knowledge discovery and linked data techniques to address translational research challenges of rare human diseases. At an international level, Marco is focused on the implementation of FAIR principles to create a powerful substrate and worldwide robust infrastructure for knowledge discovery across distributed rare disease data resources.ORCID: 0000-0002-8691-772X
0000-0002-8691-772X
Mark Thompson
The workflow presented in the manuscript is a result of many years of experience by all authors. A. Jacobsen and M. Thompsontook are the lead in writing the manuscript. All authors contributed to the writing and provided critical feedback to help shape the manuscript.
Mark Thompson is a senior research scientist in the Biosemantics group at the Human Genetics department of Leiden University Medical Centre. He obtained a PhD in Computer Science from the University of Amsterdam in 2012. He has expertise in hardware and software architecture (co-)design, data management, data modeling, FAIR data infrastructure and computational aspects of knowledge discovery.
0000-0002-7633-1442
Publication records
Published: None (Versions2
References
Data Intelligence