Published Versions 2 Vol 2 (1) : 40–46 2020
Making Data and Workflows Findable for Machines
2411 56 0
Abstract & Keywords
Abstract: Research data currently face a huge increase of data objects with an increasing variety of types (data types,formats) and variety of workflows by which objects need to be managed across their lifecycle by datainfrastructures. Researchers desire to shorten the workflows from data generation to analysis and publication,and the full workflow needs to become transparent to multiple stakeholders, including research administratorsand funders. This poses challenges for research infrastructures and user-oriented data services in terms of notonly making data and workflows findable, accessible, interoperable and reusable, but also doing so in a waythat leverages machine support for better efficiency. One primary need to be addressed is that of findability,and achieving better findability has benefits for other aspects of data and workflow management. In thisarticle, we describe how machine capabilities can be extended to make workflows more findable, inparticular by leveraging the Digital Object Architecture, common object operations and machine learningtechniques.
Keywords: Findability; Workflows; Automation; FAIR data; Data infrastructures; Data services
Article and author information
Tobias Weigel
All authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript. Tobias Weigel ( has led the editorial process.
Tobias Weigel is working at the German Climate Computing Center (DKRZ) in the area of e-infrastructures. Tobias has worked extensively on Digital Object and Persistent Identifier services in multiple contexts, including community cyberinfrastructures (ESGF, ENVRI) and cross-disciplinary infrastructures (EUDAT, EOSC). He has co-chaired multiple working groups of the Research DataAlliance (RDA) to convene on technical recommendations in the area of identifiers, metadata and related e-infrastructures services. Tobias is editorial board member of the CODATA Data Science Journal and member of the RDA Technical Advisory Board. Tobias holds a PhD from University of Hamburg in computer science.
Ulrich Schwardmann
All authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Ulrich Schwardmann is deputy leader of the eScience working group of the GWDG, a joint compute and IT competence center of the university and the Max Planck Society, and leads there the data management activities of GWDG. He has a doctoral degree in mathematics and has a long lasting background in scientific computing. Ulrich Schwardmann is working with persistent identifiers as enabling technology for research data management since almost ten years. He is speaker of the management board of ePIC, the Persistent Identifier Consortium for eResearch, and is DONA-MPA System Administrator for GWDG. His current research interests include Digital Object Interface Protocol, PID Information Types and Data Type Registration, PID profiles and policies.
Jens Klump
All authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Jens Klump leads the Geoscience Analytics Team in the Mineral Resources Unit of the Commonwealth Scientific and Industrial Research Organization (CSIRO). Jens’ work focuses on data in minerals exploration, investigating the digital value chain from data capture to data analysis and decision making. This value chain includes automated data and metadata capture, sensor data integration, both in the field and in the laboratory, data processing workflows, and data analysis by statistical methods, machine learning and numerical modelling. Jens obtained degrees in geology and oceanography from the University of Cape Town, South Africa, and received his PhDin marine geology from the University of Bremen, Germany.
Sofiane Bendoukha
All authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Sofiane Bendoukha is a computer scientist at the German Climate Computing Center (DKRZ) within the data management group. For years, Sofiane has been working on scientific workflow management systems, service orchestration and workflow modeling. After joining DKRZ, he focused more on the development of software tools for the climate community related to the management of persistent identifiers and Handle servers in the EUDAT project. Currently, Sofiane is a deputy leader in the EOSC-HUB project. He is responsible of designing, implementing and deploying reliable and user-friendly compute services to scientists related to the climate domain.Sofiane holds a PhD in computer science from the University of Hamburg, Germany.
Robert Quick
All authors have made meaningful and valuable contributions in revising and proofreading the resulting manuscript.
Rob Quick is the Associate Director of the Science Gateways Research Center with the Pervasive Technology Institute at Indiana University. Rob has been working with interoperability of international cyberinfrastructure for more than 15 years. This includes holding the position of Chief Operations Officer for the Open Science Grid and managing the XSEDE Science Gateways Support Services. In recent years he has turned his focus from interoperability of research computing infrastructure to data interoperability. This includes NSF funding (NSF #1659310 and #1839013) to create and operate the Robust Persistent Identification of Data (RPID) testbed which provides a set of testbed services to allow researcher to implement the FAIR principles within the Digital Object Architecture. Rob holds a degree in Physics from Purdue University.
Publication records
Published: None (Versions2
Data Intelligence