‭Review Published Versions 1 Vol 3 (1) : 116-135 2021
Download
Not Ready for Convergence in Data Infrastructures
96 0 0
Abstract & Keywords
Abstract: Much research is dependent on Information and Communication Technologies (ICT). Researchers in different research domains have set up their own ICT systems (data labs) to support their research, from data collection (observation, experiment, simulation) through analysis (analytics, visualisation) to publication. However, too frequently the Digital Objects (DOs) upon which the research results are based are not curated and thus neither available for reproduction of the research nor utilization for other (e.g., multidisciplinary) research purposes. The key to curation is rich metadata recording not only a description of the DO and the conditions of its use but also the provenance – the trail of actions performed on the DO along the research workflow. There are increasing real-world requirements for multidisciplinary research. With DOs in domain specific ICT systems (silos), commonly with inadequate metadata, such research is hindered. Despite wide agreement on principles for achieving FAIR (findable, accessible, interoperable, and reusable) utilization of research data, current practices fall short. FAIR DOs offer a way forward. The paradoxes, barriers and possible solutions are examined. The key is persuading the researcher to adopt best practices which implies decreasing the cost (easy to use autonomic tools) and increasing the benefit (incentives such as acknowledgement and citation) while maintaining researcher independence and flexibility.
Keywords: Scientific process; Workflow; Metadata; FAIR; Scientific data; Data wrangling
Acknowledgments
[1]
Wittenburg, P., Strawn, G.: Common Patterns in Revolutionary Infrastructures and Data. Available at:
[2]
Wilkinson, M., et al.: The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3, Article No. 160018 (2016)
[3]
Kahn, R., Wilensky, R.: A framework for distributed digital object services. International Journal on Digital Libraries 6(2), 115–123 (2006)
[4]
RDA DFT: DFT core terms and model. Available at: http://hdl.handle.net/11304/5d760a3e-991d-11e5-9bb4-2b0aad496318. Accessed 6 January 2021
[5]
Paris GEDE Workshop on Moving Forward on Data Infrastructure Technology Convergence. Available at: https://github.com/GEDE-RDA-Europe/GEDE/tree/master/FAIR%20Digital%20Objects/Paris-FDO-workshop. Accessed 6 January 2021
[6]
RDA MD IG. Available at: https://www.rd-alliance.org/groups/metadata-ig.html. Accessed 6 January 2021
[7]
K. Hetne, P. Wittenburg: FAIR Technology Matrix – Phase 2 Impressions. Available at: https://github.com/GEDE-RDA-Europe/GEDE/blob/master/Events/RDA%2014th%20Plenary/Matrix%20about%20technologies%20used%20by%20RIs_Background%20document.pdf. Accessed 6 January 2021
[8]
H. Stehouwer, P. Wittenburg; RDA Europe: Data practices analysis. Available at: http://hdl.handle.net/11304/6e1424cc-8927-11e4-ac7e-860aa0063d1f. Accessed 6 January 2021
[9]
RDA Data Farbic IG. Available at: https://www.rd-alliance.org/group/data-fabric-ig.html. Accessed 6 January 2021
[10]
Turning FAIR into reality - Final report and action plan from the European Commission expert group on FAIR data. Available at: https://op.europa.eu/en/publication-detail/-/publication/7769a148-f1f6-11e8-9982-01aa75ed71a1/language-en/format-PDF/source-80611283. Accessed 6 January 2021
[11]
Core Trust Seal. Available at: https://www.coretrustseal.org/. Accessed 6 January 2021
[12]
Responsible research and innovation. Available at: https://en.wikipedia.org/wiki/Responsible_Research_and_Innovation. Accessed 6 January 2021
[14]
Handles. Available at: https://www.handle.net/. Accessed 6 January 2021
[15]
DOI. Available at: https://www.doi.org/. Accessed 6 January 2021
[17]
X3ML Toolkit. Available at: https://www.ics.forth.gr/isl/x3ml-toolkit. Accessed 6 January 2021
[20]
CWL. Available at: https://www.commonwl.org/. Accessed 6 January 2021
[21]
Paasage. Available at: https://paasage.ercim.eu/. Accessed 6 January 2021
[22]
Docker. Available at: https://www.docker.com/. Accessed 6 January 2021
[23]
Kubernetes. Available at: https://kubernetes.io/. Accessed 6 January 2021
[24]
Melodic. Available at: https://melodic.cloud/. Accessed 6 January 2021
[25]
B2SHARE. Available at: https://b2share.eudat.eu/. Accessed 6 January 2021
[26]
VODAN. Available at: https://www.go-fair.org/implementation-networks/overview/vodan/. Accessed 6 January 2021
[27]
Cerf, C.: How to prevent a digital dark age. Available at: https://www.theguardian.com/media-network/2015/may/29/googles-vint-cerf-prevent-digital-dark-age. Accessed 6 January 2021
[28]
EOSC FAIR WG. Seven recommendations for implementation of FAIR practice - Draft for consultation. Available at: https://www.eoscsecretariat.eu/eosc-liaison-platform/post/seven-recommendations-implementation-fair-practice-draft-consultation. Accessed 6 January 2021
[29]
NAS Consensus report. Available at: https://www.nap.edu/catalog/25116/open-science-by-design-realizing-a-vision-for-21st-century. Accessed 6 January 2021
Article and author information
Cite As
Jeffery, K., et al.: Not ready for convergence in data infrastructures. Data Intelligence 3(1), 116-135 (2021). doi: 10.1162/ dint_a_00084
Keith Jeffery
All authors contributed ideas, text and review comments in the production of the paper. Sections 3 and 4 were collated dominantly by K. Jeffery.
keith.jeffery@keithgjefferyconsultants.co.uk
Keith Jeffery is an independent consultant working on EPOS, ENVRIplus and ENVRIFAIR as well as on advanced CLOUD computing and Virtual research Environments. He is past IT Director at STFC with 360,000 users, 1,100 servers and 140 staff. Keith holds three honorary visiting professorships, a Fellow of the Geological Society and the British Computer Society, a Chartered Engineer & IT Professional and an Honorary Fellow of the Irish Computer Society. Keith is past President of ERCIM and euroCRIS, and serves on international expert groups, conference boards and assessment panels. He had advised government on IT. He chaired the EC Expert Groups on GRIDs and on CLOUD Computing.
0000-0003-4053-7825
Peter Wittenburg
All authors contributed ideas, text and review comments in the production of the paper. G. Strawn contributed much to Section 1 and the others by P. Wittenburg.
Peter Wittenburg was Executive Director of Research Data Alliance (RDA) Europe, Member of RDA Technical Advisory Board, Scientific Coordinator of European Data Infrastructure (EUDAT) and Technical Director of the CLARIN and DOBES Research Infrastructures. He set up and led the Technical Group with about 35 experts at Max Planck Institute (MPI) for Psycholinguistics and then led the Language Archiving Group with about 25 experts. Since 2000 he has played leading roles in a variety of European (funded by the European Commission) and national projects (funded by MPS, DFG, BMBF, NWO) and ISO initiatives (ISO TC37/SC4). He won the Heinz Billing Award of the Max Planck Society (MPS) for the advancement of scientific computation in 2011 and received an honorary doctorate from University Tübingen in 2013.
0000-0003-3538-0106
Larry Lannom
All authors contributed ideas, text and review comments in the production of the paper.
Larry Lannom is Director of Information Services and Vice President at the Corporation for National Research Initiatives (CNRI), where he works with organizations in both the public and private sectors to develop experimental and pilot applications of advanced networking and information management technologies. Mr Lannom’s current work is focused on CNRI’s Digital Object Architecture, which is based on the concept of the DO, a uniform approach to representing digital information across computing and application environments, both now and into the future. Mr. Lannom joined CNRI in September of 1996. Prior to that, he was a Technical Director at DynCorp, Inc., where he served as an advisor on digital library research for the ISTO, CSTO, and ITO offices of the US Defense Advanced Research Projects Agency (DARPA), including initiating the Computer Science Technical Reports (CS-TR) project, DARPA’s first effort in the digital library area. In addition, he managed the development of internal information systems for DARPA. Originally trained as a librarian, his earlier work included reference book publishing and information retrieval research.
0000-0003-1254-7604
George Strawn
All authors contributed ideas, text and review comments in the production of the paper. G. Strawn contributed much to Section 1 and the others by P. Wittenburg.
George Strawn is currently the director of the Board on Research Data and Information at the National Academies of Sciences, Engineering, and Medicine where he focuses on OS and FAIR data. Prior to joining the Academies, Dr. Strawn was the director of the National Coordination Office (NCO) for the Networking and Information Technology Research and Development (NITRD) Program and co-chair of the NITRD interagency committee.
0000-0003-4098-0464
Claudia Biniossek
All authors contributed ideas, text and review comments in the production of the paper.
Claudia Biniossek works together with Dirk Betz on bringing together theories, methods, and data infrastructures within a transdisciplinary approach. The aim is to open new pathways of data-driven (transdisciplinary) research. Therefore, they created the repositories x-science.org and x-econ.org, which specialized in data coming from experimental social sciences and economics. The purpose is to test the basic principles of human decision making in the field of experimental data coming from economics, sociology, political sciences, psychology, and neuroscience.
0000-0002-2202-7875
Dirk Betz
All authors contributed ideas, text and review comments in the production of the paper.
Dirk Betz works together with Claudia Biniossek in bringing together theories, methods, and data infrastructures within a transdisciplinary approach. The aim is to open new pathways of data-driven (transdisciplinary) research. Therefore, they created the repositories x-science.org and x-econ.org, which specialized in data coming from experimental social sciences and economics. The purpose is to test the basic principles of human decision making in the field of experimental data coming from economics, sociology, political sciences, psychology, and neuroscience.
0000-0002-6411-4758
Christophe Blanchi
All authors contributed ideas, text and review comments in the production of the paper.
Christophe Blanchi is the Executive Director of the DONA Foundation in Geneva where he is responsible for its day-to-day operations, promoting and evolving the Digital Object Architecture and its related set of standards, and insuring the consistent operations of the Global Handle Registry. Prior to joining the DONA Foundation, Christophe Blanchi was a senior research scientist at CNRI in Reston, Virginia, where he was involved in research, development, and deployment of Digital Object Architecture related technologies.
0000-0003-2277-5176
Publication records
Published: May 10, 2021 (Versions1
References
Data Intelligence