‭Review Published Versions 1 Vol 3 (1) : 95-105 2021
Open Science and Data Science
233 2 0
Abstract & Keywords
Abstract: Data Science (DS) as defined by Jim Gray is an emerging paradigm in all research areas to help finding non-obvious patterns of relevance in large distributed data collections. “Open Science by Design” (OSD), i.e., making artefacts such as data, metadata, models, and algorithms available and re-usable to peers and beyond as early as possible, is a pre-requisite for a flourishing DS landscape. However, a few major aspects can be identified hampering a fast transition: (1) The classical “Open Science by Publication” (OSP) is not sufficient any longer since it serves different functions, leads to non-acceptable delays and is associated with high curation costs. Changing data lab practices towards OSD requires more fundamental changes than OSP. 2) The classical publication-oriented models for metrics, mainly informed by citations, will not work anymore since the roles of contributors are more difficult to assess and will often change, i.e., other ways for assigning incentives and recognition need to be found. (3) The huge investments in developing DS skills and capacities by some global companies and strong countries is leading to imbalances and fears by different stakeholders hampering the acceptance of Open Science (OS). (4) Finally, OSD will depend on the availability of a global infrastructure fostering an integrated and interoperable data domain—“one data-domain” as George Strawn calls it—which is still not visible due to differences about the technological key pillars. OS therefore is a need for DS, but it will take much more time to implement it than we may have expected.
Keywords: Open Science by Design; Open Science by Publication; Data Science; Data infrastructure; Digital Objects; FAIR
Burgelman, J.-C., et al.: Open science, open data, and open scholarship: European policies to make science fit for the twenty-first century. Frontiers in Big Data 2,43 (2019)
Hey, T., Tansley, S., Tolle, K. (eds.): The fourth paradigm: Data-intensive scientific discovery. Microsoft Research, Redmond (2009)
RDA DFT core terms and model. Available at: http://hdl.handle.net/11304/5d760a3e-991d-11e5-9bb4-2b0aad496318. Accessed 5 January 2021
Open science. Available at: https://en.wikipedia.org/wiki/Open_science. Accessed 5 January 2021
National Academies of Sciences, Engineering, and Medicine. Open science by design: Realizing a vision for 21st century research. The National Academies Press, Washington (2018)
Strawn, G.: Open science and the hype cycle. Data Intelligence 3(1), 88-94, 2021.
Semantic web. Available at: https://en.wikipedia.org/wiki/Semantic_Web. Accessed 5 January 2021
GEDE-RDA-Europe/GEDE. Available at: https://github.com/GEDE-RDA-Europe/GEDE/blob/master/Digital-Objects/DO-Workshops/Workshop-Philadelphia-2019/koureas-do-p13.pdf. Accessed 5 January 2021
Jeffery, K., et al.: Not ready for convergence in data infrastructures. Data Intelligence 3(1), 116-135 (2021)
Paris-FDO-workshop. Available at: https://github.com/GEDE-RDA-Europe/GEDE/tree/master/FAIR%20Digital%20Objects/Paris-FDO-workshop. Accessed 5 January 2021
Bonino at GEDE Paris Session. Available at: https://github.com/GEDE-RDA-Europe/GEDE/blob/master/FAIR%20Digital%20Objects/Paris-FDO-workshop/GEDE_Paris_Session%201_Bonino.pptx. Accessed 5 January 2021
DOBES. Available at: https://dobes.mpi.nl/data. Accessed 5 January 2021
NOMAD Centre of Excellence. Available at: https://nomad-coe.eu/. Accessed 5 January 2021
Data Fabric IG. Available at: https://www.rd-alliance.org/group/data-fabric-ig.html. Accessed 5 January 2021
Data is the new oil of the digital economy. Available at: https://www.wired.com/insights/2014/07/data-new-oil-digital-economy/. Accessed 5 January 2021
Wittenburg, P., Strawn, G.: Common patterns in revolutionary infrastructures and data. Available at: http://doi.org/10.23728/b2share.4e8ac36c0dd343da81fd9e83e72805a0. Accessed 5 January 2021
Article and author information
Cite As
Wittenburg, P.: Open Science and Data Science. Data Intelligence 3(1), 95-105 (2021). doi: 10.1162/dint_a_00082
Peter Wittenburg
Peter Wittenburg was Executive Director of Research Data Alliance (RDA) Europe, Member of RDA Technical Advisory Board, and Scientific Coordinator of European Data Infrastructure (EUDAT). He set up and led the Technical Group with about 30 experts at Max Planck Institute (MPI) for Psycholinguistics and then led the Language Archiving Group with about 25 experts. Since 2000 he has played leading roles in a variety of European (funded by the European Commission) and national projects (funded by MPS, DFG, BMBF, NWO 23) and ISO initiatives (ISO TC37/SC4). He won the Heinz Billing Award of the Max Planck Society (MPS) for the advancement of scientific computation in 2011 and received an honorary doctorate from University Tübingen in 2013.
Publication records
Published: May 10, 2021 (Versions1
Data Intelligence