Published Versions 2 Vol 2 (1) : 78–86 2019
Download
FAIR Data Reuse - the Path through Data Citation
156 15 0
Abstract & Keywords
Abstract: One of the key goals of the FAIR guiding principles is defined by its final principles - to optimize data sets for reuse by both humans and machines. To do so, data providers need to implement and support consistent machine readable metadata to describe their data sets. This can seem like a daunting task for data providers, whether it is determining what level of detail should be provided in the provenance metadata or figuring out what common shared vocabularies should be used. Additionally, for existing data sets it is often unclear what steps should be taken to enable maximal, appropriate reuse. Data citation already plays an important role in making data findable and accessible, providing persistent and unique identifiers plus metadata on over 16 million data sets. In this paper, we discuss how data citation and its underlying infrastructures, in particular associated metadata, provide an important pathway for enabling FAIR data reuse.
Keywords: FAIR data; Data citation; Research objects; Data provenance
Acknowledgements
This work was partially supported by Horizon 2020, INFRADEV-4-2014-2015, 654248, CORBEL, Coordinated Research Infrastructures Building Enduring Life-science services.
[2]
D. Mietchen, J. McEntyre, J. Beck, C. Maloney, & Force11 Data Citation Implementation Group. Adapting JATS to support data citation. In: Journal Article Tag Suite Conference (JATS-Con) Proceedings 2015 [Internet], National Center for Biotechnology Information (US). Available at: https://www.ncbi.nlm.nih.gov/books/NBK280240/.
[4]
M. Fenner, M. Crosas, J.S. Grethe, D. Kennedy, H. Hermjakob, P. Rocca-Serra, ... & T. Clark. A data citation roadmap for scholarly data repositories. Scientific Data, 6(2019), Article No. 28. doi: 10.1038/s41597-019-0031-8.
[8]
H. Cousijn, P. Feeney, D. Lowenberg, E. Presani, & N. Simons. Bringing citations and usage metrics together to make data count. Data Science Journal 18(1)(2019), 9. DOI:http://doi.org/10.5334/dsj-2019-009
[10]
B.E. Bierer, M. Crosas, H.H. Pierce. 2017. Data authorship as an incentive to data sharing. The New England Journal of Medicine 376(17)(2017), 1684-1687. doi: 10.1056/NEJMsb1616595.
[11]
For Attribution — Developing Data Attribution and Citation Practices and Standards: Summary of an International Workshop. 2012. Washington, D.C.: National Academies Press. Available at: http://www.nap.edu/catalog/13564.
[12]
Joan Starr, Eleni Castro, Mercè Crosas, Michel Dumontier, & T. Clark Achieving human and machine accessibility of cited data in scholarly publications. PeerJ Computer Science1, e1. doi: 10.7717/peerj-cs.1.
[13]
International DOI Foundation. The DOI Handbook. (2012). doi: 10.1000/186.
[15]
M.D. Wilkinson, M. Dumontier, I.J. Aalbersberg, G. Appleton, M. Axton, A. Baak, … & B. Mons. The FAIR guiding principles for scientific data management and stewardship. Scientific Data 3(2016), Article No. 160018. doi: 10.1038/sdata.2016.18.
[16]
M. Fenner. 2017. Using Schema.org for DOI Registration. doi: 10.5438/0000-00cc.
[19]
M. Fenner. Exposing DOI metadata provenance (Version 1.0). (April 10, 2019). doi: 10.5438/wy92-xj57.
[22]
F.A. Nielsen, D. Mietchen, & E. Willighagen. Scholia, scientometrics and Wikidata. In: E. Blomqvist, K. Hose, H. Paulheim et al. (eds.) The Semantic Web: ESWC 2017 Satellite Events. ESWC 2017. Cham, Switzerland: Springer, 2017, pp. 237-259. doi: 10.1007/978-3-319-70407-4_36.
[24]
S. Bechhofer, I. Buchan, D. De Roure, P. Missier, J. Ainsworth, J., Bhagat, … & C. Goble. Why linked data is not enough for scientists. Future Generation Computer Systems, 29(2)(2013), 599–611. doi: 10.1016/j.future.2011.08.004.
[25]
J. Kunze, J. Littman, E. Madden, J. Scancella, & C. Adams. The BagIt file packaging format (V1.0). (October 2018). doi: 10.17487/rfc8493.
Paul Groth is Professor of Algorithmic Data Science at the University of Amsterdam where he leads the Intelligent Data Engineering Lab (INDElab). He holds a PhD in Computer Science from the University of Southampton (2007).  His research focuses on intelligent systems for dealing with large amounts of diverse contextualized knowledge with a particular focus on Web and science applications.
Article and author information
Cite As
P. Groth, H. Cousijn, T. Clark & C. Goble. FAIR data reuse – the path through data citation. Data Intelligence 2(2020), 78–86. doi: 10.1162/dint_a_00030
Paul Groth
P. Groth (p.groth@uva.nl) conceptualized and wrote the first draft of the paper. All authors edited and reviewed the final version of the article.
p.groth@uva.nl
Paul Groth is Professor of Algorithmic Data Science at the University of Amsterdam where he leads the Intelligent Data Engineering Lab (INDElab). He holds a PhD in Computer Science from the University of Southampton (2007). His research focuses on intelligent systems for dealing with large amounts of diverse contextualized knowledge with a particular focus on Web and science applications.
0000-0003-0183-6910
Helena Cousijn
T. Clark (twc8q@virginia.edu), H. Cousijn (hcousijn@datacite.org) and C. Goble (carole.goble@manchester.ac.uk) clarified the ideas and concepts in the paper. All authors edited and reviewed the final version of the article.
Helena Cousijn is DataCite’s Community Engagement and Communications Director. She has committed to DataCite’s mission of enabling data sharing and reuse and is especially passionate about data citation. Before joining DataCite, Helena worked as Senior Product Manager for Research Data Management Solutions at Elsevier. She holds a DPhil in Neuroscience from the University of Oxford.
0000-0001-6660-6214
Tim Clark
T. Clark (twc8q@virginia.edu), H. Cousijn (hcousijn@datacite.org) and C. Goble (carole.goble@manchester.ac.uk) clarified the ideas and concepts in the paper. All authors edited and reviewed the final version of the article.
Tim Clark is Associate Professor of Public Health Sciences, Associate Professor of Neurology (by courtesy), and Associate Research Director for Neuroinformatics in the Data Science Institute, at the University of Virginia. He holds a PhD in Computer Science from The University of Manchester. His research interests include biomedical knowledge representation, computational models of evidence, cloud computing, and neuroscience.
0000-0003-4060-7360
Carole Goble
T. Clark (twc8q@virginia.edu), H. Cousijn (hcousijn@datacite.org) and C. Goble (carole.goble@manchester.ac.uk) clarified the ideas and concepts in the paper. All authors edited and reviewed the final version of the article.
Carole Goble is Professor of Computer Science at The University of Manchester. Over the past 25 years Carole has pursued research interests in the acceleration of FAIR scientific innovation through: distributed computing, workflows and automation; knowledge management and the Semantic Web; social, virtual environments; software engineering for scientific software; and new models of scholarship for data-intensive science. Carole has served on numerous committees and currently serves in the G7 Open Science Working Group as the UK expert. In 2008 she was awarded the Microsoft Jim Gray e-Science award for contributions to e-Science and in 2010 was elected a Fellow of the Royal Academy of Engineering. In 2014 she was awarded the Commander of the Order of the British Empire for services to Science.
0000-0003-1219-2137
Publication records
Published: None (Versions2
References
Data Intelligence