Online First Versions 1 Vol 2 (2) 2019
Download
Refining Linked Data with Games with a Purpose
: 2019 - 04 - 15
: 2019 - 08 - 10
: 2019 - 10 - 10
146 4 0
Abstract & Keywords
Abstract: With the rise of linked data and knowledge graphs, the need becomes compelling to find suitable solutions to increase the coverage and correctness of data sets, to add missing knowledge and to identify and remove errors. Several approaches – mostly relying on machine learning and natural language processing techniques – have been proposed to address this refinement goal; they usually need a partial gold standard, i.e. some “ground truth” to train automatic models. Gold standards are manually constructed, either by involving domain experts or by adopting crowdsourcing and human computation solutions. In this paper, we present an open source software framework to build Games with a Purpose for linked data refinement, i.e. web applications to crowdsource partial ground truth, by motivating user participation through fun incentive. We detail the impact of this new resource by explaining the specific data linking “purposes” supported by the framework (creation, ranking and validation of links) and by defining the respective crowdsourcing tasks to achieve those goals. We also introduce our approach for incremental truth inference over the contributions provided by players of Games with a Purpose (also abbreviated as GWAP): we motivate the need for such a method with the specificity of GWAP vs. traditional crowdsourcing; we explain and formalize the proposed process and we explain its positive consequences and we illustrate the results of an experimental comparison with state-of-the-art approaches. To show this resource’s versatility, we describe a set of diverse applications that we built on top of it; to demonstrate its reusability and extensibility potential, we provide references to detailed documentation, including an entire tutorial which in a few hours guides new adopters to customize and adapt the framework to a new use case.
Keywords: Human computation; Games with a purpose; Linked data; Knowledge graph; Data refinement; Data linking; Truth inference
Acknowledgements
This work was partially supported by the STARS4ALL project (H2020-688135) co-funded by the European Commission.
[1]
M. Acosta, A. Zaveri, E. Simperl, D. Kontokostas, S. Auer & J. Lehmann. Crowdsourcing linked data quality assessment. In: International Semantic Web Conference, 2013, pp. 260–276. doi: 10.1007/978-3-642-41338-4_17.
[2]
D.C. Brabham. Crowdsourcing. Cambridge, MA: MIT Press, 2013.
[3]
M.A. Brovelli, I. Celino, A. Fiano, M.E. Molinari & V. Venkatachalam. A crowdsourcing-based game for land cover validation. Applied Geomatics 10(1)(2018), 1–11. doi: 10.1007/s12518-017-0201-3.
[4]
I. Celino, D. Cerizza, S. Contessa, M. Corubolo, D. Dell'Aglio, E.D. Valle, & S. Fumeo. Urbanopoly: A social and location-based game with a purpose to crowdsource your urban data. In: Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Conference on Social Computing (SocialCom), 2012, pp. 910–913. doi: 10.1109/SocialCom-PASSAT.2012.138.
[5]
I. Celino, S. Contessa, M. Corubolo, D. Dell'Aglio, E. Della Valle, S. Fumeo & T. Krüger. Linking smart cities datasets with human computation: The case of UrbanMatch. In: P. Cudré-Mauroux et al. (eds) The Semantic Web – ISWC 2012. Berlin: Springer, pp. 34–49. doi: 10.1007/978-3-642-35173-0_3.
[6]
I. Celino, I., E. Della Valle & R Gualandris. On the effectiveness of a mobile puzzle game UI to crowdsource linked data management tasks. In: 1st International Workshop on User Interfaces for Crowdsourcing and Human Computation, 2014.
[7]
I. Celino, A. Fiano & R Fino. Analysis of a cultural heritage game with a purpose with an educational incentive. In: International Conference on Web Engineering, 2016, pp. 422–430. doi: 10.1007/978-3-319-38791-8_28.
[8]
J. Chamberlain, M. Poesio & U. Kruschwitz. Phrase detectives: A web-based collaborative annotation game. In: Proceedings of the International Conference on Semantic Systems (I-Semantics' 08), 2008, pp. 42–49.
[9]
A.P. Dawid & A.M. Skene. Maximum likelihood estimation of observer error-rates using the em algorithm. Applied statistics(1979), pp. 20–28. doi: 10.2307/2346806.
[10]
X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy … & W. Zhang. Knowledge vault: A web-scale approach to probabilistic knowledge fusion. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2014, pp. 601–610. doi: 10.1145/2623330.2623623.
[11]
I.P. Fellegi & A.B. Sunter. A theory for record linkage. Journal of the American Statistical Association 64(328)(1969), 1183–1210.
[12]
A. Ferrara, A. Nikolov & F. Scharffe. Data linking for the semantic web. In: Semantic Web: Ontology and Knowledge Base Enabled Tools, Services, and Applications, 2013, 169–200. doi: 10.4018/978-1-4666-3610-1.ch008.
[13]
C. Fürber & M. Hepp. Using SPARQL and SPIN for data quality management on the semantic Web. In: W. Abramowicz & R. Tolksdorf (eds.) Business Information Systems (BIS 2010). Berlin: Springer, pp. 35–46. doi: 10.1007/978-3-642-12814-1_4.
[14]
C. Guéret, P. Groth, C. Stadler, & J. Lehmann. Assessing linked data mappings using network measures. In: Extended Semantic Web Conference, 2012, pp. 87–102.
[15]
J. Hees, T. Roth-Berghofer, R. Biedert, B. Adrian, & A. Dengel. BetterRelations: Using a game to rate linked data triples. In: Annual Conference on Artificial Intelligence, 2011, pp. 134–138.
[16]
J. Howe. The rise of crowdsourcing. Wired magazine 14(6)(2006), 1–4.
[17]
D.R. Karger, S. Oh & D. Shah. Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, 2011, pp. 1953–1961.
[18]
E. Law & L.v. Ahn. Human computation. Synthesis Lectures on Artificial Intelligence and Machine Learning 5(3)(2011), 1–121.
[19]
B. Mozafari, P. Sarkar, M. Franklin, M. Jordan & S. Madden. Scaling up crowd-sourcing to very large datasets: A case for active learning. In: Proceedings of the VLDB Endowment 8(2)(2014), 125–136.
[20]
B. Mozafari, P. Sarkar, M.J. Franklin, M.I. Jordan & S. Madden. Active learning for crowd-sourced databases. arXiv preprint. arXiv:1209.3686, 2012.
[21]
H. Paulheim. Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web 8(3)(2017), 489–508.
[22]
H. Paulheim & C. Bizer. Improving the quality of linked data using statistical distributions. International Journal on Semantic Web and Information Systems (IJSWIS) 10(2)(2014), 63–86.
[23]
A.J. Quinn & B.B. Bederson. Human computation: A survey and taxonomy of a growing eld. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2011, pp. 1403–1412.
[24]
W.M. Rand. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association 66(336)(1071), 846–850.
[25]
G. Re Calegari, G. Nasi & I. Celino. Human computation vs. machine learning: An experimental comparison for image classification. Human Computation Journal 5(1)(2018), 13–30.
[26]
B. Settles. Active learning. Synthesis Lectures on Articial Intelligence and Machine Learning 6(1)(2012), 1–114.
[27]
R. Shah. Spam Hurts Crowdsourcing But Can't Kill It, (Forbes Contributor Opinions). Available at: https://www.forbes.com/sites/rawnshah/2010/12/17/spam-hurts-crowdsourcing-but-cant-kill-it/.
[28]
V.S. Sheng, F. Provost & P.G. Ipeirotis. Get another label? Improving data quality and data mining using multiple, noisy labelers. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 614–622.
[29]
E. Simperl, B. Norton & D. Vrandecic. Crowdsourcing tasks in linked data management. In: Proceedings of the Second International Conference on Consuming Linked Data-Volume 782, 2012, pp. 61–72.
[30]
K. Siorpaes & M. Hepp. Games with a Purpose for the Semantic Web. IEEE Intelligent Systems 23(3)(2008)
[31]
J. Sleeman & T. Finin. Type prediction for ecient coreference resolution in heterogeneous semantic graphs. In: Semantic Computing (ICSC), 2013 IEEE Seventh International Conference, 2013, pp. 78–85.
[32]
Sleeman, J., Finin, T., Joshi, A.: Topic Modeling for RDF Graphs. In: LD4IE at ISWC. pp. 48{62 (2015)
[33]
S. Thaler, E.P.B. Simperl & K. Siorpaes. SpotTheLink: A game for ontology alignment. Wissensmanagement 182(2011), 246–253.
[34]
U. Ul Hassan, S. O'Riain & E. Curry. Effects of expertise assessment on the quality of task routing in human computation. In: Proceedings of the 2nd International Workshop on Social Media for Crowdsourcing and Human Computation, 2013.
[35]
L. Von Ahn & L. Dabbish. Labeling images with a computer game. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2004, pp. 319–326.
[36]
L. Von Ahn & L. Dabbish. Designing games with a purpose. Communications of the ACM 51(8)(2008), 58–67.
[37]
J. Vuurens, A.P. de Vries & C. Eickho. How much spam can you take? an analysis of crowdsourcing results to increase accuracy. In: Proc. ACM SIGIR Workshop on Crowdsourcing for Information Retrieval (CIR'11), 2011, pp. 21–26.
[38]
J. Waitelonis, N. Ludwig, M. Knuth & H. Sack. WhoKnows? Evaluating linked data heuristics with a quiz that cleans up DBpedia. Interactive Technology and Smart Education 8(4), 2011, pp. 236–248.
[39]
C. Wieser, F. Bry, A. Bérard & R. Lagrange. ARTigo: building an artwork search engine with games and higher-order latent semantic analysis. In: First AAAI Conference on Human Computation and Crowdsourcing, 2013.
[40]
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann & S. Auer. Quality assessment for linked data: A survey. Semantic Web 7(1)(2016), 63–93.
[41]
Y. Zheng, G. Li, Y. Li, C. Shan & R. Cheng. Truth inference in crowdsourcing: Is the problem solved? Proceedings of the VLDB Endowment 10(5)(2017), 541–552.
Article and author information
Cite As
I. Celino, G. Re Calegari & A. Fiano. Refining linked data with games with a purpose. Data Intelligence 2(2020).
Irene Celino
The ideas and concepts presented in the paper are the results of at least three years of cooperationbetween the authors. I. Celino (irene.celino@cefriel.com) focused on data linking, crowdsourcing tasks andincremental truth inference. All authors contributed to the manuscriptwriting and they edited and reviewed the final version of the article.
irene.celino@cefriel.com
Irene Celino is the Head of the Knowledge Technologies group at Cefriel,where she leads an R&D team and she is Portfolio and Project Manager. Withexpertise in Semantic Web and Human Computation technologies, herresearch activities cover the application of such innovative technologies tothe design and development of Web applications, search engines,recommendations systems and mobile games, especially in Smart City andtransportation-related scenarios. She has over 15 years of experience in over30 R&D cooperative projects, both at National/Regional level and at Europeanlevel within FP6, FP7, H2020 and EIT Digital. She is author of over 70scientific publications in peer-reviewed journals, books and conferences.
0000-0001-9962-7193
Gloria Re Calegari
The ideas and concepts presented in the paper are the results of at least three years of cooperationbetween the authors. G. Re Calegari (gloria.re@cefriel.com) and A. Fiano (andrea.fiano@cefriel.com)focused on the framework, its applications, evaluation and tutorial. All authors contributed to the manuscriptwriting and they edited and reviewed the final version of the article.
Gloria Re Calegari is a researcher at Cefriel. She has a computer sciencebackground and her fields of expertise are Data Science and HumanComputation technologies. Her research activities cover the design anddevelopment of gamified application and Games with a Purpose, next tothe development of machine learning solutions that bring together humansand artificial intelligence. During her over 5 years of experience in R&Dcooperative projects, both at National and Regional level, she publishedmore than 20 scientific publications in peer-reviewed journals andconferences.
Andrea Fiano
The ideas and concepts presented in the paper are the results of at least three years of cooperationbetween the authors. G. Re Calegari (gloria.re@cefriel.com) and A. Fiano (andrea.fiano@cefriel.com)focused on the framework, its applications, evaluation and tutorial. All authors contributed to the manuscriptwriting and they edited and reviewed the final version of the article.
Andrea Fiano is a senior developer at Cefriel. Starting with the developmentof Web application in .Net, he continued with the development of backendsolutions and REST APIs in Java and practiced with Single Page Applicationsand Progressive Web App in Angular and Node.js. He provides his expertisein the development of customer tailored solutions as well as in supportingthe research branch. In particular, he has helped in the field of HumanComputation with the development of some Games with a Purpose in theSmart City and Crowdsourcing scenarios.
Publication records
Published: Nov. 12, 2019 (Versions1
References
Data Intelligence