Online First Versions 3 Vol 2 (2) 2019
Download
The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas
: 2019 - 04 - 15
: 2019 - 08 - 10
: 2019 - 10 - 10
48 3 0
Abstract & Keywords
Abstract: Ontologies of research areas are important tools for characterizing, exploring, and analyzing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large data set of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a Web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualize sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data.
Keywords: Scholarly data; Ontology learning; Bibliographic data; Scholarly ontologies; Semantic Web
Acknowledgments
[1]
H. Saif, Y. He, & H. Alani. Semantic sentiment analysis of Twitter. In: P. Cudré-Mauroux, et al. (eds) The Semantic Web – ISWC 2012. Berlin: Springer, 2012, pp. 508-524. doi: 10.1007/978-3-642-35176-1_32.
[2]
L. Ding, P. Kolari, Z. Ding, & S. Avancha. Using ontologies in the semantic Web: A survey. In: R. Sharman, R. Kishore & R. Ramesh (eds.) Ontologies: A Handbook of Principles, Concepts and Applications in Information Systems. New York: Springer, 2007, pp. 79–113. doi: 10.1007/978-0-387-37022-4_4.
[3]
F. Osborne, A. Salatino, A. Birukou & E. Motta. Automatic classification of Springer Nature proceedings with smart topic miner. In: Groth P. et al. (eds.) The Semantic Web – ISWC 2016. Cham, Switzerland: Springer, 2016, pp. doi: 10.1007/978-3-319-46547-0_33
[4]
S.E. Middleton, D.D. Roure & N.R. Shadbolt. Ontology-based recommender systems. In: S. Staab & R. Studer (eds.) Handbook on Ontologies. Berlin: Springer, 2009, pp. 779–796. doi: 10.1007/978-3-540-92673-3.
[5]
A. Hotho, S. Staab & G. Stumme. Ontologies improve text document clustering. In: The Third IEEE International Conference on Data Mining, 2003, pp. 541–544. doi: 10.1109/ICDM.2003.1250972.
[6]
K.M. Livingston, M. Bada, W.A. Baumgartner & L.E. Hunter. KaBOB: Ontology-based semantic integration of biomedical databases. BMC Bioinformatics 16(2015), Article No. 126. doi: 10.1186/s12859-015-0559-3.
[7]
F. Osborne, E. Motta & P. Mulholland. Exploring scholarly data with rexplore. In: International Semantic Web Conference, 2013, pp. 460–477. doi: 10.1007/978-3-642-41335-3_29.
[8]
S. Fathalla, S. Vahdati, S. Auer & C. Lange. Towards a knowledge graph representing research findings by semantifying survey articles. In: J. Kamps et al. (eds) Research and Advanced Technology for Digital Libraries. Cham, Switzerland: Springer, 2017, pp. 315–327. doi: 10.1007/978-3-319-67008-9_25.
[9]
L.M.A. Bettencourt, D.I. Kaiser & J. Kaur. Scientific discovery and topological transitions in collaboration networks. Journal of Informetrics 3(3)(2009), 210–221. doi: 10.1016/j.joi.2009.03.001.
[10]
F. Osborne, G. Scavo & E. Motta. Identifying diachronic topic-based research communities by clustering shared research trajectories. In: The Semantic Web: Trends and Challenges, pp. 114--129. Springer International Publishing (2014). doi: 10.1007/978-3-319-07443-6_9.
[11]
A.A. Salatino, F. Osborne & E. Motta. AUGUR: Forecasting the emergence of new research topics. In: JCDL’18: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries, 2018, pp. 303-312. doi: 10.1145/3197026.3197052.
[12]
F. Osborne, A. Mannocci & E. Motta. Forecasting the spreading of technologies in research communities. In: Proceedings of the Knowledge Capture Conference, 2017, Article No. 1. doi: 10.1145/3148011.3148030.
[13]
F. Osborne & E. Motta. Klink-2: Integrating multiple Web sources to generate semantic topic networks. In: M. Arenas et al. (eds.) The Semantic Web - ISWC 2015. Cham, Switzerland: Springer, 2015, pp. 408–424. doi: 10.1007/978-3-319-25007-6_24.
[14]
F. Osborne & E. Motta. Mining semantic relations between research Areas. In: P. Cudré-Mauroux et al. (eds) The Semantic Web – ISWC 2012. Berlin: Springer, 2012, pp. doi: 10.1007/978-3-642-35176-1_26.
[15]
F. Osborne & E. Motta. Pragmatic ontology evolution: Reconciling user requirements and application performance. In: D. Vrandečić et al. (eds.) The 17th International Semantic Web Conference. Cham, Switzerland: Springer, 2018, pp. 495-512. doi: 10.1007/978-3-030-00671-6_29.
[16]
F. Osborne, H. Muccini, P. Lago & E. Motta. Reducing the effort for systematic reviews in software engineering. Data Science (2019), 1-29. doi: 10.3233/DS-190019.
[17]
T. Thanapalasingam, F. Osborne, & E. Motta. Ontology-based recommendation of editorial products. In: Vrandečić D. et al. (eds.) The Semantic Web – ISWC 2018. Cham, Switzerland: Springer, 2018, pp 341-358 doi: 10.1007/978-3-030-00668-6_21.
[18]
A.A. Salatino, T. Thanapalasingam, A. Mannocci, F. Osborne & E. Motta. The computer science ontology: A large-scale taxonomy of research areas. In: D. Vrandečić et al. (eds.) The 17th International Semantic Web Conference. Cham, Switzerland: Springer, 2018, pp. 187–205. Doi: 10.1007/978-3-030-00668-6_12.
[19]
G. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak & S. Hellmann. DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics 7(2009), 154–165. doi: 10.2139/ssrn.3199424.
[20]
D. Vrandečić & M. Krötzsch. Wikidata: A free collaborative knowledgebase. Communications of the ACM 57(2014), 78–85. doi: 10.1145/2629489.
[21]
F.M. Suchanek, G. Kasneci & G. Weikum. YAGO: A core of semantic knowledge unifying WordNet and Wikipedia. In: Proceedings of the 16th International Conference on World Wide Web, 2007, pp. 697-706. doi: 10.1145/1242572.1242667.
[22]
K. Bollacker, C. Evans, P. Paritosh, T. Sturge & J. Taylor. Freebase: A collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data (SIGMOD '08), 2008, pp. 1247-1250.
[23]
J.F. Sowa. Building large knowledge-based systems: Representation and inference in the Cyc Project: D.B. Lenat and R.V. Guha. Artificial Intelligence 61(1)(1993), 95–104. doi: 10.1016/0004-3702(93)90096-T
[24]
A.A. Salatino, F. Osborne, T. Thanapalasingam, & E. Motta. The CSO classifier: Ontology-driven detection of research topics in scholarly articles. In: A. Doucet et. al. (eds.) TPDL 2019: 23rd International Conference on Theory and Practice of Digital Libraries. Cham, Switzerland: Springer, 2019, pp. 296-311. doi: 10.1007/978-3-030-30760-8_26.
[25]
F. Osborne & E. Motta. Exploring research trends with rexplore. D-Lib Magazine 19(9/10)(2013). doi: 10.1045/september2013-osborne.
[26]
K.W. Boyack, D. Newman, R.J. Duhon, R. Klavans, M. Patek, J.R. Biberstine … & K. Börner. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS ONE 6(3)(2011), e18029. doi: 10.1371/journal.pone.0018029.
[27]
C.E. Lipscomb. Medical Subject Headings (MeSH). Bulletin of the Medical Library Association 88(3)(2000), 265–266.
[28]
B. Cherrier. Classifying economics: A history of the JEL Codes. Journal of Economic Literature 55(2017), 545–579. doi: 10.2139/ssrn.2537382.
[29]
P. Clough, M. Sanderson & T. Gollins. Examining the limits of crowdsourcing for relevance assessment. IEEE Internet Computing 17(2013), 32–38. doi: 10.1109/mic.2012.95.
[30]
P. Cimiano & J. Völker. Text2Onto. In: A. Montoyo, R. Muńoz & E. Métais (eds.) Natural Language Processing and Information Systems (NLDB 2005). Berlin: Springer, 2005, pp. 227–238. doi: 10.1007/11428817_21.
[31]
A. Muller, J. Dorre, P. Gerstl & R. Seiffert. The TaxGen framework: automating the generation of a taxonomy for a large document collection. In: Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences, 1999, pp. 82. doi: 10.1109/HICSS.1999.772687.
[32]
M. Sanderson & B. Croft. Deriving concept hierarchies from text. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval—SIGIR ’99, 1999, pp. 206–213. doi: 10.1145/312624.312679.
[33[ Z. Shen, H. Ma & K. Wang. A Web-scale system for scientific knowledge exploration. In: Proceedings of ACL 2018, System Demonstrations, 2018, pp. 87–92. doi: 10.18653/v1/P18-4015.
[34]
G. Wohlgenannt, A. Weichselbraun, A. Scharl & M. Sabou. Dynamic integration of multiple evidence sources for ontology learning. Journal of Information and Data Management 3(3)(2012), 243–254.
[35]
J.M. Mortensen, M.A. Musen & N.F. Noy. Crowdsourcing the verification of relationships in biomedical ontologies. In: AMIA Annual Symposium Proceedings, 2013, pp. 1020–1029.
[36]
S. Kirrane, M. Sabou, J.D. Fernández, F. Osborne, C. Robin, P. Buitelaar … & A. Polleres. A decade of Semantic Web research through the lenses of a mixed methods approach. Submitted to Semantic Web Journal (2019).
[37]
J. Daiber, M. Jakob, C. Hokamp & P.N. Mendes. Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS ’13), 2013, pp. 121-124.
[38]
C. Bizer, T. Heath & T. Berners-Lee. Linked data - The story so far. International Journal on Semantic Web and Information Systems 5(2009), 1–22. doi: 10.4018/jswis.2009081901.
[39]
A.A. Salatino, T. Thanapalasingam, A. Mannocci, F. Osborne & E. Motta. Classifying research papers with the computer science ontology. In: The 17th International Semantic Web Conference (P&D/Industry/BlueSky), 2018,
[40]
A.A. Salatino, F. Osborne, A. Birukou & E. Motta. Improving editorial workflow and metadata quality at Springer Nature. Submitted to the In-Use track of the International Semantic Web Conference, October 26–30, 2019, The University of Auckland, New Zealand.
[41]
L. Charlin & R.S. Zemel. The Toronto paper matching system: An automated paper-reviewer assignment system. Available at: http://www.cs.utoronto.ca/~lcharlin/papers/tpms.pdf.
[42]
A.E. Cano-Basave, F. Osborne & A.A. Salatino. Ontology forecasting in scientific literature: Semantic concepts prediction based on innovation-adoption priors. In: Knowledge Engineering and Knowledge Management, 2016, pp. 51–67. doi: 10.1007/978-3-319-49004-5_4.
[43]
A. Mannocci, F. Osborne & E. Motta. The Evolution of IJHCS and CHI: A quantitative analysis. International Journal of Human-Computer Studies 131(2019), 23-40. doi: 10.1016/j.ijhcs.2019.05.009.
[44]
D.M. Blei, A.Y. Ng & M.I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research 3(2003), 993–1022.
[45]
F. Monaghan, G. Bordea, K. Samp & P. Buitelaar. Exploring your research: Sprinkling some Saffron on Semantic Web Dog Food. In: Semantic Web Challenge at the International Semantic Web Conference, 2010, pp. 420-435.
Article and author information
Cite As
A.A. Salatino, T. Thanapalasingam, A. Mannocci, A. Birukou, F. Osborne & E. Motta. The computer science ontology: A comprehensive automatically-generated taxonomy of research areas. Data Intelligence 2(2020).
Angelo A. Salatino
Angelo A. Salatino (angelo.salatino@open.ac.uk) and Thiviyan Thanapalasingam (thiviyan.thanapalasingam@open.ac.uk) designed and developed the resource, wrote the paper, and reviewed drafts of the paper.
Angelo A. Salatino is a Research Associate at the Intelligence Systems and Data Science (ISDS) group, at the Knowledge Media Institute (KMi) of The Open University. He obtained a PhD, studying methods for the early detection of research trends. In particular, his project aimed at identifying the emergence of new research topics at their embryonic stage (i.e., before being recognized by the research community). Currently, he is mainly working on: i) new technologies for classifying scientific papers according to their relevant research topics, and ii) how the research output of academia fosters innovation in the industry. His research interests are in the areas of Semantic Web, Network Science and Knowledge Discovery technologies, with focus on the structure and evolution of science: Science of Science. His academic record is available at https://salatino.org.
0000-0002-4763-3943
Thiviyan Thanapalasingam
Angelo A. Salatino (angelo.salatino@open.ac.uk) and Thiviyan Thanapalasingam (thiviyan.thanapalasingam@open.ac.uk) designed and developed the resource, wrote the paper, and reviewed drafts of the paper.
Thiviyan Thanapalasingam is a PhD candidate at the University of Amsterdam. Under the supervision of Professor Paul Groth, Thiviyan is studying graph embedding methods for rapidly constructing Knowledge Graphs for the natural sciences and adapting them for downstream applications, such as query answering and text summarization. Before starting his doctoral research, Thiviyan obtained a first-class honours Masters degree in Chemistry from the University of Leicester and then, he worked as a Research Assistant within the Intelligent Systems and Data Science (ISDS) Group at the Knowledge Media Institute (KMi) of The Open University.
Andrea Mannocci
Andrea Mannocci (andrea.mannocci@open.ac.uk) analyzed the data, wrote and reviewed the paper.
Andrea Mannocci is a Research Fellow at the Institute of Information Science and Technologies (ISTI) of the Italian Research Council (CNR) in Pisa. He currently works as a data scientist within the framework of the EU project OpenAIRE Advance. His research interests span from the development of novel metrics and impact indicators for Open Science, to Science of Science, complex networks and the analysis of research as a global-scale phenomenon inserted in a delicate socioeconomic and geopolitical context. Previously, he was a member of the Scholarly Knowledge Mining, Modelling and Sense Making (SKM3) at the Knowledge Media Institute (KMi) of The Open University, UK. He obtained his PhD degree in Information Engineering from the University of Pisa, Italy, researching on systems for data flow quality monitoring in data infrastructures.
Aliaksandr Birukou
Aliaksandr Birukou (aliaksandr.Birukou@springer.com) wrote and reviewed drafts of the paper.
Aliaksandr (Alex) Birukou is working as an Editorial Director in Springer Nature. His team in Computer Science (CS) Editorial in Heidelberg is publishing the conference proceedings in CS (about 850 volumes/year, including the LNCS series). Alex's other team running the portfolio of about 200 journals in different disciplines, translated from Russian into English. Apart from editorial work Alex represents CS in several internal and external research and development (R&D) projects dealing with optimization or innovation of scientific publishing. Most recent projects include: persistent IDentifiers for conferences (Alex chairs the Crossref/DataCite group: https://www.crossref.org/working-groups/conferences-projects/); intelligent dashboards for editorial (automated assessment of conferences); OCS/EquinOCS (conference submission management system). Previously, Alex founded lod.springer.com, which later became a part of Springer Nature SciGraph. Alex is also a co-founder of ConfRef.org, which got Digital Science Catalyst Grant and has the ambition of becoming Google for scientific conferences. He enjoys teaching courses about publishing and publishing innovation in the University of Trento, Italy, and People's Friendship University, Moscow, Russia. You can see his academic record at https://scholar.google.com/citations?user=ilAhtBgAAAAJ&hl=en.
Francesco Osborne
Francesco Osborne (francesco.osborne@open.ac.uk) and Enrico Motta (enrico.motta@open.ac.uk) designed the resource, wrote the paper, and reviewed drafts of the paper.
Dr. Francesco Osborne is a Research Fellow at the Knowledge Media Institute, The Open University, UK, where he leads the Scholarly Knowledge Mining team. He received his PhD degree in Computer Science from the University of Torino, Italy. He has authored more than 70 peer reviewed publications in the fields of Information Extraction, Knowledge Graphs, Science of Science, Semantic Web, Research Analytics, and Semantic Publishing. His work on innovative solutions for scholarly analytics has attracted extensive interest from the industrial sector, leading to funding from both Springer Nature and Elsevier, the top two international academic publishers.
Enrico Motta
Francesco Osborne (francesco.osborne@open.ac.uk) and Enrico Motta (enrico.motta@open.ac.uk) designed the resource, wrote the paper, and reviewed drafts of the paper.
Prof. Enrico Motta has a PhD in Artificial Intelligence from the UK's Open University, where is currently a Professor in Knowledge Technologies. He also holds a professorial appointment at University of Bergen in Norway. In the course of his academic career he has authored over 350 refereed publications and his h-index is 67, an impact indicator that positions him among the top computer scientists in UK. His research focuses on large scale data integration and analysis to support decision making in complex scenarios. Among his recent projects, he has led the MK:Smart initiative, a £17.2M project that tackled key barriers to economic growth in Milton Keynes through the deployment of innovative data-intensive solutions. He is also currently working on new solutions for scholarly analytics and in particular he is collaborating with Springer Nature to develop new tools to improve both the quality and efficiency of editorial processes in the academic publishing industry. Prof. Motta was Editor-in-Chief of the International Journal of Human-Computer Studies from 2004 to 2018 and, over the years, has advised strategic research boards and governments in several countries, including UK, US, The Netherlands, Austria, Finland, and Estonia.
Publication records
Published: None (Versions3
References
Data Intelligence