Published Versions 3 Vol 2 (3) : 379-416 2020
Download
The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas
 >>
: 2020 - 04 - 15
: 2019 - 08 - 10
: 2019 - 10 - 10
: 2020 - 09 - 28
601 14 0
Abstract & Keywords
Abstract: Ontologies of research areas are important tools for characterizing, exploring, and analyzing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large data set of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a Web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualize sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data.
Keywords: Scholarly data; Ontology learning; Bibliographic data; Scholarly ontologies
Acknowledgments
[1]
Saif, H., He, Y., Alani, H.: Semantic Sentiment Analysis of Twitter. In: The Semantic Web -- ISWC 2012. pp. 508–524. Springer, Berlin, Heidelberg (2012).
[2]
Ding, L., Kolari, P., Ding, Z., Avancha, S.: Using Ontologies in the Semantic Web: A Survey. In: Ontologies: A Handbook of Principles, Concepts and Applications in Information Systems. pp. 79–113. Springer US, Boston, MA (2007).
[3]
Osborne, F., Salatino, A., Birukou, A., Motta, E.: Automatic Classification of Springer Nature Proceedings with Smart Topic Miner. In: International Semantic Web Conference 2016. pp. 383–399. Springer, Cham (2016).
[4]
Middleton, S.E., Roure, D. De, Shadbolt, N.R.: Ontology-Based Recommender Systems. In: Handbook on Ontologies. pp. 779–796. Springer Berlin Heidelberg, Berlin, Heidelberg (2009).
[5]
Hotho, A., Staab, S., Stumme, G.: Ontologies improve text document clustering. In: Third IEEE International Conference on Data Mining. pp. 541–544. IEEE Comput. Soc.
[6]
Livingston, K.M., Bada, M., Baumgartner, W.A., Hunter, L.E.: KaBOB: ontology-based semantic integration of biomedical databases. BMC Bioinformatics. 16, 126 (2015).
[7]
Osborne, F., Motta, E., Mulholland, P.: Exploring Scholarly Data with Rexplore. In: International Semantic Web Conference 2013, Sydney, Australia. pp. 460–477. Springer, Berlin, Heidelberg (2013).
[8]
Fathalla, S., Vahdati, S., Auer, S., Lange, C.: Towards a Knowledge Graph Representing Research Findings by Semantifying Survey Articles. In: Research and Advanced Technology for Digital Libraries. pp. 315–327. Springer, Cham (2017).
[9]
Bettencourt, L.M.A., Kaiser, D.I., Kaur, J.: Scientific discovery and topological transitions in collaboration networks. Journal of Informetrics. 3, 210–221 (2009).
[10]
Osborne, F., Scavo, G., Motta, E.: Identifying diachronic topic-based research communities by clustering shared research trajectories. In: The Semantic Web: Trends and Challenges. pp. 114--129. Springer International Publishing (2014).
[11]
Salatino, A.A., Osborne, F., Motta, E.: AUGUR: Forecasting the Emergence of New Research Topics. In: Joint Conference on Digital Libraries 2018, Fort Worth, Texas. pp. 1–10 (2018).
[12]
Osborne, F., Mannocci, A., Motta, E.: Forecasting the Spreading of Technologies in Research Communities. In: Proceedings of the Knowledge Capture Conference (2017).
[13]
Osborne, F., Motta, E.: Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks. In: International Semantic Web Conference 2015. pp. 408–424. Springer, Cham, Bethlehem, USA. (2015).
[14]
Osborne, F., Motta, E.: Mining Semantic Relations between Research Areas. Presented at the (2012).
[15]
Osborne, F., Motta, E.: Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance. In: International Semantic Web Conference 2018. Springer, Monterey, CA (USA). (2018).
[16]
Osborne, F., Muccini, H., Lago, P., Motta, E.: Reducing the Effort for Systematic Reviews in Software Engineering. Data Science. (2019).
[17]
Thanapalasingam, T., Osborne, F., Motta, E.: Ontology-Based Recommendation of Editorial Products. In: International Semantic Web Conference 2018. , Monterey, CA (USA). (2018).
[18]
Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas. In: International Semantic Web Conference 2018. pp. 187–205. Springer, Cham, Monterey, USA (2018).
[19]
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A crystallization point for the Web of Data. Journal of Web Semantics. 7, 154–165 (2009).
[20]
Vrandečić, D., Krötzsch, M.: Wikidata: A Free Collaborative Knowledgebase. Communications of the ACM. 57, 78–85 (2014).
[21]
Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia.
[22]
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data - SIGMOD ’08. p. 1247. ACM Press, New York, New York, USA (2008).
[23]
Lenat, D.B., Guha, R. V: Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project *. In: Artificial Intelligence. pp. 95–104 (1993).
[24]
Salatino, A.A., Osborne, F., Thanapalasingam, T., Motta, E.: The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. In: TPDL 2019: 23rd International Conference on Theory and Practice of Digital Libraries (2019).
[25]
Osborne, F., Motta, E.: Exploring research trends with rexplore. D-Lib Magazine. 19, (2013).
[26]
Boyack, K.W., Newman, D., Duhon, R.J., Klavans, R., Patek, M., Biberstine, J.R., Schijvenaars, B., Skupin, A., Ma, N., Börner, K.: Clustering More than Two Million Biomedical Publications: Comparing the Accuracies of Nine Text-Based Similarity Approaches. PLoS ONE. 6, e18029 (2011).
[27]
Lipscomb, C.E.: Medical Subject Headings (MeSH). Bulletin of the Medical Library Association. 88, 265–6 (2000).
[28]
Cherrier, B.: Classifying Economics: A History of the JEL Codes. Journal of Economic Literature. 55, 545–579 (2017).
[29]
Clough, P., Sanderson, M., Gollins, T.: Examining the Limits of Crowdsourcing for Relevance Assessment. IEEE Internet Computing. 17, 32–38 (2013).
[30]
Cimiano, P., Völker, J.: Text2Onto. In: Natural Language Processing and Information Systems. pp. 227–238. Springer, Berlin, Heidelberg (2005).
[31]
Muller, A., Dorre, J., Gerstl, P., Seiffert, R.: The TaxGen framework: automating the generation of a taxonomy for a large document collection. In: Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers. p. 9. IEEE Comput. Soc (1999).
[32]
Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR ’99. pp. 206–213. ACM Press, New York, New York, USA (1999).
[33]
Shen, Z., Ma, H., Wang, K.: A Web-scale system for scientific knowledge exploration. In: Proceedings of ACL 2018, System Demonstrations. pp. 87–92. Association for Computational Linguistics, Melbourne, Australia (2018).
[34]
Wohlgenannt, G., Weichselbraun, A., Scharl, A., Sabou, M.: Dynamic Integration of Multiple Evidence Sources for Ontology Learning. Journal of Information and Data Management. 3, 243–254 (2012).
[35]
Mortensen, J.M., Musen, M.A., Noy, N.F.: Crowdsourcing the verification of relationships in biomedical ontologies. AMIA Annual Symposium Proceedings. 2013, 1020–9 (2013).
[36]
Kirrane, S., Sabou, M., Fernández, J.D., Osborne, F., Robin, C., Buitelaar, P., Motta, E., Polleres, A.: A decade of Semantic Web research through the lenses of a mixed methods approach. Submitted to Semantic Web Journal. (2019).
[37]
Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems - I-SEMANTICS ’13. p. 121. ACM Press, New York, New York, USA (2013).
[38]
Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal on Semantic Web and Information Systems. 5, 1–22 (2009).
[39]
Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: Classifying research papers with the computer science ontology. In: In International Semantic Web Conference (P&D/Industry/BlueSky). CEUR Workshop Proceedings, vol. 2180. (2018).
[40]
Salatino, A.A., Osborne, F., Birukou, A., Motta, E.: Improving Editorial Workflow and Metadata Quality at Springer Nature. In: The Semantic Web – ISWC 2019. Springer Verlag (2019).
[41]
Charlin, L., Zemel, R.S.: The Toronto Paper Matching System: An automated paper-reviewer assignment system. (2013).
[42]
Cano-Basave, A.E., Osborne, F., Salatino, A.A.: Ontology forecasting in scientific literature: Semantic concepts prediction based on innovation-adoption priors. In: Knowledge Engineering and Knowledge Management. pp. 51–67 (2016).
[43]
Mannocci, A., Osborne, F., Motta, E.: The Evolution of IJHCS and CHI: A Quantitative Analysis. International Journal of Human-Computer Studies. (2019).
[44]
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research. 3, 993–1022 (2003).
[45]
Monaghan, F., Bordea, G., Samp, K., Buitelaar, P.: Exploring Your Research: Sprinkling some Saffron on Semantic Web Dog Food. In: Semantic Web Challenge at the International Semantic Web Conference (Vol. 117, pp. 420-435) (2010).
Article and author information
Cite As
A.A. Salatino, T.Thanapalasingam, A. Mannocci, A. Birukou, F. Osborne & E. Motta. The computer science ontology: A comprehensive automatically-generated taxonomy of research areas. Data Intelligence 2(2020), 379-416. doi: 10.1162/dint_a_00055
Angelo A. Salatino
A.A. Salatino and T. Thanapalasingam designed and developed the resource, wrote the paper, and reviewed drafts of the paper.
Angelo A. Salatino is a Research Associate at the Intelligence Systems and Data Science (ISDS) group, at the Knowledge Media Institute (KMi) of The Open University, UK. He obtained a PhD, studying methods for the early detection of research trends. In particular, his project aimed at identifying the emergence of new research topics at their embryonic stage (i.e., before being recognized by the research community). Currently, he is mainly working on: i) new technologies for classifying scientific papers according to their relevant research topics, and ii) how the research output of academia fosters innovation in the industry. His research interests are in the areas of Semantic Web, Network Science and Knowledge Discovery technologies, with focus on the structure and evolution of science: Science of Science. His academic record is available at https://salatino.org.
Thiviyan Thanapalasingam
A.A. Salatino and T. Thanapalasingam designed and developed the resource, wrote the paper, and reviewed drafts of the paper.
Thiviyan Thanapalasingam is a PhD candidate at the University of Amsterdam, The Netherlands. Under the supervision of Professor Paul Groth, Thiviyan is studying graph embedding methods for rapidly constructing Knowledge Graphs for the natural sciences and adapting them for downstream applications, such as query answering and text summarization. Before starting his doctoral research, Thiviyan obtained a first-class honours Masters degree in Chemistry from the University of Leicester and then, he worked as a Research Assistant within the Intelligent Systems and Data Science (ISDS) Group at the Knowledge Media Institute (KMi) of The Open University.
Andrea Mannocci
A. Mannocci analyzed the data, wrote and reviewed the paper.
Andrea Mannocci is a Research Fellow at the Institute of Information Science and Technologies (ISTI) of the Italian Research Council (CNR) in Pisa. He currently works as a data scientist within the framework of the EU project OpenAIRE Advance. His research interests span from the development of novel metrics and impact indicators for Open Science, to Science of Science, complex networks and the analysis of research as a global-scale phenomenon inserted in a delicate socioeconomic and geopolitical context. Previously, he was a member of the Scholarly Knowledge Mining, Modelling and Sense Making (SKM3) at the Knowledge Media Institute (KMi) of The Open University, UK. He obtained his PhD degree in Information Engineering from the University of Pisa, Italy, researching on systems for data flow quality monitoring in data infrastructures.
Aliaksandr Birukou
A. Birukou wrote and reviewed drafts of the paper.
Aliaksandr (Alex) Birukou is working as an Editorial Director in Springer Nature. His team in Computer Science (CS) Editorial in Heidelberg is publishing the conference proceedings in CS (about 850 volumes/year, including the LNCS series). Alex's other team running the portfolio of about 200 journals in different disciplines, translated from Russian into English. Apart from editorial work Alex represents CS in several internal and external research and development (R&D) projects dealing with optimization or innovation of scientific publishing. Most recent projects include: persistent IDentifiers for conferences (Alex chairs the Crossref/DataCite group: https://www.crossref.org/working-groups/conferences-projects/); intelligent dashboards for editorial (automated assessment of conferences); OCS/EquinOCS (conference submission management system). Previously, Alex founded lod.springer.com, which later became a part of Springer Nature SciGraph. Alex is also a co-founder of ConfRef.org, which got Digital Science Catalyst Grant and has the ambition of becoming Google for scientific conferences. He enjoys teaching courses about publishing and publishing innovation in the University of Trento, Italy, and People's Friendship University, Moscow, Russia. You can see his academic record at https://scholar.google.com/citations?user=ilAhtBgAAAAJ&hl=en.
Francesco Osborne
F. Osborne and E. Motta designed the resource, wrote the paper, and reviewed drafts of the paper.
Dr. Francesco Osborne is a Research Fellow at the Knowledge Media Institute, The Open University, UK, where he leads the Scholarly Knowledge Mining team. He received his PhD degree in Computer Science from the University of Torino, Italy. He has authored more than 70 peer reviewed publications in the fields of Information Extraction, Knowledge Graphs, Science of Science, Semantic Web, Research Analytics, and Semantic Publishing. His work on innovative solutions for scholarly analytics has attracted extensive interest from the industrial sector, leading to funding from both Springer Nature and Elsevier, the top two international academic publishers.
Enrico Motta
F. Osborne and E. Motta designed the resource, wrote the paper, and reviewed drafts of the paper.
Enrico Motta has a PhD in Artificial Intelligence from the UK’s Open University, where is currently a Professor in Knowledge Technologies. He also holds a professorial appointment at the University of Bergen in Norway. In the course of his academic career he has authored over 350 refereed publications and his h-index is 67, an impact indicator that positions him among the top computer scientists in UK. His research focuses on large scale data integration and analysis to support decision making in complex scenarios. Among his recent projects, he has led the MK:Smart initiative, a £17.2M project that tackled key barriers to economic growth in Milton Keynes through the deployment of innovative data-intensive solutions. He is also currently working on new solutions for scholarly analytics and in particular he is collaborating with Springer Nature to develop new tools to improve both the quality and efficiency of editorial processes in the academic publishing industry. Prof. Motta was Editor-in-Chief of the International Journal of Human-Computer Studies from 2004 to 2018 and, over the years, has advised strategic research boards and governments in several countries, including UK, US, The Netherlands, Austria, Finland and Estonia.
Publication records
Published: Sept. 28, 2020 (Versions3
References
Data Intelligence