Oil and Gas. Equinor(formerly Statoil ASA) is a Norwegian multinational oil and gas company. One of the common tasks for geologists at Equinor is to find new exploitable accumulations of oil or gas in given areas by analyzing data about these areas in a timely manner. However, gathering the required data is not a trivial task since it is stored in multiple complex and large data sources, including EPDS, Recall, CoreDB, GeoChemDB, OpenWorks, Compass and NPD FactPages. Construction of the right queries is not possible for the Equinor geologists, so they have to communicate their information needs to IT specialists who then turn them into SQL queries. This drastically affects the efficiency of finding the right data to back decision making. The work of [52] describes how the data access and integration challenges in Equinor have been addressed by adopting the VKG-based system Optique [63], which relies on the following tools: (1)the bootstrapper BootOX to create ontologies and mappings from relational databases in a semi-automatic fashion; (2)the VKG system Ontop to perform query reformulation; (3)the federator Exareme to evaluate the reformulated queries over the federated DBs; and (4) the query formulation module OptiqueVQS to support query construction for engineers with a limited IT background.
Machine Diagnoses. Siemens Energy runs several service centers that remotely monitor and perform diagnostics for several thousand appliances, such as gas and steam turbines, generators and compressors installed in power plants. For performing reactive and predictive diagnostics at Siemens, data access and integration of both static data (e.g., configuration and structure of turbines) and dynamic data (e.g., sensor data) are particularly important but very challenging. The work of [53] addressed these data access requirements by using the Optique platform as a VKG solution, similar to the Equinor use-case.
Government and Public Administration. The Italian Public Debt Directorate is responsible for various matters, such as issuance and management of the public debt, and analysis of the problems inherent to its management. The Directorate is organized into offices that deal with specific aspects, and each sub-unit has an understanding of a particular portion of the public debt domain. However, a shared and formalized description of the relevant concepts and relations in the whole domain was missing, since data were managed by different systems in different offices, and their structure had been heavily modified and updated to serve specific application needs. There was a clear need to coordinate and integrate the data of the various sub-units. The work of [54] presented a project for addressing this issue. They developed the Public Debt Ontology to formalize the whole domain of the Italian public debt. The VKG system Mastro Studio has been used to provide a comprehensive software environment. Users can take advantage of the wiki-like documentation of the ontology to access both its graphical representation and its OWL2 specification.
To promote more transparent and inclusive governance in the Tuscany region, SIRIS Academic, a small Spanish company specialized in providing data management solutions, has developed Tuscany’s Observatory of Research and Innovation portal [55]. They integrate Open Data on the Higher Education & Research field, including official Italian student and researcher data coming from the Ministero dell’Istruzione, dell’Università e della Ricerca (MIUR), and European data on FP7 and H2020 research projects. They follow the VKG approach and use the platform University Analytics (UNiCS) developed by SIRIS Academic. The platform uses Ontop to integrate open data repositories and to make them available via a dedicated SPARQL endpoint. Then the platform shows the data as an interactive dashboard hosting data visualisations, which are fed by the underlying UNiCS SPARQL endpoint.
Over the last 200 years, countries have replaced their constitutions on average every 19 years, and some have amended them almost yearly. A basic problem in the drafting of these documents is the search and analysis of model text deployed in other jurisdictions. In the Constitute Project, Ultrawrap was used to integrate the world’s constitutions into a single unified semantic endpoint for contextual searching. The project was launched at the General Assembly of the United Nations in 2013 and continues to integrate over 196 current databases of all of the world’s constitutions on the Web. Countries throughout the world can take advantage of this free service to modify and develop their constitutions.
Cultural heritage. Historians, especially in Digital Humanities, are starting to use new data sets to aggregate information about history. These are collections of data, information and knowledge that are devoted to the preservation of the legacy of tangible and intangible culture inherited from previous generations. In the project Production and distribution of food during the Roman Empire: Economics and Political Dynamics (EPNet), the work of [56] presents a framework that eases the access of scholars to historical and cultural data about food production and commercial trade system during the Roman Empire, distributed across different data sources. The proposed approach relies on the VKG paradigm to integrate the following data sets: (1)the EPNet relational repository, (2)the Heidelberg Epigraphic database, and (3)Pleiades, an open-access digital gazetteer for ancient history. An ontology provides to the historians a clear point of access and a unified and unambiguous conceptual view over these data sets.
Maritime security. The maritime security domain presents a need for efficient combining and processing of dynamic (real-time) and static vessel data that come from heterogeneous sources. The project Real-time Services for the Maritime Security (EMSec) needed to integrate static, real-time and geospatial data, including (1)static vessel metadata, (2) open data like GeoNames and OpenStreetMap, (3)large radar and satellite images, and (4)real-time vessel data (approximately 1,000 vessel positions are acquired per second). To address this objective, the system Real-time Maritime Situation Awareness System (RMSAS), which relies on the VKG technology, has been developed [57]. RMSAS uses Ontop (with the Ontop-spatial extension) to expose the data mentioned above as SPARQL endpoints. The Web-based tool Sextantis then used to visualize the results on temporally-enabled maps combining geospatial and temporal results from different (Geo)SPARQL endpoints.
Manufacturing. Digitalization in the manufacturing domain requires information models describing assets and information sources of companies to enable the semantic integration and interoperable exchange of data. The work of [58] reports on a case study where, for a global manufacturing company, an information model using semantic technologies is proposed. Three types of data were of particular interest in the project: (1)sensor data, (2)the Bill of Materials, and (3)data from the Manufacturing Execution System. The information model is centered around machine data and describes all relevant assets, key terms and relations in a structured way, making use of existing as well as newly developed RDF vocabularies. In addition, it comprises numerous RML mappings that link different data sources required for integrated data access and querying via SPARQL. The technical infrastructure and methodology used to develop and maintain the information model is based on a Git repository and utilizes the development environment VoCol as well as Ontop.
Healthcare. Semantic interoperability is essential when carrying out post-genomic clinical trials where several institutions collaborate, since researchers and developers need to have an integrated view and access to heterogeneous data sources. The work of [59] presents how to query clinical data in HL7 RIM based relational model using the Morph system. It presents a solution that uses an ontology based on the HL7v3 Reference Information Model and a set of R2RML mappings that relate this ontology to an underlying relational database implementation, and where morph-RDB is used to expose a SPARQL endpoint.
Improving healthcare for people with chronic conditions requires clinical information systems that support integrated care and information exchange. The adoption of an approach based on semantic information simplifies the use of multiple and diversified Electronic Health Records (EHRs). Within the work described in [60], a Diabetes Mellitus Ontology (DMO) has been developed, and has been used to diagnose patients with diabetes, and automatically identify them by analyzing EHRs. Specifically, by using Ontop, the EHR data from a general practice (with almost 1,000 active patients) could be queried via SPARQL. The accuracy of the algorithm for automatic identification of patients with diabetes was validated by performing a manual audit of the EHRs, and considered good enough for the purpose. Not surprisingly, the accuracy of the automatic method was influenced by data quality, such as incorrect data due to mistaken units of measurement, unavailable data due to lack of or wrong documentation, and data management errors.
Also, Capsenta has reported that VKG technology has been deployed in the healthcare sector to help clinical investigators to increase procedure volume, to improve patient identification and to reduce IT resources.
Smart cities. Smart City applications rely on large amounts of data retrieved from sensors, social networks or government authorities. Open data and data from existing enterprise systems are two valuable resources. However, open data are often published in a tabular form with little or incomplete schema information, while enterprise applications typically rely on complex relational schemas. There is a clear need to make city-specific information easy to consume and combine at low cost, but this proves to be a difficult task. The work of [61] presents the system DALI, which exploits linked data to provide federated entity search and spatial exploration across hundreds of information sources containing open and enterprise data pertaining to cities. Ontop is used as the VKG solution, and mappings are created using a rule and pattern-based entity extraction mechanism to detect different kinds of entities. The DALI system has been evaluated in two scenarios: (1) data-engineers bring together public and enterprise data sets about public safety; (2)knowledge-engineers and domain-experts build a view of health and social care providers for vulnerable populations.
Log extraction in process mining. Process mining techniques are able to extract knowledge from event log data, which is often available in today’s information systems [64]. Process mining tools normally assume that the data to be analyzed are already organized in some specific textual (XML based) format, notably IEEE standard for eXtensible Event Stream (XES) for achieving interoperability in event logs and event streams [65]. However, in practice, many companies have already had their own IT infrastructure that maintains the data relevant for process logs, e.g., in standard relational databases, and hence in a form that is not compliant with the XES standard. To cope with this kind of problem, the approach proposed in [66] exploits a VKG based framework and associated methodology for the extraction of XES event logs from relational data sources. This approach is implemented in OnProm, which provides a complete tool-chain that (i) allows for describing event logs by means of suitable annotations of a conceptual model of the available data, (ii)exploits the Ontop system for the actual log extraction, and (iii)is fully integrated with the well-known ProM process mining framework. It has been tested in EBITmax, an Italian company that provides consultancy services in program management and business process management for small and large enterprises, and that has incorporated process mining to complement its standard consultancy services [62]. The experimentation has shown the added value and flexibility of an approach based on semantics for the semi-automatic generation of process logs from legacy data.