Home

An increasing amount of data is now available on public and private sources. Furthermore, the types, formats and number of sources of data are also increasing. The data sources have many different levels and types of structuring. Techniques for extracting, processing and analyzing such data have been developed in the last few years for managing this bewildering variety based on a structure called a knowledge graph. In this website, the knowledge graph data model is introduced and specified mathematically. This data model has many advantages compared with existing data models that have been used for representing graph structures. One important advantage is that all statements are reified, which can reduce the cost and complexity of storage and retrieval for knowledge graphs that have rich semantics, such as provenance, units of measure, and uncertainty specifications. In spite of the added capabilities of the knowledge graph data model, one can efficiently store knowledge graphs in existing triple stores, and existing tools can be used with only minor modifications. This article also introduces a new data language, called KGSQL, which is specifically designed for the knowledge graph data model. The syntax and denotational semantics of KGSQL are both specified formally. Denotational semantics is is an approach of formalizing the meanings of programming languages and is especially well suited to specifying data languages. In addition KGSQL is specified using the notion of an institution from category theory, which can be used to express KGSQL in the DISTRIBUTED ONTOLOGY, MODEL, AND SPECIFICATION LANGUAGE™ (DOL™) of the Object Management Group.

Introduction to KGSQL