Course:CPSC532:StaRAI2020:KnowledgeGraphs
Knowledge Graphs
Knowledge graphs contain and represent structural information about the world using a set of entities and a set of facts about them.
Authors: Lucca Siaudzionis, Maulik Parmar.
Abstract
A Knowledge Graph (KG) is a structured representation of facts that consists of entities, relationships, and semantic descriptions about them. Entities can be abstract concepts or objects in the real world. Relationships illustrate relations between entities, and they contain types and properties with a precise meaning. We can use the KG built from a set of facts to infer new knowledge about a set of entities and to highlight interesting relations. [1]
Introduction
A knowledge graph (KG) represent structural information about the world using entities and relations. It consists of a set of triples (or "facts") (head, relation, tail) in which the head and tail are elements of a set of entities Ε, and relation is in a set of relations Ρ. These triples are used to show the structural relationships between entities.
Examples include:
- (Alan Turing, BornIn, London)
- (London, CapitalOf, England)
- (Turing machine, ProposedBy, Alan Turing)
- (Alan Turing, SupervisedBy, Alonzo Church)
- (Alonzo Church, GraduatedFrom, Princeton University)
Entities and relations can be shown visually through a graph, in which each entity becomes a node, and each relation becomes a labeled directed edge. (See image on the side.)
Usually, information is presented in the world in more complicated ways, and it is necessary to transform it into triples in order to store it in a KG.
Example: To represent, Cristiano Ronaldo plays as a forward for Juventus and captains the Portugal national team.
We create a unique entity id ( cr7 ) to represent Cristiano Ronaldo, the soccer player.
prop(cr7, name, Cristiano Ronaldo)
prop(cr7, captains, Portugal National Soccer Team )
prop(cr7, club, Juventus )
prop(cr7, club-position, Forward)
prop(cr7, type, player)
Reifying Triples
Imagine the following two facts:
- "Air Canada flies from Vancouver to Ottawa"; and
- "Air Canada flies from Toronto to New York."
A naive implementation would make Air Canada the entity in both those triples. For example, (Air Canada, fliesFrom, Toronto) and (Air Canada, fliesTo, New York). The same would be done to the first flight. However, that would be problematic, as there would be no difference between the first and second flight, and one could mistakenly infer that Air Canada flies from Vancouver to New York.
In order to avoid problems such as these, we can reify our entries, which means to make an individual out of them. Our systems in the real world to this already -- there is a flight ID for the first flight, and a different flight ID for the second flight. For example:
- (AC338, fliesFrom, Vancouver);
- (AC338, fliesTo, Ottawa);
- (AC7654, fliesFrom, Toronto); and
- (AC7654, fliesTo, New York).
Now, each flight is explicitly different from each other, because they have been individualized with unique IDs. (Those are actually the IDs from real flights.) In order to inform that those two flights are provided by Air Canada, we add:
- (Air Canada, flies, AC338); and
- (Air Canada, flies, AC7654).
Famous Examples
Different major companies use knowledge graphs in different ways, depending on what the company specializes in. The following table contains informations on major industry use of KGs.
Developer | Purpose & Function |
---|---|
Microsoft | Uses knowledge graph for the Bing search engine, LinkedIn data & Academics. |
Knowledge graph is used as a massive categorization function across Google’s devices and directly imbedded in the search engine. | |
Develops connections between people, events and ideas, mainly focusing on news, people and events related to the social network. | |
IBM | Provides a framework for other companies and/or industries to develop internal knowledge graphs. |
Universality of Triples
While designing knowledge bases, we have to decide on which entities and relationships to represent. There are some guiding principles that are useful for choosing relations and individuals. We will try to illustrate this with few examples.
Example:
Suppose you decide that “cotton” is an appropriate category for classifying individuals. You could treat the name “cotton” as a unary relation and write that product y is cotton:
cotton(y)
You can ask what is cotton?
Ask cotton(X)
The X returned are the cotton individuals(products). With this representation, it is hard to ask the question, “What material is product y?”. In the syntax of definite clauses, you cannot ask
Ask X(y)
Because, in languages based on first-order-logic, predicate names cannot be variables. We could treat the material as individual and use the constant “cotton” to denote the material cotton. You can use predicate “material” where material(Ind, Val) means that physical individual Ind has material “cotton”. “Product y is cotton” can now be written as
material(y, cotton)
The world now consists of materials as individuals as well as products. Under this binary relation material, you can ask “What material is product y?” using the following query:
Ask material (y, X)
You can also ask “what product has material cotton?” using the query:
Ask material (X, cotton)
To make an abstract concept into an object is to reify it.[2] We reified the material cotton. But still, we will not be able to answer queries like which property of product “y” has value cotton? The answer to this query is “material”. To answer this type of query, you can further treat the material as an individual, and invent a relation prop and write “individual y is of material cotton” as
prop(y, material, cotton)
This representation allows us to answer all the above queries. The individual-property-value representation is in terms of a single relation prop where prop(Ind, Prop, Val) means that individual Ind has value Val for property Prop. All relations are represented as triples and thus this representation is also called triple representation. The first element of the triple is called the subject, the second is the verb, and the third is the object, using the analogy that a triple is a simple three-word sentence. We will write triple as
subject verb object
or
prop(subject, verb, object)
We can interpret prop relation in terms of directed graphs where triple is depicted as an arc between Ind node and Val node with an arc labeled with Prop between them. Such a graph is known as Knowledge Graph or Semantic Network.[2]
Triples are universal as they allow us to answer all the above queries.
Classes
Typically, an individual knows more about a domain than a database of facts. Primitive knowledge is the knowledge that is specified explicitly in terms of facts. Also, you know some general rules that can be used to derive new facts. Derived knowledge is the knowledge that can be inferred from other knowledge.[2]
A standard way to use derived knowledge is to put individuals into classes, and then give general properties to classes so that individuals inherit the properties of classes. A class is typically an intensional set, defined by a characteristic function that is true of members of the set and false of other individuals[2]. A class is the set of those actual and potential individuals that would be members of the class. A natural kind is a class such that describing individuals using that class is more compact than describing individuals without the class. For example, “birds” is a natural kind, because describing the common attributes of birds makes a knowledge base that uses “birds” more compact than one that does not use “birds” and instead repeats the attributes for every individual.
Class S is a subclass of class C means that S is a subset of C.
The relationship between types and subclasses can be written as a definite clause:
prop(X,type,C) ← prop(S,subClassOf,C) ^ prop(X,type,S)
Property inheritance occurs when a value for a property is specified at the class level and inherited by the members of the class.
Research and Tasks
We can use the structural information present in a KG to infer new information, which can in turn be used to answer questions. Google, for example, uses knowledge graphs to both answer queries and to show relevant searches, which themselves answer questions before they were asked.[3] The research field is split into four major areas: knowledge acquisition, temporal knowledge graphs, knowledge representation learning, and knowledge-aware applications.
Knowledge Acquisition
These tasks are mainly divided into:
- knowledge graph completion (KGC), which is the expansion of the KG using the current sets of entities and relations;
- relation extraction, a key field, where unknown relations are extracted from plain text and added to the KG; and
- entity discovery, which involves recognizing and disambiguating entities to be incorporated into a KG.
One of the tasks that falls under KGC is link prediction. At a high level, the task of link prediction is to figure out a model that, given a head entity eh, a relation r, and a tail entity et, estimates the the likelihood of the triple (eh, r, et) being true, and uses this likelihood to decide whether that triple becomes inferred knowledge or not. A good embedding and method to do link prediction is SimplE, described by Kazemi and Poole in “SimplE Embedding for Link Prediction in Knowledge Graphs”.
Temporal Knowledge Graphs
This field researches cases where temporal information is of the essence. It includes:
- temporal relational dependency, in we know some temporal order between the relations of an entity (for example, we know a person was born, then worked, then died);
- temporal information embedding, where the triples are extended into quadruples (eh, r, et, τ), with the element τ for time;
- temporal logical reasoning, in which logical rules are used for the temporal reasoning; and
- entity dynamics, reflecting temporal changes to the entities.
Knowledge Representation Learning
Knowledge Representation Learning (KRL) focuses on representing, encoding, and scoring parts of the KG. One of its tasks is to define scoring functions to measure the plausibility of certain triples existing in the KG, which is related to the task of link prediction.
Knowledge-Aware Applications
Knowledge-Aware Application tasks are used for miscellaneous problems faced by the industry in the real world. The main tasks are question answering and recommendation systems. An example of system that uses this is the aforementioned Google Knowledge Graph, which answers queries with a "knowledge box" and also suggests related searches to what the user is querying.
Annotated Bibliography
- "SimplE Embedding for Link Prediction in Knowledge Graphs" (Seyed Kazemi, David Poole).
- WikiData.
- "What is a Knowledge Graph?" (Neelam Tyagi).
- "Knowledge Graph – A Powerful Data Science Technique to Mine Information from Text (with Python code)" (Prateek Joshi).
- "What is a Knowledge Graph?" (Ontotext).
- "A Survey on Knowledge Graphs: Representation, Acquisition and Applications" (Shaoxiong Jin, Erik Cambria, Pekka Marttinen, Philip Yu).
- "Artificial Intelligence Foundations of Computational Agents"
- "Introducing the Knowledge Graph: Things, Not Strings" (Google).
To Add
Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.
|