Course:CPSC522/Knowledge Graphs

From UBC Wiki

Knowledge Graphs

Knowledge graphs graphically represent information about the world and can be useful for many applications.

Principal Author: Sarah Chen
Collaborators:

Abstract

For intelligent systems to be able to reason about the world, it follows that they should have a representation of it. Knowledge graphs are one such method for doing so that have become increasingly significant and commonplace as many companies adopt them. Knowledge is represented in a directed graph where nodes are entities and directed edges are relations. In this article, knowledge graphs are discussed more thoroughly as well as their construction, representation, and the important problem of completion. Lastly, some applications of knowledge graphs are addressed that demonstrate their usefulness while acknowledging their challenges and limitations.

Builds on

Knowledge graphs build off of Ontologies and Knowledge Bases.

Related Pages

Knowledge graphs can be used in Natural Language Processing.

Content

What is a Knowledge Graph?

Knowledge can be represented using , where prop is the relation. What this means is that an Individual has some Property with the specified Value. For example, means UBC is LocatedIn British Columbia. As it turns out, prop is the only relation needed, so in this article, the triple representation where prop is omitted, , will be used. Knowledge graphs represent triples as a directed graph. Specifically, Individual and Value are converted into nodes, and the Property becomes a directed edge from the Individual to the Value.

Shows how a triple can be represented as a directed graph.
Shows a graphical representation of the triple <UBC, LocatedIn, British Columbia>.

In the knowledge graph, Individuals and Values may also be referred to as entities. Furthermore, the Individual and Value can be referred to as the Subject and the Object or the head and the tail. The Property can be referred to as the relation. A knowledge graph can be formalized as . is a set of entities that can be the head or tail. is the set of relations between entities. is a set of facts, where a fact is a triple.

It is worth mentioning though that there is no standard definition of a knowledge graph. Some are more restrictive and others more general. The main characteristic of a knowledge graph is that it represents the data using a graph-based data model. The knowledge graph defined above is specifically a directed edge-labelled (del) graph. Other examples of knowledge graphs are heterogenous and property graphs. Nodes and edges in heterogenous graphs each have one type. A node that represents The Lorax then would just have type Book instead of a directed edge to another entity to indicate that it is a book. In property graphs, what would be represented as graphically is converted to properties of nodes or edges. For example, instead of in graphical form, there would just be the property Location: British Columbia for the node UBC. Heterogenous and property graphs have both advantages and disadvantages compared to a del graph. Del graphs form the focus of this page, so it should be assumed unless stated otherwise that the knowledge graph being referenced is a del graph. There is also debate about the differences between knowledge bases, ontologies, and knowledge graphs. One common view is that a knowledge graph is different from an ontology but essentially synonymous with a knowledge base. Towards a Definition of Knowledge Graphs has a good discussion of the debate and offers an alternative view of knowledge graphs.

Constructing Knowledge Graphs

Currently, knowledge graph construction relies heavily on manual methods. For some data, it is straightforward to convert to triple and then graph representation. For example, the fact "Dr. Seuss is the author of The Lorax" can be represented as . Triples may seem too complex for other data. Consider "The Lorax is a book". There are two possible triple representations. A relation Type could be used and then the Value is the type of the Individual which would give . Alternatively, Book could be the relation and the Value is true or false depending on whether the Individual is a book. For The Lorax, this would be . Some data can also seem too complex for triples. For example, a school could have a book fair scheduled to start at 9:00a.m. and end at 5:00p.m. on Tuesday in Room 7. This can be converted to triple representation though through reification where the book fair becomes an individual. Assume the book fair is represented by BookFair1. As triples, this data would be
,
,
, and
.
As knowledge graphs can be very large, manual construction may be expensive and time-consuming making automation important. This can be challenging though since data content can be unstructured or semi-structured in addition to being structured. As a result, semi-automatic construction has generally been focused on.
Also, alongside more general open-world knowledge graphs are more domain specific ones, such as for healthcare and education, which can have their own methods for knowledge graph construction.

Real World Knowledge Graphs

Knowledge graphs can be either open or enterprise. If they are open, their content is publicly available online. Examples are DBPedia, Wikidata, and YAGO. Enterprise knowledge graphs are internal to a company where they are used for various commercial applications. A portion of the Wikidata knowledge graph is shown below involving director Christine Choy.

Wikidata knowledge graph - Christine Choy

Knowledge Graph Representation

Knowledge graph representation or knowledge graph embeddings are important as they are used for knowledge graph completion, which is described in the next section, and other applications. The aim is to find low-dimensional representations or embeddings of entities and relations that preserve semantic meaning. There are many possible spaces for the embeddings such as the real vector space, the complex vector space, and the hyperbolic space. An encoding model specifies how the embeddings interact to express information in the knowledge graph. For example, linear/bilinear models apply linear or bilinear operations to the embeddings of the entities and relations. Other possible models include factorization models and convolutional neural networks. The embeddings are evaluated through the use of scoring functions. Auxiliary information, like images, can also be embedded alongside the knowledge graph which may help increase the information contained in the embeddings.

Some example embedding models
Model Embedding Space Scoring function
RotatE
TorusE
SimplE

Knowledge Graph Completion

Another important task is knowledge graph completion, since even large knowledge graphs, such as Freebase, can be incomplete. It may be possible to identify information new triples in the graph using the information already contained in the graph. For example, if Person A is married to Person B, Person B is married to Person A, so if the triple were in the graph, the triple could be added. On the other hand, if Person A was born in Canada, whether they are a citizen is not certain and its probability can increase or decrease based on Person A' s relations with other entities. Discovering these links is known as the link prediction task. It can be expressed as either when predicting the tail and when predicting the head.
A common approach to link prediction is to use the low-dimensional embeddings. Some score function measures the plausibility of facts. Higher scores mean the fact is more plausible and vice versa. As link prediction is a binary prediction task, since a link is either absent or present, the sigmoid function can be used to convert the score to a probability. Other methods have been used as well such as deep learning models. Knowledge graphs may also have paths that can be taken advantage of. A path is a set of triples where the tail of a triple is the head of the next triple in the path, such as and . The relational path considers only the relations. From the relational path BornIn-LocatedIn, could be inferred. However, embedding approaches tend to fail for this so deep reinforcement learning has also been used to do link prediction by finding relational paths.

Applications of Knowledge Graphs

Knowledge graphs were first popularized in 2012 by Google who used them for its search engine. They have found since then to be useful for other applications as well.

Web Search

Search engines can benefit from knowledge graphs by incorporating entity data. This may allow queries to be better represented. It can also increase the information contained in the document representation and by comparing entities in the query and document, the quality of the page ranking may improve.

Question-Answering

Knowledge graphs can be utilized in question answering (QA) systems. These can be simple questions which only involve one triple such as "Which country is Paris the capital of?" which would involve only looking at triples of the form . More complex questions could also be asked that involve reasoning over the knowledge graph known as multi-hop reasoning. Consider "Who co-authored with Person D?". Assuming, there is no co-authored with relation, the triples would have to be looked at to find relevant papers and then would have to be looked at for each paper found while ignoring ones that involve Person D.

Recommender Systems

Recommender systems exist to help users find various data of interest such as music, movies, and products. Adding side information has been explored to help improve recommendations and knowledge graphs are a promising way of doing so. In this case, knowledge graphs use the heterogenous data model. Knowledge graphs can represent information about both items and users and represent their interactions with relations between the two. They also potentially provide interpretability regarding the rationale for the recommendation. A movie recommender system for example may have a user that has watched many movies by one director which can be used to explain why a movie by that director was recommended.

Additional Applications

A detailed breakdown and additional applications of knowledge graphs are given in A Survey on Knowledge Graphs: Representation, Acquisition, and Applications.

Challenges and Limitations

Knowledge graphs have a lot of potential but they are not without considerations. As mentioned in the section Knowledge Graph Completion, knowledge graphs can suffer from a lack of complete information. As a variety of sources may also be used, false information can be added to the knowledge graph and it may be difficult to detect the source. This can hurt the accuracy and reliability of knowledge graphs. Aside from incorrect or absent information, sensitive information presents another risk since it can be used for malicious purposes. The large size of knowledge graphs can also result in machine learning that uses them to be computationally expensive. These challenges though do not mean that knowledge graphs should not be used, since there are possible remedies such as evaluating the data sources being used. Instead, awareness of the challenges and limitations of knowledge graphs is crucial and care and thought should be taken when using them.

Annotated Bibliography

[1] Poole D.L., Mackworth A.K. Artificial Intelligence: Foundations of Computational Agents, 2nd Edition. Cambridge UP, New York, 2017. http://artint.info/2e/html/ArtInt2e.html
[2] Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia D’amato, Gerard De Melo, Claudio Gutierrez, Sabrina Kirrane, José Emilio Labra Gayo, Roberto Navigli, Sebastian Neumaier, Axel-Cyrille Ngonga Ngomo, Axel Polleres, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan Sequeda, Steffen Staab, and Antoine Zimmermann. 2021. Knowledge Graphs. ACM Comput. Surv. 54, 4, Article 71 (May 2022), 37 pages. https://doi.org/10.1145/3447772
[3] S. Ji, S. Pan, E. Cambria, P. Marttinen and P. S. Yu, "A Survey on Knowledge Graphs: Representation, Acquisition, and Applications," in IEEE Transactions on Neural Networks and Learning Systems, vol. 33, no. 2, pp. 494-514, Feb. 2022, doi: 10.1109/TNNLS.2021.3070843.
[4] Bilal Abu-Salih, Domain-specific knowledge graphs: A survey, Journal of Network and Computer Applications, Volume 185, 2021, 103076, ISSN 1084-8045, https://doi.org/10.1016/j.jnca.2021.103076.
[5] Andrea Rossi, Denilson Barbosa, Donatella Firmani, Antonio Matinata, and Paolo Merialdo. 2021. Knowledge Graph Embedding for Link Prediction: A Comparative Analysis. ACM Trans. Knowl. Discov. Data 15, 2, Article 14 (April 2021), 49 pages. https://doi.org/10.1145/3424672
[6] Wang, P., Jiang, H., Xu, J. and Zhang, Q., 2019. Knowledge graph construction and applications for Web search and beyond. Data Intelligence, 1(4), pp.333-349.
[7] Y. Zhang, H. Dai, Z. Kozareva, A. J. Smola, and L. Song, “Variational reasoning for question answering with knowledge graph,” in Proc. AAAI, 2018, pp. 6069–6076.
[8] Q. Guo et al., "A Survey on Knowledge Graph-Based Recommender Systems," in IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 8, pp. 3549-3568, 1 Aug. 2022, doi: 10.1109/TKDE.2020.3028705.
[9] C. F. Draschner, H. Jabeen and J. Lehmann, "Ethical and Sustainability Considerations for Knowledge Graph based Machine Learning," 2022 IEEE Fifth International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Laguna Hills, CA, USA, 2022, pp. 53-60, doi: 10.1109/AIKE55402.2022.00015.

To Add

Put links and content here to be added. This does not need to be organized, and will not be graded as part of the page. If you find something that might be useful for a page, feel free to put it here.


Some rights reserved
Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-NonCommercial-ShareAlike 3.0. The full text of this license may be found here: CC by-nc-sa 3.0
By-nc-sa-small-transparent.png