# Course:CPSC522/Future Directions for Semantic Systems

## Ontology Search Engine

This page describes the basics of Ontology, Semantic Web and two ontology search engines - OntoSearch and WI OntoSearch.

Principal Author: Samprity Kashyap
Collaborators: Junyuan Zheng
Papers:
1) An ontology search engine based on semantic analysis : Mingxia Gao; Chunnian Liu; Furong Chen Information Technology and Applications, 2005. ICITA 2005. Third International Conference on Year: 2005, Volume: 1 Pages: 256 - 259 vol.1, DOI: 10.1109/ICITA.2005.68[1]
2) OntoSearch: An Ontology Search Engine Y. Zhang, W. Vasconcelos and D. Sleeman, In Proceedings The Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence (AI-2004), Cambridge, UK,2004[2]

### Builds on

This page builds on Ontology and Semantic Web.

### Related Pages

The related pages include Learning Ontologies for the Semantic Web and Understanding Semantic Web and Ontologies: Theory and Applications

## Abstract

An ontology is a formal, explicit specification of a shared conceptualization (Gruber, 1993). Ontology is a form of knowledge management and representation for a domain and enables knowledge sharing. It is considered as the backbone of the Semantic Web. For a given user requirement finding the suitable ontology is an important task. The current keyword based web search engines returns plain text for an ontology search query. These are difficult to visualize and understand. Moreover these web engines only consider the string value of the query. Hence they cannot provide optimum results for ontology searches. A lot of irrelevant data that just contains the keyword is returned. This page describes various concepts of ontology, semantic web and the possible solutions for ontology search. Two ontology search engines - OntoSearch and WI OntoSearch have been discussed and reviewed.

## Background

### Ontology

Ontology is defined as a data model that represents knowledge as a set of concepts within a domain and the relationships between these concepts. It is a way of managing knowledge. It captures the knowledge within an organisation as a model. This model can then be created by users to answer complex questions and display relationships across an enterprise. Today people have access to more data in a single day than most people had access to in a lifetime in previous decades. The problem with this development is that data is found in many different forms. All of this information covered in many different format makes it almost impossible to understand existing relationships between different data. Data should be representation in a way that allows the relationships to be discovered. Ontology captures data in a way that allows relationships to become visible.[3]
Ontology is made up of 2 main components: Concepts/Classes: Represented by ovals and Relationships: Represented by arrows

Figure 1: Representation of concept and relationship
Figure 2 : Example of class and relationship

In this example there are two classes Person and Organisation each representing a real world concept. There is also a relationship has employer. Together these classes and relationships can be combined to assert statements about the real world. Here the class Person is related to the class Organisation through the property has employer. We can use ontology to define real world relationships. For example: Todd is an instance of the class Person and Facebook is an instance of the class Organisation. Ontology captures the relationship between these instances. The combination of classes and relationships is known as a triple. A triple consists of a subject, a predicate and an object. In this example subject is Todd, predicate is has employer and object is Facebook
The two standards that govern the concept of ontology are :

1. RDF (Resource Description Framework): Resource is anything that has an identity. How does something get an identity? Uniform Resource Identifier (URI). We can think of URIs as namespace that is used in XML. Description is just a container holding several statements describing the resource. Framework is needed to enable humans and machines to make and understand statements. RDF defines some extra structure to triples.
2. OWL (Web ontology language): OWL allows you to describe far more about the properties and classes. For instance, If A isMarriedTo B OWL can indicate that this implies B isMarriedTo A. It can also say if C isAncestorOf D and D isAncestorOf E then C isAncestorOf B. OWL describes semantic relationships which normal programming isn't bothered about and is closer to AI research.

WebProtege is a good website for creating ontologies using OWL. We have created an ontology of Vehicles using WebProtege.

Figure 3: Can you identify who the Mystery Man is?

Within an ontology concepts are only defined in terms of their relationships to other concepts. For example -In Figure 3 we are describing a concept called Mystery Person. It has many relationships to other concepts including gender, date of birth, race, hair color and address. We can use this relations to define the Mystery Person. Individuals also known as instances or particulars are the base unit of an ontology, they are the things that the ontology describes or potentially could describe.
Ontologies are also easily extensible. We can add additional relationships which link person to concepts. They are an excellent alternative to source code. Many approaches capture knowledge and relationship established by different working groups as lines and lines of source code. This approach is extremely hard to manage and cannot easily adapt to changes in the environment. Ontology presents a new method in managing knowledge and capturing relationships. The history of artificial intelligence elucidates that knowledge is vital for intelligent systems. In many cases, better knowledge can be more efficient for solving a task than better algorithms. In order to have truly intelligent systems we need to capture, process, reuse and communicate knowledge. Ontologies support these tasks. [4]

### Semantic Web

”The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
–Tim Berners-Lee, James Hendler, Ora Lassila,The Semantic Web, Scientific American, May 2001
The Big Idea : A web that will evolve around a collection of knowledge. It will allow people to add what they know and also help them in finding answer to their questions. The unique feature of this web is having a structured form that can be read by both humans and machines.
Currently users search for data on the web by asking questions which are of the form Which documents contains these words and phrases?. The semantic web will show related items showing new relationships instead of word matching. For example : How does the weather effect the stock market? crime? birth rates?. Ontologies are considered one of the pillars of the Semantic Web.

Figure 4: Architecture of Semantic Web[5]

#### Architecture of Semantic Web

At the very bottom is unicode and URI(Uniform Resource Identifiers).URIs are nameplates for everything that is going to be capured on the web(URL identifies and locates and URI only identifies). Unicode is a common code language. On the top of unicode base is XML and the schemas that go with XML. XML is the common language that connects everything together. Using XML the semantic web will use RDFs. Then with RDFs we will be able to build an Ontology vocabulary with basic rules that will lie on top with Logic. We can use ontology to make computers act as if they understand the information they are dealing with. Basically all we are doing is setting up lots of relationships between the RDFs. With ontology we can bring in semantic that makes the meaning so clear that a computer can understand it. We can see from figure 4 that RDFs, Ontology, Logic and Proof form a digital signature. So we know that a document is what it says it is. That brings us to the top layer which is the Trust. For example we have an RDF saying Jane sells books. If we have this statement from a person we know very well we are going to trust it a lot more than an unknown third party. In the same way the computer should also be able to trust the source of information.[5]

Knowledge that is encoded in Semantic Web languages is different from the unstructured free text found on Web pages and also the highly structured information found in databases. We need to use a combination of techniques for effective indexing and retrieval for these semi structured information. RDF and OWL introduce aspects that are beyond those used in ordinary XML. This allows users to define terms (classes and properties), express relationships among them, and assert constraints and axioms that hold for well-formed data.

### Ontology Search Engine

Figure 5 : Search results for keyword food in Watson

Ontology Search Engines are engines for searching relevant ontologies in various formats like RDF, OWL or DAML. Queries are usually written as natural language keywords and results are ranked. Figure 5 is an example of results returned Watson Semantic Web Search for the keyword food. When we click on one of the links we get a file that looks like this- Click here to get the file. So ontology search engines return results in RDF, OWL or DAML format and not web pages.

## Content

### Problem Definition

A search engine is a system for retrieving documents and it helps in finding information stored in a system, such as on the World Wide Web, inside a network, or in a personal computer. The search engine allows us to ask for content meeting specific criteria and retrieves a list of items that match those criteria. Users specify keywords that match words in huge search engine databases and produced a ranked list of URLs and Web-pages in which the keywords have been matched.[6]

The traditional search engines based on keywords have problems when it comes to ontology search:

1. Visualization issues as the plain text of the ontology cannot describe the structure of the ontology in a clear manner
2. They do not check the semantic of search objects and view them as character strings
3. Usually a lot of irrelevant Ontology information will be returned to the user as they have the keywords somewhere in their files.
4. To get correct Ontology information, we have to input relevant concepts set as query words. The keywords based search engines cannot provide satisfying results for such query.

Ontology search engines have own features in comparison to other general search engines.The most important distinction between Ontology information and Web information is that Ontology has a semantic structure. Searching useful information and locating appropriate Ontology from WWW or Semantic Web is a vital task in Ontology research domain. Ontology search engines help in identifying the most useful and efficient result for our input query[7]. Locating suitable existing ontologies to capture the user-required information from the Internet and improving precision of Ontology search by semantic analysis are big challenges in the current research of the Ontology.[2][1]

### OntoSearch: An Ontology Search Engine

This paper discusses the work on OntoSearch which is like ontology Google. Google provides a powerful web search engine. The paper says that we can simply use the Google facility “filetype:” to limit the type of searching file. But when it comes to ontology searching, it has some issues. Ontologies are not always accessible for a specific topic or domain. Google returns links of relevant files but it is on the user to check if they are actually relevant. Finally Google searches files based on keywords supplied by the users and not semantics. OntoSearch is a combination of Google Web APIs and a hierarchy visualization technique. It can search for ontology files on the Internet and visualize them as hierarchies. OntoSearch system is based on Java, JSP, Jena and JBoss technologies.[2]

#### Design of Ontosearch

Figure 6: Overview of OntoSearch[2]

The applicability of visualization techniques for ontology searching on the Internet as a hierarchical view of ontology has been investigated in this paper. It is a good way to give the user a quick overview of the selected ontology. A visualization tool : OntoSearch was developed which combined the Google search API and the RDFs ontology (hierarchy) visualization technology. This tool helped the user in searching for relevant (based on keywords) ontology files on the Internet. It also displayed the files in a visually understandable format —a hierarchy tree. The hierarchical view enabled users to review the structures of the ontology files and select the relevant ontology files quickly.
Figure 6 shows the working of OntoSearch. The rectangles in the figures represent processes and the ovals represent data/information. The working is as follows:

1. The user inputs keywords to OntoSearch to describe the nature of the ontology required
2. OntoSearch applies the Google Web APIs to search the internet for relevant RDFs files and returns a list of the URLs on the screen
3. The user chooses few of the returned RDFs files and displays their structure, and decides which of the files are relevant.
4. The user can select the relevant files in a hierarchy tree view and save them on local disk

### WI OntoSearch : An Ontology Search Engine Based on Semantic Analysis

The Ontology search engine introduced in the previous paper only considers semantic structure visualization. It dose not change essence of keywords based search engines. [8] elaborates on an intelligent search tool---- TUCUXI. This tool captures the semantics of Web pages through linguistic tools like WordNet and returns appropriate results by matching structure. The tool considers semantic but it needs whole Ontology file as input information. So it is not suitable for users who need Ontology. Hence a new tool: "WI onto search" based on concepts-- weights vectors matching algorithm (CWVMA) has been proposed in this paper which considers semantic structure of Ontology information.[1]

#### CWVMA Algorithm

This paper considers semantic structure of Ontology information and proposes the algorithm : concepts-- weights vectors matching algorithm (CWVMA). The algorithm parses the input information and preliminary results based on keywords into set of concepts. It then creates weight vector according to the impact of set of concepts on the whole Ontology semantic. Finally it deals with corresponding weight vectors as resultant vectors as per concepts matching. We can filter irrelevant Ontologies and order remainder Ontologies based on the resultant vector’s sum. This algorithm presents the basis for a prototype system named WI OntoSearch, which is an Ontology search engine, designed and implemented in this paper. Some of the terms of the algorithm are as follows:

1. Weight vector: Ontology of n concepts is mapped into the vector ${\displaystyle (r_{1},r_{2},\dots ,r_{n})}$ by matching rule( process of determining correspondences between concepts) ${\displaystyle r_{1}\in [0,1]}$. The value of ${\displaystyle r_{i}}$ denotes the influence of the ${\displaystyle i^{th}}$ concept on whole Ontology semantic and is decided by matching rule. The vector ${\displaystyle (r_{1},r_{2},\dots ,r_{n})}$ is called weight vector.
2. Concepts--weights vectors: Concepts set is represented as ${\displaystyle (C_{1},C_{2},\dots ,C_{n})}$ , where ${\displaystyle C_{i}}$ is the ${\displaystyle i^{th}}$ concept of the Ontology. The corresponding weight vector is ${\displaystyle (r_{1},r_{2},\dots ,r_{n})}$. Together they are are named concepts--- weights vectors.
3. Benchmark Ontology: It is a standard by which other Ontology can be assessed or measured given an Ontology
4. Evaluating Ontology: An Ontology that needs to be compared with benchmark Ontology.

Factors that should be considered while measuring Ontology semantic are: concepts that compose Ontology and impact of these concepts on Ontology semantic. So concepts--weights vectors matching algorithm needs to determine weight vector according to matching rule. It then maps concepts set to get result vector. WI OntoSearch in this paper uses the algorithm with the setting one rule. This means that it sets constant 1 on every sub-weight of weight vector. However if the application requires discrimination between minute variations in influence of concepts on Ontology semantic, it must use a complex matching rule like the distance--root rule.
The fundamental idea of the algorithm is to approximate semantic similarity between the input message and preliminary keywords based results. Input message is benchmark Ontology and preliminary keywords based results are evaluating Ontologies. The algorithm is as follows:

 1  % Input: benchmark Ontology:Ontology1, evaluating Ontology:Ontology2
2  % Output: result vector  ${\displaystyle (r_{1},r_{2},\dots ,r_{n})}$
3  % parse Ontology1 and Ontology2 into ${\displaystyle (I_{1},I_{2},\dots ,I_{n})}$ and ${\displaystyle (C_{1},C_{2},\dots ,C_{n})}$ ;
4  % create corresponding weight vectors ${\displaystyle (t_{1},t_{2},\dots ,t_{n})}$ and ${\displaystyle (I_{1},I_{2},\dots ,I_{n})}$ by matching rules;
5
6  for all ${\displaystyle I_{i}}$ in ${\displaystyle (I_{1},I_{2},\dots ,I_{n})}$ do
7  compare ${\displaystyle I_{i}}$ with ${\displaystyle C_{j}}$ in ${\displaystyle (C_{1},C_{2},\dots ,C_{n})}$ ;
8  if ${\displaystyle I_{i}=C_{j}}$then ${\displaystyle R_{i}=t_{i}*r_{j}}$;
9  else ${\displaystyle R_{i}=0}$ ;
10 end if
11 end for
12 % output result vector ${\displaystyle (R_{1},R_{2},\dots ,R_{n})}$


${\displaystyle R_{i}}$ in result vector ${\displaystyle (R_{1},R_{2},\dots ,R_{n})}$ can express semantic similarity between some concept of evaluating Ontology and the ${\displaystyle i^{th}}$ concept in benchmark Ontology. Sum of result vector ${\displaystyle (R_{1},R_{2},\dots ,R_{n})}$ , that is ${\displaystyle \sum _{m}R_{i}}$ , can denote semantic similarity between evaluating Ontology and benchmark Ontology. So it can be seen as measure of similarity between evaluating Ontology and benchmark Ontology.[1]

Rank value in the paper is based on the value of similarity between benchmark and evaluating Ontology.

#### CWVMA Example

Figure 7: Example Ontologies[1]

Two Ontology snippets are presented in the Figure 7. Ontology 1 is the input benchmark ontology and ontology 2 is the evaluating ontology. Operations performed on them are as follows:

1. First they are parsed into concepts sets :(food, fruit, apples) ,(food, fruit-Vegetables, meats, fruit, apples).
2. The distance-root rule determines every sub weight of weight vector as per the distance from root concept (root node in Ontology tree). Weight vectors acquired by the distance-root rule are ${\displaystyle (1,0.75,0.5)}$ for the first ontology and ${\displaystyle (1,0.75,0.75,0.5,0.25)}$ for the second ontology.
3. The result vector ${\displaystyle (1,0.75*0.5,0.5*0.25)}$ (first term for food, second for fruit and third for apple) can be calculated from the above algorithm.

The result shows similarity for the word “food” in both evaluating Ontology( Ontology 2) and benchmark Ontology(Ontology 1) is 1 from the first term in ${\displaystyle (1,0.75*0.5,0.5*0.25)}$ and similarity between “fruit” in Ontology 2 and Ontology 1 is 0.75*0.5. In the same way the similarity between “apple” in Ontology 2 and Ontology 1 is 0.5*0.25. Value of similarity between evaluating Ontology and benchmark Ontology is :${\displaystyle \sum _{m}R_{i}=1.5}$[1]

#### Evaluation of WI OntoSearch

Figure 8: Input and number of results before and after filtering[1]

The authors searched about 4 billion web pages provided by Google Web Service through WI OntoSearch. A large number of experiments showed the algorithm can improve precision of Ontology search. The experimental results were evaluated on two criterion :

1. Recall : Percent of related messages on filtered results given an appropriate filter threshold
2. Precision : Percent of appropriate results for user on candidate results that satisfy a selected precision threshold

Figure 8 describes some of the results in the paper. We can clearly see that after applying the filter the number of relevant results has reduced. To check recall they performed random sampling. There were some noises present. When three words were inputted the precision is closed to 90%--100% compared to single word where average value of precision was 50%.Therefore we can conclude that precision increases with increase of input words.
The experiment data showed that concepts--weights vectors matching algorithm had a positive effect on filtering irrelevance Ontologies for owl/rdfs/daml file format and single/dual/three input words.

### Conclusion

Ontosearch does provide a visually appealing way to depict the ontologies. Otherwise The user cannot view the ontology in an understandable graphic format. They need to look browse through the ontologies as structured text files. This process takes up a lot of time and does not guarantee an optimum result. This is because the plain text of the ontology cannot depict the internal structure of the ontology in a precise way. In the OntoSearch paper it has been mentioned that For example, if we search in Google for “filetype:RDFs Food”, then Google will return all the RDFs files with the keywords “Food”[2]. We tried doing this in google search as well as advanced google search but nothing substantial came up. In advanced search there was no RDF filetype option. Maybe there were no available rdfs for food or google API for ontology search has changed as this paper was written back in 2004. The OntoSearch system described in the paper is quite simple. It can only search for one type (RDFs) of ontology file, and it only compares the user keywords with the contents of the ontology files wherever they occur. And so it matches indiscriminately the keywords both from concepts and comment fields. It only compares keywords and does not take into account semantics. The current UI and implementation of ontosearch seems quite different from the one mentioned in the paper.

The second paper builds on the first paper and takes into account semantics. They proposed the CWVMA to measure Ontology semantic and built WI OntoSearch. It improves precision of Ontology search. It also displays semantic structure of result Ontology to help user understand the Ontology structure. WI OntoSearch considers the setting 1 rule for the algorithm. The setting should be flexible so that the algorithm with different matching rules can satisfy different requirements. Another interesting and appealing work for this paper could be clustering Ontologies by the algorithm with complex mapping rules.

An updated paper on OntoSearch is available here. We could not find any updated paper on WI OntoSearch. Current accessible ontology search engines include Falcons Concept Search, OntoSearch, Swoogle and Watson.

## Annotated Bibliography

1. OntoSearch: An Ontology Search Engine
2. SpryKnowledge's channel : What is an Ontology?
3. Ontologies and Semantic Web
4. Peter WebExplorations : The Semantic Web - An Overview
5. D Mukhopadhyay, A Banik, S Mukherjee, J Bhattacharya: A Domain Specific Ontology Based Semantic Web Search Engine
6. R.Aravindhan , M Chitra A Review on Ontology Based Search Engine International Journal of Advanced Research in Computer and Communication Engineering Vol. 3, Issue 10, October 2014
7. R.Benassi, S.Bergamaschi, and M.Vincini, “TUCUXI: The InTelligent Hunter Agent for Concept Understanding and LeXical ChaIning”, IEEE/WIC/ACM Web Intelligence International Conference, Beijing, China 2004, pp249-255