Search Engine

From UBC Wiki

Authors: XinRang Zhang, Andrew Lin

What is the problem?

We will design a simple search engine, similar to the some web search engine like google.

It equiped with the function of judging the similarity between the input problems and the built-in database's articale and outputting according to the Rank.

We do not use crawlers, but manually grab some test articles/get articles from web api and insert into the knowledge base

This search engine also requires a certain degree of robustness, which means that similar words will output similar results.

Some explanations for match weight rule:

A word should be equivalent to its past tense, progressive tense and other grammatical forms

The weight of consecutive occurrences of query words in the article should be higher than their respective occurrences; the more consecutive words, the higher the weight

Tags and keywords have the highest matching weight

When a search term that exactly matches the keyword list appears, only the exact match result will be displayed

What is the something extra?

We will utilize a knowledge base to store webpages/articles or use an API to retrieve websites.

When we add an article to the knowledge base, we can enter one or several tags or keywords for the article.

Will check tags and keywords first, and then match the content of the article.

The user will have the option to choose which search result they wish to display, the response will be in the form of an HTML snippet.

What did we learn from doing this?

Prolog makes it very easy to integrate different types of APIs. There are many built-in functions within the libraries that we utilized that simplify many of the processes, such as converting JSON results to a dictionary. Whether or not functional programming is the best option for our project is debatable; many times, we found ourselves writing the code in a more imperative style, though the end result was still a functional program. Writing the program did make a lot of the tasks simpler, the code structure is easy to read and understand, and it helped us break the project into smaller chunks. A problem that did come up was trying to find a way to remove HTML tags from the JSON responses we were pulling from our api calls. As our responses were returned in the form of HTML snippets, we wanted to rid the response of the tags using regex expressions to make it look more visually appealing.

Links to code etc