Course:CPSC312-2021/Course Suggester

From UBC Wiki

Authors: Caleb Kellett, Heather DeHaven

What is the problem?:

A course recommender system, where the user asks or states for a course related to a keyword or a subject and it finds UBC computer science courses related to the keyword or subject for the user. Additionally, the user may ask or state for a subject related to a specific UBC CPSC course.

What is the something extra?

We plan on taking the keywords provided by the user and using Wikidata to gather a better understanding of the subject the user provided. We then will use the description to find matches in the courses description. For example, if the user inputted "biology", we would go to the Wikidata entity page on biology, and analyze and compare its description on the website with course descriptions. If there are matching words, we would suggest the course. Furthermore, we look for matchings in entities that are in the "has part" relation of the subject the user provided. As an example, "biology" has part of "genetics". Our program then compares the words in the description of genetics with course descriptions and looks for matching there as well.

We also use a database of noun inflections to more accurately search for matches - for example, in the case of CPSC445, without changing the word 'gene' to 'genes', there would be no match, but with the inflected searches this is found.

What did we learn from doing this?

What is the bottom-line? Is functional programming suitable for (part-of) the task? Make sure you include the evidence for your claims.

Prolog is very suitable for a recommendation system like this and using NLP to interface with the user.

First, we learned how to create a simple interface with the user using NLP. Using Prolog we create a dictionary of nouns, verbs, and prepositional phrases to use to process specific sentences and questions. From processes the sentences and questions, we can add a constraint to find our matchings. Prolog makes NLP very simple and easy. With a dictionary, we are able to process the sentences and questions we need, using the words as atoms and breaking down the sentences grammatically.

Second, we learned how to clean data or strings with Prolog. The core of our program is matching the user's input with a course. The user gives us on word to find a recommendation. To do so, we needed to clean the course descriptions and descriptions on Wikidata, meaning we had to remove irrelevant words like "and" to avoid matchings on them. Prolog was quite suitable for this. Using split_string, we were able to take the descriptions (which were stored as strings in our course.pl and were strings when taken off wikidata) and put every word into a list while removing new lines, tabs, spaces, periods, commas, semi-colons, and hyphens. Split_string made this very simple. Furthermore, using maplist and string_to_atom, we were then able to map each string into an atom. Lastly, Prolog's subtract, gave us the power to remove atoms from the list that were irrelevant or non-descriptive words. We removed words like, "and", "for", "involving", etc... Prolog made this very simple and it is quite suitable for this type of task.

Third, once our descriptions were clean, we could begin matching.

For the simplest method of matching, looking if the user's input, like algorithms, exists in a course description. Prolog made it quite easy by using their member function.

After that, our program looks for matchings in the descriptions of the user's inputs with the course descriptions. There is a constraint here. Using wikidata, we needed a hardcoded map from the user's input, like biology, to the wikidata entity URI. Without this mapping, there was no way locate biology information in the triples (that are loaded using rdf at the start of the program). However, once we have a mapping, we then use rdf with schema with get the description (http://schema.org/description) to get the wikidata entity's description. After cleaning it, with used Prolog's intersection method to see if there are any matchings between the wikidata entity's description and a course description. Prolog's ability to use rdf and schema made it quite easy to get the description of the user's input (if we had a mapping of it to the URI on wikidata).

Additionally, we look for matchings with the entities on wikidata that are in the "has part" relation. We get their descriptions, clean it, and use intersection to find matchings, just as above.

Speaking on the downsides of Prolog, however, it doesn't seem very suitable to create a stand-alone project. It lends itself very well to finding the matches and searching, but the provided mechanisms for user interaction seem really lacking. It would be simple to create a front end for the program using Java or Javascript, assuming there is some API to use that would allow getting the output of the program easily, that would allow a much smoother interaction with the user, as well as output display and filtering.

Overall, we learned how to utilize NLP, how to clean data as strings, use rdf and schema to get data from wikidata, and recommend courses using word matching, all in Prolog.

To conclude, we found Prolog quite suitable for all of our tasks outside of user interaction.

Links to code

https://github.students.cs.ubc.ca/hdehaven/CPSC312PrologCourseSuggestor