Course:CPSC312-2018-Happiness-NLP

From UBC Wiki

Authors:

Susie Chen, James Luo, Tom Lee

What is the problem?

We are taking a world happiness score dataset and doing a natural language processing application on it. We are using Python to parse the CSV then using Prolog to query it. We will first find the question phrase, then parse the constraints or nouns/parameters, parsing the verb group, checking the compatibility of types, and then finally composing a formal query in Prolog.

Sample Queries:

"What is the happiness rating for Germany?"

"What countries are in North America?"

"What countries are happy?"

"What is the family score of Argentina?"

"What is the highest gdp score country?"

Dataset can be found here: https://www.kaggle.com/unsdsn/world-happiness

What is the something extra?

Users will be able to perform complex queries that can return different types of output, such as "Which x _" returns the countries that meet the constraints.

Hence, we will be primarily taking advantage of difference list of Prolog to help process each query, as well as other query techniques to achieve these goals.

We also decided later on to explore the prolog testing suite in order to see what kind of tools they offer.

What did we learn from doing this?

We learned that NLP is not easy as it seems. A sentence can be broken into different parts that are helpful for us to determine what kind of questions that the user is trying to ask. In addition, a part might also encounter a nested word group (ex, a noun phrase within a relation phrase). The difficulty arose from distinguishing which part of the sentences fall into which appropriate word groups. We are able to take advantage of the Prolog's difference list to help achieve to goal of NLP. It simplified our job since we only need to declare which word should belong to which word group, and Prolog takes care of recursing, trying different combinations of words groups to answer user's questions.

In terms of testing in Prolog, it turned out to be a lot simpler. This may be due to the simplicity of Prolog, however the tool suite offered is very intuitive and is well documented to use. Testing made our lives a lot easier by not having to manual test whenever a feature is done.

Links to code

https://github.com/leesw98/countries.pl