Course:CPSC312-2023-Naive-Bayes

From UBC Wiki

Authors: Konstantin Mestnikov, Yegor Yeryomenko

What is the problem?

The Bayes' theorem is a simple yet powerful mathematical formula that allows predictions to be made based on prior beliefs and the likelihood of evidence observed. A Naive Bayes classifier is in the family of probabilistic classifiers that is used among other things in tasks such as spam prediction, recommendation systems and others. Here we will be using the algorithm for sentiment analysis for movie reviews to classify them as either a positive or a negative review. We will:

  • implement text preprocessing for the algorithm
  • implement the algorithm in Haskell
  • train and test the model using a publicly available dataset
  • implement a command-line interface where users can input strings of reviews and get the model's inference result

What is the something extra?

  • allow users to download the model parameters in a text file
  • re-train the model with various dataset sizes and compare their performance

What did we learn from doing this?

We acquired more practical experience in writing Haskell applications and Cabal projects. We learned a more in-depth about Input/Output, several data structures and their operations (Map,Set, etc). We also acquired a better intuition about cryptic Haskell compile errors and how to debug them. Also, not the least important, we got a taste of working with files, like reading the data to and from them. Finally, by implementing the simple Naive-Bayes algorithm we saw the potential in implementing machine learning algorithms in Haskell.

Links to code etc

https://github.com/yegory/NaiveBayes