Course:CPSC312-2023-Decision-Trees

From UBC Wiki

Authors: Aung, Ethan, Timothy

What is the problem?

The goal is to investigate the feasibility of using Haskell in Machine Learning applications, specifically decision trees. The aim is to show that Haskell can be applied to ML models through the creation of a functional decision tree model that can read data from a file and make predictions on another file. Success will be determined based on the ability of this model to perform these tasks effectively. Our main features will include:

  • reading input both from cache and disk
  • transforming raw data into quantitative and nominal factors
  • training and predicting binary labels
  • customizable hyperparameters for the model

What is the something extra?

For our "something extra", will be the data transformation step. We currently assume only tidy data will be used for our model and that our data transformation will be able to identify factors from the data inputted. Perhaps, if time permits, we will also include some simple data tidying.

  • We implemented a file reader for CSVs that parses the data into a String matrix.
  • We created a parser that transform the matrix into a custom Dataframe class.
    • It allows for nominal, quantitative, and missing factors.
    • Allows for implementing tidying at a later date.
  • We included hyperparameters.
    • A depth parameter that effects the max depth of the tree.
    • A comparator function parameter that controls how the values are compared .

What did we learn from doing this?

  • We learnt that Haskell can be used for ML applications.
    • The type safety of Haskell made it really easy to debug.
    • Functional programming makes it easier to just write helper functions with abstractions.
      • We could use the inferred type to write the signature.
      • We knew about the type at all times so it was easy to tell where mistakes were made.
    • It is important to give each function only one functionality.
    • Making custom, recursive types is really easy.
    • Discovered the data structure and algorithms required to build a simple decision tree model

Links to code, etc.

GitHub Link : https://github.com/min2028/Decision-Trees