Decision tree

From UBC Wiki
Decision Tree
Wiki.png
CPSC 312
Section: 201
Instructor: David Poole
Email:
Office:
Office Hours:
Class Schedule:
Classroom:
Important Course Pages
Syllabus
Lecture Notes
Assignments
Course Discussion

Overview

Authors: Ryan Xu, En (Michael) Xie, Zifeng Liang

What is the problem?

Decision Tree is a machine learning algorithm used for classification and regression.

Before running the program, the following is required:

  • a CSV file that contains at least one sample
  • Each sample must contain at least one feature with numeric value
  • Labels of the sample must be discrete values

After running the program, the users can use the program to predict values by inputting a list of values that match to the sample format. For example, given the coordinate of the United State map, the program will predict whether it’s a red state or a blue state.

What is the something extra?

  • implement a CSV reader to load data
  • implement the prediction function
  • implement a model that can run recursively to build a decision tree based on the best split
  • finding the ideal parameter/hyperparameter for our model to get the best accuracy

What did we learn from doing this?

  • Use Haskell to implement a Machine Learning (ML) model that was originally implemented in Python
    • Unlike other languages, Haskell doesn't have for-loop or while-loop. the implementation of looping through a matrix or a list is dependent on recursive functions or build-in functions (e.g. map, foldr, foldl, etc.)
    • Unlike Python, it was challenging to do indexing and slicing matrices/lists in Haskell.
    • Writing the correct type definition of a function is crucial before implementing the function. When the type definition is wrong, it was challenging to debug the function even if the implementation may be correct.
    • Similarly, having the correct "type" and "data" declarations are important to chain functions together. Haskell is strict on types. If one type is wrong, it can cause a chain effect that whole program is corrupted.
    • Breaking the problem into small pieces is the key to implement Haskell programs. Ideally, one function should do only one thing. Once each subproblem is well defined, we can write a ML model with much fewer lines compared with the Python version.
    • I/Os in Haskell
    • Cabal for building and packaging Haskell libraries and programs
    • Using Data modules to process matrices and CSV files

References used:

Code Repository

Code for this project can be found at here.