Decision tree

Decision Tree
CPSC 312
Section:	201
Instructor:	David Poole
Email:
Office:
Office Hours:
Class Schedule:
Classroom:
Important Course Pages
	Syllabus
	Lecture Notes
	Assignments
	Course Discussion

Overview

Authors: Ryan Xu, En (Michael) Xie, Zifeng Liang

What is the problem?

Decision Tree is a machine learning algorithm used for classification and regression.

Before running the program, the following is required:

a CSV file that contains at least one sample
Each sample must contain at least one feature with numeric value
Labels of the sample must be discrete values

After running the program, the users can use the program to predict values by inputting a list of values that match to the sample format. For example, given the coordinate of the United State map, the program will predict whether it’s a red state or a blue state.

What is the something extra?

implement a CSV reader to load data
implement the prediction function
implement a model that can run recursively to build a decision tree based on the best split
finding the ideal parameter/hyperparameter for our model to get the best accuracy

What did we learn from doing this?

Use Haskell to implement a Machine Learning (ML) model that was originally implemented in Python
- Unlike other languages, Haskell doesn't have for-loop or while-loop. the implementation of looping through a matrix or a list is dependent on recursive functions or build-in functions (e.g. map, foldr, foldl, etc.)
- Unlike Python, it was challenging to do indexing and slicing matrices/lists in Haskell.
- Writing the correct type definition of a function is crucial before implementing the function. When the type definition is wrong, it was challenging to debug the function even if the implementation may be correct.
- Similarly, having the correct "type" and "data" declarations are important to chain functions together. Haskell is strict on types. If one type is wrong, it can cause a chain effect that whole program is corrupted.
- Breaking the problem into small pieces is the key to implement Haskell programs. Ideally, one function should do only one thing. Once each subproblem is well defined, we can write a ML model with much fewer lines compared with the Python version.
- I/Os in Haskell
- Cabal for building and packaging Haskell libraries and programs
- Using Data modules to process matrices and CSV files

References used:

https://wiki.ubc.ca/Random_Forest
https://www.kaggle.com/code/prashant111/decision-tree-classifier-tutorial
https://en.wikipedia.org/wiki/Decision_tree

Code Repository

Code for this project can be found at here.