Decision tree
Decision Tree | |
---|---|
CPSC 312 | |
Section: | 201 |
Instructor: | David Poole |
Email: | |
Office: | |
Office Hours: | |
Class Schedule: | |
Classroom: | |
Important Course Pages | |
Syllabus | |
Lecture Notes | |
Assignments | |
Course Discussion | |
Overview
Authors: Ryan Xu, En (Michael) Xie, Zifeng Liang
What is the problem?
Decision Tree is a machine learning algorithm used for classification and regression.
Before running the program, the following is required:
- a CSV file that contains at least one sample
- Each sample must contain at least one feature with numeric value
- Labels of the sample must be discrete values
After running the program, the users can use the program to predict values by inputting a list of values that match to the sample format. For example, given the coordinate of the United State map, the program will predict whether it’s a red state or a blue state.
What is the something extra?
- implement a CSV reader to load data
- implement the prediction function
- implement a model that can run recursively to build a decision tree based on the best split
- finding the ideal parameter/hyperparameter for our model to get the best accuracy
What did we learn from doing this?
- Use Haskell to implement a Machine Learning (ML) model that was originally implemented in Python
- Unlike other languages, Haskell doesn't have for-loop or while-loop. the implementation of looping through a matrix or a list is dependent on recursive functions or build-in functions (e.g. map, foldr, foldl, etc.)
- Unlike Python, it was challenging to do indexing and slicing matrices/lists in Haskell.
- Writing the correct type definition of a function is crucial before implementing the function. When the type definition is wrong, it was challenging to debug the function even if the implementation may be correct.
- Similarly, having the correct "type" and "data" declarations are important to chain functions together. Haskell is strict on types. If one type is wrong, it can cause a chain effect that whole program is corrupted.
- Breaking the problem into small pieces is the key to implement Haskell programs. Ideally, one function should do only one thing. Once each subproblem is well defined, we can write a ML model with much fewer lines compared with the Python version.
- I/Os in Haskell
- Cabal for building and packaging Haskell libraries and programs
- Using Data modules to process matrices and CSV files
References used:
- https://wiki.ubc.ca/Random_Forest
- https://www.kaggle.com/code/prashant111/decision-tree-classifier-tutorial
- https://en.wikipedia.org/wiki/Decision_tree
Code Repository
Code for this project can be found at here.