# Decision tree

Decision Tree
CPSC 312
Section: 201
Instructor: David Poole
Email:
Office:
Office Hours:
Class Schedule:
Classroom:
Important Course Pages
Syllabus
Lecture Notes
Assignments
Course Discussion

## Overview

Authors: Ryan Xu, En (Michael) Xie, Zifeng Liang

### What is the problem?

Decision Tree is a machine learning algorithm used for classification and regression.

Before running the program, the following is required:

• a CSV file that contains at least one sample
• Each sample must contain at least one feature with numeric value
• Labels of the sample must be discrete values

After running the program, the users can use the program to predict values by inputting a list of values that match to the sample format. For example, given the coordinate of the United State map, the program will predict whether it’s a red state or a blue state.

### What is the something extra?

• implement the prediction function
• implement a model that can run recursively to build a decision tree based on the best split

### What did we learn from doing this?

• Use Haskell to implement a Machine Learning (ML) model that was originally implemented in Python
• Unlike other languages, Haskell doesn't have for-loop or while-loop. the implementation of looping through a matrix or a list is dependent on recursive functions or build-in functions (e.g. map, foldr, foldl, etc.)
• Unlike Python, it was challenging to do indexing and slicing matrices/lists in Haskell.
• Writing the correct type definition of a function is crucial before implementing the function. When the type definition is wrong, it was challenging to debug the function even if the implementation may be correct.
• Similarly, having the correct "type" and "data" declarations are important to chain functions together. Haskell is strict on types. If one type is wrong, it can cause a chain effect that whole program is corrupted.
• Breaking the problem into small pieces is the key to implement Haskell programs. Ideally, one function should do only one thing. Once each subproblem is well defined, we can write a ML model with much fewer lines compared with the Python version.