CPSC312-2019-DecisionTrees

From UBC Wiki

What is the problem?

There are various types of machine learning algorithms, we plan to implement the clustering algorithms, namely K-means which uses Euclidian distance as a measure for clustering. We also plan to run this algorithm with different cluster and range to test the processing time

What is the something extra?

We plan to also implement another clustering program, Hierarchical clustering which uses clustering on the basis of hierarchic nature of the data

What did we learn from doing this?

We implemented K-means and hierarchical clustering using Haskell, and in the process learned about these algorithms and how they are useful. We referred to a lot of medium and towards data science posts to understand the underpinnings of these algorithms to understand how to implement them. Not surprisingly, we were able to able to implement these algorithms in relatively small number of lines without compromising on the readability and the functionality of the program. This makes us believe that the task was well suited for functional programming. However, we also found out that k-means was a lot faster in the computation of the data compared to hierarchical clustering given the same data and parameters to work it. We diagnosed the issue to be the nature of the algorithm, k-means is math heavy whereas hierarchical clustering is either a top down or a bottom up approach.

Links to code etc [1]