CSV Data Manipulation Program

From UBC Wiki

CSV Data Manipulation Progam

What is the problem?

We wanted to explore the properties of Haskell that make it a feasible option for creating a data manipulation library and see why its features make it especially easy to do so. Using Haskell, we created a small program that provides much of the basic functionality as Excel in transforming and visualizing data.

What is the something extra?

A variety of Excel functions have been recreated in haskell:

select_row :: Int -> [a] -> a

select_column

sum_row

average_row

min_row

max_row

min_column

max_column

select_value

read_int

row_name

column_name

row_column_name

Row_column

compare_values

Compare_columns

Compare-rows

In addition to these data manipulation functions we have extended the program to include data visualization functions, as well as a user interface.

What did we learn from doing this?

We learned a lot about the features of Haskell that make it especially useful for data science. Haskell's strong type system requires the the programmer to define a type for each value, otherwise it will infer a precise type to it. While programming we saw how this extra programming friction ensures type safety. Since Haskell is statically checked, the types for each value are known at compile time which made debugging a lot easier as any mismatches were brought up (and had to be fixed) before the program runs. We also found encoding mathematical concepts is a lot more intuitive and simple to code in Haskell than in other languages. All of these features allowed us to make our code clean and concise. This project helped us extend our knowledge of parsing and manipulating data from files and we gained more experience in creating a full program from scratch.

Links to code etc

https://github.com/austinparkk/CPSC312Project