Course:CPSC312-2021/HalGrep

From UBC Wiki

HalGrep

Authors:

  • Kyle Da Silva
  • Julian Mentasti
  • Reginald McDonald

What is the problem?

We would like to test how good of a language Haskell is for creating unix utility programs. To test this, we would like to build the grep program using Haskell. Our program will provide an easy way to perform full-text search on local files. To learn more about grep, run `man grep` in a linux terminal.

What is the something extra?

For something extra, we will implement a way to perform fuzzy string matching so that a user can get approximate results. For more information on fuzzy string matching, please see its Wikipedia page

What did we learn from doing this?

In this project we investigated whether functional programming (in particular Haskell) is good for writing Unix utility programs. The three main portions of the program were 1) User input and output on the command line 2) navigating the file system, and reading files 3) text matching. We will evaluate each part separately.

1. User IO on the Command line Originally we thought that this part of the project would be a weakness for functional programming. However the functional solution to the problem of accepting command line arguments and options was very good. We were able to create a tree of our arguments and options which in turn was created into a parser by a library. This makes for a very easy to read, and easy to extend. Having a full fledged parser also means that we can ensure the input is correct.

2. Navigating the File System

One problem we noticed while writing the file functionality of the program was that lazy evaluation caused a problem for us: It means that we had to make sure to open and close each file we worked with. Another problem that haskell had when reading files is that readFile chooses its character encoding based on environment variables which is unexpected, and challenging for a tool supposed to read all sorts of files. IO is instantaneous so never times out so you can't put it on its own thread.It is also the case that just being able to load a file and encode it does not mean we can always call it.

3. Text Matching

Functional programming was a good tool for text matching, however there were some particular problems with Haskell which caused difficulties. In particular fuzzy string matching, we used a bitwise algorithm and Haskell required importing a separate library where many other languages give you bitwise operations for free. Also, Haskell boxes all the variables so it was a lot of extra code to ensure you were using the right data type.

Overall we felt functional programming was an awkward fit for some common tasks of utility programs, such as dealing with files. We would recommend it if the particular utility has some core functionality that is otherwise well suited to functional programming. However in general functional programming does not stand out as great for Unix utility programs.

Links to code etc

https://github.com/reggiemcdonald/halgrep