KMP Pattern-matching algorithm

From UBC Wiki

Authors: Tom Mo, Shijia Su

What is the problem?

Given a string and a pattern to match, use the KMP pattern-matching algorithm to return all matching indexes.

What is the something extra?

  • We implemented a DP(Dynamic Programming) array which is the the longest prefix which is also suffix (LPS) in the Haskell for our algorithm
  • We implemented a brute force algorithm to compare its implementation complexity as well as runtime with KMP
  • A CLI with text file IO and error handling is implemented as the interface to test our algorithm

What did we learn from doing this?

  • The naive way of pattern matching is relatively quick to implement, because Haskell syntax is suitable for recursive programs.
  • The KMP way is relatively tricky, we need to generate a DP array and then use the recursion to discuss the scenario by having different conditions. However, after having DP array, the main algorithm is like 5-6 lines long, which is beautiful and elegant.
  • The hardest part for this project is to understand the KMP and to know when to drop character and how to add indexes. We spent a lot of time for the bugs.
  • By finishing this project, we got a lot of practice by managing pointers in the algorithm, which is what we need in java in recursive programs and also learnt about how to build an DP array in the H
  • Regarding runtime, conventionally, the KMP should have an upper bound of O(n+m) and the naive approach should have O(nm). Therefore, the KMP approach should be faster by a factor proportional to the pattern length. We calculated actual time for some cases and put the time on the readMe in the Github.
  • In conclusion, For the implementation complexity, the naive way is very easy compared to other programming methods. KMP is more abstract and requires a different way of describing the same algorithm. From our observation, haskell is generally very suitable for quick implementation of algorithms. The challenge of KMP arose from not thinking of the algorithm process functionally.
  • Runtime assessment can also be challenging, partly due to lazy evaluation. Wrapping pure functions inside IO actions gets around this issue.

Links to code etc