Course:CPSC312-2021/SearchAutocomplete

From UBC Wiki

Search Autocomplete

Authors: Kevin Yuen Gee Lee, Yiang Mao

What is the problem?

We want to provide a Haskell library that can take care of search autocompletion. That is, given a prefix string and a library of english words, we will return a list of possible matches that the user may be in the process of typing. Our library will provide functions to enable the following:

  • Inserting a single search term to populate the library
  • Inserting a list of search terms to populate the library
  • Loading a library of words from a file
  • Returning a list of possible matches in the library given a prefix string representing the current search term. For example, given a prefix string "do" and a library containing ["do", "dog", "cat"], our library will return "[do, dog]". If no matches are found our library will return an empty list.

What is the something extra?

We will write a custom Trie data structure to implement this library efficiently. Our Trie data structure will represent a Trie node as an object with:

  1. a boolean property that represents whether this node represents the end of a dictionary word
  2. a Data.Map property that represents the children of this Trie node. This will be a map of Char to Trie

For example, a dictionary of the words [cat, car] would be represented in our Trie data structure pictorially as follows:

                             Root 
                              |
                              c
                              |
                              a
                            /   \
                           t     r

This is represented by the following recursive Trie Node structure:

Trie {
 False, 
 [
   ('c', Trie {
           False, 
           [ ('a', Trie {
                     False, 
                     [ ('t', Trie { 
                                True, 
                                []
                       }),
                       ('r', Trie {
                               True,
                               []
                       }),
                     ]
           })]
   })]
}

Given a definition for an empty trie:

emptyNode :: Trie
emptyNode = Node False Map.empty

One will be able to insert a word into the trie like this:

mytrie = insert "cat" emptyNode

Or, insert a list of words like this:

mytrie = insertList ["cat", "car"]

Give a words.txt file containing the following words:

cat
car

you can also load the entire words.txt file into a trie by running the loadDictionary function:

mytrie = loadDictionary "words.txt"

Once a dictionary has been populated we can get a list of autocomplete suggestions by running the autocomplete command with a given prefix and trie node containing the words that we have loaded:

autoComplete "c" mytrie

In our example, this autocomplete call will return the following list: ["cat", "car"].

What did we learn from doing this?

We built more knowledge about things such as: (1) List comprehensions, (2) Working with library functions like Data.Map, (3) Using recursion to accomplish looping. On top of these things, we also learned how to build a more complex functional program by piecing together smaller functional components. For example, consider the insert function defined here.

It takes a: (1) String representing a word and (2) Trie node, and inserts the word into the trie. It's made up of 4 smaller functions that handles different cases we can encounter in the recursion:

  1. We have not yet finished processing the string and there is no existing node representing the next character in the word. For this case, we want insert a new node into the trie. Thus we call insertNewNode.
  2. We have finished processing the string and there is no existing node matching the last character in the string. For this case, we want to insert a new node in the trie and mark that node as the end of a word. Thus we call insertEndNode.
  3. We have not yet finished processing the string and there is already an existing node matching the next character in the word. For this case, we want to continue moving down the trie. Thus we call traverseToNextNode.
  4. We have finished processing the insertion string and there is an existing node matching the next character in the string. For this case, we want to mark that node as the end of a word. Thus we call markEndNode.

We also learned about the haskell module system and used it to define a Helpers file that we were able to import into our main haskell file.

We thought that functional programming was very suitable for our project since traversing a trie is recursive in nature. Since functional programming languages like Haskell rely on a lot of recursion and treating functions as values, we felt it was perfect for this project. Furthermore, our program had no need for external side effects like printing, database operations or dealing with any web requests. Using a functional language like Haskell helped us enforce the predictability of our program as it's impossible to introduce side effects in this language i.e it's pure. A given input to a trie like "insertList ["cat", "car"]" will always return the same output. This predictable nature of writing our program in a functional manner also makes it easier to verify and debug.

Links to code etc

https://github.com/yiangMao/autocomplete