Predictive Texts

Section:
Instructor:
Email:
Office:
Office Hours:
Class Schedule:
Classroom:
Important Course Pages
	Syllabus
	Lecture Notes
	Assignments
	Course Discussion
	[[Category:]]

Authors:

Justin Frank, Ricky Ma, Ian Ho

What is the problem?

Creating N-gram language models from given texts (i.e. published scientific papers, a person's tweets, Edgar Allan Poe poems) to analyze the frequency of words, word pairs, and longer word "grams".

What is the something extra?

Using word-level Markov chain text generation, the program will generate new text (1-5 sentences) based on the likeliness of a word appearing after another. This will be done with the N-gram language models created from the given texts. If user input is given, the program will also return a completed sentence that "speaks" like the given data.

What did we learn from doing this?

Haskell has very convenient ways to work with lists, even with large amount (1 million+ elements) of data.
The length function runs in linear time in relation to the size of the list, and isn't always the best when working very big lists.
The IO class and "do" syntax is very good for looping in the UI when we go back to the menu after particular functionalities have been performed.
Having informative typing saves a lot of debugging time and reduces runtime errors.

Learned how to profile Haskell code and use the Cabal build system, both of which were very useful for developing the program.

Functional programming was very suitable for this task. I think the best example of this is how generic many of the functions in our program are. We didn't get to make use of it, but our implementation is generic for any grams that implement Ord and Hashable, from just programming each function in a sensible manner. This program would have been much longer in any imperative language.

Links to code etc

https://github.com/ricky-ma/Impersonator