Morse Code Decrypt

From UBC Wiki

What is the problem?

Given some input Morse code input, we want to be able to decrypt the code and return the corresponding alphanumeric message.

The main issues in the transmission and decryption of Morse code is intra-word and inter-word space, i.e. the space between letters and the space between words. To simulate those conditions, we will generate a Morse code string with no distinction between word space and letter space. Given those problems, we will have the decryptor also account for the following:

- Messages that are on repeatedly sent on a loop, and return only one copy of the an entire message instead of the repeats

- Inputs that could begin in the middle of a message, so the output cannot be returned starting from the same point as the original Morse code in its correct interpretation

- The program returns a list of possible interpretations of the message from a dictionary. The input "mangoandpineapple" should return ["mango and pineapple", "mango and pine apple", "man go and pineapple", "man go and pine apple"]

For the purposes of this project, we will leave out names and words found in the spoken language (e.g. "doesn't", "wanna", "lol") and focus on commonly used words.

What is the something extra?

As the decoder will need to account for the additional cases, it cannot act as a simple translator by taking in parsed segments of code and translating character by character. Instead, it will need to build and revise a message as it builds acquires more of the string, and ultimately cut off the message to be a singular copy of the intended message.

The Morse code translation will be stored in a BST. We will also create the dictionary by parsing a .txt file and storing it in either a Trie or a Ternary Search Tree after some experimentation with space usage. The program will need insert and search functions to decrypt the message.

It made sense for us to implement a BST to store the Morse code translation since Morse consists solely of '.' and '-' commonly referred to as "dit" and "dah". Therefore for a Char v, a MorseBST Node is either Empty or in the form Node v (MorseBST v) (MorseBST v). For a sequence '.-', we would start at the root and travel to the dit branch and then the dah branch to return the corresponding letter 'a'.

root
(dit) / \ (dah)
'E' 'T'
/ | | \
'I' 'A' 'N' 'M'

Our final version of the dictionary data structure is a Trie with a branch for siblings and a second for children. A Trie node is either Empty or in the form TrieNode v w (DicTrie v w) (DicTrie v w) where v is a Char and w is a Bool that indicates the end of a word. To store the dictionary, we first found a txt file of most used English words and parsed it as a list of strings to insert each word.

(sibling)
'b' --------------- 'd'
(child) | |
'a' --- 'o' 'a'
| | |
'b' 'x' 'd'
|
'y'

What did we learn from doing this?

Functional programming was appropriate for this project, considering how recursion simplified tree operations. We built more knowledge on tree data structures such as Tries and Ternary Search Trees, especially the different implementations of Tries, such as using Maps or list of Triples. It was also interesting to see how different implementations of our data structure would affect performance; our final implementation of a dictionary is able to store over 200k words, albeit given some time, whereas our first implementation would consume all memory available before even finishing.

Throughout this project, laying a framework of modularity was obvious, but crucial to having the overall program perform exactly as intended. Steady creation and debugging of the various functionality, such as converting messages both ways, trimming down messages into a single length, and breaking sentences into correct words and returning possible messages. Overall, our program has functionality that can be extended into translations for other languages, and simple encryption programs, through the various functions that have been implemented.

Relevant links

Morse Code Decrypt Github

https://en.wikipedia.org/wiki/Morse_code

https://en.wikipedia.org/wiki/Trie

https://en.wikipedia.org/wiki/Ternary_search_tree

https://raw.githubusercontent.com/eneko/data-repository/master/data/words.txt (Raw dictionary found on Unix and MacOS. Will be modified to reduce size and process numeric data)