Course:CPSC312-2019-Text-Prediction

From UBC Wiki

Authors: Gina Bolognesi, John Turkson, Andy Ma

What is the problem?

We implement a markov chain to roughly predict the weather based on the weather conditions of the past few days.

What is the something extra?

We will use API calls to collect weather data (from forecast.io), then parse the resulting JSON-formatted string in order to extract simple weather states for each day.

The Markov chain predicts by randomly sampling from the computed probability distribution over states.

What did we learn from doing this?

(This should be written after you have done the work.) What is the bottom-line? Is functional programming suitable for (part-of) the task? Make sure you include the evidence for your claims.

Random sampling is unexpectedly difficult in Haskell, as the randomness means that the function will not always give the same output given the same input, so it is no longer a pure function. The markov chain still needs to be tested at scale, but the lazy nature of Haskell seems to be well suited to computing the transition table as it will only compute the transition probability of interest rather than populating the whole table. Memoization may be a useful addition if we routinely plan to to predict multiple future states.

Making API requests turned out to be logical and surprisingly comparable to how they are usually done in imperative programming languages. The use of do blocks makes each successive action simple to reason about, and although we did get confused at Functors and types for a bit, making HTTP API requests turned out to be quite doable in Haskell. We found the use of IO interesting, seeing how Haskell blends the pureness of functions with the mutable state of many things that exist in the real world, like real-time weather data. Having said this, we think that Haskell is very suitable to querying websites and obtaining data, especially with its parallelisms to other programming languages. The elimination of complicated loops and most mutable state makes the behaviour of processed queries easier to reason about, and apart from different mechanisms and semantics for dealing with 400 response codes, we find that Haskell is well-suited to making HTTP/API requests.

Besides Markov chains, parsing weather data made up for the remaining major parts of the project. We found that Haskell is a very suitable tool for this, mostly due to pattern matching and lazily-evaluated functions. We found this especially useful when trying to filter a larger dataset for specific keys. We were impressed at the low memory footprint that Haskell had even when parsing data consisting of thousands of lines, as well as the overall speed in which Haskell did so. Haskell is an effective tool for processing data, due to the inclusion of (lazy) map, filter, and reduce functions (among others), to quickly narrow down data. This is a very useful addition especially if large parts of the file does not need to be processed, which was the case in this project. This is a large advantage over some non-functional languages that have to iterate through the entire collection of items/dataset during preprocessing, whereas Haskell is able to process subsets of the data that match a certain condition, for example. The only gripe we had about Haskell is the lack of (a built-in) data structure that provides constant-time index-based access of its elements. As versatile as Haskell's built-in Lists are, it would have been useful to quickly access the middle of the list, as well as traverse the list in reverse order from an arbitrary point. This points only detract slightly to Haskell's ability as a dataset parser, as other algorithms can be used to take advantage of Haskell's laziness and built-in types. Overall, we find that Haskell is an excellent tool to process data, especially if not all the dataset or a small subset of the data needs to be processed. Its biggest advantages are the language's built-in pattern matching, which allows one to concisely search and query the dataset, and Haskell's lazy list processing functions, which allow for only the minimum amount of the dataset to be processed, only when needed.

GUI in Haskell is more difficult than it needs to be, in our opinions. Granted, this was mostly due to installation issues and sparse information online about the documentation as well as how to use GUI packages, but nonetheless we encountered issue after issue of trying to get a rudimentary GUI working in Haskell. However, we do realize the potential that Haskell has for the web, with having the front end created and managed by some of the more popular web frameworks out there, while Haskell processes actions in the backend.

Links to code

https://github.com/ginabolo/haskell-markov-complete