Course:CPSC-312-2019-Course-Scheduler

From UBC Wiki

What is the problem?

We will create a scheduler, where given some UBC courses and some constraints, for example "I need 15 credits", "I would like my courses to be after 10am", and of course that no courses conflict, the system returns possible course selections that meet the user's criteria.

What is the something extra?

Our scheduler will have a simple natural language interface, and draw its data directly from the SSC course offerings.

What did we learn from doing this?

We found that Prolog is a great tool to handle Natural Language Processing. Its ability to search for solutions to possible parts of speech in sentences was paramount to implementing an expressive way for users to query the database of facts for courses. It was not only easy for the user to express queries but also for the programmer to describe the language that queries are made in. It was very simple and elegant to design and describe the domain of the queries being asked over; all of this can be expressed in module parts in the program - there is no need for multiple different files and modules that have complex structures describing rules about the queries that can be made. For example, the programmer is easily able to specify queries that operate either on courses or on sections, and the structure (different word combinations, etc.) that describe how to obtain information relating to these facts. These features made up a significant portion of this project - and without Prolog's innate ability to describe parts of sentences and grammar so expressively, it would have taken many more hours and much more complicated code to do something simple and intuitive. We hope to see some of these features natively supported in other, more popular programming languages, in order to integrate with and simplify existing code used for large-scale production purposes.

Another instrumental feature contained in Prolog is its built-in mechanism to find proofs to queries. This was a vital feature not only to the scheduling aspect of the project but also to the information-retrieving aspect of the project. Prolog made it extremely simple to write functions and predicates to search over courses, instructors, buildings, and sections due to abstracting the search mechanism away from the user and programmer. As a result, you will see no loops or if statements in our codebase - everything is a statement that either evaluates to true (returning a match, and information) or false. The fact that this functionality can be applied generally in Prolog (that is, able to be used without much configuration in many different situations/domains) allows for simpler, more readable code - within the project, we applied a standard "template" for functions that search over similar parts of the knowledge-base, reducing duplication while also increasing readability. We see the strength in Prolog in its abstractions that many modern languages can borrow from: just as Prolog has abstracted search and satisfiability, similar to how other languages like Julia have abstracted parallel processing and machine learning, other languages could benefit from not needing custom implementations that often end up cluttering the codebase. Furthermore, the search capabilities of Prolog are not limited only to scheduling, but can be applied to almost any topic, due to its generalized nature. This also makes it an attractive candidate to be ported to/used in place of other, modern languages at searching for solutions to constraint-bounded problems.

We were also impressed at the speed in which Prolog was able to parse, interpret, and find solutions to the (large) knowledge-base at hand. Our knowledge base totalled over 3000 rdf triples, consisting of course and section data, and Prolog was able to handle this without issue. Although this is not a very large amount of facts present, we anticipate that Prolog would be able to scale this number up to a much larger scale without issue. We even predict that Prolog would be better-suited for our previous project, a weather predictor, that handled between 50,000 - 100,000 weather datapoints at a time, to parse and query data to predict the weather than Haskell was. With this being said, a strength of Prolog lies within its performance when searching for results, and combined with its relatively ease of describing and expressing constraint satisfaction problems, would be well-suited for various applications, especially those involving large datasets.

Our initial thoughts of Prolog was that it was a somewhat limiting language - first in its syntax, and also in what it can do. We initially thought that it could only be used for searching for solutions efficiently, which fit the project well. But we soon discovered that Prolog can do much more than simply "search for solutions". It is much closer to a conventional programming language than we thought. We discovered packages for a wide array of tasks, such as web requests, parsing information, streaming to/from files, large-scale data processing, and many more. If we had discovered these capabilities sooner, it would have saved us some headache from writing a custom json-to-rdf parser in a separate program to pre-process our data for Prolog. We see the potential of Prolog to be used in, or alongside modern applications due to its extended capabilities - we theorize that Prolog would make a good candidate as web servers or databases - being able to both serve a large amount of connections at a time, while also being able to maintain a huge library of facts - an ideal candidate for processing large amounts of data. Moreover, with its ability to natively be incorporated with Java through JPL, we see Prolog as an ideal contender to SQL itself, having simpler, more understandable queries, while also being able to be used alongside object-oriented code.

However, there are some minor inconveniences involving Prolog that we noticed. Prolog isn't suitable for computing joins over large datasets; attempting to build a schedule where both courses and sections are variables subject to other constraints will result in stack overflows and excessive runtimes. This is not very surprising, as the possible combinations of explore becomes factorial. A possible solution would me to merge the entities, and simply fold the attributes of courses into each of the sections of that course, so as to avoid computing the join. It is possible to run a query over one of the two entity types at a time, the only issue being that requesting another answer after exhausting the available answers will hang, as it takes a long time to exhaustively verify there are no further possible answers.

We found that the order in which predicates are defined in the body of a rule defines the the order in which they are evaluated, and when dealing with large datasets, the order can make a huge difference in the runtime and even be the difference between completing or freezing. This mars Prolog's focus on elegance and logic somewhat, as we need to think about search space optimize our queries are we write them.

An issue that we encountered was that Prolog was generating a very large amount of duplicate schedules, especially when scheduling multiple courses (6+) at a time. We found this difficult to prevent, needing to refactor most of our query structures as well as the order in which predicates are evaluated. From our research in mitigating this problem, we discovered that it is also quite simple to write meta-interpreters for Prolog, allowing any user to modify or extend the semantics of Prolog code itself. As different issues require different domain-specific tools, we see the birth of many different variants of Prolog suited for these different tasks.

All in all, we found that Prolog was very suitable to address the problem of scheduling, as well as many others, and is a good candidate to replace other modern tools/frameworks, such as databases or prediction services, while also being able to work alongside others, such as object-oriented code or web servers. The elegant and simplistic nature of Prolog allows for queries to be easily expressed and functions easy to read, significantly improving the maintainability and readability of existing codebases. The abstractions that Prolog provides eliminate common sources of verbosity while preserving performance. We see a bright future in Prolog.

Links to code

https://github.com/ginabolo/prolog-scheduler