From UBC Wiki

Syntax Validator

Authors: Laura Barton, JM Messham

What is the problem?

For a program to compile, it must be free of all syntax errors. We have designed a tool which, given a C++ program, will determine whether there are syntax errors (e.g., missing semicolon delimiters, unbalanced brackets). Whenever possible, the line number that the error occurred on will be returned as part of the output.

What is the something extra?

In addition to running the syntax validator by manually inputting your code, it can be by using the included Emacs plugin (written in Lisp). This allows validation of larger C++ files that would be impractical to type directly into swipl. Basic instructions for setting up and using the plugin are included in the README, although some familiarity with Emacs is recommended.

What did we learn from doing this?

Overall, Prolog is suitable for this task; syntax validation is essentially pattern matching, which Prolog excels at. The biggest hurdle was figuring out an efficient way to pass non-trivial amounts of code to our program, however this was surmountable by writing our "something extra" emacs plugin.

The simplest task was semicolon delimitation; since this should typically occur at the end of a line, it was clear where to expect the character. Balanced brackets was a bit trickier; not only is it challenging to predict where the closing bracket will occur (it could be anywhere from a few characters to the whole file away), but there are both () and {} to deal with (we did not include square brackets for indexing, although these could be easily added). In order to accommodate this, we essentially treated our bracket validator as a stack, building up a list of open brackets and popping them off as the closing brackets appeared. If the list was empty at the end of the parsing process, we knew that all brackets had found matching pairs.

In terms of Prolog itself, we learned to use the cut (!) operator. This helped the program's output make sense, since it didn't repeat itself with all possible sets of errors. Some other Prolog built-ins came in handy, like string_concat and sub_atom, which are similar to difference lists except they work with strings.

Lingering questions are those related to the efficiencies of various methods of processing strings; it appear that if one wants to examine all characters in a string, or even a character at a particular index in a string, this can become cumbersome very quickly. At the same time, there were many ways to make it work. We're unsure which ways would scale the best, or what other tradeoffs we might be making.