Authors: Finn Hackett, Peter Deltchev
What is the problem?
In a lot of lower-lever languages, code can be written that does bad things with memory. Think "segmentation fault", or strange data corruption. By making some assumptions regarding what is intended behaviour, it is possible to determine from static analysis if a program contains a large set of common bugs.
We intend to write a program in Prolog that is able to detect memory-related errors by analysing an abstract representation of programs that are provided to it.
Languages that may benefit from such analysis: C, C++, assembly?, languages that can get NullPointerException or somesuch
Partly inspired by the latter part of this talk: https://www.youtube.com/watch?v=hEx5DNLWGgA&index=2&list=PLHTh1InhhwT75gykhs7pqcR_uSiG601oh
Errors we hope to be able to detect:
- out-of-bounds read/write
- unhandled corner cases
- mishandled memory ownership (memory leaks, double-free, dangling pointer)
What is the something extra?
Some interesting aspects to this project are the following planned features:
- Parsing actual code from a file (in our example language)
- Fully generic abstract algorithm that should support a wide range of languages given the appropriate converters/parsers
What did we learn from doing this?
In writing the source code tokenizer+parser, we learned how to perform basic I/O with the filesystem as well as string manipulation in Prolog - neither of these topics were covered in class and were intriguing to learn about. Pattern matching in particular was well suited to determining what a particular syntactic element represented but Prolog's capabilities for string manipulation were underwhelming and limiting.
In writing the memory checker, we got to understand a bit more about larger-scale program design in Prolog. Interesting mentions include the subtleties of operating on difference lists across multiple nested predicate calls, and how the member predicate can be used to extract elements from a list by unification instead of directly recursing over the list. The main tricky part was the sheer amount of state that needed to be passed around, and the monstrous functions with many parameters that it lead to. An improvement on our end could be packing a lot of related state into parameterised predicates so that they can be passed around more generically by code that does not need to know their exact contents.
The project as a whole has been an interesting exploration of what kind of data structures can be built and processed using Prolog, since we had to build and navigate an Abstract Syntax Tree of sorts. We also explored the limitations of Prolog regarding data typing and its tolerance for programming error - sometimes we only noticed a mis-typed parameter to a predicate after a lot of testing.