Course:CPSC312-2023-Mini-C-Compiler

From UBC Wiki

What is the Problem?

Compile C programming language to specific machine code

What's the Original Language?

Subset of ANSI C (C89)

What are the Main Procedures?

  1. Preprocessing
  2. Tokenizing & Parsing
  3. Generating IR
  4. Assembly to target architecture (X86, ARM, etc.)

What's the Target Language?

LLVM (Low-Level Virtual Machine) IR (Intermediate Representation)

Link to our Program:

https://github.com/Tengs-Penkwe/Haskell-C-Compiler

What Did We Learn?

  1. Tokenization
    • How to use the Text.Parsec library in Haskell to write a parser
    • How to define and tokenize different types of values in the language, including integers and floats.
    • How to define and tokenize identifiers in the language.
    • How to define and tokenize different types of symbols in the language, including special symbols, separators, and brackets.
  2. Parsing
    • How to define AST for programming Languages: declarations, statements and expressions.
    • How to use monads in Haskell to handle errors and return values from parsing functions.
    • How to write a parser that generates an abstract syntax tree (AST) for the parsed input.
  3. Code Generation
    • How to maintain state in stateful process
    • How to transfer our AST to LLVM's AST
    • How to define LLVM Module, Block
    • How this process should cooperate with AST Design


To start this project, we were not very familiar with the process of a compiler. We learned the main procedures to compile are to preprocess, tokenize, parse, generate Intermediate Representation , and finally covert the IR into the target language. Through this process, we discovered how useful Haskell and functional programming can be. It has been mentioned in lectures how we can write what would be complex problems in other languages, quite simply using Haskell. This project really showcased this idea. With Haskell being so good at pattern matching, it is quite easy to work with trees. Since we used Algebraic Data Type a lot in our project, creating this program in another language would have been much more complex.