Course:CPSC311/2010WT1/Final Exam Appendix

From UBC Wiki

Do not remove: This reference sheet is the appendix for the CPSC 311 2010W1 Final Exam. Only the first 10 printed pages are guaranteed to be printed; so be compact! Deadline for edits: 0900 Monday 20 December.

Final Appendix (Wiki, Student-Generated; credits at end in print version)

Midterm 1 Material

Multiple Argument Function Conversion Example

(with (f (fun (x y z) (+ x y)))
  (f 1 2))
==>
 (with (f (fun (x) (fun (y) (fun (z) (+ x y)))))
   ((f 1) 2)

Definitions

Abstract syntax - idealized syntax, designed for convenient use by the underlying interpreter/compiler; result of parsing a concrete syntax expression
Concrete syntax - syntax written by the user; consumed by the parser
Closure - a wrapper that contains an expression (function/ binding) and its environment; expression saved until it needs to be used
CPS stands for Continuation-Passing Style (a.k.a. Tail Call Elimination)
Static Scope: In a language with static scope, the scope of an identifier's binding is a syntactically delimited region. A typical region would be the body of a function or other binding construct.
Dynamic Scope: The scope of an identifier's binding is the entire remainder of the execution during which that binding is in effect.
First-order Functions are not values in the language. They can only be defined in a designated portion of the program, where they must be given names for use in the remainder of the program.
Higher-order Functions can return other functions as values and take other functions as parameters.
First-class Functions are values with all the rights of other values. In particular, they can be supplied as the value of arguments to functions, returned by functions as answers, constructed as values by expressions, and stored in data structures.
Deferred Substitution Every time we encounter a substitution (in the form of a with or application), we augment the repository with one more entry, recording the identifier's name and the value (if eager) or expression (if lazy) it should eventually be substituted with. We continue to evaluate without actually performing the substitution. To prevent dynamic scoping, use closures to store the environment in which the functions are created
Cache calculated substitution values are saved the first time they are calculated. By including a 3rd field, that is either false or contains the calculated value

Eager vs. Lazy Evaluation

Lazy With- we reduce the named expression to a value only when we need to
Lazy Functions- we do not reduce the argument to a value until necessary
Strictness Points- the points where the implementation of a lazy language forces an expression to reduce to a value (example: in a function application, need to evaluate exactly what function to invoke, can't just leave it as a function closure)
Eager- the named expression is reduced to a value before being substituting it.

Recursion implementation by patching the environment

Fixed-point of a function: a value that, when input to a function, is also output from the function.

interp the bound expression with a fake environment binding, and then patch that binding to point to the result of said interp afterwards using set! mutation. This creates a cyclic environment that addresses the needs of recursion. The cyclicity ensures that there is always "one more binding" for the recursive function when we need it.

(rec (bound-id named-exp body)
  (local ([define new-env (anEnv bound-id (numV 0) env)]
          [define named-value (interp named-exp new-env)])
    (begin
      (set-anEnv-val! new-env named-value)  ;; Patches false out for the correct value
      (interp body new-env))))

Racket syntax and General Examples

BNF

 <CFWAE> ::= <num>
   | {+ <CFWAE> <CFWAE>}
   | {- <CFWAE> <CFWAE>}
   | {* <CFWAE> <CFWAE>}
   | {/ <CFWAE> <CFWAE>}
   | <id>
   | {if0 <CFWAE> <CFWAE> <CFWAE>}
   | {with {<id> <CFWAE>} <CFWAE>}
   | {fun {<id> ...} <CFWAE>}
   | {<CFWAE> <CFWAE> ...}
   | {rec {<symbol> <CFWAE>} <CFWAE>}

Conditional chains

 (cond
     [(number? v) ...]
     [(boolean? v) ...]
     [(string? v) ...]
     [else false])

Map performs a procedure on all elements of the list in order

 (map (lamda (x) (* 2 x)) [1, 2, 3]) 
 outputs [2, 4, 6]

Pattern matching

 '...' or '___' means 0 or more
'..n' or '__n" means n or more
'_' matches anything

==>

 (match data
  [pattern  action]
     [(? number?) (foo)]              ;; check if data is number 
     [(list a b c) (foo a b c) ]      ;; back-reference of list elements
     [((and n (? number?))) (foo n)]  ;; check data and back-reference it as n
     [else (foo)])

Begin evaluate expressions in order. Results are ignored except for that of last expression, which is the evaluated value. In this case, the side-effect of printing "Hello World" occurs, however the expression overall evaluates to 4.

 (begin
   (+ 5 2) (printf "Hello World!\n") (+ 3 1))

Functions
(lambda (x) x) ; first-class value, not used here
(define f (lambda (x) x))
(define (f x) x)
Type-case is a conditional construct specific to a particular define-type

 (type-case WAE expr
   [num (n) expr]
   ...)

Racket code specific to implementation of our languages

Environment: - repository of deferred substitutions

(define-type Env
     [mtEnv]
     [anEnv (name symbol?)(value CFWAE-Value?)])

Expression Closures:

(define-type CFWAE-Value
     [numV (num number?)]
     [exprV (body CFWAE?)(env Env?)] ;; used to hold an unevaluated expression and its environment (used in lazy interp)
     [closureV (param symbol?)(body CFWAE?)(env Env?)]) ;; used to close over the existing environment to ensure static scope

Define-type (From assignment 1)

(define-type WAE
 [num (n number?)]
 [binop (op procedure?) (lhs WAE?) (rhs WAE?)]
 [with (b Binding?) (body WAE?)]
 [with* (lob (listof Binding?)) (body WAE?)]
 [id (name symbol?)])

Strict code (from lecture)

(define (strict val)
 (if (expV? val)
     (strict (interp (expV-exp val) (expV-env val)))
     val))

Midterm 2 Material

State

  • Entails mutation
  • State is "threaded through" a program in the sense that the state that results from evaluating a given function is the state in which the next sequential function is to be evaluated in.
  • State is inherently dynamic and so isn't supported by solely the environment, as it has static scope in our implementations.
    • The environment protects static scope while the store tracks the dynamic changes
    • The environment maps identifiers to either locations (or values if there is no store) while the store maps locations to values.
  • Closures do not remember state as function are to be applied using the state that exists where they are applied.

Mutability implementation data structures

To support mutability in a language, we need two data structures:

Environment: This is our regular substitution list required to support static scope.

  • Maps either:
    • Identifiers to values: Makes sense when we are differentiating between mutable and immutable identifiers. If it is immutable, it will simply map to a regular value. If it is mutable, it will map to a boxV value with a store location stored within.
    • Identifiers to store locations: Only store locations are stored within the environment. Thus, even for an immutable value, there is always a two-step lookup: the environment maps the identifier to a store location, and the store maps the store location to a value.
  • Identifiers in the environment have lexical scope: a particular environment belongs to a particular expression scope.

Store: This is where we keep track of dynamic variable values.

  • Maps location to a value (which can be a boxV pointing to another location itself).
  • The store is "threaded through" the evaluation across sub-expressions.
  • Mutation occurs when we change a value in the store.

"Whereas the interpreter employs the same environment for both arms of an addition, for instance, it cascades the store from one arm to the next and then back out alongside the resulting value."

Meta vs Syntactic Interpreters

Syntactic Interpreter - An interpreter that uses the interpreting language to represent only terms of the interpreted language, implementing all the corresponding behavior explicitly

  • doesn't use Scheme's implementation of things (like numbers)
  • doesn't matter how well the interpreting and interpreted languages correspond

Meta Interpreter - An interpreter that uses language features of the interpreting language to directly implement behaviour of the interpreted language

  • easy to write when there is a strong match between interpreted and interpreting language
  • uses Scheme's implementation of things (closures, procedure applications, numbers, etc)

Meta-Circular Interpreter - A meta interpreter in which the interpreting and interpreted language are the same

  • Only uses Scheme implementation of things

Continuations

What are they?

  • stack is represented procedurally
  • remember only the result and what is left to do (passes the state along or back to caller)
  • roughly the same idea as tail recursion
Why? Why Not?
  • Saves on memory and computation (ex. tail recursion summation vs augmenting recursion summation)
  • A way of recording state when the processing computer is not always available (ex. web-servers)
  • get rid of machine optimizations by forcing the structure of stack into continuation
  • memory wastage if language needlessly creates stack frames
  • need access to the source of entire program otherwise CPS translator may fail
  • security: what if someone outside knew how to read the stack procedure?

Continuation Passing Style (end in /k)

Main Ideas

  • every function takes an extra argument (its continuation)
  • every argument in a function call must be either a variable or a lambda expression (not a more complex expression).
  • every call is a tail call

Example from midterm 2
We have the following, and want to convert it to CPS (assuming (all-pos/k) and (range/k) are correctly implemented, and represent CPS versions of all-pos and range, respectively:

(define (test-all-pos)
 (if (all-pos (range -5 5))
     'fail
     'pass))

==>

(define (test-all-pos/k k)
 (range/k -5 5
  (lambda (range-result)
   (all-pos/k range-result
    (lambda (allpos-result)
     (k (if allpos-result
           'fail
           'pass))))))

Note that this assumes if is in CPS.

(define (map f l)
 (if (empty? l)
    empty
    (cons (f (first l)) 
             (map f (rest l)))))

==>

(define (map/k f/k list k)
 (if (empty? list)
     (k empty)
     (f/k (first list)
          (lambda (f-result)
            (map/k f/k (rest list)
                   (lambda (r-result)
                     (k (cons f-result r-result))))))))

Note that this assumes empty, first, rest, and cons are all in CPS.

Variables

Call-by-value Call-by-reference
  • evaluated argument is held in a new location,
    • Therefore, changes to the content of that location in the store don't affect the actual parameter
  • pass a reference to the actual argument, not the value
  • updates to the reference within the called procedure will become visible to the calling context
  • cheaper to use (no additional allocation) but introduce problems
  • to implement:
    • create a closure and give it the location of the actual argument
    • uses l-value: env lookup without store lookup
    • any mutations to the formal parameter are now changes to the same location as the actual parameter
Stateful Stateless
  • server maintains state information
  • easier to program
    • don't need setup and breakdown of state at each interaction
  • ex: FTP
    • interp of each command is relative to history of past commands
  • does not retain record of prior communication
  • Web application must completely restore state of the computation for each interaction
  • server can handle higher loads
  • server can ignore clients who don't appear to be active
  • must transmit enough data to resume computation

Web Programs

Receiver - a procedure of one argument representing the pending computation

  • any computation not mentioned in the receiver never gets performed because of the program's termination after each iteration

Lifting - make nested procedures into top-level procedures

  • e.g. lambdas become defines

Make a program Web-Ready

  • 1. Generate receivers that capture pending computations
  • 2. Pass values to receivers instead of returning them

Implications

  • 1. Order of evaluation
  • 2. Transformation is global
    • all procedures in program must consume an extra receiver
  • 3. Sequentializes the program

Escaper Lambdas Example

From assignment 7 (midterm 2 review), we want to describe the continuation of the bolded expression (namely (* (- x 32) 5) ) below:

(define (fahrenheit-celcius-converter type x)
  (case type
    ['Fahrenheit (/ (* (- x 32) 5) 9)]
    ['Celcius    (+ 32 (/ (* x 9) 5))]))

  (+ (fahrenheit-celcius-converter 'Fahrenheit 32) 
     (fahrenheit-celcius-converter 'Fahrenheit 212))

We can use escaper lambdas to do this. The continuation of the bolded expression in the first evaluation of the function is:

(lambda^ (value) (+ (/ value 9) (fahrenheit-celcius-converter 'Fahrenheit 212)))

And the second continuation is: (lambda^ (value) (+ 0 (/ value 9)))

Post-Midterm 2 Material

Lambda Calculus

Consists of:

  • 1-parameter functions
  • 1-argument function applications
  • identifiers

Shrinking of a language is achieved by showing that a given component of the language may be replaced by a combination of other components making the component unnecessary. (ex. numbers are unnecessary as they can be represented by a function that represents zero and a successor function) By creating a simplified language it is easier to construct proofs showing that the language has desired properties. (ex. correctly implements tail-call optimization, never accesses an array out of bounds, ...)

Representing zero, succession and sum using lambda calculus:

zero_ = (lambda (f) (lambda (x) x)) ;; apply the function f zero times
succ_ = (lambda (n) (lambda (f) (lambda (x) (f ((n f) x)))))   ;; apply the function f n times
sum_ = (lambda (m)  (lambda (n)  ((n succ) m)))

Eliminating Recursion

While reducing the language closer to the lambda calculus, we needed a fixed-point function to avoid the problem of having unbound errors in recursion definitions (eventually would get to some expression of the form (f f))

These are the lecture notes that show how we created the fixed-point function and used it in implementing factorial (_mul is multiplication, _pred is predecessor (i.e. "decrement"):

(define _fixmaker
  (lambda (fixmaker)
    (lambda (f)
      (f (lambda (x) (((fixmaker fixmaker) f) x))))))

;; Then, we'll give it itself:
(define _fix
  ((lambda (f) (f f))
   _fixmaker))

;; Now, we can use fix to define recursive functions:
(define _fac
  (_fix (lambda (fac)
          (lambda (n)
            (((_if (_zero? n))
              (lambda (IGNORE) _one))
             (lambda (IGNORE) ((_mul n) (fac (_pred n)))))))))

Here is the Y-combinator all of its lambadic glory:

(lambda (p)
  ((lambda (f)
     (f f))
   (lambda (f)
     (p (f f)))))

Type Checking

Advantages More Advantages Disadvantages

Safety: A type checker can guarantee certain errors won't happen in the program before we run it.
Readability: Type declarations give information about the code to a programmer, and is never outdated in a program that compiles (unlike comments). This in itself is a form of Documentation.

  • Exploit types to make programs run faster, use less space, etc

Efficiency: Type declarations document static properties at compile time, so we don't have to check them each time code is visited during run time.

  • Reduce time spent debugging.
  • Catch errors in code that is not executed by the programmer (good if test suite is weak)
  • Prohibits certain kinds of terrible coding styles

Loses Expressiveness: some things will never produce errors but can't be type-checked will be rejected.
More Work at Start: upfront work is necessary before runtime testing.

Type - any property that can be established without executing the program. Types record kind of value not the precise value

A type system is that part of a programming language (definition and implementation) that concerns itself with making sure that no operations are performed on inappropriate (types of) arguments.

A language is “Type Safe” (or just “Safe”) if it has a type system that ensures that no operations can be performed on inappropriate arguments without this being detected. The property of type safety can be achieved by static checking (=detecting before runtime), dynamic checking (=detecting at runtime) or a combination of both!

Type Judgements - Collection of rules. One type rule for every syntactic construct. At least one type rule applies to every sub-term. A type error occurs when we are unable to construct a type judgment tree

  • Relation between type judments for functions and applications:
    • Function declaration: assume arg has right type, guarantee that the body will have the promised type
    • Function application: guarantee the argument has the right type for function, assume the result will have the type the function promises
Function Application Recursion
Г[i <- τ1] |- b : τ2
Г |- fun {i : τ1} : τ2 b} : (τ1 ->τ2
Г |- e1: τ1 -> τ2, Г |- e2: τ1
Г |- {e1 e2}: τ2
Г[i <- τi] |- b : τ, Г[i <- τi] |- v : τi
Г |- {rec {i : τi v} b} : τ
  • Type Procedures (^) (polymorphic function)

e.g. map : (for all) a, b. list(a) × (a -> b) -> list(b)

Soundness:

  • A program P typechecks without error --> No possible runs of P have "forbidden errors"
  • Typechecker says P has an error <-- Some possible runs of P have "forbidden errors"

Metavariables - Not program variables. Stand in for program text

Typing Control

  • Cannot write an infinitely long type
  • Strongly Normalizing: No matter what program you write in a strongly normalized language, it will always terminate
  • Typing Recursion: extend the environment for body to initiate recursion, extend again for v to sustain it
  • Datatype Variant Tags -- With static type checker, only need to store variant since type is guaranteed by type checker. Better space conception

Type Inferencing

  • Constraints are generated by traversing the abstract syntax tree and determining what types expressions are required to have by the operations that act with them.
  • The unification algorithm is applied on the generated constraints: If the constraints unify a type can be determined for the expression otherwise type inferencing fails on the expression.

Unification Algorithm

1. If X and Y are identical identifiers, do nothing.
2. If X is an identifier, replace all occurrences of X by Y both on the stack and in the substitution, and
   add X |-> Y to the substitution.
3. If Y is an identifier, replace all occurrences of Y by X both on the stack and in the substitution, and
   add Y |-> X to the substitution.
4. If X is of the form C(X1,...,Xn) for some constructor C, and Y is of the form C(Y1,...,Yn) (i.e., it
   has the same constructor), then push Xi=Yi for all 1 <= i <= n onto the stack.
5. Otherwise, X and Y do not unify. Report an error.

Occurs Check - Throw an error if the replacee is in the replacer (ie - recursion)

Principal Type - Imposes only enough constraints needed for type soundness, and no more

Garbage Collection

Why manual?

1. Automatic GC is a slow and expensive process, therefore let people handle it and only do as much as is necessary.
2. Soundness over completeness -> Automatic GC cannot detect all correct instances of GC (can fail to reclaim, or reclaim too early)

Why not manual?

1. People make mistakes when de-allocating memory. 
   (i.e. Reclaiming something that is in use, or failing to reclaim unused memory)
2. Loss of structural simplicity 
   (e.g. multiple versions of the same code that differ only in memory management)
3. Loops often lose tail-calling behaviour

Algorithms for GC:

1. Pointer Counting -> Collect anything with 0 things referencing it (Cyclic Pointers!)
2. Mark and sweep
   0) System is low on memory
   1) Mark root set as always live, and recursively Mark down any reachable children of root set as live.
   2) anything that can't be reached is trash and is Sweeped.

GC should demonstrate:

  • Utility: - must reclaim enough garbage to actually help computation
  • Soundness: - never reclaim something that is still needed
  • Efficiency: - Run quickly

Space Leakage: - Collector not reclaiming space that we know is no longer needed (eg. global variables that are no longer needed)

Explicit Polymorphism

(define length
  < Λ (τ)
   (lambda (l : list (τ)) : number
     (cond
       [(Empty? <τ> l) 0]
       [(Cons? <τ> l) (add1 (length <τ> (Rest <τ>  l)))]))>)
  • The expression (Rest <τ> l) first applies Rest to τ, resulting in an actual rest procedure that applies to lists of values of type τ
  • This procedure consumes l as an argument and proceeds as it would in the type system free case
  • Every type-parameterized procedure, such as Rest or length, is a generator of infinitely many procedures that each operate on specific types.
  • Type Elaborator : The phase that performs type applications.

Implicit Polymorphism

  • Want to get new identifiers for the ones bound by the closures but not the ones in its lexical scope (lexical scope variables are shared between all applications of the closure). Only get fresh type varaibale for types introduced by let or letrec

Г|- v : τ' Г[x <- CLOSE(τ', Г)]|- b : τ
Г |- (let ([xv])b):τ

  • to get fresh type variables:

Г|- e : CLOSE(τ', Г')
Г |- e : τ

  • where τ is the same as τ', except the renaming applies only to type variables in τ' that are not bound by Г