# CPSC 320 2009W2 Final Exam Reference Sheet

Do not remove: This reference sheet is the appendix for the CPSC 320 2009W2 final exam. Only the first 10 printed pages are guaranteed to be printed; so be compact! Deadline for edits: 11:59PM Sunday Apr 18.

NOTE: to initialize the page, I have simply copied in the Midterm #2 reference sheet.

## Reducing for proving NP Completeness

If we are trying to prove that problem A is NP-Complete, show that:

1. A is in NP (e.g. there is a certificate for A verified in polynomial time), and
2. We can reduce B to A in polynomial time, where B is already shown to be NP-complete, OR
1. Show that A is NP-hard

A problem H is NP-hard iff there is NP problem that can be reduced in poly. time to H

## Important Theorem

If any NPC problem is P time solvable, then P = NP. Equivalently, if any NP problem is not in P then no NPC is in P. Y is Polynomial Time Reducible to X Solve problem Y with a polynomial number of computation steps and a polynomial number of calls to a black box that solves X

## Dynamic Programming

• A problem must have optimal substructure and overlapping subproblems for dynamic programming to be considered. Note that the overlapping subproblems should only be slightly smaller; if they are something like half the size of the original problem, we'd use divide and conquer instead.

Directly from Wikipedia:

• Top-down approach: This is the direct fall-out of the recursive formulation of any problem. If the solution to any problem can be formulated recursively using the solution to its subproblems, and if its subproblems are overlapping, then one can easily memoize or store the solutions to the subproblems in a table. Whenever we attempt to solve a new subproblem, we first check the table to see if it is already solved. If a solution has been recorded, we can use it directly, otherwise we solve the subproblem and add its solution to the table.
• Bottom-up approach: This is the more interesting case. Once we formulate the solution to a problem recursively as in terms of its subproblems, we can try reformulating the problem in a bottom-up fashion: try solving the subproblems first and use their solutions to build-on and arrive at solutions to bigger subproblems. This is also usually done in a tabular form by iteratively generating solutions to bigger and bigger subproblems by using the solutions to small subproblems. For example, if we already know the values of F41 and F40, we can directly calculate the value of F42. (Referring to the Fibonacci problem)

Using dynamic programming: (from http://www.cs.cornell.edu/courses/cs482/2004su/handouts/dp.pdf)

Step 1: Define your sub-problem. Tell us in English what your sub-problem means, whether it look like P(k) or R(i, j) or anything else.

Step 2: Present your recurrence. Give a mathematical definition of your sub-problem in terms of “smaller” sub-problems.

Step 3: Prove that your recurrence is correct. Usually a small paragraph.

Step 4: State your base cases. Sometimes only one or two or three bases cases are needed, and sometimes you’ll need a lot (say O(n)).

Step 5: Present the algorithm. This often involves initializing the base cases and then using your recurrence to solve all the remaining sub-problems. You want to ensure that by filling in your table of sub-problems in the correct order, you can compute all the required solutions. Finally, generate the desired solution. Often this is the solution to one of your sub-problems, but not always.

Step 6: Running time.

Matrix Chain Multiplication
M(i,j) = min i<=k<j ( M(i,k) + M(k+1,j) + pi-1*pk*pj )

Longest Common Subsequence
C(i,j) = C(i-1, j-1) + 1 if x[i] = y[i]

max(C(i-1,j), C(i,j-1)) otherwise

Longest Increasing Subsequence
L(j) = 1 if ak > aj

max 1<=k<j ( L(k) ) + 1 otherwise

0-1 Knapsack problem
Value(i,w) = 0 if i=0 or w=0

Value(i-1,w) if wi > w
max(Value(i-1, w), Value(i-1, w - wi) + vi) otherwise

Balls and Jars
Ways(n,k) = Ways(n,k-1) + Ways(n-k,k)
Servers
MinCost(i) = 0 if i>n

min1<=k<j (ck + sumj=i -> k-1 (k-j) + MinCost(k+1) otherwise

Word Segmentation
BQ(i) = 0 if i = 0

max0<=j<i (BQ(j) + quality(yj+1,..,yi)) if i>0

Travelling SalesPerson
C(S,j) = min( C(S-{j}, i) + dij)

Coin Change Making
C(j) = 0 if j =0

1 + min1<=i<k(C(j-di) if j>=1

Moving on a Grid
A(i,j) = C(i,j) + min(A(i-1,j-1),A(i-1,j), A(i-1,j+1))

Boolean Parenthization
T(i,j) = sumi<=k<j T(i,k)T(k+1,j)

(T(i,k)+F(i,k))(T(k+1,j)+F(k+1,j)) - F(i,k)F(k+1,j)
T(i,k)F(k+1,j)+F(i,k)T(k+1,j)

RNA Prediction Structure
W(i,j) = -1 + W(i+1,j-1) if i and j can pair

mini<=k<j (W(i,k)+W(k+1,j)) otherwise

TOP-DOWN-SEGMENT(y; i)

if table[i]: bq is null

then best-choice 0
best-bq quality(y1 : : : yi)
for j 1 to i 􀀀 1
do bq TOP-DOWN-SEGMENT(y; j) + quality(yj+1 : : : yi)
if bq > best-bq
then best-choice j
best-bq bq
table[i]: bq best-bq
table[i]: last -end best-choice
return table[i]: bq


Here’s a bottom-up version:
BOTTOM-UP-SEGMENT(y; n)

for i   1 to n

do best-choice 0
best-bq quality(y1 : : : yi)
for j 1 to i 􀀀 1
do bq table[j]: bq +quality(yj+1 : : : yi)
if bq > best-bq
then best-choice j
best-bq bq
table[i]: bq best-bq
table[i]: last -end best-choice
return table[n]: bq


## Amortized Analysis

### Aggregate Analysis

Show that for any n, a sequence of n operations takes worst case ${\displaystyle T(n)}$ in total. The average (amortized) cost per operation is then ${\displaystyle {\frac {T(n)}{n}}}$, and this cost applies to every operation in the sequence, not just a single type of operation in the sequence.

### Accounting Method

Charge one type of operation more early in the sequence, and use the excess credit to pay for other operations later in the sequence. Here, different operations can have a different cost. We require

${\displaystyle \sum _{i=1}^{n}{\widehat {c}}_{i}\geq \sum _{i=1}^{n}c_{i}}$

Where ${\displaystyle {\widehat {c}}_{i}}$ is the amortized (charged) cost of the ${\displaystyle i^{th}}$ operation and ${\displaystyle c_{i}}$ is the actual cost of the ${\displaystyle i^{th}}$ operation. The total credit stored in the data structure is thus ${\displaystyle \sum _{i=1}^{n}{\widehat {c}}_{i}-\sum _{i=1}^{n}c_{i}}$ and we must always have nonnegative credit.

### Potential Method

Similar to the accounting method, we store prepaid charges in the data structure, but we store the credit as a whole in the data structure, not for individual operations. This "total credit" is the potential. We have a potential function ${\displaystyle \Phi :D_{i}\rightarrow \mathbb {R} }$ where ${\displaystyle \Phi (D_{i})}$ is the potential stored in the data structure ${\displaystyle D_{i}}$.

The amortized cost ${\displaystyle {\widehat {c}}_{i}=c_{i}+\Phi (D_{i})-\Phi (D_{i-1})}$. Thusly, the total amortized cost is the actual cost plus the potential stored in the data structure:

${\displaystyle \sum _{i=1}^{n}{\widehat {c}}_{i}=\sum _{i=1}^{n}c_{i}+\Phi (D_{n})-\Phi (D_{0})}$

And we require that ${\displaystyle \forall i\;\Phi (D_{i})\geq \Phi (D_{0})}$ then we guarantee that we pay for each operation in advance.

• If ${\displaystyle \Phi (D_{i})-\Phi (D_{i-1})>0}$, then ${\displaystyle {\widehat {c}}_{i}}$ represents an overcharge on the ${\displaystyle i^{th}}$ operation
• If ${\displaystyle \Phi (D_{i})-\Phi (D_{i-1})<0}$, then ${\displaystyle {\widehat {c}}_{i}}$ represents an undercharge on the ${\displaystyle i^{th}}$ operation, and the potential decrease pays for the actual cost of this ${\displaystyle i^{th}}$ operation

## Optimal vs. Greedy

• Optimization problem: The problem of finding the best solution from all feasible solutions, according to some metric.
• Greedy algorithm: Repeatedly make the "locally best choice" until the choices form a complete solution.
• Optimal substructure: Optimal solution to the problem is composed of pieces which are themselves optimal solutions to subproblems.
• Greedy-choice property: Locally optimal choices can be extended to a globally optimal solution.

How does one prove that a greedy solution is optimal? Two ways:

• Greedy algorithm stays ahead: Measure the greedy algorithm's progress in a step-by-step fashion, and show that it does better than any other algorithm at each step.
• Exchange argument: Gradually transform any given solution to the greedy solution.

e.g Exchange argument for Activity Selection

• Let O be some optimal solution and G a greedy solution (where we pick the earliest finish time) to an activity selection problem.
• Let Ai be the first activity that differs between G and O. That is, let Ai conflict with some elements ALT in O.
• Three Cases
• ALT = 0 (no conflicts). Contradiction because then we could add Ai to O and get a better sol'n so O was not optimal.
• ALT = 1. We can now swap Ai and ALT because Ai ends earlier than ALT and does not conflict with anything else in O. If Ai did not end earlier then G would've just picked ALT instead.
• ALT >= 2. Conflict have one element in ELT ending before Ai and some element ending after Ai. This is a contradiction because then G would've picked the element in ELT that ended before Ai.

Since O's conflict with G is always a single item that can be swapped, keep swapping until O == G. Now Size(O) == Size(G) and we G gave us an optimal sol'n.

## Graphs

### Graph definitions

A graph G = (V, E) is composed of vertices ${\displaystyle V=\{v_{1},v_{2},...,v_{n}\}}$ and edges ${\displaystyle E=\{e_{1},e_{2},...,e_{n}\}}$ where each ${\displaystyle e_{i}}$ connects two vertices ${\displaystyle (v_{i1},v_{i2})}$.

• sparse: ${\displaystyle |E|}$ is much less than ${\displaystyle |V|^{2}}$
• dense: ${\displaystyle |E|}$ is close to ${\displaystyle |V|^{2}}$
• acyclic - A graph is acyclic if it contains no cycles; unicyclic if it contains exactly one cycle; and pancyclic if it contains cycles of every possible length (from 3 to the order of the graph).
• weakly connected - replacing all of its directed edges with undirected edges produces a connected (undirected) graph
• strongly connected - there is a path from each vertex in the graph to every other vertex. Implies graph is directed
• spanning tree: a subset of edges from a connected graph that touches all vertices in the graph (spans the graph) and forms a tree (is connected and contains no cycles)
• minimum spanning tree: the spanning tree with the least total edge cost
• cut (S, V-S)
• light edge
• any undirected graph (not necessarily connected) has a minimum spanning forest, which is a union of minimum spanning trees for its connected components.

### Representing a graph in memory

• adjacency-list representation: efficient for sparse graphs
• memory required: ${\displaystyle O(V+E)}$
• adjacency-matrix representation: efficient for dense graphs
• memory required: ${\displaystyle O(V^{2})}$

### Kruskal (MST algorithm)

TODO: analyzed runtimes, and proof.

1. Let ${\displaystyle Q}$ be a min-priority queue of the edges in E
2. Let ${\displaystyle T=\{\}}$ (the minimum spanning tree being built)
3. While ${\displaystyle Q}$ is not empty do:
1. ${\displaystyle (u,v)=Q.deleteMin}$
2. If ${\displaystyle u}$ is not connected to ${\displaystyle v}$
1. ${\displaystyle T=T\cup \{(u,v)\}}$
2. Record that ${\displaystyle u}$ and ${\displaystyle v}$ are now connected.

Note: To efficiently implement the priority queue, use a binary heap. To efficiently implement checking for and changing connectivity, use the up-tree disjoint-set union-find data structure (with union-by-rank and path compression).

Where E is the number of edges in the graph and V is the number of vertices, Kruskal's algorithm can be shown to run in O(E log E) time, or equivalently, O(E log V) time, all with simple data structures.We can achieve this bound as follows: first sort the edges by weight using a comparison sort in O(E log E) time; this allows the step "remove an edge with minimum weight from S" to operate in constant time. Next, we use a disjoint-set data structure (Union&Find) to keep track of which vertices are in which components. We need to perform O(E) operations, two 'find' operations and possibly one union for each edge. Even a simple disjoint-set data structure such as disjoint-set forests with union by rank can perform O(E) operations in O(E log V) time. Thus the total time is O(E log E) = O(E log V).

If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component).

### Prim (MST algorithm)

The algorithm continuously increases the size of a tree starting with a single vertex until it spans all the vertices.

   * Input: A connected weighted graph with vertices V and edges E.
* Initialize: Vnew = {x}, where x is an arbitrary node (starting point) from V, Enew = {}
* Repeat until Vnew = V:
o Choose edge (u,v) with minimal weight such that u is in Vnew and v is not (if there are multiple edges with the same weight, choose arbitrarily but consistently)
* Output: Vnew and Enew describe a minimal spanning tree


Time Complexity:

Minimum edge weight data structure Time complexity (total)
binary heap and adjacency list O((V + E) log(V)) = O(E log(V))
Fibonacci heap and adjacency list O(E + V log(V))

There are a number of specialized heap data structures that either supply additional operations or outperform the above approaches. The binary heap uses O(log n) time for both operations, but allows peeking at the element of highest priority without removing it in constant time. Binomial heaps add several more operations, but require O(log n) time for peeking. Fibonacci heaps can insert elements, peek at the maximum priority element, and increase an element's priority in amortized constant time (deletions are still O(log n)).v

Proof: Let P be a connected, weighted graph. At every iteration of Prim's algorithm, an edge must be found that connects a vertex in a subgraph to a vertex outside the subgraph. Since P is connected, there will always be a path to every vertex. The output Y of Prim's algorithm is a tree, because the edge and vertex added to Y are connected. Let Y1 be a minimum spanning tree of P. If Y1=Y then Y is a minimum spanning tree. Otherwise, let e be the first edge added during the construction of Y that is not in Y1, and V be the set of vertices connected by the edges added before e. Then one endpoint of e is in V and the other is not. Since Y1 is a spanning tree of P, there is a path in Y1 joining the two endpoints. As one travels along the path, one must encounter an edge f joining a vertex in V to one that is not in V. Now, at the iteration when e was added to Y, f could also have been added and it would be added instead of e if its weight was less than e. Since f was not added, we conclude that

${\displaystyle w(f)\geq w(e).}$

Let Y2 be the graph obtained by removing f and adding e from Y1. It is easy to show that Y2 is connected, has the same number of edges as Y1, and the total weights of its edges is not larger than that of Y1, therefore it is also a minimum spanning tree of P and it contains e and all the edges added before it during the construction of V. Repeat the steps above and we will eventually obtain a minimum spanning tree of P that is identical to Y. This shows Y is a minimum spanning tree.

### Shortest-path (Dijkstra's)

Algorithm for Dijkstra's, analyzed runtimes, and proof.

## Description of the algorithm

Suppose you want to find the shortest path between two intersections on a map, a starting point and a destination. To accomplish this, you could highlight the streets (tracing the streets with a marker) in a certain order, until you have a route highlighted from the starting point to the destination. The order is conceptually simple: at each iteration, create a set of intersections consisting of every unmarked intersection that is directly connected to a marked intersection, this will be your set of considered intersections. From that set of considered intersections, find the closest intersection to the destination (this is the "greedy" part, as described above) and highlight it and mark that street to that intersection, draw an arrow with the direction, then repeat. In each stage mark just one new intersection. When you get to the destination, follow the arrows backwards. There will be only one path back against the arrows, the shortest one.

## Pseudocode

In the following algorithm, the code u := vertex in Q with smallest dist[], searches for the vertex u in the vertex set Q that has the least dist[u] value. That vertex is removed from the set Q and returned to the user. dist_between(u, v) calculates the length between the two neighbor-nodes u and v. The variable alt on line 13 is the length of the path from the root node to the neighbor node v if it were to go through u. If this path is shorter than the current shortest path recorded for v, that current path is replaced with this alt path. The previous array is populated with a pointer to the "next-hop" node on the source graph to get the shortest route to the source.

 1  function Dijkstra(Graph, source):
2      for each vertex v in Graph:           // Initializations
3          dist[v] := infinity               // Unknown distance function from source to v
4          previous[v] := undefined          // Previous node in optimal path from source
5      dist[source] := 0                     // Distance from source to source
6      Q := the set of all nodes in Graph
// All nodes in the graph are unoptimized - thus are in Q
7      while Q is not empty:                 // The main loop
8          u := vertex in Q with smallest dist[]
9          if dist[u] = infinity:
10              break                         // all remaining vertices are inaccessible from source
11          remove u from Q
12          for each neighbor v of u:         // where v has not yet been removed from Q.
13              alt := dist[u] + dist_between(u, v)
14              if alt < dist[v]:             // Relax (u,v,a)
15                  dist[v] := alt
16                  previous[v] := u
17      return dist[]


## Old material

### Asymptotic Notations

• ${\displaystyle f(n)\in O(g(n))}$ iff ${\displaystyle \exists c\in \mathbf {R^{+}} ,\exists n_{0}\in \mathbf {Z^{+}} ,\forall n\in \mathbf {Z^{+}} ,n\geq n_{0}\rightarrow f(n)\leq cg(n)}$
• ${\displaystyle f(n)\in \Omega (g(n))}$ iff ${\displaystyle g(n)\in O(f(n))}$, ${\displaystyle \exists c\in \mathbf {R^{+}} ,\exists n_{0}\in \mathbf {Z^{+}} ,\forall n\in \mathbf {Z^{+}} ,n\geq n_{0}\rightarrow 0\leq cg(n)\leq f(n)}$
• ${\displaystyle f(n)\in \Theta (g(n))}$ iff ${\displaystyle f(n)\in O(g(n))}$ and ${\displaystyle g(n)\in O(f(n))}$, ${\displaystyle \exists c_{1},c_{2}\in \mathbf {R^{+}} ,\exists n_{0}\in \mathbf {Z^{+}} ,\forall n\in \mathbf {Z^{+}} ,n\geq n_{0}\rightarrow 0\leq c_{1}g(n)\leq f(n)\leq c_{2}g(n)}$
• ${\displaystyle f(n)\in o(g(n))}$ iff ${\displaystyle \forall c\in \mathbf {R^{+}} ,\exists n_{0}\in \mathbf {Z^{+}} ,\forall n\in \mathbf {Z^{+}} ,n\geq n_{0}\rightarrow f(n)
• ${\displaystyle f(n)\in \omega (g(n))}$ iff ${\displaystyle g(n)\in o(f(n))}$, ${\displaystyle \forall c\in \mathbf {R^{+}} ,\exists n_{0}\in \mathbf {Z^{+}} ,\forall n\in \mathbf {Z^{+}} ,n\geq n_{0}\rightarrow 0\leq cg(n)

Also, if ${\displaystyle \displaystyle \lim _{n\to \infty }{\frac {f(n)}{g(n)}}=}$

• ${\displaystyle \infty }$, then ${\displaystyle f(n)\in \omega (g(n))}$
• a non-zero constant, then ${\displaystyle f(n)\in \Theta (g(n))}$
• ${\displaystyle 0}$, then ${\displaystyle f(n)\in o(g(n))}$

### Complexity of Deterministic Select

T(n) <= c[n/5] + c[7n/10 + 6] + an

    <= cn/5 + c + 7cn/10 + 6c + an
<= 9cn/10 + 7c + an
<= cn + (-cn/10 + 7c + an)


-cn/10 + 7c + an <= 0 cn/10 - 7c >= an cn - 70c >= 10an c(n - 70) >= 10an c >= 10an(n/(n - 70))

### Master theorem

If ${\displaystyle \displaystyle T(n)=aT(''n/b'')+f(n)}$ and a constant in the base case, where ${\displaystyle ''n/b''}$ can be either ${\displaystyle \lfloor n/b\rfloor }$ or ${\displaystyle \lceil n/b\rceil }$, ${\displaystyle a>=1}$ and ${\displaystyle b>1}$ then:

1. If ${\displaystyle f(n)\in O(n^{log_{b}{a-{\mathcal {E}}}})}$ for some ${\displaystyle {\mathcal {E}}>0}$, then ${\displaystyle T(n)\in \Theta (n^{log_{b}a})}$
• Dominated by leaf cost
1. If ${\displaystyle f(n)\in \Theta (n^{log_{b}a})}$ then, ${\displaystyle T(n)\in \Theta (n^{log_{b}a}lgn)}$
• Balanced cost
1. If ${\displaystyle f(n)\in \Omega (n^{log_{b}{a+{\mathcal {E}}}})}$ for some ${\displaystyle {\mathcal {E}}>0}$ and if ${\displaystyle \displaystyle af(''n/b'')\leq cf(n)}$ for some constant ${\displaystyle \displaystyle c<1}$ and sufficiently large ${\displaystyle n}$, then ${\displaystyle T(n)\in \Theta (f(n))}$
• Dominated by root cost - It is important in this case to check that f(n) is well behave. More specifically,
• For sufficient large n, a*f(n/b) <= c*f(n)

The following equations cannot be solved using the master theorem:

Also, Case 3 always hold when ${\displaystyle f=n^{k}}$ and if ${\displaystyle f(n)\in \Omega (n^{log_{b}{a+{\mathcal {E}}}})}$

### Log Laws

For all real a > 0, b > 0, c > 0, and n,

• ${\displaystyle a=b^{\log _{b}a}}$
• ${\displaystyle \log _{c}ab=\log _{c}a+\log _{c}b}$
• ${\displaystyle \log _{b}a^{n}=n\log _{b}a}$
• ${\displaystyle \log _{b}a={\frac {\log _{c}a}{\log _{c}b}}}$
• ${\displaystyle \log _{b}{\frac {1}{a}}=-\log _{b}a}$
• ${\displaystyle \log _{b}a={\frac {1}{\log _{a}b}}}$
• ${\displaystyle a^{\log _{b}c}=c^{\log _{b}a}}$
• ${\displaystyle \log 1=0}$

### Summation

• Arithmetic Series: ${\displaystyle \displaystyle {\sum _{k=1}^{n}(k)=1+2+\dots +n}={\frac {1}{2}}n(n+1)}$
• Sum of Squares: ${\displaystyle \displaystyle {\sum _{k=0}^{n}(k^{2})}={\frac {n(n+1)(2n+1)}{6}}}$
• Sum of Cubes: ${\displaystyle \displaystyle {\sum _{k=0}^{n}(k^{3})}={\frac {n^{2}(n+1)^{2}}{4}}}$
• Geometric Series: ${\displaystyle \displaystyle {\sum _{k=0}^{n}(x^{k})=1+x+x^{2}+\dots +x^{k}}={\frac {x^{n+1}-1}{x-1}},}$ for real ${\displaystyle x\neq 1}$
• Infinite decreasing: ${\displaystyle \displaystyle {\sum _{k=0}^{\infty }(x^{k})={\frac {1}{1-x}},for|x|<1}}$
• Telescoping: ${\displaystyle \displaystyle {\sum _{k=1}^{n-1}(a_{k}-a_{k+1})=a_{0}-a_{n}}}$

### Decision Tree-Related Notes

• for a list of ${\displaystyle n}$ elements, there's ${\displaystyle n!}$ permutations
• a binary tree of height ${\displaystyle d}$ has at most ${\displaystyle 2^{d}}$ leaves
• therefore, a binary tree with at least ${\displaystyle n}$ leaves must have height at least ${\displaystyle \lceil \lg n\rceil }$
• to get the height of a tree: the longest path (say we divide by 3/2 at each level) finishes when ${\displaystyle \left({\frac {2}{3}}\right)^{k}n=1\rightarrow k=log_{\frac {3}{2}}n}$, so the height is ${\displaystyle log_{\frac {3}{2}}n}$
• ${\displaystyle \lg(n!)\in \Theta (n\lg n)}$, which we can establish by proving big-O and big-Ω bounds separately (pumping "up" or "down" the values of the terms in the factorial and the overall number of terms as needed)

### Stirling's Approximation

• ${\displaystyle \ln n!\sim n\ln n-n\ .}$

### REDUCTION

If we reduce problem A to problem B (A -> B), then A<=B.

• Q.Why is reducing sorting to selection not a promising approach to develop a lower-bound on the selection problem?
• A.The best we could hope for is a lower bound of(n lg n)—”transferring” sorting’s lower bound to selection, but we know already that we can do better than this bound (i.e., O(n)); so, the reduction is hopeless.
• Q.why is reducing selection to sorting also not a promising approach to develop a lower-bound on the selection problem?
• A.This reduction can establish a lower bound on sorting, but it cannot establish a lower bound on selection. So, it’s useless for this purpose.

## Assignment Solutions

Consider our algorithm’s first choice of school on a particular problem, call it i. Compare to an optimal solution OPT. If i is in OPT, we’re done. If i is not in OPT, then select the school j in OPT that made the fewest requests for tickets (or even just any school in OPT). It cannot be that nj < ni or our algorithm would have selected j before it selected i. So, nj >= ni. In that case, we can build the solution (OPT - {j}) Union {i}, which demands no more tickets in total than OPT (and so is a feasible solution) and includes the same number of schools as OPT (and so is also optimal). Thus, our algorithm’s first choice can always be extended to an optimal solution. This corresponds tightly to our notion of the “greedy choice property”. An algorithm for a problem exhibits the greedy choice property if its initial choice can be extended to an optimal solution, i.e., it’s a “good” choice. (Note: we should be careful about the case where our algorithm does not allocate any tickets. Technically, that doesn’t clearly fall under this question, but let’s address it here. In that case, even the school that requests the fewest tickets requests too many to distribute. Alternatively, there is no smallest school because no schools request tickets. In either case, the optimal (and greedy) solution is to allocate no tickets.)

Assume S is an optimal solution to the problem of distributing tickets to schools other than i and OPT is an optimal solution to the overall problem. For contradiction, say that S union {i} is not an optimal solution. Then, OPT gives out tickets to more schools than S Union {i}; that is, |OPT| > |S U {i}|. Now, remove i from each solution to get S and OPT - {i}. Both are solutions to the subproblem (the former by assumption and the latter because removing i “frees up” just enough tickets for the change in capacity to the subproblem). |OPT-{i}| must also be greater than |S|, meaning S is not the optimal way to give out capacity - ni tickets, but this is a contradiction. QED This corresponds tightly to our notion of the “optimal substructure” property. A problem exhibits optimal substructure if we can form an optimal solution by combining any “good” choice that breaks the problem into smaller subproblem(s) and any optimal solution(s) to the subproblem(s). Note that (1) optimal substructure is a property of the problem, not the algorithm (unlike the “greedy choice property”) and (2) the “good” choice and the optimal subproblem solution(s) need not be obtained by a greedy algorithm or any particular algorithm (or even by the same algorithm as each other).

Consider two minimum spanning trees of a graph with different numbers of edges of a given weight. Consider the least weight w at which they differ. Arbitrarily order the edges of weight w and consider the first one (u; v) at which they differ. Without loss of generality, let MST1 contain (u; v) and MST2 exclude it. Then, MST1 contains (u; v) and no other path from u to v. MST2 does not contain (u; v) but does contain a different path from u to v. That path must include an edge (x; y) with weight w0 � w. (MST1 and MST2 don’t differ on edges with weights smaller than w. So, if the path has only such edges, thenMST1 also contains that path and therefore with the addition of (u; v) contains a cycle, which is a contradiction.) If w0 > w, thenMST2 was not a minimum spanning tree. If w0 = w, then the new tree is also a minimum spanning tree and has the same number of edges of weight w as MST2. However, we can repeat this process on the new tree and the oldMST1 (which are now “a pair of MSTs with different numbers of edges of weight w”) until we eliminate all differences between the two MSTs on edges of weight w. This is a contradiction.