Hessians and the Second Derivative Test

Statement of the Theorem

Let z=f(x,y) be a function of two real variables with continuous second derivatives in an open disk centered at a point P=(a,b). The Hessian determinant of f at P is the number

  $\displaystyle D=D(a,b)=\left|{\begin{matrix}f_{xx}(P)&f_{xy}(P)\\f_{yx}(P)&f_{yy}(P)\\\end{matrix}}\right|=f_{xx}(P)f_{yy}(P)-(f_{yx}(P))^{2}.$

Recall that P is a critical point of f if ∇f(P)=0.

Theorem (Second Derivative Test). Supppose that z=f(x,y) is a function of two real variables with continuous second derivatives in an open disk centered at a point P=(a,b) and P is a critical point of f. Let D denote the Hessian determinant. Then

if D and f_xx(P)>0, f(P) is a local minimum of f;
if D>0 and f_xx(P)<0, f(P) is a local maximum of f;
if D<0, f(P) is neither a local maximum or a local minimum of f.

Proof of the Theorem

A special case

Lemma. Let A,B and C be real numbers and set

  $\displaystyle f(x,y)=Ax^{2}+2Bxy+Cy^{2}.$

Note that the Hessian determinant is constant and is given by D=(2A)(2C)-(2B)²=4(AC-B²). Suppose A and D are non-zero. Then we have

   $\displaystyle f(x,y)=A(x+{\frac {By}{A}})^{2}+{\frac {D}{4A}}y^{2}.$

Proof. Multiply it out. The way we guess this is by completing the square using the first term on the right-hand-side to write the cross-term 2Bxy of the left-hand-side.

Corollary. Suppose f(x,y)=Ax² + 2Bxy+Cy². Then the Theorem holds for f at the critical point P=(0,0). In fact, if A,D>0, then f(0,0) is an absolute minimum of f on R² and, D>0 and A<0, f(0,0) is an absolute maximum of f.

Proof. It is easy to see that P=(0,0) is a critical point: take the gradient. And, obviously, f(0,0)=0. Suppose A, D>0. Then the coefficients A and (D/A) on the right hand side of the the expression for f in the Lemma are both positive. It follows that f(x,y)≥ 0 for all (x,y). Similarly, if D>0 but A<0, both of the coefficients on the right-hand-side are negative so f(x,y)≤ 0 for all (x,y).

Suppose that D<0 and A≠ 0. Pick a number r>0. Then f(r,0)=Ar² while f(-Br/A,1)=Dr²/(4A). Since these have opposite signs f(0,0) is neither a maximum nor a minimum.

The last case to consider is when D<0 but A=0. If C≠ 0, then we can just switch the roles of x and y and apply the reasoning of the previous paragraph. So we can assume C=0. Then f(x,y)=Bxy with B≠ 0. For this note that f(r,r)=r²>0 while f(r,-r)=-r²<0. Thus f(0,0)=0 is neither a local maximum nor a local minimum.

The General Case

The idea for the general case is essentially to approximate a general function f(x,y) by a quadratic function as in the special case. In fact, using Taylor's theorem with remainder we could prove the general case using this sort of approximation. However, I prefer a direct proof along the lines of the way the theorem is proved for the D>0 case in the text.

Proof of the Second Derivative Test. To simplify the notation, I assume that P=(0,0) and f(P)=0 and leave the reduction to this case to the reader as an exercise. For Q=(x,y), set H(x,y)=f_xx(Q)f_yy(Q)-[f_xy(Q)]² and D=H(P).

Suppose D>0 and that f_xx(P)>0. Since f has continuous second-order differentives, we can find an open disk B of radius r>0 centered at P such that H(Q)>0 on B and f_{xx}(Q)>0 Q in B. Now, for Q=(a,b) in B define a function g by

  $\displaystyle g(t):=f(at,bt).$

Using the chain rule, we can see that g'(0)=af_x(P)+bf_y(P)=0 and

  $\displaystyle g''(t)=a^{2}f_{xx}(at,bt)+2abf_{xy}(at,bt)+b^{2}f_{yy}(at,bt).$

Since H(Q), f_xx(Q)>0 on B, the lemma implies that g(t)≥ 0 for t∈ (0,1). This implies that f(Q)=g(1)≥ g(0)=f(P)=0. So f(P) is a local minimum.

To handle the case D>0 and f_xx(P)<0, apply the result of the last paragraph to the function -f(x,y). This shows that -f(0,0) is a local minimum of -f(x,y). Thus f(0,0) is a local maximum of f(x,y).

Finally, suppose D<0. Assume that f_xx(P)>0. (The general case is similar.) Then set h(t)=f(t,0). We have h(0)=f_xx(P)>0. So f(P)=h(0) is a local minimum of the function h(t). On the other hand, if we set

  $\displaystyle g(t)=f(-{\frac {f_{xy}(P)}{f_{xx}(P)}}t,t).$

It follows from a small calculation using the chain rule that

  $\displaystyle g''(0)=D/f_{xx}<0.$

So f(P)=g(0) is a local maximum of the function g(t). This implies that f(P) is neither a local maximum nor a local minimum of f.