2.2 - The Probability Density Function

An Important Distinction Between Continuous and Discrete Random Variables

What is Pr(X = x)? The answer clearly depends on the distribution of the random variable X. For discrete random variables, we have already seen that if x is a possible value that X can assume, then Pr(X = x) is some positive number. But is this still true if X is a continuous random variable?

In the context of our maximum outdoor air temperature example from the previous section, we may ask what is the probability that the maximum outdoor air temperature in downtown Vancouver on any given day in January is exactly 0°C? Since our measurements of the air temperature are never exact, this probability should be zero. If we had instead asked for the probability that the maximum outdoor air temperature was within 0.005° of 0°C, then we would have arrived at a nonzero probability. All practical measurements of continuous data are always approximate. They may be very precise, but they can never be truly exact. Hence, we cannot expect to measure the likelihood of an exact outcome, only an approximate one.

In general, for any continuous random variable X, we will always have Pr(X = x) = 0. We can prove this fact directly by appealing to our basic results about combining probabilities of disjoint events.

Suppose we choose any interval [ x , x + Δx]. The probability that the continuous random variable X lies inside of this interval is

${\text{Pr}}(x\leq X\leq x+\Delta x).$

Using our identity for probabilities of disjoint events, we can write this as the difference

${\text{Pr}}(X\leq x+\Delta x)-{\text{Pr}}(X\leq x).$

If we take the limit as Δx goes to zero, we obtain

${\begin{aligned}\lim _{\Delta x\rightarrow 0}{\text{Pr}}(x\leq X\leq x+\Delta x)&=\lim _{\Delta x\rightarrow 0}{\Big [}{\text{Pr}}(X\leq x+\Delta x)-{\text{Pr}}(X\leq x){\Big ]}\\&={\text{Pr}}(X\leq x)-{\text{Pr}}(X\leq x)\\&=0\end{aligned}}$

Notice that the crucial step in this argument is the evaluation of the limit in the second to last line. Since X is a continuous random variable, its CDF F(x) is a continuous function; thus, we are allowed to pass the limit through to the argument of the function F(x) = Pr(X ≤ x). Notice that if X were a discrete random variable, this evaluation would not be possible in general since its CDF would not be continuous.

This gives a direct proof of the fact that Pr(X = x) = 0 for any continuous random variable X. We will see that an even simpler proof will come for free for most continuous random variables via the Fundamental Theorem of Calculus. In order to do this however, we need to relate these probabilities to an integration of some appropriate function. It turns out that this function plays a vital role in describing the distribution of a continuous random variable and will be extremely useful for performing calculations.

The Probability Density Function

The "appropriate function" referred to above is called the probability density function (PDF). It can be defined for most continuous random variables, and is extremely useful for calculating probabilities of events associated to a continuous random variable.

The Probability Density Function

Let F(x) be the cumulative distribution function for a continuous random variable X.

The probability density function (PDF) of X is given by

f(x)={\frac {dF(x)}{dx}}

,

wherever the derivative exists.

In short, the PDF of a continuous random variable is the derivative of its CDF. Using the Fundamental Theorem of Calculus, we see that the CDF F(x) of a continuous random variable X may be expressed in terms of its PDF:

$F(x)=\int _{-\infty }^{x}f(t)dt,$

where f denotes the PDF of X.

Properties of the PDF

This formulation of the PDF via the Fundamental Theorem of Calculus allows us to derive the following properties.

Properties of the Probability Density Function
If f(x) is a probability density function for a continuous random variable X, then ${\begin{aligned}1.&\ F(x)=\mathrm {Pr} (X\leq x)=\int _{-\infty }^{x}f(t)dt\\2.&\ f(x)\geq 0\ {\text{for any value of}}\ x\\3.&\int _{-\infty }^{\infty }f(t)dt=1\end{aligned}}$

The first property, as we have already seen, is just an application of the Fundamental Theorem of Calculus and relates the CDF of a continuous random variable to its PDF.

The second property states that for a function to be a PDF, it must be nonnegative. This makes intuitive sense since probabilities are always nonnegative numbers. More precisely, we already know that the CDF F(x) is a nondecreasing function of x. Thus, its derivative f(x) is nonnegative.

The third property states that the area between the function and the x-axis must be 1, or that all probabilities must integrate to 1. This must be true since $\lim _{x\rightarrow -\infty }F(x)=0{\text{ and }}\lim _{x\rightarrow +\infty }F(x)=1\!$ ; thus Property 3 follows from the Fundamental Theorem of Calculus.

The PDF gives us a helpful geometrical interpretation of the probability of an event: the probability that a continuous random variable X is less than some value x₀, is equal to the area under the PDF f(x) on the interval (-∞,x₀ ], as demonstrated in the following graph.

Similarly, we have ${\text{Pr}}(a\leq x\leq b)=\int _{a}^{b}f(x)dx\!$ .

Now that we can interpret probabilities as integrals, it is clear that for a continuous random variable X, we will always have Pr(X = x) = 0. This is simply because the area under a single point of a curve is always zero. In other words, if X is a continuous random variable, the probability that X is equal to a particular value will always be zero. We again note that this is an important difference between continuous and discrete random variables.

The PDF of a continuous random variable plays a similar role as the PMF does for discrete random variables. In particular, they are both used to compute probabilities of events associated to a random variable. However, as the previous paragraph shows, PDFs and PMFs are different objects, just as continuous and discrete random variables are different concepts.

Example

Let f(x) = k(3x² + 1) for 0 ≤ x ≤ 2, and f(x) = 0 elsewhere.

Find the value of k that makes the given function a PDF.
Let X be a continuous random variable whose PDF is f(x). Compute the probability that X is between 1 and 2.
Find the cumulative distribution function of X.
Find the probability that X is exactly equal to 1.

Solution

Part 1)

${\begin{aligned}1&=\int _{-\infty }^{\infty }f(x)dx\\&=\int _{0}^{2}k(3x^{2}+1)dx\\&=k{\Big (}{\frac {3x^{3}}{3}}+x{\Big )}{\Big |}_{0}^{2}dx\\&=k(10)\end{aligned}}$

Therefore, k = 1/10.

Notice that f(x) ≥ 0 for all x. Also notice that we can rewrite this PDF as a piecewise function:

$f(x)={\begin{cases}{\frac {1}{10}}(3x^{2}+1)&{\text{if }}0\leq x\leq 2\\0&{\text{otherwise}}\end{cases}}$

Part 2)

Using our value of k from Part 1:

${\begin{aligned}\mathrm {Pr} (1\leq X\leq 2)=\int _{1}^{2}{\frac {3x^{2}+1}{10}}dx={\frac {x^{3}+x}{10}}{\Big |}_{1}^{2}=1-2/10=4/5\end{aligned}}$

Therefore, Pr(1 ≤ X ≤ 2) is 4/5.

Part 3)

Using the Fundamental Theorem of Calculus, the CDF of X at x in [0,2] is

${\begin{aligned}{\text{Pr}}(X\leq x)=F(x)&=\int _{-\infty }^{x}f(t)dt\\&=\int _{0}^{x}{\frac {1}{10}}(3t^{2}+1)dt\\&={\frac {1}{10}}(t^{3}+t){\Big |}_{0}^{x}\\&={\frac {1}{10}}(x^{3}+x),{\text{ for }}0\leq x\leq 2\end{aligned}}$

A similar calculation easily verifies that F(x) = 0 for all x < 0 and that F(x) = 1 for all x > 2.

Part 4)

Since X is a continuous random variable, we immediately know that the probability that it equals any one particular value must be zero. More directly, we compute

${\text{Pr}}(X=1)=\int _{1}^{1}f(t)dt=0$

An Important Subtlety

There is an important subtlety in the definition of the PDF of a continuous random variable. Notice that the PDF of a continuous random variable X can only be defined when the cumulative distribution function of X is differentiable.

As a first example, consider the experiment of randomly choosing a real number from the interval [0,1]. Let X denote the outcome of this experiment. Since the likelihood of picking a number in a given subinterval of [0,1] is proportional to the length of that subinterval, we see that the CDF F(x) is given by

${\text{Pr}}(X\leq x)=F(x)={\begin{cases}0&{\text{if }}x<0\\x&{\text{if }}0\leq x\leq 1\\1&{\text{if }}x>1\end{cases}}$

This function is differentiable everywhere except at the points x = 0 and x = 1. So the PDF of X is defined at all points except for these two:

${\frac {dF(x)}{dx}}=f(x)={\begin{cases}1&{\text{if }}0<x<1\\0&{\text{if }}x<0{\text{ or }}x>1\end{cases}}$

Nevertheless, it can still make sense to define the PDF at the points where the CDF fails to be differentiable. We know that the integral over a single point is always zero, so we can always change the value of our PDF at any particular point (or at any finite set of points) without changing the probabilities of events associated to our random variable. Thus, we could define

${\frac {dF(x)}{dx}}:=f(x)={\begin{cases}1&{\text{if }}0<x<1\\0&{\text{otherwise}}\end{cases}}$

or

${\frac {dF(x)}{dx}}:=f(x)={\begin{cases}1&{\text{if }}0\leq x\leq 1\\0&{\text{otherwise}}\end{cases}}$

Both of these functions are also PDFs of the continuous random variable X. These two formulations have the advantage of being defined for all real numbers.

Not All Continuous Random Variables Have PDFs

We can sometimes encounter continuous random variables that simply do not have a meaningful PDF at all. The simplest such example is given by a distribution function called the Cantor staircase.

The Cantor set is defined recursively as follows:

Start with the interval [0,1).
Delete the middle third of this interval. You are now left with two subintervals [0,1/3) and [2/3,1).
Delete the middle third of each of these remaining subintervals. Now we have four new subintervals: [0,1/9), [2/9,3/9), [6/9,7/9), and [8/9,1).
Repeat this middle third deletion for the new subintervals. Continue indefinitely.

If we take this process to the limit, the set that remains is called the Cantor set. It is extremely sparse in [0,1), yet still contains about as many points as the entire interval itself. In particular, notice that every point of the form x = 1 - 3^-k is in the Cantor set for every k > 0.

We can define a Cantor random variable to have the cumulative distribution function that increases on the Cantor set and remains constant off of this set. We define this function as follows:

Let F(x) be the CDF of our Cantor random variable X. Define F(x) = 0 for x < 0 and F(x) = 1 for x > 1.
Define F(x) = 1/2 on [1/3,2/3), i.e. on the first middle third deleted in the construction of the Cantor set.
Define F(x) = 1/4 on [1/9,2/9) and F(x) = 3/4 on [7/9,8/9).
Define F(x) = 1/8, 3/8, 5/8, and 7/8 on the deleted middle thirds from the third step in our Cantor set construction.
Continue indefinitely.

After a limiting argument and some technicalities with defining F(x) on the Cantor set itself, this procedure defines a continuous function that begins at 0 and increases to 1. However, since this function is constant except on the Cantor set, we see that its derivative off of the Cantor set must be identically zero. On the Cantor set the function is not differentiable and so has no natural PDF.

What we see is that, for a Cantor random variable, we cannot make any sensible definition for the PDF. It is either identically zero or not defined.

This is an interesting example of how identifying a random variable with its PDF can lead us astray. Thankfully, for our purposes, we will never need to consider continuous random variables that do not have PDFs defined everywhere (except possibly at finitely many points).