Science:MATH105 Probability/Lesson 1 DRV/1.05 Variance and Standard Deviation

Another important quantity related to a given random variable is its variance. The variance is a numerical description of the spread, or the dispersion, of the random variable. That is, the variance of a random variable X is a measure of how spread out the values of X are, given how likely each value is to be observed.

Definition: Variance and Standard Deviation of a Discrete Random Variable

The variance, Var(X), of a discrete random variable X is

${\text{Var}}(X)=\sum _{k=1}^{N}{\Big (}x_{k}-\mathbb {E} (X){\Big )}^{2}{\rm {{Pr}(X=x_{k})}}$

The integer N is the number of possible values of X.

The standard deviation, σ, is the positive square root of the variance:

$\sigma (X)={\sqrt {{\text{Var}}(X)}}$

Observe that the variance of a distribution is always non-negative (p_k is non-negative, and the square of a number is also non-negative).

Observe also that much like the expectation of a random variable X, the variance (or standard deviation) is a weighted average of an expression of observable and calculable values. More precisely, notice that

${\text{Var}}(X)=\mathbb {E} \left(\left[X-\mathbb {E} (X)\right]^{2}\right)$

Students in MATH 105 are expected to memorize the formulas for variance and standard deviation.

Example: Grade Distributions

Using the grade distribution example of the previous page, calculate the variance and standard deviation of the random variable associated to randomly selecting a single exam.

Solution

The variance of the random variable X is given by

${\begin{aligned}{\text{Var}}(X)&=\sum _{k=1}^{N}(x_{k}-\mathbb {E} (X))^{2}{\rm {{Pr}(X=x_{k})}}\\&=(30-64)^{2}{\frac {3}{10}}+(60-64)^{2}{\frac {2}{10}}+(80-64)^{2}{\frac {3}{10}}+(90-64)^{2}{\frac {1}{10}}+(100-64)^{2}{\frac {1}{10}}\\&=624\end{aligned}}$

The standard deviation of X is then

$\sigma (X)={\sqrt {624}}\approx 24.979992$

Interpretation of the Standard Deviation

For most "nice" random variables, i.e. ones that are not too wildly distributed, the standard deviation has a convenient informal interpretation. Consider the intervals $S_{m}=\left[\mathbb {E} (X)-m\sigma (X),\ \mathbb {E} (X)+m\sigma (X)\right],$ for some positive integer m. As we increase the value of m, these intervals will contain more of the possible values of the random variable X.

A good rule of thumb is that for "nicely distributed" random variables, all of the most likely possible values of the random variable will be contained in the interval S₃. Another way to say this is that most of the PDF will live on the interval S₃.

For our grade distribution example, notice that all possible values of X are contained in the interval S₃. In fact, all possible values of X are contained in S₂ for this particular example.