Heteroskedasticity

From UBC Wiki
Jump to: navigation, search
EconHelp.png This article is part of the EconHelp Tutoring Wiki


In statistics, a sequence of random variables is heteroskedastic, if the random variables have different variances. The term means "differing variance" and comes from the Greek "hetero" ('different') and "skedasis" ('dispersion'). In contrast, a sequence of random variables is called homoskedastic if it has constant variance.

Heteroskedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong. An example of the consequence of biased standard error estimation which OLS will produce if heteroskedasticity is present, is that a researcher may find at a selected confidence level, results compelling against the rejection of a null hypothesis as statistically significant when that null hypothesis was in fact uncharacteristic of the actual population (i.e., make a type I error).

It is widely known that, under certain assumptions, the OLS estimator has a normal asymptotic distribution when properly normalized and centered (even when the data does not come from a normal distribution). This result is used to justify using a normal distribution, or a chi square distribution (depending on how the test statistic is calculated), when conducting a hypothesis test. This holds even under heteroscedasticity. More precisely, the OLS estimator in the presence of heteroscedasticity is asymptotically normal, when properly normalized and centered, with a variance-covariance matrix that differs from the case of homoscedasticity.

Corrections for Heteroskedasticity[edit | edit source]

There are three common corrections for heteroscedasticity: View Logged data. Unlogged series that are growing exponentially often appear to have increasing variability as the series rises over time. The variability in percentage terms may, however, be rather stable.

Use a different specification for the model (different X variables, or perhaps non-linear transformations of the X variables).

Apply a weighted least squares estimation method, in which OLS is applied to transformed or weighted values of X and Y. The weights vary over observations, depending on the changing error variances.

Heteroskedasticity-consistent standard errors (HCSE), while still biased, improve upon OLS estimates (White 1980). HCSE is a consistent estimator of standard errors in regression models with heteroskedasticity. The White method corrects for heteroskedasticity without altering the values of the coefficients. This method may be superior to regular OLS because if heteroskedasticity is present it corrects for it, however, if the data is homoskedastistic, the standard errors are equivalent to conventional standard errors estimated by ols. Several modifications of the White method of computing heteroskedasticity-consistent standard errors have been proposed as corrections with superior finite sample properties.

Examples of Heteroskedasticity[edit | edit source]

Heteroscedasticity often occurs when there is a large difference among the sizes of the observations. A classic example of heteroskedasticity is that of income versus expenditure on meals. As one's income increases, the variability of food consumption will increase. A poorer person will spend a rather constant amount by always eating less expensive food; a wealthier person may occasionally buy inexpensive food and at other times eat expensive meals. Those with higher incomes display a greater variability of food consumption.

Imagine you are watching a rocket take off nearby and measuring the distance it has traveled once each second. In the first couple of seconds your measurements may be accurate to the nearest centimeter, say. However, 5 minutes later as the rocket recedes into space, the accuracy of your measurements may only be good to 100 m, because of the increased distance, atmospheric distortion and a variety of other factors. The data you collect would exhibit heteroscedasticity.