Q: Heteroscedasticity:
"I'm fitting an OLS regression model, which assumes that all cases have
equal error variance (homscedasticity). I suspect my data may violate this
assumption. Is this a problem; if so what can I do about it?"
A: When heteroscedasticity is mild, OLS standard errors behave quite well (Long and Ervin 2000). However, when heteroscedasticity is severe, ignoring it may bias your standard errors and p values. The direction of the bias depends on the pattern of heteroscedasticity: p values may be too large or too small.
Sometimes heteroscedasticity can be removed by a data transformation, such as logging the dependent variable. This may also improve the approximation to normality. Be careful, however, that your transformation doesn't make the results hard to interpret.
Sometimes the form of the heteroscedasticity is clear and can be modeled. More commonly, though, heteroscedasticity is a nuisance that cannot be modeled because its source is not well understood. In this case, the classic correction for heteroscedasticity is the HC0 estimator proposed by Huber (1967) and White (1980). But although this estimator is correct in large samples, it is no better than OLS in small samples. MacKinnon and White (1985) discussed three improvements, HC1, HC2, and HC3. An evaluation by Long and Ervin (2000) suggests that HC3 is the best, especially in small samples.
It is possible to correct for heteroscedasticity using popular software:
reg y x, hc3In small samples, this is better than the ROBUST option, which implements HC1.
proc reg;However, ACOV only corrects the covariance matrix; it does not correct the standard errors. To get the corrected standard errors, you have to take the square roots of the diagonal elements in the covariance matrix. :(
model y = x / acov;
run;
proc mixed empirical;Unlike the ACOV option in PROC REG, the EMPIRICAL option corrects the standard errors as well as the covariance matrix. The EMPIRICAL option also works more complicated mixed models, not just vanilla regressions.
model y = x ;
run;
Title; Calculate MacKinnon and White's HC3 estimator of
OLS Standard Errors $
Namelist; iv=one,list of predictors; dv=name of dependent variable$
Regress; lhs=dv; rhs=iv; res=resy$
Matrix; xpxinv=
Create; resysq=resy^2$
Create; hii=qfr(iv,xpxinv); hc3= resysq/ (1-hii) $
Matrix; varhc3=xpxinv*iv'[hc3]iv*xpxinv; stat(bols,varhc3)$
Now, how do you know if you should correct for heteroscedasticity? There are a number of tests for heteroscedasticity, so it seems natural to conduct a test, then use a correction if the test suggest heteroscedasticity. The trouble with this is that the tests often fail to detect heteroscedasticity, leading you to neglect the correction when it is actually needed. In simulations, Long and Ervin (2000) found that this possibility was quite serious. As a result, they recommended that "a test for heteroscedasticity should not be used to determine whether [an HC estimator] should be used." It is better to use an HC estimator whenever heteroscedasticity is suspected.
Greene, W.F. (1997). Econometric Analysis (3rd edition). New York: Prentice-Hall.
Hayes, A. F. & Cai, L. (in review). "Heteroscedasticity-robust moderated multiple regression using heteroscedasticity-consistent standard error estimates." Manuscript submitted for publication.
Huber, P.J. 1967. "The behavior of maximum likelihood estimates under
non-standard conditions." Proceeding of the Fifth Berkeley Symposium on
Mathematical Statistics and Probability 1: 221-233.
MacKinnon, J.G. and H. White. 1985. "Some heteroskedasticity consistent covariance matrix estimators with improved finite sample properties." Journal of Econometrics, 29, 53-57.
White, Halbert. 1980. "A heteroskedastic-consistent covariance matrix estimator and a direct test of heteroskedasticity." Econometrica 48:817-838.