Variables in a regression can be endogenous for several reasons including omitted variable biased, measurement error and simultaneity / reverse causation. One example from the previous post was that of unobserved ability in the determination of wages. Since unobserved ability is omitted from a regression of the impact of wages on income it is possible that the return to education is overestimated.

The Hausman Test for endogeneity can help us determine whether or not there is some for of omitted variable biased in this regression:

Since there is a suspicion that education (educ) suffers from omitted variable biased in the form of unobserved ability, we choose fathers and mothers education as instrumental variables. Parents education is likely not to affect the wages of their children but your parents education are good predictor of your education and genetic transmission of intellectual ability. This is why they may potentially be a good instrumental variable. We can test this assumption that father and mothers education are strong instruments by running a reduced form regression, with educ as the independent variable and all exogenous variables including the instruments and the explanatory variables.

The F-test above shows that in fact fathers’ and mother’s education are both statistically significant in determining their offspring’s educational attainment. The next step is to take the residuals of the reduced form equation and those residuals back into the structural equation. The structural equation is the original relationship that we care about Testing the statistical significance of the coefficient on the residuals in the structural equation is the Hausman Test.

The null-hypothesis is that ‘resid’ is zero and that therefore education is exogenous. This hypothesis can be rejected at the 10% level, but not 5% level. This is a border-line case, but for the sake of completeness we will use the 10% significance level to reject the null-hypothesis that ‘resid’ is zero and thus that education is exogenous. In other words, there is evidence that education is endogenous.

Given that we have selected what we believe to be a good instrument: 1) Parent’s education are related to offspring education and 2) parent’s education is unlikely to be related to their offspring’s wages. The next step is to estimate the model using parent’s education as instruments for people in the sample who are earning wages, since we rejected the null-hypothesis that ‘resid’ was zero at the 10% level in the previous regression.

**Concluding Remarks: **The Hausman Test is used to determine whether or not one of the explanatory variables in a regression suffers from endogeneity (omitted variable biased, measurement error, or reverse causality). The Hausman test found such endogeneity in the form of ommitted variable bias.

**The correct regression to run is the instrumental variable regression if you reject the null-hypothesis at the 10% level like we did. Running the IV regression one finds that each year of education increases wages by 6%.**

**If one believes that the 10% level is too generous, then decide on using the 5% significance level, we would not reject the null-hypothesis that ‘resid’ is zero thus we would not reject the hypothesis that education is exogenous. This would lead us to use the original OLS estimate of an 11% yearly return to education.**

This is so helpful. I like how you explicitly state that if you are going to reject the null, it’s really saying that you don’t need IV… and that THAT is the Hausman test.

How is this Hausman test equivalent to the derivation in the Durbin Wu Hausman paper?

Noted. Thank you.

Was the data used in this model from the Mroz 1987 paper ?

Yes, that is the data.

Cheers.

After the IV regression is ran and the educ variable now being statistically insignificant, what does that actually mean, does it mean that we have now resolved the problem of endogeneity and we can now treat the regression normally, using motheduc and fatheduc as instruments for educ?

Please ignore my last comment. I’ve put some comments in the concluding remarks that should help.

The Hausman test here was borderline. We can reject the null hypothesis that resid is zero at 5% significance level, but not at the 10% significance level. If we reject the null hypothesis that resid is zero (i.e. use the 5% level), we are essentially rejecting the null hypothesis that education is exogenous. Therefore, there is evidence that education is endogenous.

We would use instrumental variables into fix the endogenous education variable. Instrumental variables increases standard errors, that is one of their costs, but its a cost we are willing to pay if endogeneity if ruining our estimates. This may have hurt our statistical significance of the return to education but only slightly.

Reblogged this on Quantitative and Applied Economics.

Thanks for the article. Am using count data and I suspect social capital is the endogenous. I can use ols in hausman test considering the dependent variable is count (density of adopted technologies)? Thanks

Thanks it was very helpful for me but i have just a question if we have 2 endogenous variables in the same model what we should do ? Thank again

Hi. what if the hausman test shows that the IV is fine, but logically the instruments don’t make sense since they might be related to the DV directly. Should I just blindly trust the test or follow my logic? Are there doubts concerning the results of the hausman test as in one cannot always rely on it. A reference would be appreiciated.

very helpful, includes the command too! thanks

Hi,

Thanks for this helpful post. I am attempting to carrying out the steps of this Hausman test, but my data is panel data and has dummy variables.

I was wondering if I should run panel regression for the reduced form equation? When I attempt it STATA does not let me save the residuals, so I cannot do the next stage, i.e. run the structural equation with the residuals?

Is this just something I need to do differently in STATA or do I do the reduced form equation in OLS ?

Also concerning the DUmmy variables, do I include them in the reduced form equation and second equation ?

Thanks