I Just Ran Two Million Regressions

I Just Ran Two Million RegressionsXavier X. Sala-I-Martin, 1997Lukas Hager5/3/20211 / 17

What is the Paper's Main Question?What are the "true" variables that belong in a complex regression?
2 / 17

What is the Paper's Main Question?

What are the "true" variables that belong in a complex regression?
For this paper: what variables are really correlated with growth?

3 / 17

What is the Paper's Main Question?

What are the "true" variables that belong in a complex regression?
For this paper: what variables are really correlated with growth?
Levine and Renelt (1992) run extreme bounds test to identify robust coefficients

4 / 17

What is the Paper's Main Question?

What are the "true" variables that belong in a complex regression?
For this paper: what variables are really correlated with growth?
Levine and Renelt (1992) run extreme bounds test to identify robust coefficients
Is this test too strict?

5 / 17

What Data Does the Paper Use?World Bank Data on growth since 1960
6 / 17

What Data Does the Paper Use?

World Bank Data on growth since 1960
62 variables total variables, 59 of which are tested
- Variables picked needed to be populated close to 1960 to minimize endogeneity

7 / 17

What Data Does the Paper Use?

World Bank Data on growth since 1960
62 variables total variables, 59 of which are tested
- Variables picked needed to be populated close to 1960 to minimize endogeneity
Examples of regressors:
- "Fraction Protestant"
- "Spanish colony"
- "War dummy"

8 / 17

What Data Does the Paper Use?

World Bank Data on growth since 1960
62 variables total variables, 59 of which are tested
- Variables picked needed to be populated close to 1960 to minimize endogeneity
Examples of regressors:
- "Fraction Protestant"
- "Spanish colony"
- "War dummy"
Outcome: growth

9 / 17

What are the Methods and Models Used for Analysis?Three groups of variables3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): \boldsymbol{X}

10 / 17

What are the Methods and Models Used for Analysis?

Three groups of variables
- 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): $\boldsymbol{X}$
- Relevant variable that we want to assess: $x$

11 / 17

What are the Methods and Models Used for Analysis?

Three groups of variables
- 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): $\boldsymbol{X}$
- Relevant variable that we want to assess: $x$
- Three randomly selected "control" variables: $\boldsymbol{Z}$
So each regression is of the form $Y = \boldsymbol{X}\beta + x\gamma + \boldsymbol{Z}\zeta + \varepsilon$

12 / 17

What are the Methods and Models Used for Analysis?

Three groups of variables
- 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): $\boldsymbol{X}$
- Relevant variable that we want to assess: $x$
- Three randomly selected "control" variables: $\boldsymbol{Z}$
So each regression is of the form $Y = \boldsymbol{X}\beta + x\gamma + \boldsymbol{Z}\zeta + \varepsilon$
Leamer's Extreme Bound Test: Calculate extreme bounds by the extreme values of $\gamma \pm 2\sigma_{\gamma}^2$ ; if one extreme bound positive and one negative, coefficient is not robust

13 / 17

What are the Methods and Models Used for Analysis?For each of M models, get the integrated likelihood of \gamma, L, and create weights (weight better fitting regressions more) \omega_j = \frac{L_j}{\sum_{i=1}^ML_i}
Compute an estimate of the mean by combining all M models weighted by \omega_j: \hat{\gamma} = \sum^M \omega_i\gamma_i
Do the same for the standard errors: \hat{\sigma}_{\gamma}^2 = \sum^M \omega_i\gamma_i
If the distribution of parameters across tests is assumed to be normal, this is sufficient to get a distributional estimate (CDF(0))
14 / 17

What are the Methods and Models Used for AnalysisIf we don't assume, normality, compute the individual CDF(0) for each model and weight them as before:
 \Phi_{\gamma}(0) = \sum^M \omega_i \Phi(0 / \hat{\gamma_i}, \hat{\sigma}_i^2)
Take this distribution for all variables and assess whether "enough" mass lies to one side of zero to consider the variable to be robust
15 / 17

What are the Main Findings?Unsurprisingly, the weaker test has more variables that pass itOriginal test applied to the 59 variables yields one robust variable
Using the revised test, 22 of the 59 variables are robust (significant)

16 / 17

What are Some Limitations of the Paper?If models are not the "true" regression models, the likelihood weights are not correct
Potential technological limitations; adding more variables not feasible with 2005 computing power
17 / 17

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
s	Toggle scribble toolbox

I Just Ran Two Million Regressions

Xavier X. Sala-I-Martin, 1997

Lukas Hager

5/3/2021

What is the Paper's Main Question?

What is the Paper's Main Question?

What is the Paper's Main Question?

What is the Paper's Main Question?

What Data Does the Paper Use?

What Data Does the Paper Use?

What Data Does the Paper Use?

What Data Does the Paper Use?

What are the Methods and Models Used for Analysis?

What are the Methods and Models Used for Analysis?

What are the Methods and Models Used for Analysis?

What are the Methods and Models Used for Analysis?

What are the Methods and Models Used for Analysis?

What are the Methods and Models Used for Analysis

What are the Main Findings?

What are Some Limitations of the Paper?

What is the Paper's Main Question?

Help