Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide

I Just Ran Two Million Regressions

Xavier X. Sala-I-Martin, 1997

Lukas Hager

5/3/2021

1 / 17

What is the Paper's Main Question?

  • What are the "true" variables that belong in a complex regression?
2 / 17

What is the Paper's Main Question?

  • What are the "true" variables that belong in a complex regression?

  • For this paper: what variables are really correlated with growth?

3 / 17

What is the Paper's Main Question?

  • What are the "true" variables that belong in a complex regression?

  • For this paper: what variables are really correlated with growth?

  • Levine and Renelt (1992) run extreme bounds test to identify robust coefficients

4 / 17

What is the Paper's Main Question?

  • What are the "true" variables that belong in a complex regression?

  • For this paper: what variables are really correlated with growth?

  • Levine and Renelt (1992) run extreme bounds test to identify robust coefficients

  • Is this test too strict?

5 / 17

What Data Does the Paper Use?

  • World Bank Data on growth since 1960
6 / 17

What Data Does the Paper Use?

  • World Bank Data on growth since 1960

  • 62 variables total variables, 59 of which are tested

    • Variables picked needed to be populated close to 1960 to minimize endogeneity
7 / 17

What Data Does the Paper Use?

  • World Bank Data on growth since 1960

  • 62 variables total variables, 59 of which are tested

    • Variables picked needed to be populated close to 1960 to minimize endogeneity
  • Examples of regressors:

    • "Fraction Protestant"
    • "Spanish colony"
    • "War dummy"
8 / 17

What Data Does the Paper Use?

  • World Bank Data on growth since 1960

  • 62 variables total variables, 59 of which are tested

    • Variables picked needed to be populated close to 1960 to minimize endogeneity
  • Examples of regressors:

    • "Fraction Protestant"
    • "Spanish colony"
    • "War dummy"
  • Outcome: growth

9 / 17

What are the Methods and Models Used for Analysis?

  • Three groups of variables
    • 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): \boldsymbol{X}
10 / 17

What are the Methods and Models Used for Analysis?

  • Three groups of variables

    • 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): \boldsymbol{X}

    • Relevant variable that we want to assess: x

11 / 17

What are the Methods and Models Used for Analysis?

  • Three groups of variables

    • 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): \boldsymbol{X}

    • Relevant variable that we want to assess: x

    • Three randomly selected "control" variables: \boldsymbol{Z}

  • So each regression is of the form Y = \boldsymbol{X}\beta + x\gamma + \boldsymbol{Z}\zeta + \varepsilon

12 / 17

What are the Methods and Models Used for Analysis?

  • Three groups of variables

    • 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): \boldsymbol{X}

    • Relevant variable that we want to assess: x

    • Three randomly selected "control" variables: \boldsymbol{Z}

  • So each regression is of the form Y = \boldsymbol{X}\beta + x\gamma + \boldsymbol{Z}\zeta + \varepsilon

  • Leamer's Extreme Bound Test: Calculate extreme bounds by the extreme values of \gamma \pm 2\sigma_{\gamma}^2; if one extreme bound positive and one negative, coefficient is not robust

13 / 17

What are the Methods and Models Used for Analysis?

  • For each of M models, get the integrated likelihood of \gamma, L, and create weights (weight better fitting regressions more) \omega_j = \frac{L_j}{\sum_{i=1}^ML_i}
  • Compute an estimate of the mean by combining all M models weighted by \omega_j: \hat{\gamma} = \sum^M \omega_i\gamma_i
  • Do the same for the standard errors: \hat{\sigma}_{\gamma}^2 = \sum^M \omega_i\gamma_i
  • If the distribution of parameters across tests is assumed to be normal, this is sufficient to get a distributional estimate (CDF(0))
14 / 17

What are the Methods and Models Used for Analysis

  • If we don't assume, normality, compute the individual CDF(0) for each model and weight them as before: \Phi_{\gamma}(0) = \sum^M \omega_i \Phi(0 / \hat{\gamma_i}, \hat{\sigma}_i^2)
  • Take this distribution for all variables and assess whether "enough" mass lies to one side of zero to consider the variable to be robust
15 / 17

What are the Main Findings?

  • Unsurprisingly, the weaker test has more variables that pass it
    • Original test applied to the 59 variables yields one robust variable
    • Using the revised test, 22 of the 59 variables are robust (significant)
16 / 17

What are Some Limitations of the Paper?

  • If models are not the "true" regression models, the likelihood weights are not correct
  • Potential technological limitations; adding more variables not feasible with 2005 computing power
17 / 17

What is the Paper's Main Question?

  • What are the "true" variables that belong in a complex regression?
2 / 17
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
sToggle scribble toolbox
Esc Back to slideshow