class: center, middle, inverse, title-slide # I Just Ran Two Million Regressions ## Xavier X. Sala-I-Martin, 1997 ### Lukas Hager ### 5/3/2021 --- ### What is the Paper's Main Question? - What are the "true" variables that belong in a complex regression? -- - For this paper: what variables are really correlated with growth? -- - Levine and Renelt (1992) run *extreme bounds test* to identify robust coefficients -- - Is this test too strict? --- ### What Data Does the Paper Use? - World Bank Data on growth since 1960 -- - 62 variables total variables, 59 of which are tested - Variables picked needed to be populated close to 1960 to minimize endogeneity -- - Examples of regressors: - "Fraction Protestant" - "Spanish colony" - "War dummy" -- - Outcome: growth --- ### What are the Methods and Models Used for Analysis? - Three groups of variables - 3 variables that are always included (level of income in 1960, life expectancy in 1960, primary school enrollment rate in 1960): `$$\boldsymbol{X}$$` -- - Relevant variable that we want to assess: `$$x$$` -- - Three randomly selected "control" variables: `$$\boldsymbol{Z}$$` - So each regression is of the form `$$Y = \boldsymbol{X}\beta + x\gamma + \boldsymbol{Z}\zeta + \varepsilon$$` -- - **Leamer's Extreme Bound Test**: Calculate extreme bounds by the extreme values of \\(\gamma \pm 2\sigma_{\gamma}^2\\); if one extreme bound positive and one negative, coefficient is not robust --- ### What are the Methods and Models Used for Analysis? - For each of \\(M\\) models, get the integrated likelihood of \\(\gamma\\), \\(L\\), and create weights (weight better fitting regressions more) `$$\omega_j = \frac{L_j}{\sum_{i=1}^ML_i}$$` - Compute an estimate of the mean by combining all \\(M\\) models weighted by \\(\omega_j\\): `$$\hat{\gamma} = \sum^M \omega_i\gamma_i$$` - Do the same for the standard errors: `$$\hat{\sigma}_{\gamma}^2 = \sum^M \omega_i\gamma_i$$` - If the distribution of parameters across tests is assumed to be normal, this is sufficient to get a distributional estimate (CDF(0)) --- ### What are the Methods and Models Used for Analysis - If we don't assume, normality, compute the individual CDF(0) for each model and weight them as before: $$ \Phi_{\gamma}(0) = \sum^M \omega_i \Phi(0 / \hat{\gamma_i}, \hat{\sigma}_i^2)$$ - Take this distribution for all variables and assess whether "enough" mass lies to one side of zero to consider the variable to be robust --- ### What are the Main Findings? - Unsurprisingly, the weaker test has more variables that pass it - Original test applied to the 59 variables yields **one** robust variable - Using the revised test, 22 of the 59 variables are robust (significant) --- ### What are Some Limitations of the Paper? - If models are not the "true" regression models, the likelihood weights are not correct - Potential technological limitations; adding more variables not feasible with 2005 computing power