Problem Set 2

Author

Lukas Hager

Published

April 9, 2024

This problem set must be submitted on Canvas by 11:59 PM PST on April 10, 2024.

As always, you may write helper functions that these functions call, just so long as you have these four functions in the .py file that you submit.

Be careful with scipy.optimize.minimize! Remember that it can be finnicky.

Exercise 0

Please write a function that takes no arguments and returns a link to your solutions on GitHub.

Use the following shell:

def github() -> str:
    """
    Some docstrings.
    """

    return "https://github.com/<user>/<repo>/blob/main/<filename.py>"

Exercise 1

Please write a function that returns 1000 simulated observations via the following data generating process:

\[ y_i = 5 + 3x_{i1} + 2x_{i2} + 6x_{i3} + \varepsilon_i \]

Here, \(x_{i1},x_{i2},x_{i3}\sim \mathcal{N}(0,2)\) and \(\varepsilon_i\sim\mathcal{N}(0,1)\)¹

In particular, your function should take one argument, seed, an integer that is used to set a seed (this should default to 481 if not provided), and should return a tuple of two elements, (y,X) where y is a \(1000\times 1\) np.array and X is a \(1000\times 3\) np.array.

Use the following shell:

def simulate_data(seed: int) -> tuple:
    """
    Some docstrings.
    """

    return None

Exercise 2

Write a function that estimates the MLE parameters \(\hat{\beta}_{MLE}\) for data simulated as above, where the assumed model is

\[ y_i = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + \beta_3x_{i3} + \varepsilon_i \]

where \(\varepsilon_i\sim \mathcal{N}(0,1)\) (note that this is a very strong assumption – in reality, you would also estimate the variance of the error term \(\hat{\sigma}^2\))².

In particular, your function should take as arguments a \(1000\times 1\) np.array y and a \(1000 \times 3\) np.array X and return a \(4 \times 1\) np.array with the coefficients \(\beta_0, \beta_1, \beta_2, \beta_3\) (in that order).

Warning

Do not use any packages besides numpy and scipy.optimize.

Use the following shell:

import numpy as np

def estimate_mle(y: np.array, X: np.array) -> np.array:
    """
    Some docstrings.
    """

    return None

Exercise 3

Write a function to estimate the OLS coefficients for the simulated data, with a catch: do not use the closed-form solution (i.e. \(\hat{\beta} = (\mathbb{X}^{\top}\mathbb{X})^{-1}\mathbb{X}^{\top}y\)).

In particular, your function should take as arguments a \(1000 \times 3\) np.array X and a \(1000\times 1\) np.array y and return a \(4 \times 1\) np.array with the coefficients \(\beta_0, \beta_1, \beta_2, \beta_3\) (in that order).

Warning

Do not use any packages besides numpy and scipy.optimize.

Use the following shell:

def estimate_ols(y: np.array, X: np.array) -> np.array:
    """
    Some docstrings.
    """

    return None

Footnotes

Here, the second argument references the variance, not the standard deviation.↩︎
One way to think about the likelihood function: what does the error have to be for us to get a given combination of \(x_i, y_i\)?↩︎