In-class exercise#
Note
Please respect the time limits. Of course you do not need to set a timer, but it is important that you make it to the very end.
Be prepared to present/discuss each of the following questions in class. You should designate group members for each of the subtasks (e.g., task 1 / example 1, task 2 / question 1, etc.) such that everybody is prepared to take a lead on at least one of these subtasks.
Task 1: Heckman selection formula (30min)#
Think of the transformation of the selection equation
that leads to expressing selection in terms of propensity scores. Explain intuitively or formally why assuming normally distributed unobservables \(\eta(0, ω), \eta(1, ω)\)
leads to the Heckman selection model with the inverse Mill’s ratio controlling for selection bias.
Task 2: Graphical representation of selection and outcome equations (70min)#
2.1 Policy-relevant treatment effect (PRTE)#
Assume \(Z\) is an actual policy. Explain the PRTE for moving from \(Z=0\) to \(Z=1\) in the graph.
Hint: You can read the answer almost immediately off the graph.
Now assume there is another policy, denoted by \(Z=2\), which increases treatment take-up to 40%. Calculate the PRTE for moving from \(Z=1\) to \(Z=2\) for both the Heckman and the linear selection models. Explain the difference between the two values. Do you think it is large or small?
Hint: This is the case of an “Additive PRTE”.
Try to find one example each where using the Roy model for such an exercise (observing data on \(Z=0\) and \(Z=1\) with all assumptions on the instrument being fulfilled and extrapolating to \(Z=2\)) seems reasonable and where it seems unreasonable. Explain your reasoning.
2.2 Selection on levels#
Sketch how the average selection bias (see Table 1 in Mogstad et al.) is represented in the graphs when you only observe data with \(P(S=1) = 0.4\) (i.e., forget about \(Z\)).
Calculate it for the two Roy models in the notebook. Explain the difference between the two values. Do you think it is large or small?
2.3 Selection on gains#
Sketch how the average selection on the gain (see Table 1 in Mogstad et al.) is represented in the graphs when you only observe data with \(P(S=1) = 0.4\) (i.e., forget about \(Z\)).
Calculate it for the two Roy models in the notebook. Explain the difference between the two values. Do you think it is large or small?
Note: You may find it helpful to look at the LaLonde (1986) formulation as written down in the very beginning of Section 7 (Equivalence Failures) of Kline and Walters, which rules out selection on gains.
2.4 An own example#
Come up with an example (binary policy and binary treatment) where you would find the Roy model useful. What would you expect regarding selection on levels and on gains? Add a graph with a parametrisation of the Heckit or the linear function, which would imply this type of selection for some value of the instrument.