The binary Roy model in graphs#
This notebook contains an illustration of how the binary (generalised) Roy model works. The figures are based on Figures 1 and 2 in Kline and Walters (2019).
The code is annotated below, it should be clear enough. Learning Python is not an objective of this course, but this is the most efficient way to illustrate the model. All formulas are simple enough so you can understand them without knowing Python.
First start with some imports.
from functools import partial
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from scipy.integrate import quad
from scipy.stats import norm
MTR functions for the Heckman selection model#
Define the marginal treatment response (MTR) functions when they take the form from the Heckman selection model.
Define the general functional form
Define the MTR functions for the untreated and the treated. The function
partial
supplies the numerical values for the parameters; the resulting function only takes the argumentu
.
def heckit(u, α, γ):
return α + γ * norm.ppf(u)
m_by_s_heckit = {
0: partial(heckit, α=0.8, γ=1.2),
1: partial(heckit, α=0.7, γ=0.7),
}
Define a function calculating the expected value of fun
on the interval p_low
to p_high
. fun
is expected to take exactly one argument, which corresponds to u
in the graph.
def calculate_expectation(fun, p_low, p_high):
assert p_low < p_high
return quad(fun, p_low, p_high)[0] / (p_high - p_low)
Data#
Define the probability to choose treatment (\(S=1\)) conditional on \(Z\). That is, for \(Z=0\), 10% of individuals choose treatment, for \(Z=1\), this fraction is 30%.
prob_s1_by_z = [0.1, 0.3]
Now define the observed values that would obtain when an infinite amount of data was generated from the Roy model with the above parametrisation of the Heckman selection model.
For each value of the treatment encouragement \(Z\), we calculate the expected value in the respective interval (a partition of \([0, 1]\), which is defined by \(P(S=1|Z)\)).
observed_values = pd.DataFrame.from_dict(
{
z: [
calculate_expectation(m_by_s_heckit[0], prob_s1_by_z[z], 1),
calculate_expectation(m_by_s_heckit[1], 0, prob_s1_by_z[z]),
]
for z in (0, 1)
},
)
observed_values.index.name = "s"
observed_values.columns.name = "z"
observed_values.loc["prob_s1"] = prob_s1_by_z
observed_values.round(2)
z | 0 | 1 |
---|---|---|
s | ||
0 | 1.03 | 1.40 |
1 | -0.53 | -0.11 |
prob_s1 | 0.10 | 0.30 |
Now add a couple of more data points. In particular, we can identify the mean outcomes of compliers conditional on \(S\). This will be helpful for the graphs below.
observed_values["compliers"] = pd.Series(
{
0: calculate_expectation(m_by_s_heckit[0], prob_s1_by_z[0], prob_s1_by_z[1]),
1: calculate_expectation(m_by_s_heckit[1], prob_s1_by_z[0], prob_s1_by_z[1]),
"prob_s1": (prob_s1_by_z[0] + prob_s1_by_z[1]) / 2,
},
)
observed_values.round(2)
z | 0 | 1 | compliers |
---|---|---|---|
s | |||
0 | 1.03 | 1.40 | -0.23 |
1 | -0.53 | -0.11 | 0.10 |
prob_s1 | 0.10 | 0.30 | 0.20 |
MTR functions for the linear selection model#
Define the general MTR function for a linear model. Use the demeaned form s.t. the constant term is the ATE.
Define a function that takes the observed data and returns the coefficients \(α_z, γ_z, z \in \{0, 1\}\), which would generate these data if we observed an infinite amount of data.
Calculate and display these coefficients.
def linear_fun(u, α, γ):
return α + γ * (u - 0.5)
def get_linear_α_γ_from_observed(obs):
out = {}
for s in 0, 1:
out[s] = {}
out[s]["γ"] = (obs.loc[s, 1] - obs.loc[s, 0]) / (
(obs.loc["prob_s1", 1] - obs.loc["prob_s1", 0]) / 2
)
out[s]["α"] = (obs.loc[s, 0] + out[s]["γ"] * 0.5) - out[s]["γ"] * (
1 - s + obs.loc["prob_s1", 0]
) / 2
return out
_α_γ = get_linear_α_γ_from_observed(observed_values)
_α_γ
{0: {'γ': 3.6204670557663747, 'α': 0.8529744231092482},
1: {'γ': 4.172055570565626, 'α': 1.348936683236413}}
Define the MTR functions for the untreated and the treated. The function partial
supplies the numerical values for the parameters; the resulting function only takes
the argument u
.
m_by_s_linear = {z: partial(linear_fun, α=_α_γ[z]["α"], γ=_α_γ[z]["γ"]) for z in (0, 1)}
Functions for plotting#
You can safely ignore this cell. It only contains helper functions for plotting, which you won’t need to call directly.
def update_layout(title):
return {
"template": "plotly_dark",
"width": 1125,
"height": 685,
"xaxis_title": "u",
"xaxis_range": [0, 1],
"yaxis_title": "Y",
"yaxis_range": [-4, 4],
"title": title,
"legend": {
"x": 0,
"y": 1,
"xanchor": "left",
"yanchor": "top",
"orientation": "h",
},
"paper_bgcolor": "black", # Set the paper background color to transparent
}
def one_point(p, y, z, s):
if z == "compliers":
name = f"compliers, s={s}"
symbol = "star"
else:
name = f"z={z}, s={s}"
if z == 0:
symbol = "x"
elif z == 1:
symbol = "circle"
return go.Scatter(
x=[p],
y=[y],
name=name,
mode="markers",
marker={
"color": "#87CEEB" if s else "red",
"symbol": symbol,
"size": 12,
},
)
Define a function that plots the observed averages for Z=0 and Z=1.
Define a function that plots transformations of the these observed averages
def plot_observed(obs, title):
fig = go.Figure()
for z in 0, 1:
for s in 0, 1:
if s == 0:
p = (1 + obs.loc["prob_s1", z]) / 2
left = obs.loc["prob_s1", z]
right = 1
elif s == 1:
p = obs.loc["prob_s1", z] / 2
left = 0
right = obs.loc["prob_s1", z]
fig.add_trace(one_point(p=p, y=obs.loc[s, z], z=z, s=s))
fig.add_shape(
type="rect",
x0=left,
x1=right,
y0=-3.5 - z * 0.5,
y1=-3.1 - z * 0.5,
line={"color": "rgba(0, 0, 0, 0)"},
fillcolor="#87CEEB" if s else "red",
opacity=0.4,
label={"text": f"z={z}, s={s}"},
showlegend=True,
)
fig.update_layout(update_layout(title=title))
return fig
def plot_late(obs, title):
fig = go.Figure()
for z in obs.columns:
for s in 0, 1:
if z == 0 and s == 0 or z == 1 and s == 1:
continue
if s == 0:
p = (1 + obs.loc["prob_s1", z]) / 2
left = obs.loc["prob_s1", z]
right = 1
elif s == 1:
p = obs.loc["prob_s1", z] / 2
left = 0
right = obs.loc["prob_s1", z]
# Override for compliers
if z == "compliers":
p = obs.loc["prob_s1", z]
left = obs.loc["prob_s1", 0]
right = obs.loc["prob_s1", 1]
for _zz in 0, 1:
if _zz == 0 and s == 0 or _zz == 1 and s == 1:
continue
fig.add_shape(
type="rect",
x0=left,
x1=right,
y0=-3.5 - _zz * 0.5,
y1=-3.1 - _zz * 0.5,
line={"color": "rgba(0, 0, 0, 0)"},
fillcolor="red" if s else "#87CEEB",
opacity=0.4,
label={"text": "compliers"},
showlegend=True,
)
else:
fig.add_shape(
type="rect",
x0=left,
x1=right,
y0=-3.5 - z * 0.5,
y1=-3.1 - z * 0.5,
line={"color": "rgba(0, 0, 0, 0)"},
fillcolor="#87CEEB" if s else "red",
opacity=0.4,
label={"text": f"z={z}, s={s}"},
showlegend=True,
)
fig.add_trace(one_point(p=p, y=obs.loc[s, z], z=z, s=s))
fig.update_layout(update_layout(title=title))
return fig
Plot the observed values#
fig = plot_observed(obs=observed_values, title="Observed means conditional on Z, S")
fig.show()
These are the data that we would observe if we had an infinitely large sample (probabilities first, then from left to right):
\(P(S=1|Z=0)\)
\(P(S=1|Z=1)\)
\(E[Y|S=1,Z=0]\)
\(E[Y|S=1,Z=1]\)
\(E[Y|S=0,Z=0]\)
\(E[Y|S=0,Z=1]\)
Note that both models are constructed to be observationally equivalent, so either the Heckman selection model or the linear selection model with the chosen parametrisations could have generated these data.
Plot the transformed observed quantities#
Instead of the averages in the previous graph, plot
\(E[Y(S=1)|P(S=1|Z=0) \leq u < P(S=1|Z=1)]\) (expected value of \(Y\) for compliers if they take up treatment)
\(E[Y(S=0)|P(S=1|Z=0) \leq u < P(S=1|Z=1)]\) (expected value of \(Y\) for compliers if they do not take up treatment)
\(E[Y(S=1)|u < P(S=1|Z=0)]\) (expected value of \(Y\) for always-takers if they take up treatment)
\(E[Y(S=0)|P(S=1|Z=0) \leq u]\) (expected value of \(Y\) for never-takers if they do not take up treatment)
These are the values shown in Figures 1 and 2 of Kline and Walters (2019).
fig = plot_late(
obs=observed_values,
title="Observed values, transformed",
)
fig.show()
Add the MTR functions#
Now add the the four MTR functions (\(S \in \{0, 1\} \times \{\text{heckit}, \text{linear}\}\)) to the previous graph.
fig = plot_late(
obs=observed_values,
title="Identified and unidentified quantities through the lens of the Roy model",
)
u_grid = np.linspace(0, 1, 101)
fig.add_traces(
[
go.Scatter(
x=u_grid,
y=m_by_s_linear[s](u_grid),
mode="lines",
name=f"s={s}, linear",
line={"color": "#87CEEB" if s else "red"},
opacity=0.5,
)
for s in (0, 1)
]
+ [
go.Scatter(
x=u_grid,
y=np.round(m_by_s_heckit[s](u_grid), 3),
mode="lines",
name=f"s={s}, heckit",
line={"color": "#87CEEB" if s else "red", "dash": "dash"},
opacity=0.8,
)
for s in (0, 1)
],
)
fig.show()
To reiterate: Both Heckman and linear selection models would generate exactly the same “observed” data. However, the MTR functions are very different. If we did not know the truth, we would not know which one to pick. Of course, there is an infinite number of other possibilities (quadratic, cubic, exponential, …).
Note that all values shown are expectations, i.e. the values of the integrals of the MTR functions on the respective intervals. For the linear function, they are exactly on the function. For the Heckman selection model, they are not because of Jensen’s inequality.