Simple example on using Instrumental Variables method for estimation
[1]:
import numpy as np
import pandas as pd
import patsy as ps
from statsmodels.sandbox.regression.gmm import IV2SLS
import os, sys
sys.path.append(os.path.abspath("../../../"))
from dowhy import CausalModel
[2]:
n_points = 1000
education_abilty = 1
education_voucher = 0.5
income_abilty = 2
income_education = 4
# confounder
ability = np.random.normal(0, 3, size=n_points)
# instrument
voucher = np.random.normal(2, 1, size=n_points)
# treatment
education = np.random.normal(5, 1, size=n_points) + education_abilty * ability +\
education_voucher * voucher
# outcome
income = np.random.normal(10, 3, size=n_points) +\
income_abilty * ability + income_education * education
# build dataset
data = np.stack([ability, education, income, voucher]).T
df = pd.DataFrame(data, columns = ['ability', 'education', 'income', 'voucher'])
[3]:
income_vec, endog = ps.dmatrices("income ~ education", data=df)
exog = ps.dmatrix("voucher", data=df)
m = IV2SLS(income_vec, endog, exog).fit()
m.summary()
[3]:
Dep. Variable: | income | R-squared: | 0.899 |
---|---|---|---|
Model: | IV2SLS | Adj. R-squared: | 0.899 |
Method: | Two Stage | F-statistic: | 160.6 |
Least Squares | Prob (F-statistic): | 3.05e-34 | |
Date: | Tue, 07 Jan 2020 | ||
Time: | 14:32:06 | ||
No. Observations: | 1000 | ||
Df Residuals: | 998 | ||
Df Model: | 1 |
coef | std err | t | P>|t| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
Intercept | 8.3670 | 1.987 | 4.211 | 0.000 | 4.468 | 12.266 |
education | 4.2607 | 0.336 | 12.674 | 0.000 | 3.601 | 4.920 |
Omnibus: | 0.871 | Durbin-Watson: | 2.058 |
---|---|---|---|
Prob(Omnibus): | 0.647 | Jarque-Bera (JB): | 0.953 |
Skew: | 0.059 | Prob(JB): | 0.621 |
Kurtosis: | 2.904 | Cond. No. | 14.3 |
[4]:
model=CausalModel(
data = df,
treatment='education',
outcome='income',
common_causes=['ability'],
instruments=['voucher']
)
identified_estimand = model.identify_effect()
estimate = model.estimate_effect(identified_estimand,
method_name="iv.instrumental_variable", test_significance=True
)
print(estimate)
WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['education'] on outcome ['income']
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'ability']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['voucher']
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:
Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou
-1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income
*** Causal Estimate ***
## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
d
────────────(Expectation(income|ability))
d[education]
Estimand assumption 1, Unconfoundedness: If U→{education} and U→income then P(income|education,ability,U) = P(income|education,ability)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:
Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou
-1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income
## Estimate
Value: 4.2606685045720365
## Statistical Significance
p-value: <0.001