Simple example on using Instrumental Variables method for estimation

[1]:

import numpy as np
import pandas as pd
import patsy as ps

from statsmodels.sandbox.regression.gmm import IV2SLS
import os, sys
sys.path.append(os.path.abspath("../../../"))
from dowhy import CausalModel

[2]:

n_points = 1000
education_abilty = 1
education_voucher = 0.5
income_abilty = 2
income_education = 4


# confounder
ability = np.random.normal(0, 3, size=n_points)

# instrument
voucher = np.random.normal(2, 1, size=n_points)

# treatment
education = np.random.normal(5, 1, size=n_points) + education_abilty * ability +\
            education_voucher * voucher

# outcome
income = np.random.normal(10, 3, size=n_points) +\
         income_abilty * ability + income_education * education

# build dataset
data = np.stack([ability, education, income, voucher]).T
df = pd.DataFrame(data, columns = ['ability', 'education', 'income', 'voucher'])

[3]:

income_vec, endog = ps.dmatrices("income ~ education", data=df)
exog = ps.dmatrix("voucher", data=df)

m = IV2SLS(income_vec, endog, exog).fit()
m.summary()

[3]:

IV2SLS Regression Results
Dep. Variable:	income	R-squared:	0.899
Model:	IV2SLS	Adj. R-squared:	0.899
Method:	Two Stage	F-statistic:	160.6
	Least Squares	Prob (F-statistic):	3.05e-34
Date:	Tue, 07 Jan 2020
Time:	14:32:06
No. Observations:	1000
Df Residuals:	998
Df Model:	1

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	8.3670	1.987	4.211	0.000	4.468	12.266
education	4.2607	0.336	12.674	0.000	3.601	4.920

Omnibus:	0.871	Durbin-Watson:	2.058
Prob(Omnibus):	0.647	Jarque-Bera (JB):	0.953
Skew:	0.059	Prob(JB):	0.621
Kurtosis:	2.904	Cond. No.	14.3

[4]:

model=CausalModel(
        data = df,
        treatment='education',
        outcome='income',
        common_causes=['ability'],
        instruments=['voucher']
        )

identified_estimand = model.identify_effect()

estimate = model.estimate_effect(identified_estimand,
        method_name="iv.instrumental_variable", test_significance=True
)
print(estimate)

WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['education'] on outcome ['income']
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'ability']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.

WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y

INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['voucher']
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:

Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou

      -1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income

*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d
────────────(Expectation(income|ability))
d[education]
Estimand assumption 1, Unconfoundedness: If U→{education} and U→income then P(income|education,ability,U) = P(income|education,ability)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)

## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:

Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou

      -1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income

## Estimate
Value: 4.2606685045720365

## Statistical Significance
p-value: <0.001