DoWhy: Different estimation methods for causal inference
This is a quick introduction to the DoWhy causal inference library. We will load in a sample dataset and use different methods for estimating the causal effect of a (pre-specified)treatment variable on a (pre-specified) outcome variable.
First, let us add the required path for Python to find the DoWhy code and load all required packages
[1]:
import os, sys
sys.path.append(os.path.abspath("../../"))
[2]:
import numpy as np
import pandas as pd
import logging
import dowhy
from dowhy.do_why import CausalModel
import dowhy.datasets
Now, let us load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome.
Beta is the true causal effect.
[3]:
data = dowhy.datasets.linear_dataset(beta=10,
num_common_causes=5,
num_instruments = 2,
num_samples=10000,
treatment_is_binary=True)
df = data["df"]
Note that we are using a pandas dataframe to load the data.
Identifying the causal estimand
We now input a causal graph in the DOT graph format.
[4]:
# With graph
model=CausalModel(
data = df,
treatment=data["treatment_name"],
outcome=data["outcome_name"],
graph=data["gml_graph"],
instruments=data["instrument_names"],
logging_level = logging.INFO
)
INFO:dowhy.do_why:Model to find the causal effect of treatment ['v'] on outcome ['y']
[5]:
model.view_model()
[6]:
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
We get a causal graph. Now identification and estimation is done.
[7]:
identified_estimand = model.identify_effect()
print(identified_estimand)
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['X2', 'Z0', 'X3', 'X4', 'X1', 'X0', 'Z1', 'Unobserved Confounders']
WARNING:dowhy.causal_identifier:There are unobserved common causes. Causal effect cannot be identified.
WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z1', 'Z0']
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
Method 1: Regression
Use linear regression.
[8]:
causal_estimate_reg = model.estimate_effect(identified_estimand,
method_name="backdoor.linear_regression",
test_significance=True)
print(causal_estimate_reg)
print("Causal Estimate is " + str(causal_estimate_reg.value))
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 9.999999999999995
## Statistical Significance
p-value: <0.001
Causal Estimate is 10.0
Method 2: Stratification
We will be using propensity scores to stratify units in the data.
[9]:
causal_estimate_strat = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_stratification")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))
INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 10.061716788484345
Causal Estimate is 10.0617167885
Method 3: Matching
We will be using propensity scores to match units in the data.
[10]:
causal_estimate_match = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_matching")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))
INFO:dowhy.causal_estimator:INFO: Using Propensity Score Matching Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 7.391127134286239
Causal Estimate is 7.391127134286239
Method 4: Weighting
We will be using (inverse) propensity scores to assign weights to units in the data.
[11]:
causal_estimate_ipw = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_weighting")
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))
INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 17.631870964903452
Causal Estimate is 17.6318709649
Method 5: Instrumental Variable
We will be using the Wald estimator for the provided instrumental variable.
[12]:
causal_estimate_iv = model.estimate_effect(identified_estimand,
method_name="iv.instrumental_variable", method_params={'iv_instrument_name':'Z1'})
print(causal_estimate_iv)
print("Causal Estimate is " + str(causal_estimate_iv.value))
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
-1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment v isaffected in the same way by common causes of v and y
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y isaffected in the same way by common causes of v and y
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
-1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment v isaffected in the same way by common causes of v and y
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y isaffected in the same way by common causes of v and y
## Estimate
Value: 12.992472396947559
Causal Estimate is 12.9924723969
Method 6: Regression Discontinuity
We will be internally converting this to an equivalent instrumental variables problem.
[13]:
causal_estimate_regdist = model.estimate_effect(identified_estimand,
method_name="iv.regression_discontinuity",
method_params={'rd_variable_name':'Z1',
'rd_threshold_value':0.5,
'rd_bandwidth': 0.1})
print(causal_estimate_regdist)
print("Causal Estimate is " + str(causal_estimate_regdist.value))
INFO:dowhy.causal_estimator:Using Regression Discontinuity Estimator
INFO:dowhy.causal_estimator:
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
-1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment local_treatment isaffected in the same way by common causes of local_treatment and local_outcome
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome local_outcome isaffected in the same way by common causes of local_treatment and local_outcome
*** Causal Estimate ***
## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)
## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
-1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment local_treatment isaffected in the same way by common causes of local_treatment and local_outcome
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome local_outcome isaffected in the same way by common causes of local_treatment and local_outcome
## Estimate
Value: 12.84020418877542
Causal Estimate is 12.8402041888