Causal Discovery example

The goal of this notebook is to show how causal discovery methods can work with DoWhy. We use discovery methods from Causal Discovery Tool (CDT) repo. As we will see, causal discovery methods are not fool-proof and there is no guarantee that they will recover the correct causal graph. Even for the simple examples below, there is a large variance in results. These methods, however, may be combined usefully with domain knowledge to construct the final causal graph.

[1]:
import dowhy
from dowhy import CausalModel

import numpy as np
import pandas as pd
import graphviz
import networkx as nx

np.set_printoptions(precision=3, suppress=True)
np.random.seed(0)

Utility function

We define a utility function to draw the directed acyclic graph.

[2]:
def make_graph(adjacency_matrix, labels=None):
    idx = np.abs(adjacency_matrix) > 0.01
    dirs = np.where(idx)
    d = graphviz.Digraph(engine='dot')
    names = labels if labels else [f'x{i}' for i in range(len(adjacency_matrix))]
    for name in names:
        d.node(name)
    for to, from_, coef in zip(dirs[0], dirs[1], adjacency_matrix[idx]):
        d.edge(names[from_], names[to], label=str(coef))
    return d

def str_to_dot(string):
    '''
    Converts input string from graphviz library to valid DOT graph format.
    '''
    graph = string.strip().replace('\n', ';').replace('\t','')
    graph = graph[:9] + graph[10:-2] + graph[-1] # Removing unnecessary characters from string
    return graph

Experiments on the Auto-MPG dataset

In this section, we will use a dataset on the technical specification of cars. The dataset is downloaded from UCI Machine Learning Repository. The dataset contains 9 attributes and 398 instances. We do not know the true causal graph for the dataset and will use CDT to discover it. The causal graph obtained will then be used to estimate the causal effect.

1. Load the data

[3]:
data_mpg = pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data-original',
                   delim_whitespace=True, header=None,
                   names = ['mpg', 'cylinders', 'displacement',
                            'horsepower', 'weight', 'acceleration',
                            'model year', 'origin', 'car name'])
data_mpg.dropna(inplace=True)
data_mpg.drop(['model year', 'origin', 'car name'], axis=1, inplace=True)
print(data_mpg.shape)
data_mpg.head()
(392, 6)
[3]:
mpg cylinders displacement horsepower weight acceleration
0 18.0 8.0 307.0 130.0 3504.0 12.0
1 15.0 8.0 350.0 165.0 3693.0 11.5
2 18.0 8.0 318.0 150.0 3436.0 11.0
3 16.0 8.0 304.0 150.0 3433.0 12.0
4 17.0 8.0 302.0 140.0 3449.0 10.5

Causal Discovery with Causal Discovery Tool (CDT)

We use the CDT library to perform causal discovery on the Auto-MPG dataset. We use three methods for causal discovery here -LiNGAM, PC and GES. These methods are widely used and do not take much time to run. Hence, these are ideal for an introduction to the topic. Other neural network based methods are also available in CDT and the users are encouraged to try them out by themselves.

The documentation for the methods used are as follows: - LiNGAM [link] - PC [link] - GES [link]

[4]:
from cdt.causality.graph import LiNGAM, PC, GES

graphs = {}
labels = [f'{col}' for i, col in enumerate(data_mpg.columns)]
functions = {
    'LiNGAM' : LiNGAM,
    'PC' : PC,
    'GES' : GES,
}

for method, lib in functions.items():
    obj = lib()
    output = obj.predict(data_mpg)
    adj_matrix = nx.to_numpy_matrix(output)
    adj_matrix = np.asarray(adj_matrix)
    graph_dot = make_graph(adj_matrix, labels)
    graphs[method] = graph_dot

# Visualize graphs
for method, graph in graphs.items():
    print("Method : %s"%(method))
    display(graph)
No GPU automatically detected. Setting SETTINGS.GPU to 0, and SETTINGS.NJOBS to cpu_count.
Method : LiNGAM
../_images/example_notebooks_dowhy_causal_discovery_example_8_2.svg
Method : PC
../_images/example_notebooks_dowhy_causal_discovery_example_8_4.svg
Method : GES
../_images/example_notebooks_dowhy_causal_discovery_example_8_6.svg

As you can see, no two methods agree on the graphs. PC and GES effectively produce an undirected graph whereas LiNGAM produces a directed graph. We use only the LiNGAM method in the next section.

Estimate causal effects using Linear Regression

Now let us see whether these differences in the graphs also lead to significant differences in the causal estimate of effect of mpg on weight.

[5]:
for method, graph in graphs.items():
        if method != "LiNGAM":
            continue
        print('\n*****************************************************************************\n')
        print("Causal Discovery Method : %s"%(method))

        # Obtain valid dot format
        graph_dot = str_to_dot(graph.source)

        # Define Causal Model
        model=CausalModel(
                data = data_mpg,
                treatment='mpg',
                outcome='weight',
                graph=graph_dot)

        # Identification
        identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
        print(identified_estimand)

        # Estimation
        estimate = model.estimate_effect(identified_estimand,
                                        method_name="backdoor.linear_regression",
                                        control_value=0,
                                        treatment_value=1,
                                        confidence_intervals=True,
                                        test_significance=True)
        print("Causal Estimate is " + str(estimate.value))

*****************************************************************************

Causal Discovery Method : LiNGAM
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
──────(E[weight|cylinders])
d[mpg]
Estimand assumption 1, Unconfoundedness: If U→{mpg} and U→weight then P(weight|mpg,cylinders,U) = P(weight|mpg,cylinders)

### Estimand : 2
Estimand name: iv
No such variable(s) found!

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

Causal Estimate is -38.940973656209735

As mentioned earlier, due to the absence of directed edges, no backdoor, instrmental or frontdoor variables can be found out for PC and GES. Thus, causal effect estimation is not possible for these methods. However, LiNGAM does discover a DAG and hence, its possible to output a causal estimate for LiNGAM. The estimate is still pretty far from the original estimate of -70.466 (which can be calculated from the graph).

Experiments on the Sachs dataset

The dataset consists of the simultaneous measurements of 11 phosphorylated proteins and phospholipids derived from thousands of individual primary immune system cells, subjected to both general and specific molecular interventions (Sachs et al., 2005).

The specifications of the dataset are as follows - - Number of nodes: 11 - Number of arcs: 17 - Number of parameters: 178 - Average Markov blanket size: 3.09 - Average degree: 3.09 - Maximum in-degree: 3 - Number of instances: 7466

The original causal graph is known for the Sachs dataset and we compare the original graph with the ones discovered using CDT in this section.

1. Load the data

[6]:
from cdt.data import load_dataset
data_sachs, graph_sachs = load_dataset("sachs")

data_sachs.dropna(inplace=True)
print(data_sachs.shape)
data_sachs.head()
(7466, 11)
[6]:
praf pmek plcg PIP2 PIP3 p44/42 pakts473 PKA PKC P38 pjnk
0 26.4 13.2 8.82 18.30 58.80 6.61 17.0 414.0 17.00 44.9 40.0
1 35.9 16.5 12.30 16.80 8.13 18.60 32.5 352.0 3.37 16.5 61.5
2 59.4 44.1 14.60 10.20 13.00 14.90 32.5 403.0 11.40 31.9 19.5
3 73.0 82.8 23.10 13.50 1.29 5.83 11.8 528.0 13.70 28.6 23.1
4 33.7 19.8 5.19 9.73 24.80 21.10 46.1 305.0 4.66 25.7 81.3

Ground truth of the causal graph

[7]:
labels = [f'{col}' for i, col in enumerate(data_sachs.columns)]
adj_matrix = nx.to_numpy_matrix(graph_sachs)
adj_matrix = np.asarray(adj_matrix)
graph_dot = make_graph(adj_matrix, labels)
display(graph_dot)
../_images/example_notebooks_dowhy_causal_discovery_example_17_0.svg

Causal Discovery with Causal Discovery Tool (CDT)

We use the CDT library to perform causal discovery on the Auto-MPG dataset. We use three methods for causal discovery here -LiNGAM, PC and GES. These methods are widely used and do not take much time to run. Hence, these are ideal for an introduction to the topic. Other neural network based methods are also available in CDT and the users the encourages to try them out by themselves.

The documentation for the methods used in as follows: - LiNGAM [link] - PC [link] - GES [link]

[8]:
from cdt.causality.graph import LiNGAM, PC, GES

graphs = {}
graphs_nx = {}
labels = [f'{col}' for i, col in enumerate(data_sachs.columns)]
functions = {
    'LiNGAM' : LiNGAM,
    'PC' : PC,
    'GES' : GES,
}

for method, lib in functions.items():
    obj = lib()
    output = obj.predict(data_sachs)
    graphs_nx[method] = output
    adj_matrix = nx.to_numpy_matrix(output)
    adj_matrix = np.asarray(adj_matrix)
    graph_dot = make_graph(adj_matrix, labels)
    graphs[method] = graph_dot

# Visualize graphs
for method, graph in graphs.items():
    print("Method : %s"%(method))
    display(graph)
Method : LiNGAM
../_images/example_notebooks_dowhy_causal_discovery_example_19_1.svg
Method : PC
../_images/example_notebooks_dowhy_causal_discovery_example_19_3.svg
Method : GES
../_images/example_notebooks_dowhy_causal_discovery_example_19_5.svg

As you can see, no two methods agree on the graphs. Next we study the causal effects of these different graphs

Estimate effects using Linear Regression

Now let us see whether these differences in the graphs also lead to significant differences in the causal estimate of effect of PIP2 on PKC.

[9]:
for method, graph in graphs.items():
        if method != "LiNGAM":
            continue
        print('\n*****************************************************************************\n')
        print("Causal Discovery Method : %s"%(method))

        # Obtain valid dot format
        graph_dot = str_to_dot(graph.source)

        # Define Causal Model
        model=CausalModel(
                data = data_sachs,
                treatment='PIP2',
                outcome='PKC',
                graph=graph_dot)

        # Identification
        identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
        print(identified_estimand)

        # Estimation
        estimate = model.estimate_effect(identified_estimand,
                                        method_name="backdoor.linear_regression",
                                        control_value=0,
                                        treatment_value=1,
                                        confidence_intervals=True,
                                        test_significance=True)
        print("Causal Estimate is " + str(estimate.value))

*****************************************************************************

Causal Discovery Method : LiNGAM
No directed path from ['PIP2'] to ['PKC'] in the causal graph.
Causal effect is zero.
Causal Estimate is 0

From the causal estimates obtained, it can be seen that the three estimates differ in different aspects. The graph obtained using LiNGAM contains a backdoor path and instrumental variables. On the other hand, the graph obtained using PC contains a backdoor path and a frontdoor path. However, despite these differences, both obtain the same mean causal estimate.

The graph obtained using GES contains only a backdoor path with different backdoor variables and obtains a different causal estimate than the first two cases.

Graph Validation

We compare the graphs obtained with the true causal graph using the causal discovery methods using 2 graph distance metrics - Structural Hamming Distance (SHD) and Structural Intervention Distance (SID). SHD between two graphs is, in simple terms, the number of edge insertions, deletions or flips in order to transform one graph to another graph. SID, on the other hand, is based on a graphical criterion only and quantifies the closeness between two DAGs in terms of their corresponding causal inference statements.

[10]:
from cdt.metrics import SHD, SHD_CPDAG, SID, SID_CPDAG
from numpy.random import randint

for method, graph in graphs_nx.items():
    print("***********************************************************")
    print("Method: %s"%(method))
    tar, pred = graph_sachs, graph
    print("SHD_CPDAG = %f"%(SHD_CPDAG(tar, pred)))
    print("SHD = %f"%(SHD(tar, pred, double_for_anticausal=False)))
    print("SID_CPDAG = [%f, %f]"%(SID_CPDAG(tar, pred)))
    print("SID = %f"%(SID(tar, pred)))
***********************************************************
Method: LiNGAM
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 17.000000
SHD = 18.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [80.000000, 80.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 80.000000
***********************************************************
Method: PC
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 29.000000
SHD = 27.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [82.000000, 82.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 82.000000
***********************************************************
Method: GES
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 34.000000
SHD = 31.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [82.000000, 88.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 91.000000

The graph similarity metrics show that the scores are the lowest for the LiNGAM method of graph extraction. Hence, of the three methods used, LiNGAM provides the graph that is most similar to the original graph.

Graph Refutation

Here, we use the same SHD and SID metric to find out how different the discovered graph are from each other.

[11]:
import itertools
from numpy.random import randint
from cdt.metrics import SHD, SHD_CPDAG, SID, SID_CPDAG

# Find combinations of pair of methods to compare
combinations = list(itertools.combinations(graphs_nx, 2))

for pair in combinations:
    print("***********************************************************")
    graph1 = graphs_nx[pair[0]]
    graph2 = graphs_nx[pair[1]]
    print("Methods: %s and %s"%(pair[0], pair[1]))
    print("SHD_CPDAG = %f"%(SHD_CPDAG(graph1, graph2)))
    print("SHD = %f"%(SHD(graph1, graph2, double_for_anticausal=False)))
    print("SID_CPDAG = [%f, %f]"%(SID_CPDAG(graph1, graph2)))
    print("SID = %f"%(SID(graph1, graph2)))
***********************************************************
Methods: LiNGAM and PC
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 25.000000
SHD = 24.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [13.000000, 13.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 13.000000
***********************************************************
Methods: LiNGAM and GES
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 33.000000
SHD = 33.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [10.000000, 13.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 13.000000
***********************************************************
Methods: PC and GES
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SHD_CPDAG = 22.000000
SHD = 22.000000
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID_CPDAG = [70.000000, 78.000000]
adjacency_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0.
SID = 78.000000

The values for the metrics show how different the graphs are from each other. A higher distance value implies that the difference between the graphs is more.

[ ]: