dowhy package

Subpackages

Submodules

dowhy.causal_estimator module

class dowhy.causal_estimator.CausalEstimate(estimate, target_estimand, realized_estimand_expr, **kwargs)[source]

Bases: object

TODO.

add_params(**kwargs)[source]

add_significance_test_results(test_results)[source]

class dowhy.causal_estimator.CausalEstimator(data, identified_estimand, treatment, outcome, test_significance, params=None)[source]

Bases: object

Base class for an estimator of causal effect.

Initializes an estimator with data and names of relevant variables.

More description.

Parameters

data – data frame containing the data
identified_estimand – probability expression representing the target identified estimand to estimate.
treatment – name of the treatment variable
outcome – name of the outcome variable
params – (optional) additional method parameters

Returns

an instance of the estimator class.

do(x)[source]

TODO.

More description.

Parameters: arg1 –
Returns

estimate_effect()[source]

TODO.

More description.

Parameters: self – object instance of class Estimator
Returns: point estimate of causal effect

test_significance(estimate, num_simulations=1000)[source]

Test statistical significance of obtained estimate.

Uses resampling to create a non-parametric significance test. A general procedure. Individual estimators can override this method.

Parameters

self – object instance of class Estimator
estimate – obtained estimate

Returns

class dowhy.causal_estimator.RealizedEstimand(identified_estimand, estimator_name)[source]

Bases: object

update_assumptions(estimator_assumptions)[source]

update_estimand_expression(estimand_expression)[source]

dowhy.causal_graph module

class dowhy.causal_graph.CausalGraph(treatment_name, outcome_name, graph=None, common_cause_names=None, instrument_names=None, observed_node_names=None)[source]

Bases: object

add_node_attributes(observed_node_names)[source]

add_unobserved_common_cause(observed_node_names)[source]

all_observed(node_names)[source]

build_graph(common_cause_names, instrument_names)[source]

do_surgery(node_names, remove_outgoing_edges=False, remove_incoming_edges=False)[source]

filter_unobserved_variables(node_names)[source]

get_ancestors(node_name, new_graph=None)[source]

get_causes(nodes, remove_edges=None)[source]

get_common_causes(nodes1, nodes2)[source]

get_descendants(node_name)[source]

get_instruments(treatment_nodes, outcome_nodes)[source]

get_parents(node_name)[source]

get_unconfounded_observed_subgraph()[source]

view_graph(layout='dot')[source]

dowhy.causal_identifier module

class dowhy.causal_identifier.CausalIdentifier(graph, estimand_type, proceed_when_unidentifiable=False)[source]

Bases: object

construct_backdoor_estimand(estimand_type, treatment_name, outcome_name, common_causes)[source]

construct_iv_estimand(estimand_type, treatment_name, outcome_name, instrument_names)[source]

identify_effect()[source]

class dowhy.causal_identifier.IdentifiedEstimand(treatment_variable, outcome_variable, estimand_type=None, estimands=None, backdoor_variables=None, instrumental_variables=None)[source]

Bases: object

set_identifier_method(identifier_name)[source]

dowhy.causal_refuter module

class dowhy.causal_refuter.CausalRefutation(estimated_effect, new_effect, refutation_type)[source]: Bases: object

class dowhy.causal_refuter.CausalRefuter(data, identified_estimand, estimate, **kwargs)[source]: Bases: object

dowhy.data_transformer module

class dowhy.data_transformer.DimensionalityReducer(data_array, ndims, **kwargs)[source]

Bases: object

reduce(target_dimensions=None)[source]

dowhy.datasets module

dowhy.datasets.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the choice method of a default_rng() instance instead; please see the random-quick-start.

a1-D array-like or int: If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)
sizeint or tuple of ints, optional: Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.
replaceboolean, optional: Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.
p1-D array-like, optional: The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

samplessingle item or ndarray: The generated random samples

ValueError: If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

randint, shuffle, permutation random.Generator.choice: which should be used in new code

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')

dowhy.datasets.linear_dataset(beta, num_common_causes, num_samples, num_instruments=0, treatment_is_binary=True)[source]

dowhy.datasets.sigmoid(x)[source]

dowhy.datasets.stochastically_convert_to_binary(x)[source]

dowhy.datasets.xy_dataset(num_samples, effect=True, sd_error=1)[source]

dowhy.do_sampler module

class dowhy.do_sampler.DoSampler(data, params=None, variable_types=None, num_cores=1, causal_model=None, keep_original_treatment=False)[source]

Bases: object

Base class for a sampler from the interventional distribution.

Initializes a do sampler with data and names of relevant variables.

Do sampling implements the do() operation from Pearl (2000). This is an operation is defined on a causal bayesian network, an explicit implementation of which is the basis for the MCMC sampling method.

We abstract the idea behind the three-step process to allow other methods, as well. The disrupt_causes method is the means to make treatment assignment ignorable. In the Pearlian framework, this is where we cut the edges pointing into the causal state. With other methods, this will typically be by using some approach which assumes conditional ignorability (e.g. weighting, or explicit conditioning with Robins G-formula.)

Next, the make_treatment_effective method reflects the assumption that the intervention we impose is “effective”. Most simply, we fix the causal state to some specific value. We skip this step there is no value specified for the causal state, and the original values are used instead.

Finally, we sample from the resulting distribution. This can be either from a point_sample method, in the case that the inference method doesn’t support batch sampling, or the sample method in the case that it does. For convenience, the point_sample method parallelizes with multiprocessing using the num_cores kwargs to set the number of cores to use for parallelization.

While different methods will have their own class attributes, the _df method should be common to all methods. This is them temporary dataset which starts as a copy of the original data, and is modified to reflect the steps of the do operation. Read through the existing methods (weighting is likely the most minimal) to get an idea of how this works to implement one yourself.

Parameters

data – pandas.DataFrame containing the data
identified_estimand – dowhy.causal_identifier.IdentifiedEstimand: and estimand using a backdoor method

for effect identification. :param treatments: list or str: names of the treatment variables :param outcomes: list or str: names of the outcome variables :param variable_types: dict: A dictionary containing the variable’s names and types. ‘c’ for continuous, ‘o’ for ordered, ‘d’ for discrete, and ‘u’ for unordered discrete. :param keep_original_treatment: bool: Whether to use make_treatment_effective, or to keep the original treatment assignments. :param params: (optional) additional method parameters

disrupt_causes()[source]: Override this method to render treatment assignment conditionally ignorable :return:

do_sample(x)[source]

make_treatment_effective(x)[source]: This is more likely the implementation you’d like to use, but some methods may require overriding this method to make the treatment effective. :param x: :return:

point_sample()[source]

reset()[source]: If your DoSampler has more attributes that the _df attribute, you should reset them all to their initialization values by overriding this method. :return:

sample()[source]: By default, this expects a sampler to be built on class initialization which contains a sample method. Override this method if you want to use a different approach to sampling. :return:

dowhy.do_why module

Module containing the main model class for the dowhy package.

class dowhy.do_why.CausalModel(data, treatment, outcome, graph=None, common_causes=None, instruments=None, estimand_type='ate', proceed_when_unidentifiable=False, **kwargs)[source]

Bases: object

Main class for storing the causal model state.

Initialize data and create a causal graph instance.

Assigns treatment and outcome variables. Also checks and finds the common causes and instruments for treatment and outcome.

At least one of graph, common_causes or instruments must be provided.

Parameters: data – a pandas dataframe containing treatment, outcome and other

variables. :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param graph: path to DOT file containing a DAG or a string containing a DAG specification in DOT format :param common_causes: names of common causes of treatment and _outcome :param instruments: names of instrumental variables for the effect of treatment on outcome :returns: an instance of CausalModel class

do(x, identified_estimand, method_name=None, method_params=None)[source]

Estimate the identified causal effect.

If method_name is provided, uses the provided method. Else, finds a suitable method to be used.

Parameters

identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method
method_name – (optional) name of the estimation method to be used.

Returns

an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

estimate_effect(identified_estimand, method_name=None, test_significance=None, method_params=None)[source]

Estimate the identified causal effect.

If method_name is provided, uses the provided method. Else, finds a suitable method to be used.

Parameters

identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method
method_name – (optional) name of the estimation method to be used.

Returns

an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

identify_effect(proceed_when_unidentifiable=None)[source]

Identify the causal effect to be estimated, using properties of the causal graph.

Returns: a probability expression for the causal effect if identified, else NULL

refute_estimate(estimand, estimate, method_name=None, **kwargs)[source]

Refute an estimated causal effect.

If method_name is provided, uses the provided method. Else, finds a suitable method to use.

Parameters: estimate – an instance of the CausalEstimate class.
Returns: an instance of the RefuteResult class

summary()[source]

Print a text summary of the model.

Returns: None

view_model(layout='dot')[source]

View the causal DAG.

Returns: a visualization of the graph

dowhy.plotter module

dowhy.plotter.plot_causal_effect(estimate, treatment, outcome)[source]

dowhy.plotter.plot_treatment_outcome(treatment, outcome, time_var)[source]

dowhy package

Subpackages

Submodules

dowhy.causal_estimator module

dowhy.causal_graph module

dowhy.causal_identifier module

dowhy.causal_refuter module

dowhy.data_transformer module

dowhy.datasets module

dowhy.do_sampler module

dowhy.do_why module

dowhy.plotter module

Module contents