dowhy package
Subpackages
- dowhy.api package
- dowhy.causal_estimators package
- Submodules
- dowhy.causal_estimators.causalml module
- dowhy.causal_estimators.econml module
- dowhy.causal_estimators.generalized_linear_model_estimator module
- dowhy.causal_estimators.instrumental_variable_estimator module
- dowhy.causal_estimators.linear_regression_estimator module
- dowhy.causal_estimators.propensity_score_estimator module
- dowhy.causal_estimators.propensity_score_matching_estimator module
- dowhy.causal_estimators.propensity_score_stratification_estimator module
- dowhy.causal_estimators.propensity_score_weighting_estimator module
- dowhy.causal_estimators.regression_discontinuity_estimator module
- dowhy.causal_estimators.regression_estimator module
- dowhy.causal_estimators.two_stage_regression_estimator module
- Module contents
- dowhy.causal_refuters package
- Submodules
- dowhy.causal_refuters.add_unobserved_common_cause module
- dowhy.causal_refuters.bootstrap_refuter module
- dowhy.causal_refuters.data_subset_refuter module
- dowhy.causal_refuters.dummy_outcome_refuter module
- dowhy.causal_refuters.placebo_treatment_refuter module
- dowhy.causal_refuters.random_common_cause module
- Module contents
- dowhy.data_transformers package
- dowhy.do_samplers package
- dowhy.interpreters package
- dowhy.utils package
Submodules
dowhy.causal_estimator module
- class dowhy.causal_estimator.CausalEstimate(estimate, target_estimand, realized_estimand_expr, conditional_estimates=None, **kwargs)[source]
Bases:
object
Class for the estimate object that every causal estimator returns
- estimate_conditional_effects(effect_modifiers=None, num_quantiles=5)[source]
Estimate treatment effect conditioned on given variables.
If a numeric effect modifier is provided, it is discretized into quantile bins. If you would like a custom discretization, you can do so yourself: create a new column containing the discretized effect modifier and then include that column’s name in the effect_modifier_names argument.
- Parameters
effect_modifiers – Names of effect modifier variables over which the conditional effects will be estimated. If not provided, defaults to the effect modifiers specified during creation of the CausalEstimator object.
num_quantiles – The number of quantiles into which a numeric effect modifier variable is discretized. Does not affect any categorical effect modifiers.
- Returns
A (multi-index) dataframe that provides separate effects for each value of the (discretized) effect modifiers.
- get_confidence_intervals(confidence_level=None, method=None, **kwargs)[source]
Get confidence intervals of the obtained estimate.
By default, this is done with the help of bootstrapped confidence intervals but can be overridden if the specific estimator implements other methods of estimating confidence intervals.
If the method provided is not bootstrap, this function calls the implementation of the specific estimator.
- Parameters
method – Method for estimating confidence intervals.
confidence_level – The confidence level of the confidence intervals of the estimate.
kwargs – Other optional args to be passed to the CI method.
- Returns
The obtained confidence interval.
- get_standard_error(method=None, **kwargs)[source]
Get standard error of the obtained estimate.
By default, this is done with the help of bootstrapped standard errors but can be overridden if the specific estimator implements other methods of estimating standard error.
If the method provided is not bootstrap, this function calls the implementation of the specific estimator.
- Parameters
method – Method for computing the standard error.
kwargs – Other optional parameters to be passed to the estimating method.
- Returns
Standard error of the causal estimate.
- interpret(method_name=None, **kwargs)[source]
Interpret the causal estimate.
- Parameters
method_name – Method used (string) or a list of methods. If None, then the default for the specific estimator is used.
kwargs: – Optional parameters that are directly passed to the interpreter method.
- Returns
None
- test_stat_significance(method=None, **kwargs)[source]
Test statistical significance of the estimate obtained.
By default, uses resampling to create a non-parametric significance test. Individual child estimators can implement different methods. If the method name is different from “bootstrap”, this function calls the implementation of the child estimator.
- Parameters
method – Method for checking statistical significance
kwargs – Other optional parameters to be passed to the estimating method.
- Returns
p-value from the significance test
- class dowhy.causal_estimator.CausalEstimator(data, identified_estimand, treatment, outcome, control_value=0, treatment_value=1, test_significance=False, evaluate_effect_strength=False, confidence_intervals=False, target_units=None, effect_modifiers=None, params=None)[source]
Bases:
object
Base class for an estimator of causal effect.
Subclasses implement different estimation methods. All estimation methods are in the package “dowhy.causal_estimators”
Initializes an estimator with data and names of relevant variables.
This method is called from the constructors of its child classes.
- Parameters
data – data frame containing the data
identified_estimand – probability expression representing the target identified estimand to estimate.
treatment – name of the treatment variable
outcome – name of the outcome variable
control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.
treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.
test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations that can be specified through the params dictionary. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.
evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect
confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap.
target_units – The units for which the treatment effect should be estimated. This can be a string for common specifications of target units (namely, “ate”, “att” and “atc”). It can also be a lambda function that can be used as an index for the data (pandas DataFrame). Alternatively, it can be a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.
effect_modifiers – Variables on which to compute separate effects, or return a heterogeneous effect function. Not all methods support this currently.
params – (optional) Additional method parameters num_null_simulations: The number of simulations for testing the statistical significance of the estimator num_simulations: The number of simulations for finding the confidence interval (and/or standard error) for a estimate sample_size_fraction: The size of the sample for the bootstrap estimator confidence_level: The confidence level of the confidence interval estimate num_quantiles_to_discretize_cont_cols: The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.
- Returns
an instance of the estimator class.
- class BootstrapEstimates(estimates, params)
Bases:
tuple
Create new instance of BootstrapEstimates(estimates, params)
- estimates
Alias for field number 0
- params
Alias for field number 1
- DEFAULT_CONFIDENCE_LEVEL = 0.95
- DEFAULT_INTERPRET_METHOD = ['textual_effect_interpreter']
- DEFAULT_NOTIMPLEMENTEDERROR_MSG = 'not yet implemented for {0}. If you would this to be implemented in the next version, please raise an issue at https://github.com/microsoft/dowhy/issues'
- DEFAULT_NUMBER_OF_SIMULATIONS_CI = 100
- DEFAULT_NUMBER_OF_SIMULATIONS_STAT_TEST = 1000
- DEFAULT_SAMPLE_SIZE_FRACTION = 1
- NUM_QUANTILES_TO_DISCRETIZE_CONT_COLS = 5
- TEMP_CAT_COLUMN_PREFIX = '__categorical__'
- do(x, data_df=None)[source]
Method that implements the do-operator.
Given a value x for the treatment, returns the expected value of the outcome when the treatment is intervened to a value x.
- Parameters
x – Value of the treatment
data_df – Data on which the do-operator is to be applied.
- Returns
Value of the outcome when treatment is intervened/set to x.
- estimate_confidence_intervals(confidence_level=None, method=None, **kwargs)[source]
Find the confidence intervals corresponding to any estimator By default, this is done with the help of bootstrapped confidence intervals but can be overridden if the specific estimator implements other methods of estimating confidence intervals.
If the method provided is not bootstrap, this function calls the implementation of the specific estimator.
- Parameters
method – Method for estimating confidence intervals.
confidence_level – The confidence level of the confidence intervals of the estimate.
kwargs – Other optional args to be passed to the CI method.
- Returns
The obtained confidence interval.
- estimate_effect()[source]
Base estimation method that calls the estimate_effect method of its calling subclass.
Can optionally also test significance and estimate effect strength for any returned estimate.
- Parameters
self – object instance of class Estimator
- Returns
A CausalEstimate instance that contains point estimates of average and conditional effects. Based on the parameters provided, it optionally includes confidence intervals, standard errors,statistical significance and other statistical parameters.
- estimate_std_error(method=None, **kwargs)[source]
Compute standard error of an obtained causal estimate.
- Parameters
method – Method for computing the standard error.
kwargs – Other optional parameters to be passed to the estimating method.
- Returns
Standard error of the causal estimate.
- static get_estimator_object(new_data, identified_estimand, estimate)[source]
Create a new estimator of the same type as the one passed in the estimate argument.
Creates a new object with new_data and the identified_estimand
- Parameters
new_data – np.ndarray, pd.Series, pd.DataFrame
The newly assigned data on which the estimator should run :param identified_estimand: IdentifiedEstimand An instance of the identified estimand class that provides the information with respect to which causal pathways are employed when the treatment effects the outcome :param estimate: CausalEstimate It is an already existing estimate whose properties we wish to replicate
- Returns
An instance of the same estimator class that had generated the given estimate.
- static is_bootstrap_parameter_changed(bootstrap_estimates_params, given_params)[source]
Check whether parameters of the bootstrap have changed.
This is an efficiency method that checks if fresh resampling of the bootstrap samples is required. Returns True if parameters have changed and resampling should be done again.
- Parameters
bootstrap_estimates_params – A dictionary of parameters for the current bootstrap samples
given_params – A dictionary of parameters passed by the user
- Returns
A binary flag denoting whether the parameters are different.
- test_significance(estimate_value, method=None, **kwargs)[source]
Test statistical significance of obtained estimate.
By default, uses resampling to create a non-parametric significance test. A general procedure. Individual child estimators can implement different methods. If the method name is different from “bootstrap”, this function calls the implementation of the child estimator.
- Parameters
self – object instance of class Estimator
estimate_value – obtained estimate’s value
method – Method for checking statistical significance
- Returns
p-value from the significance test
dowhy.causal_graph module
- class dowhy.causal_graph.CausalGraph(treatment_name, outcome_name, graph=None, common_cause_names=None, instrument_names=None, effect_modifier_names=None, mediator_names=None, observed_node_names=None, missing_nodes_as_confounders=False)[source]
Bases:
object
Class for creating and modifying the causal graph.
Accepts a graph string (or a text file) in gml format (preferred) and dot format. Graphviz-like attributes can be set for edges and nodes. E.g. style=”dashed” as an edge attribute ensures that the edge is drawn with a dashed line.
If a graph string is not given, names of treatment, outcome, and confounders, instruments and effect modifiers (if any) can be provided to create the graph.
- build_graph(common_cause_names, instrument_names, effect_modifier_names, mediator_names)[source]
Creates nodes and edges based on variable names and their semantics.
Currently only considers the graphical representation of “direct” effect modifiers. Thus, all effect modifiers are assumed to be “direct” unless otherwise expressed using a graph. Based on the taxonomy of effect modifiers by VanderWheele and Robins: “Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007.”
- check_valid_frontdoor_set(nodes1, nodes2, candidate_nodes, frontdoor_paths=None)[source]
Check if valid the frontdoor variables for set of treatments, nodes1 to set of outcomes, nodes2.
- check_valid_mediation_set(nodes1, nodes2, candidate_nodes, mediation_paths=None)[source]
Check if candidate nodes are valid mediators for set of treatments, nodes1 to set of outcomes, nodes2.
- get_all_directed_paths(nodes1, nodes2)[source]
Get all directed paths between sets of nodes.
Currently only supports singleton sets.
- get_common_causes(nodes1, nodes2)[source]
Assume that nodes1 causes nodes2 (e.g., nodes1 are the treatments and nodes2 are the outcomes)
dowhy.causal_identifier module
- class dowhy.causal_identifier.CausalIdentifier(graph, estimand_type, method_name=None, proceed_when_unidentifiable=False)[source]
Bases:
object
Class that implements different identification methods.
Currently supports backdoor and instrumental variable identification methods. The identification is based on the causal graph provided.
Other specific ways of identification, such as the ID* algorithm, minimal adjustment criteria, etc. will be added in the future. If you’d like to contribute, please raise an issue or a pull request on Github.
- MAX_BACKDOOR_ITERATIONS = 100000
- NONPARAMETRIC_ATE = 'nonparametric-ate'
- NONPARAMETRIC_NDE = 'nonparametric-nde'
- NONPARAMETRIC_NIE = 'nonparametric-nie'
- build_backdoor_estimands_dict(treatment_name, outcome_name, backdoor_sets, estimands_dict, proceed_when_unidentifiable=None)[source]
- construct_frontdoor_estimand(estimand_type, treatment_name, outcome_name, frontdoor_variables_names)[source]
- identify_effect()[source]
Main method that returns an identified estimand (if one exists).
If estimand_type is non-parametric ATE, then uses backdoor, instrumental variable and frontdoor identification methods, to check if an identified estimand exists, based on the causal graph.
- Parameters
self – instance of the CausalEstimator class (or its subclass)
- Returns
target estimand, an instance of the IdentifiedEstimand class
- identify_frontdoor()[source]
Find a valid frontdoor variable if it exists.
Currently only supports a single variable frontdoor set.
- class dowhy.causal_identifier.IdentifiedEstimand(identifier, treatment_variable, outcome_variable, estimand_type=None, estimands=None, backdoor_variables=None, instrumental_variables=None, frontdoor_variables=None, mediator_variables=None, mediation_first_stage_confounders=None, mediation_second_stage_confounders=None, default_backdoor_id=None, identifier_method=None)[source]
Bases:
object
Class for storing a causal estimand, typically as a result of the identification step.
- get_backdoor_variables(key=None)[source]
Return a list containing the backdoor variables.
If the calling estimator method is a backdoor method, return the backdoor variables corresponding to its target estimand. Otherwise, return the backdoor variables for the default backdoor estimand.
dowhy.causal_model module
Module containing the main model class for the dowhy package.
- class dowhy.causal_model.CausalModel(data, treatment, outcome, graph=None, common_causes=None, instruments=None, effect_modifiers=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, missing_nodes_as_confounders=False, **kwargs)[source]
Bases:
object
Main class for storing the causal model state.
Initialize data and create a causal graph instance.
Assigns treatment and outcome variables. Also checks and finds the common causes and instruments for treatment and outcome.
At least one of graph, common_causes or instruments must be provided.
- Parameters
data – a pandas dataframe containing treatment, outcome and other
variables. :param treatment: name of the treatment variable :param outcome: name of the outcome variable :param graph: path to DOT file containing a DAG or a string containing a DAG specification in DOT format :param common_causes: names of common causes of treatment and _outcome. Only used when graph is None. :param instruments: names of instrumental variables for the effect of treatment on outcome. Only used when graph is None. :param effect_modifiers: names of variables that can modify the treatment effect. If not provided, then the causal graph is used to find the effect modifiers. Estimators will return multiple different estimates based on each value of effect_modifiers. :param estimand_type: the type of estimand requested (currently only “nonparametric-ate” is supported). In the future, may support other specific parametric forms of identification. :param proceed_when_unidentifiable: does the identification proceed by ignoring potential unobserved confounders. Binary flag. :param missing_nodes_as_confounders: Binary flag indicating whether variables in the dataframe that are not included in the causal graph, should be automatically included as confounder nodes. :param **kwargs: More optional parameters that can be passed to the causal model. Currently supported params: “logging_level” to indicate the level of logging needed. Possible values are logging.CRITICAL, logging.ERROR, logging.INFO, logging.WARNING,and logging.DEBUG. Default is logging.INFO. :returns: an instance of CausalModel class
- do(x, identified_estimand, method_name=None, method_params=None)[source]
Do operator for estimating values of the outcome after intervening on treatment.
- Parameters
identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method
method_name – any of the estimation method to be used. See docs for estimate_effect method for a list of supported estimation methods.
method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method.
- Returns
an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information
- estimate_effect(identified_estimand, method_name=None, control_value=0, treatment_value=1, test_significance=None, evaluate_effect_strength=False, confidence_intervals=False, target_units='ate', effect_modifiers=None, method_params=None)[source]
Estimate the identified causal effect.
- Currently requires an explicit method name to be specified. Method names follow the convention of identification method followed by the specific estimation method: “[backdoor/iv].estimation_method_name”. Following methods are supported.
Propensity Score Matching: “backdoor.propensity_score_matching”
Propensity Score Stratification: “backdoor.propensity_score_stratification”
Propensity Score-based Inverse Weighting: “backdoor.propensity_score_weighting”
Linear Regression: “backdoor.linear_regression”
Generalized Linear Models (e.g., logistic regression): “backdoor.generalized_linear_model”
Instrumental Variables: “iv.instrumental_variable”
Regression Discontinuity: “iv.regression_discontinuity”
In addition, you can directly call any of the EconML estimation methods. The convention is “backdoor.econml.path-to-estimator-class”. For example, for the double machine learning estimator (“DMLCateEstimator” class) that is located inside “dml” module of EconML, you can use the method name, “backdoor.econml.dml.DMLCateEstimator”. CausalML estimators can also be called. See this demo notebook.
- Parameters
identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method
method_name – name of the estimation method to be used.
control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.
treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.
test_significance – Binary flag on whether to additionally do a statistical signficance test for the estimate.
evaluate_effect_strength – (Experimental) Binary flag on whether to estimate the relative strength of the treatment’s effect. This measure can be used to compare different treatments for the same outcome (by running this method with different treatments sequentially).
confidence_intervals – (Experimental) Binary flag indicating whether confidence intervals should be computed.
target_units – (Experimental) The units for which the treatment effect should be estimated. This can be of three types. (1) a string for common specifications of target units (namely, “ate”, “att” and “atc”), (2) a lambda function that can be used as an index for the data (pandas DataFrame), or (3) a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.
effect_modifiers – Names of effect modifier variables can be (optionally) specified here too, since they do not affect identification. If None, the effect_modifiers from the CausalModel are used.
method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method. See the docs for each estimation method for allowed method-specific params.
- Returns
An instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information
- identify_effect(estimand_type=None, method_name='auto', proceed_when_unidentifiable=None)[source]
Identify the causal effect to be estimated, using properties of the causal graph.
- Parameters
proceed_when_unidentifiable – Binary flag indicating whether identification should proceed in the presence of (potential) unobserved confounders.
- Returns
a probability expression (estimand) for the causal effect if identified, else NULL
- interpret(method_name=None, **kwargs)[source]
Interpret the causal model.
- Parameters
method_name – method used for interpreting the model. If None, then default interpreter is chosen that describes the model summary and shows the associated causal graph.
kwargs: – Optional parameters that are directly passed to the interpreter method.
- Returns
None
- refute_estimate(estimand, estimate, method_name=None, **kwargs)[source]
Refute an estimated causal effect.
- If method_name is provided, uses the provided method. In the future, we may support automatic selection of suitable refutation tests. Following refutation methods are supported.
Adding a randomly-generated confounder: “random_common_cause”
Adding a confounder that is associated with both treatment and outcome: “add_unobserved_common_cause”
Replacing the treatment with a placebo (random) variable): “placebo_treatment_refuter”
Removing a random subset of the data: “data_subset_refuter”
- Parameters
estimand – target estimand, an instance of the IdentifiedEstimand class (typically, the output of identify_effect)
estimate – estimate to be refuted, an instance of the CausalEstimate class (typically, the output of estimate_effect)
method_name – name of the refutation method
kwargs – (optional) additional arguments that are passed directly to the refutation method. Can specify a random seed here to ensure reproducible results (‘random_seed’ parameter). For method-specific parameters, consult the documentation for the specific method. All refutation methods are in the causal_refuters subpackage.
- Returns
an instance of the RefuteResult class
dowhy.causal_refuter module
- class dowhy.causal_refuter.CausalRefutation(estimated_effect, new_effect, refutation_type)[source]
Bases:
object
Class for storing the result of a refutation method.
- class dowhy.causal_refuter.CausalRefuter(data, identified_estimand, estimate, **kwargs)[source]
Bases:
object
Base class for different refutation methods.
Subclasses implement specific refutations methods.
- DEFAULT_NUM_SIMULATIONS = 100
- choose_variables(required_variables)[source]
This method provides a way to choose the confounders whose values we wish to modify for finding its effect on the ability of the treatment to affect the outcome.
- test_significance(estimate, simulations, test_type='auto', significance_level=0.05)[source]
Tests the statistical significance of the estimate obtained to the simulations produced by a refuter.
The basis behind using the sample statistics of the refuter when we are in fact testing the estimate, is due to the fact that, we would ideally expect them to follow the same distribition
For refutation tests (e.g., placebo refuters), consider the null distribution as a distribution of effect estimates over multiple simulations with placebo treatment, and compute how likely the true estimate (e.g., zero for placebo test) is under the null. If the probability of true effect estimate is lower than the p-value, then estimator method fails the test.
For sensitivity analysis tests (e.g., bootstrap, subset or common cause refuters), the null distribution captures the distribution of effect estimates under the “true” dataset (e.g., with an additional confounder or different sampling), and we compute the probability of the obtained estimate under this distribution. If the probability is lower than the p-value, then the estimator method fails the test
Null Hypothesis: The estimate is a part of the distribution Alternative Hypothesis: The estimate does not fall in the distribution.
- Parameters
'estimate' – CausalEstimate
The estimate obtained from the estimator for the original data. :param ‘simulations’: np.array An array containing the result of the refuter for the simulations :param ‘test_type’: string, default ‘auto’ The type of test the user wishes to perform. :param ‘significance_level’: float, default 0.05 The significance level for the statistical test
- Returns
significance_dict: Dict
A Dict containing the p_value and a boolean that indicates if the result is statistically significant
dowhy.data_transformer module
dowhy.datasets module
Module for generating some sample datasets.
- dowhy.datasets.choice(a, size=None, replace=True, p=None)
Generates a random sample from a given 1-D array
New in version 1.7.0.
Note
New code should use the
choice
method of adefault_rng()
instance instead; please see the random-quick-start.- a1-D array-like or int
If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were
np.arange(a)
- sizeint or tuple of ints, optional
Output shape. If the given shape is, e.g.,
(m, n, k)
, thenm * n * k
samples are drawn. Default is None, in which case a single value is returned.- replaceboolean, optional
Whether the sample is with or without replacement. Default is True, meaning that a value of
a
can be selected multiple times.- p1-D array-like, optional
The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in
a
.
- samplessingle item or ndarray
The generated random samples
- ValueError
If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size
randint, shuffle, permutation random.Generator.choice: which should be used in new code
Setting user-specified probabilities through
p
uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element ofp
is 1 / len(a).Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its
axis
keyword.Generate a uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3) array([0, 3, 4]) # random >>> #This is equivalent to np.random.randint(0,5,3)
Generate a non-uniform random sample from np.arange(5) of size 3:
>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0]) array([3, 3, 0]) # random
Generate a uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False) array([3,1,0]) # random >>> #This is equivalent to np.random.permutation(np.arange(5))[:3]
Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:
>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0]) array([2, 3, 0]) # random
Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:
>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher'] >>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3]) array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random dtype='<U11')
- dowhy.datasets.construct_col_names(name, num_vars, num_discrete_vars, num_discrete_levels, one_hot_encode)[source]
- dowhy.datasets.convert_to_categorical(arr, num_vars, num_discrete_vars, quantiles=[0.25, 0.5, 0.75], one_hot_encode=False)[source]
- dowhy.datasets.create_dot_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[], frontdoor_variables=[])[source]
- dowhy.datasets.create_gml_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[], frontdoor_variables=[])[source]
- dowhy.datasets.linear_dataset(beta, num_common_causes, num_samples, num_instruments=0, num_effect_modifiers=0, num_treatments=1, num_frontdoor_variables=0, treatment_is_binary=True, outcome_is_binary=False, num_discrete_common_causes=0, num_discrete_instruments=0, num_discrete_effect_modifiers=0, one_hot_encode=False)[source]
dowhy.do_sampler module
- class dowhy.do_sampler.DoSampler(data, params=None, variable_types=None, num_cores=1, causal_model=None, keep_original_treatment=False)[source]
Bases:
object
Base class for a sampler from the interventional distribution.
Initializes a do sampler with data and names of relevant variables.
Do sampling implements the do() operation from Pearl (2000). This is an operation is defined on a causal bayesian network, an explicit implementation of which is the basis for the MCMC sampling method.
We abstract the idea behind the three-step process to allow other methods, as well. The disrupt_causes method is the means to make treatment assignment ignorable. In the Pearlian framework, this is where we cut the edges pointing into the causal state. With other methods, this will typically be by using some approach which assumes conditional ignorability (e.g. weighting, or explicit conditioning with Robins G-formula.)
Next, the make_treatment_effective method reflects the assumption that the intervention we impose is “effective”. Most simply, we fix the causal state to some specific value. We skip this step there is no value specified for the causal state, and the original values are used instead.
Finally, we sample from the resulting distribution. This can be either from a point_sample method, in the case that the inference method doesn’t support batch sampling, or the sample method in the case that it does. For convenience, the point_sample method parallelizes with multiprocessing using the num_cores kwargs to set the number of cores to use for parallelization.
While different methods will have their own class attributes, the _df method should be common to all methods. This is them temporary dataset which starts as a copy of the original data, and is modified to reflect the steps of the do operation. Read through the existing methods (weighting is likely the most minimal) to get an idea of how this works to implement one yourself.
- Parameters
data – pandas.DataFrame containing the data
identified_estimand – dowhy.causal_identifier.IdentifiedEstimand: and estimand using a backdoor method
for effect identification. :param treatments: list or str: names of the treatment variables :param outcomes: list or str: names of the outcome variables :param variable_types: dict: A dictionary containing the variable’s names and types. ‘c’ for continuous, ‘o’ for ordered, ‘d’ for discrete, and ‘u’ for unordered discrete. :param keep_original_treatment: bool: Whether to use make_treatment_effective, or to keep the original treatment assignments. :param params: (optional) additional method parameters
- disrupt_causes()[source]
Override this method to render treatment assignment conditionally ignorable :return:
- make_treatment_effective(x)[source]
This is more likely the implementation you’d like to use, but some methods may require overriding this method to make the treatment effective. :param x: :return:
dowhy.interpreter module
- class dowhy.interpreter.Interpreter(instance, **kwargs)[source]
Bases:
object
Base class for all interpretation methods.
Initialize an interpreter.
- Parameters
instance – An object of type CausalModel, CausalEstimate or CausalRefutation.
- SUPPORTED_ESTIMATORS = []
- SUPPORTED_MODELS = []
- SUPPORTED_REFUTERS = []