dowhy.causal_refuters package
Submodules
dowhy.causal_refuters.add_unobserved_common_cause module
- class dowhy.causal_refuters.add_unobserved_common_cause.AddUnobservedCommonCause(*args, **kwargs)[source]
- Bases: - CausalRefuter- Add an unobserved confounder for refutation. - Supports additional parameters that can be specified in the refute_estimate() method. - ‘confounders_effect_on_treatment’: how the simulated confounder affects the value of treatment. This can be linear (for continuous treatment) or binary_flip (for binary treatment) 
- ‘confounders_effect_on_outcome’: how the simulated confounder affects the value of outcome. This can be linear (for continuous outcome) or binary_flip (for binary outcome) 
- ‘effect_strength_on_treatment’: parameter for the strength of the effect of simulated confounder on treatment. For linear effect, it is the regression coeffient. For binary_flip, it is the probability that simulated confounder’s effect flips the value of treatment from 0 to 1 (or vice-versa). 
- ‘effect_strength_on_outcome’: parameter for the strength of the effect of simulated confounder on outcome. For linear effect, it is the regression coeffient. For binary_flip, it is the probability that simulated confounder’s effect flips the value of outcome from 0 to 1 (or vice-versa). 
 - TODO: Needs scaled version of the parameters and an interpretation module (e.g., in comparison to biggest effect of known confounder) - Initialize the parameters required for the refuter - Parameters
- effect_on_t – str : This is used to represent the type of effect on the treatment due to the unobserved confounder. 
- effect_on_y – str : This is used to represent the type of effect on the outcome due to the unobserved confounder. 
- kappa_t – float, numpy.ndarray: This refers to the strength of the confounder on treatment. For a linear effect, it behaves like the regression coeffecient. For a binary flip it is the probability with which it can invert the value of the treatment. 
- kappa_y – floar, numpy.ndarray: This refers to the strength of the confounder on outcome. For a linear effect, it behaves like the regression coefficient. For a binary flip, it is the probability with which it can invert the value of the outcome. 
 
 - include_confounders_effect(new_data, kappa_t, kappa_y)[source]
- This function deals with the change in the value of the data due to the effect of the unobserved confounder. In the case of a binary flip, we flip only if the random number is greater than the threshold set. In the case of a linear effect, we use the variable as the linear regression constant. - Parameters
- new_data – pandas.DataFrame: The data to be changed due to the effects of the unobserved confounder. 
- kappa_t – numpy.float64: The value of the threshold for binary_flip or the value of the regression coefficient for linear effect. 
- kappa_y – numpy.float64: The value of the threshold for binary_flip or the value of the regression coefficient for linear effect. 
 
- Returns
- pandas.DataFrame: The DataFrame that includes the effects of the unobserved confounder. 
 
 - refute_estimate()[source]
- This function attempts to add an unobserved common cause to the outcome and the treatment. At present, we have implemented the behavior for one dimensional behaviors for continueous and binary variables. This function can either take single valued inputs or a range of inputs. The function then looks at the data type of the input and then decides on the course of action. - Returns
- CausalRefuter: An object that contains the estimated effect and a new effect and the name of the refutation used. 
 
 
dowhy.causal_refuters.bootstrap_refuter module
- class dowhy.causal_refuters.bootstrap_refuter.BootstrapRefuter(*args, **kwargs)[source]
- Bases: - CausalRefuter- Refute an estimate by running it on a random sample of the data containing measurement error in the confounders. This allows us to find the ability of the estimator to find the effect of the treatment on the outcome. - It supports additional parameters that can be specified in the refute_estimate() method. - Parameters
- num_simulations (int, optional) – The number of simulations to be run, - CausalRefuter.DEFAULT_NUM_SIMULATIONSby default
- sample_size (int, optional) – The size of each bootstrap sample and is the size of the original data by default 
- required_variables (int, list, bool, optional) – The list of variables to be used as the input for - y~f(W)This is- Trueby default, which in turn selects all variables leaving the treatment and the outcome
 
 - An integer argument refers to how many variables will be used for estimating the value of the outcome 
- A list explicitly refers to which variables will be used to estimate the outcome Furthermore, it gives the ability to explictly select or deselect the covariates present in the estimation of the outcome. This is done by either adding or explicitly removing variables from the list as shown below: 
 - Note - We need to pass required_variables = - [W0,W1]if we want- W0and- W1.
- We need to pass required_variables = - [-W0,-W1]if we want all variables excluding- W0and- W1.
 - If the value is True, we wish to include all variables to estimate the value of the outcome. 
 - Warning - A - Falsevalue is- INVALIDand will result in an- error.- Parameters
- noise (float, optional) – The standard deviation of the noise to be added to the data and is - BootstrapRefuter.DEFAULT_STD_DEVby default
- probability_of_change (float, optional) – It specifies the probability with which we change the data for a boolean or categorical variable It is - noiseby default, only if the value of- noiseis less than 1.
- random_state (int, RandomState, optional) – The seed value to be added if we wish to repeat the same random behavior. For this purpose, we repeat the same seed in the psuedo-random generator. 
 
 - DEFAULT_NUMBER_OF_TRIALS = 1
 - DEFAULT_STD_DEV = 0.1
 - DEFAULT_SUCCESS_PROBABILITY = 0.5
 
dowhy.causal_refuters.data_subset_refuter module
- class dowhy.causal_refuters.data_subset_refuter.DataSubsetRefuter(*args, **kwargs)[source]
- Bases: - CausalRefuter- Refute an estimate by rerunning it on a random subset of the original data. - Supports additional parameters that can be specified in the refute_estimate() method. - Parameters
- subset_fraction (float, optional) – Fraction of the data to be used for re-estimation, which is - DataSubsetRefuter.DEFAULT_SUBSET_FRACTIONby default.
- num_simulations (int, optional) – The number of simulations to be run, which is - CausalRefuter.DEFAULT_NUM_SIMULATIONSby default
- random_state (int, RandomState, optional) – The seed value to be added if we wish to repeat the same random behavior. If we with to repeat the same behavior we push the same seed in the psuedo-random generator 
 
 - DEFAULT_SUBSET_FRACTION = 0.8
 
dowhy.causal_refuters.dummy_outcome_refuter module
- class dowhy.causal_refuters.dummy_outcome_refuter.DummyOutcomeRefuter(*args, **kwargs)[source]
- Bases: - CausalRefuter- Refute an estimate by replacing the outcome with a randomly generated variable. - This allows us to have a well defined effect of the treatment on the outcome. - We find f(W) for a fixed value of t 
- For all the other values of t. We obtain the value of dummy outcome as: - y_dummy = h(t) + f(W)
 - If we originally started out with W / \ t --->y On estimating, y_dummy = f(W) W / \ t --|->y This ensures that we try to capture as much of W--->Y as possible On adding h(t) W / \ t --->y h(t) - Supports additional parameters that can be specified in the refute_estimate() method. - Parameters
- num_simulations (int, optional) – The number of simulations to be run, which defaults to - CausalRefuter.DEFAULT_NUM_SIMULATIONS
- transformation_list (list, optional) – - It is a list of actions to be performed to obtain the outcome, which defaults to - DummyOutcomeRefuter.DEFAULT_TRANSFORMATION. The default transformation is as follows:- [("zero",""),("noise", {'std_dev':1} )]
 
 - Each of the actions within a transformation is one of the following types: - function argument: function - pd.Dataframe -> np.ndarray- It takes in a function that takes the input data frame as the input and outputs the outcome variable. This allows us to create an output varable that only depends on the covariates and does not depend on the treatment variable. 
- string argument - Currently it supports some common estimators like - Linear Regression 
- K Nearest Neighbours 
- Support Vector Machine 
- Neural Network 
- Random Forest 
 
- Or functions such as: - Permute This permutes the rows of the outcome, disassociating any effect of the treatment on the outcome. 
- Noise This adds white noise to the outcome with white noise, reducing any causal relationship with the treatment. 
- Zero It replaces all the values in the outcome by zero 
 
 - Examples:
- The - transformation_listis of the following form:
 - If the function - pd.Dataframe -> np.ndarrayis already defined.- [(func,func_params),('permute',{'permute_fraction':val}),('noise',{'std_dev':val})]- Every function should be able to support a minimum of two arguments - X_trainand- outcome_trainwhich correspond to the training data and the outcome that we want to predict, along with additional parameters such as the learning rate or the momentum constant can be set with the help of- func_args.- [(neural_network,{'alpha': 0.0001, 'beta': 0.9}),('permute',{'permute_fraction': 0.2}),('noise',{'std_dev': 0.1})]- The neural network is invoked as - neural_network(X_train, outcome_train, **args).
- If a function from the above list is used - [('knn',{'n_neighbors':5}), ('permute', {'permute_fraction': val} ), ('noise', {'std_dev': val} )]
 
 - Parameters
- true_causal_effect (function) – A function that is used to get the True Causal Effect for the modelled dummy outcome. It defaults to - DummyOutcomeRefuter.DEFAULT_TRUE_CAUSAL_EFFECT, which means that there is no relationship between the treatment and outcome in the dummy data.
 - The equation for the dummy outcome is given by - y_hat = h(t) + f(W)- where - y_hatis the dummy outcome
- h(t)is the function that gives the true causal effect
- f(W)is the best estimate of- yobtained keeping- tconstant. This ensures that the variation in output of function- f(w)is not caused by- t.
 - Note - The true causal effect should take an input of the same shape as the treatment and the output should match the shape of the outcome - Parameters
- required_variables (int, list, bool, optional) – The list of variables to be used as the input for - y~f(W)This is- Trueby default, which in turn selects all variables leaving the treatment and the outcome
 - An integer argument refers to how many variables will be used for estimating the value of the outcome 
- A list explicitly refers to which variables will be used to estimate the outcome Furthermore, it gives the ability to explictly select or deselect the covariates present in the estimation of the outcome. This is done by either adding or explicitly removing variables from the list as shown below: 
 - Note - We need to pass required_variables = - [W0,W1]if we want- W0and- W1.
- We need to pass required_variables = - [-W0,-W1]if we want all variables excluding- W0and- W1.
 - If the value is True, we wish to include all variables to estimate the value of the outcome. 
 - Warning - A - Falsevalue is- INVALIDand will result in an- error.- Note - These inputs are fed to the function for estimating the outcome variable. The same set of required_variables is used for each instance of an internal estimation function. - Parameters
- bucket_size_scale_factor – For continuous data, the scale factor helps us scale the size of the bucket used on the data. The default scale factor is - DummyOutcomeRefuter.DEFAULT_BUCKET_SCALE_FACTOR.
- min_data_point_threshold (int, optional) – The minimum number of data points for an estimator to run. This defaults to - DummyOutcomeRefuter.MIN_DATA_POINT_THRESHOLD. If the number of data points is too few for a certain category, we make use of the- DummyOutcomeRefuter.DEFAULT_TRANSFORMATIONfor generaring the dummy outcome
 
 - DEFAULT_BUCKET_SCALE_FACTOR = 0.5
 - DEFAULT_STD_DEV = 0.1
 - DEFAULT_TEST_FRACTION = [TestFraction(base=0.5, other=0.5)]
 - DEFAULT_TRANSFORMATION = [('zero', ''), ('noise', {'std_dev': 1})]
 - DEFAULT_TRUE_CAUSAL_EFFECT()
 - MIN_DATA_POINT_THRESHOLD = 30
 - SUPPORTED_ESTIMATORS = ['linear_regression', 'knn', 'svm', 'random_forest', 'neural_network']
 - noise(outcome, std_dev)[source]
- Add white noise with mean 0 and standard deviation = std_dev - Parameters
- 'outcome' – np.ndarray 
 - The outcome variable, to which the white noise is added. :param ‘std_dev’: float The standard deviation of the white noise. - Returns
- input with added noise. 
 
 - permute(outcome, permute_fraction)[source]
- If the permute_fraction is 1, we permute all the values in the outcome. Otherwise we make use of the Fisher Yates shuffle. Refer to https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle for more details. - ‘outcome’: np.ndarray The outcome variable to be permuted. ‘permute_fraction’: float [0, 1] The fraction of rows permuted. 
 - preprocess_data_by_treatment()[source]
- This function groups data based on the data type of the treatment. - Expected variable types supported for the treatment: - bool 
- pd.categorical 
- float 
- int 
 - Returns
- pandas.core.groupby.generic.DataFrameGroupBy
 
 - process_data(X_train, outcome_train, X_validation, outcome_validation, transformation_list)[source]
- We process the data by first training the estimators in the transformation_list on - X_trainand- outcome_train. We then apply the estimators on- X_validationto get the value of the dummy outcome, which we store in- outcome_validation.- Parameters
- X_train (np.ndarray) – The data of the covariates which is used to train an estimator. It corresponds to the data of a single category of the treatment 
- outcome_train (np.ndarray) – This is used to hold the intermediate values of the outcome variable in the transformation list 
 
 - For Example: - [ ('permute', {'permute_fraction': val} ), (func,func_params)]- The value obtained from permutation is used as an input for the custom estimator. - Parameters
- X_validation (np.ndarray) – The data of the covariates that is fed to a trained estimator to generate a dummy outcome 
- outcome_validation (np.ndarray) – This variable stores the dummy_outcome generated by the transformations 
- transformation_list (np.ndarray) – The list of transformations on the outcome data required to produce a dummy outcome 
 
 
 
dowhy.causal_refuters.placebo_treatment_refuter module
- class dowhy.causal_refuters.placebo_treatment_refuter.PlaceboTreatmentRefuter(*args, **kwargs)[source]
- Bases: - CausalRefuter- Refute an estimate by replacing treatment with a randomly-generated placebo variable. - Supports additional parameters that can be specified in the refute_estimate() method. - Parameters
- placebo_type (str, optional) – Default is to generate random values for the treatment. If placebo_type is “permute”, then the original treatment values are permuted by row. 
- num_simulations (int, optional) – The number of simulations to be run, which is - CausalRefuter.DEFAULT_NUM_SIMULATIONSby default
- random_state (int, RandomState, optional) – The seed value to be added if we wish to repeat the same random behavior. If we want to repeat the same behavior we push the same seed in the psuedo-random generator. 
 
 - DEFAULT_MEAN_OF_NORMAL = 0
 - DEFAULT_NUMBER_OF_TRIALS = 1
 - DEFAULT_PROBABILITY_OF_BINOMIAL = 0.5
 - DEFAULT_STD_DEV_OF_NORMAL = 0
 
dowhy.causal_refuters.random_common_cause module
- class dowhy.causal_refuters.random_common_cause.RandomCommonCause(*args, **kwargs)[source]
- Bases: - CausalRefuter- Refute an estimate by introducing a randomly generated confounder (that may have been unobserved).