dowhy.api package

Submodules

dowhy.api.causal_data_frame module

class dowhy.api.causal_data_frame.CausalAccessor(pandas_obj)[source]

Bases: object

An accessor for the pandas.DataFrame under the causal namespace.

Parameters

pandas_obj

convert_to_custom_type(input_type)[source]

This function converts a DataFrame type to a custom type used within dowhy. We make use of the following mapping int -> ‘c’ float -> ‘c’ binary -> ‘b’ category -> ‘d’ Currently we have not added support for time. :param input_type: str: The datatype of a column within a DataFrame

do(x, method='weighting', num_cores=1, variable_types={}, outcome=None, params=None, dot_graph=None, common_causes=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, stateful=False)[source]

The do-operation implemented with sampling. This will return a pandas.DataFrame with the outcome variable(s) replaced with samples from P(Y|do(X=x)).

If the value of x is left unspecified (e.g. as a string or list), then the original values of x are left in the DataFrame, and Y is sampled from its respective P(Y|do(x)). If the value of x is specified (passed with a dict, where variable names are keys, and values are specified) then the new DataFrame will contain the specified values of x.

For some methods, the variable_types field must be specified. It should be a dict, where the keys are variable names, and values are ‘o’ for ordered discrete, ‘u’ for un-ordered discrete, ‘d’ for discrete, or ‘c’ for continuous.

Inference requires a set of control variables. These can be provided explicitly using common_causes, which contains a list of variable names to control for. These can be provided implicitly by specifying a causal graph with dot_graph, from which they will be chosen using the default identification method.

When the set of control variables can’t be identified with the provided assumptions, a prompt will raise to the user asking whether to proceed. To automatically over-ride the prompt, you can set the flag proceed_when_unidentifiable to True.

Some methods build components during inference which are expensive. To retain those components for later inference (e.g. successive calls to do with different values of x), you can set the stateful flag to True. Be cautious about using the do operation statefully. State is set on the namespace, rather than the method, so can behave unpredictably. To reset the namespace and run statelessly again, you can call the reset method.

Parameters
  • x – str, list, dict: The causal state on which to intervene, and (optional) its interventional value(s).

  • method – The inference method to use with the sampler. Currently, ‘mcmc’, ‘weighting’, and ‘kernel_density’ are supported. The mcmc sampler requires pymc3>=3.7.

  • num_cores – int: if the inference method only supports sampling a point at a time, this will parallelize sampling.

  • variable_types – dict: The dictionary containing the variable types. Must contain the union of the causal state, control variables, and the outcome.

  • outcome – str: The outcome variable.

  • params – dict: extra parameters to set as attributes on the sampler object

  • dot_graph – str: A string specifying the causal graph.

  • common_causes – list: A list of strings containing the variable names to control for.

  • estimand_type – str: ‘nonparametric-ate’ is the only one currently supported. Others may be added later, to allow for specific, parametric estimands.

  • proceed_when_unidentifiable – bool: A flag to over-ride user prompts to proceed when effects aren’t identifiable with the assumptions provided.

  • stateful – bool: Whether to retain state. By default, the do operation is stateless.

Returns

pandas.DataFrame: A DataFrame containing the sampled outcome

parse_x(x)[source]
reset()[source]

If a causal namespace method (especially do) was run statefully, this resets the namespace.

Returns

Module contents