dowhy.api package
Submodules
dowhy.api.causal_data_frame module
- class dowhy.api.causal_data_frame.CausalAccessor(pandas_obj)[source]
Bases:
object
An accessor for the pandas.DataFrame under the causal namespace.
- Parameters
pandas_obj –
- convert_to_custom_type(input_type)[source]
This function converts a DataFrame type to a custom type used within dowhy. We make use of the following mapping int -> ‘c’ float -> ‘c’ binary -> ‘b’ category -> ‘d’ Currently we have not added support for time. :param input_type: str: The datatype of a column within a DataFrame
- do(x, method='weighting', num_cores=1, variable_types={}, outcome=None, params=None, dot_graph=None, common_causes=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, stateful=False)[source]
The do-operation implemented with sampling. This will return a pandas.DataFrame with the outcome variable(s) replaced with samples from P(Y|do(X=x)).
If the value of x is left unspecified (e.g. as a string or list), then the original values of x are left in the DataFrame, and Y is sampled from its respective P(Y|do(x)). If the value of x is specified (passed with a dict, where variable names are keys, and values are specified) then the new DataFrame will contain the specified values of x.
For some methods, the variable_types field must be specified. It should be a dict, where the keys are variable names, and values are ‘o’ for ordered discrete, ‘u’ for un-ordered discrete, ‘d’ for discrete, or ‘c’ for continuous.
Inference requires a set of control variables. These can be provided explicitly using common_causes, which contains a list of variable names to control for. These can be provided implicitly by specifying a causal graph with dot_graph, from which they will be chosen using the default identification method.
When the set of control variables can’t be identified with the provided assumptions, a prompt will raise to the user asking whether to proceed. To automatically over-ride the prompt, you can set the flag proceed_when_unidentifiable to True.
Some methods build components during inference which are expensive. To retain those components for later inference (e.g. successive calls to do with different values of x), you can set the stateful flag to True. Be cautious about using the do operation statefully. State is set on the namespace, rather than the method, so can behave unpredictably. To reset the namespace and run statelessly again, you can call the reset method.
- Parameters
x – str, list, dict: The causal state on which to intervene, and (optional) its interventional value(s).
method – The inference method to use with the sampler. Currently, ‘mcmc’, ‘weighting’, and
‘kernel_density’ are supported. The mcmc sampler requires pymc3>=3.7. :param num_cores: int: if the inference method only supports sampling a point at a time, this will parallelize sampling. :param variable_types: dict: The dictionary containing the variable types. Must contain the union of the causal state, control variables, and the outcome. :param outcome: str: The outcome variable. :param params: dict: extra parameters to set as attributes on the sampler object :param dot_graph: str: A string specifying the causal graph. :param common_causes: list: A list of strings containing the variable names to control for. :param estimand_type: str: ‘nonparametric-ate’ is the only one currently supported. Others may be added later, to allow for specific, parametric estimands. :param proceed_when_unidentifiable: bool: A flag to over-ride user prompts to proceed when effects aren’t identifiable with the assumptions provided. :param stateful: bool: Whether to retain state. By default, the do operation is stateless. :return: pandas.DataFrame: A DataFrame containing the sampled outcome