dowhy.gcm.util package

Submodules

dowhy.gcm.util.general module

Functions in this module should be considered experimental, meaning there might be breaking API changes in the future.

dowhy.gcm.util.general.apply_one_hot_encoding(X: ndarray, one_hot_encoder_map: Dict[int, OneHotEncoder]) ndarray[source]
dowhy.gcm.util.general.fit_one_hot_encoders(X: ndarray) Dict[int, OneHotEncoder][source]

Fits one-hot encoders to each categorical column in X. A categorical input needs to be a string, i.e. a categorical column consists only of strings.

Parameters

X – Input data matrix.

Returns

Dictionary that maps a column index to a scikit OneHotEncoder.

dowhy.gcm.util.general.geometric_median(x: ndarray) ndarray[source]
dowhy.gcm.util.general.has_categorical(X: ndarray) bool[source]

Checks if any of the given columns are categorical, i.e. either a string or a boolean. If any of the columns is categorical, this method will return True. Alternatively, consider is_categorical for checking if all columns are categorical.

Note: A np matrix with mixed data types might internally convert numeric columns to strings and vice versa. To ensure that the given given data keeps the original data type, consider converting/initializing it with the dtype ‘object’. For instance: np.array([[1, ‘True’, ‘0’, 0.2], [3, ‘False’, ‘1’, 2.3]], dtype=object)

Parameters

X – Input array to check if all columns are categorical.

Returns

True if all columns of the input are categorical, False otherwise.

dowhy.gcm.util.general.is_categorical(X: ndarray) bool[source]

Checks if all of the given columns are categorical, i.e. either a string or a boolean. Only if all of the columns are categorical, this method will return True. Alternatively, consider has_categorical for checking if any of the columns is categorical.

Note: A np matrix with mixed data types might internally convert numeric columns to strings and vice versa. To ensure that the given given data keeps the original data type, consider converting/initializing it with the dtype ‘object’. For instance: np.array([[1, ‘True’, ‘0’, 0.2], [3, ‘False’, ‘1’, 2.3]], dtype=object)

Parameters

X – Input array to check if all columns are categorical.

Returns

True if all columns of the input are categorical, False otherwise.

dowhy.gcm.util.general.means_difference(randomized_predictions: ndarray, baseline_values: ndarray) ndarray[source]
dowhy.gcm.util.general.set_random_seed(random_seed: int) None[source]

Sets random seed in numpy and the random module.

Parameters

random_seed – Random see for the numpy and random module.

Returns

None

dowhy.gcm.util.general.shape_into_2d(*args)[source]

If necessary, shapes the numpy inputs into 2D matrices.

Example:

array([1, 2, 3]) -> array([[1], [2], [3]]) 2 -> array([[2]])

Parameters

args – The function expects numpy arrays as inputs and returns a reshaped (2D) version of them (if necessary).

Returns

Reshaped versions of the input numpy arrays. For instance, given 1D inputs X, Y and Z, then shape_into_2d(X, Y, Z) reshapes them into 2D and returns them. If an input is already 2D, it will not be modified and returned as it is.

dowhy.gcm.util.general.variance_of_deviations(randomized_predictions: ndarray, baseline_values: ndarray) ndarray[source]
dowhy.gcm.util.general.variance_of_matching_values(randomized_predictions: ndarray, baseline_values: ndarray) ndarray[source]

dowhy.gcm.util.plotting module

dowhy.gcm.util.plotting.bar_plot(values: Dict[str, float], uncertainties: Optional[Dict[str, Tuple[float, float]]] = None, ylabel: str = '', filename: Optional[str] = None, display_plot: bool = True, figure_size: Optional[List[int]] = None, bar_width: float = 0.8, xticks: Optional[List[str]] = None, xticks_rotation: int = 90, sort_names: bool = True) None[source]

Convenience function to make a bar plot of the given values with uncertainty bars, if provided. Useful for all kinds of attribution results (including confidence intervals).

Parameters
  • values – A dictionary where the keys are the labels and the values are the values to be plotted.

  • uncertainties – A dictionary of attributes to be added to the error bars.

  • ylabel – The label for the y-axis.

  • filename – An optional filename if the output should be plotted into a file.

  • display_plot – Optionally specify if the plot should be displayed or not (default to True).

  • figure_size – The size of the figure to be plotted.

  • bar_width – The width of the bars.

  • xticks – Explicitly specify the labels for the bars on the x-axis.

  • xticks_rotation – Specify the rotation of the labels on the x-axis.

  • sort_names – If True, the names in the plot are sorted alphabetically. If False, the order as given in values are used.

dowhy.gcm.util.plotting.plot(causal_graph: Graph, causal_strengths: Optional[Dict[Tuple[Any, Any], float]] = None, colors: Optional[Dict[Union[Any, Tuple[Any, Any]], str]] = None, filename: Optional[str] = None, display_plot: bool = True, figure_size: Optional[List[int]] = None, **kwargs) None[source]

Convenience function to plot causal graphs. This function uses different backends based on what’s available on the system. The best result is achieved when using Graphviz as the backend. This requires both the Python pygraphviz package (pip install pygraphviz) and the shared system library (e.g. brew install graphviz or apt-get install graphviz). When graphviz is not available, it will fall back to the networkx backend.

Parameters
  • causal_graph – The graph to be plotted

  • causal_strengths – An optional dictionary with Edge -> float entries.

  • colors – An optional dictionary with color specifications for edges or nodes.

  • filename – An optional filename if the output should be plotted into a file.

  • display_plot – Optionally specify if the plot should be displayed or not (default to True).

  • figure_size – A tuple to define the width and height (as a tuple) of the pyplot. This is used to parameter to modify pyplot’s ‘figure.figsize’ parameter. If None is given, the current/default value is used.

  • kwargs – Remaining parameters will be passed through to the backend verbatim.

Example usage:

>>> plot(nx.DiGraph([('X', 'Y')])) # plots X -> Y
>>> plot(nx.DiGraph([('X', 'Y')]), causal_strengths={('X', 'Y'): 0.43}) # annotates arrow with 0.43
>>> plot(nx.DiGraph([('X', 'Y')]), colors={('X', 'Y'): 'red', 'X': 'green'}) # colors X -> Y red and X green
dowhy.gcm.util.plotting.plot_adjacency_matrix(adjacency_matrix: DataFrame, is_directed: bool, filename: Optional[str] = None, display_plot: bool = True) None[source]

dowhy.gcm.util.pygraphviz module

Module contents