API Reference

dagsampler

Top-level package for dagsampler.

Provides the main public API surface for consumers of the package.

class dagsampler.CausalDataGenerator(config)

Bases: object

Generates data from a causal graph based on a detailed configuration.

This class is designed to be flexible, allowing for the specification of graph structures, variable types, functional relationships, and noise distributions through a single configuration object. It supports both random parameter generation and hardcoded parameters for reproducible experiments.

Parameters:

config (dict[str, Any])

simulate()

Run the full simulation pipeline and return generated artifacts.

Returns:

Dictionary with:
  • "data": pandas DataFrame with one column per node.

  • "parametrization": dict containing all resolved parameters (including sampled defaults) used for generation.

  • "dag": networkx.DiGraph aligned to the dataframe column order.

  • "ci_oracle" (optional): list of d-separation records when simulation_params.store_ci_oracle is enabled.

Return type:

dict[str, Any]

dagsampler.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')

Backward-compatible shorthand for independent-node configurations.

If var_specs is omitted, generate n_vars variables of node_type (default: binary) named {prefix}0..{prefix}{n_vars-1}.

Parameters:
  • var_specs (list[dict[str, Any]] | None)

  • n_samples (int)

  • seed (int | dict[str, int] | None)

  • force_uniform (bool)

  • force_uniform_marginals (bool | None)

  • n_vars (int | None)

  • node_type (str)

  • cardinality (int)

  • prefix (str)

Return type:

dict[str, Any]

dagsampler.independence_config(var_specs, n_samples, seed=None, force_uniform=True)

Build a config with no edges; every node is exogenous.

Parameters:
  • var_specs (list[dict[str, Any]]) – List of node specs {"name", "type", optionally "cardinality"}. type is one of "continuous", "binary", "categorical"; cardinality defaults to 3 for categorical.

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int (sets both seed_structure and seed_data) or {"structure": int, "data": int}.

  • force_uniform (bool) – Passed as force_uniform_marginals in simulation_params; when True, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.

Return type:

dict[str, Any]

dagsampler.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a chain config: var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].

The first node is exogenous; each subsequent node is endogenous with the previous node as its single parent. Continuous endogenous nodes use additive Gaussian noise (std=0.5). Categorical endogenous nodes always use categorical_model = {"name": "logistic"} regardless of the mechanism argument.

Parameters:
  • var_specs (list[dict[str, Any]]) – Ordered list of node specs ({"name", "type", optionally "cardinality"}).

  • mechanism (str) – "linear" for all-continuous/binary parents, "sigmoid" for a tanh nonlinearity, or "stratum_means" when any parent is categorical.

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a fork config: root -> left and root -> right.

root is exogenous; left and right are endogenous with root as their single parent.

Parameters:
  • var_specs (dict[str, dict[str, Any]]) – Dict with keys "root", "left", "right" — each a node spec {"name", "type", optionally "cardinality"}.

  • mechanism (str) – Same options as chain_config().

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a collider config: left -> collider and right -> collider.

left and right are exogenous; collider is endogenous with both as parents.

Parameters:
  • var_specs (dict[str, dict[str, Any]]) – Dict with keys "left", "right", "collider" — each a node spec {"name", "type", optionally "cardinality"}.

  • mechanism (str) – Same options as chain_config().

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.causal_sim

class dagsampler.causal_sim.CausalDataGenerator(config)

Bases: object

Generates data from a causal graph based on a detailed configuration.

This class is designed to be flexible, allowing for the specification of graph structures, variable types, functional relationships, and noise distributions through a single configuration object. It supports both random parameter generation and hardcoded parameters for reproducible experiments.

Parameters:

config (dict[str, Any])

simulate()

Run the full simulation pipeline and return generated artifacts.

Returns:

Dictionary with:
  • "data": pandas DataFrame with one column per node.

  • "parametrization": dict containing all resolved parameters (including sampled defaults) used for generation.

  • "dag": networkx.DiGraph aligned to the dataframe column order.

  • "ci_oracle" (optional): list of d-separation records when simulation_params.store_ci_oracle is enabled.

Return type:

dict[str, Any]

dagsampler.templates

Config template helpers for common DAG structures.

dagsampler.templates.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')

Backward-compatible shorthand for independent-node configurations.

If var_specs is omitted, generate n_vars variables of node_type (default: binary) named {prefix}0..{prefix}{n_vars-1}.

Parameters:
  • var_specs (list[dict[str, Any]] | None)

  • n_samples (int)

  • seed (int | dict[str, int] | None)

  • force_uniform (bool)

  • force_uniform_marginals (bool | None)

  • n_vars (int | None)

  • node_type (str)

  • cardinality (int)

  • prefix (str)

Return type:

dict[str, Any]

dagsampler.templates.independence_config(var_specs, n_samples, seed=None, force_uniform=True)

Build a config with no edges; every node is exogenous.

Parameters:
  • var_specs (list[dict[str, Any]]) – List of node specs {"name", "type", optionally "cardinality"}. type is one of "continuous", "binary", "categorical"; cardinality defaults to 3 for categorical.

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int (sets both seed_structure and seed_data) or {"structure": int, "data": int}.

  • force_uniform (bool) – Passed as force_uniform_marginals in simulation_params; when True, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.

Return type:

dict[str, Any]

dagsampler.templates.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a chain config: var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].

The first node is exogenous; each subsequent node is endogenous with the previous node as its single parent. Continuous endogenous nodes use additive Gaussian noise (std=0.5). Categorical endogenous nodes always use categorical_model = {"name": "logistic"} regardless of the mechanism argument.

Parameters:
  • var_specs (list[dict[str, Any]]) – Ordered list of node specs ({"name", "type", optionally "cardinality"}).

  • mechanism (str) – "linear" for all-continuous/binary parents, "sigmoid" for a tanh nonlinearity, or "stratum_means" when any parent is categorical.

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.templates.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a fork config: root -> left and root -> right.

root is exogenous; left and right are endogenous with root as their single parent.

Parameters:
  • var_specs (dict[str, dict[str, Any]]) – Dict with keys "root", "left", "right" — each a node spec {"name", "type", optionally "cardinality"}.

  • mechanism (str) – Same options as chain_config().

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.templates.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)

Build a collider config: left -> collider and right -> collider.

left and right are exogenous; collider is endogenous with both as parents.

Parameters:
  • var_specs (dict[str, dict[str, Any]]) – Dict with keys "left", "right", "collider" — each a node spec {"name", "type", optionally "cardinality"}.

  • mechanism (str) – Same options as chain_config().

  • n_samples (int) – Number of rows to generate.

  • seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.

  • post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.cli

dagsampler.cli.build_parser()
Return type:

ArgumentParser

dagsampler.cli.main()
Return type:

int