API Reference¶

dagsampler¶

Top-level package for dagsampler.

Provides the main public API surface for consumers of the package.

class dagsampler.CausalDataGenerator(config)¶

Bases: object

Generates data from a causal graph based on a detailed configuration.

This class is designed to be flexible, allowing for the specification of graph structures, variable types, functional relationships, and noise distributions through a single configuration object. It supports both random parameter generation and hardcoded parameters for reproducible experiments.

Parameters:: config (dict[str, Any])

simulate()¶

Run the full simulation pipeline and return generated artifacts.

Returns:

Dictionary with:

"data": pandas DataFrame with one column per node.
"parametrization": dict containing all resolved parameters (including sampled defaults) used for generation.
"dag": networkx.DiGraph aligned to the dataframe column order.
"ci_oracle" (optional): list of d-separation records when simulation_params.store_ci_oracle is enabled.

Return type:

dict[str, Any]

dagsampler.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')¶

Backward-compatible shorthand for independent-node configurations.

If var_specs is omitted, generate n_vars variables of node_type (default: binary) named {prefix}0..{prefix}{n_vars-1}.

Parameters:

var_specs (list[dict[str, Any]] | None)
n_samples (int)
seed (int | dict[str, int] | None)
force_uniform (bool)
force_uniform_marginals (bool | None)
n_vars (int | None)
node_type (str)
cardinality (int)
prefix (str)

Return type:

dict[str, Any]

dagsampler.independence_config(var_specs, n_samples, seed=None, force_uniform=True)¶

Build a config with no edges; every node is exogenous.

Parameters:

var_specs (list[dict[str, Any]]) – List of node specs {"name", "type", optionally "cardinality"}. type is one of "continuous", "binary", "categorical"; cardinality defaults to 3 for categorical.
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int (sets both seed_structure and seed_data) or {"structure": int, "data": int}.
force_uniform (bool) – Passed as force_uniform_marginals in simulation_params; when True, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.

Return type:

dict[str, Any]

dagsampler.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a chain config: var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].

The first node is exogenous; each subsequent node is endogenous with the previous node as its single parent. Continuous endogenous nodes use additive Gaussian noise (std=0.5). Categorical endogenous nodes always use categorical_model = {"name": "logistic"} regardless of the mechanism argument.

Parameters:

var_specs (list[dict[str, Any]]) – Ordered list of node specs ({"name", "type", optionally "cardinality"}).
mechanism (str) – "linear" for all-continuous/binary parents, "sigmoid" for a tanh nonlinearity, or "stratum_means" when any parent is categorical.
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a fork config: root -> left and root -> right.

root is exogenous; left and right are endogenous with root as their single parent.

Parameters:

var_specs (dict[str, dict[str, Any]]) – Dict with keys "root", "left", "right" — each a node spec {"name", "type", optionally "cardinality"}.
mechanism (str) – Same options as chain_config().
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a collider config: left -> collider and right -> collider.

left and right are exogenous; collider is endogenous with both as parents.

Parameters:

var_specs (dict[str, dict[str, Any]]) – Dict with keys "left", "right", "collider" — each a node spec {"name", "type", optionally "cardinality"}.
mechanism (str) – Same options as chain_config().
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.causal_sim¶

class dagsampler.causal_sim.CausalDataGenerator(config)¶

Bases: object

Generates data from a causal graph based on a detailed configuration.

Parameters:: config (dict[str, Any])

simulate()¶

Run the full simulation pipeline and return generated artifacts.

Returns:

Dictionary with:

"data": pandas DataFrame with one column per node.
"parametrization": dict containing all resolved parameters (including sampled defaults) used for generation.
"dag": networkx.DiGraph aligned to the dataframe column order.
"ci_oracle" (optional): list of d-separation records when simulation_params.store_ci_oracle is enabled.

Return type:

dict[str, Any]

dagsampler.templates¶

Config template helpers for common DAG structures.

dagsampler.templates.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')¶

Backward-compatible shorthand for independent-node configurations.

If var_specs is omitted, generate n_vars variables of node_type (default: binary) named {prefix}0..{prefix}{n_vars-1}.

Parameters:

var_specs (list[dict[str, Any]] | None)
n_samples (int)
seed (int | dict[str, int] | None)
force_uniform (bool)
force_uniform_marginals (bool | None)
n_vars (int | None)
node_type (str)
cardinality (int)
prefix (str)

Return type:

dict[str, Any]

dagsampler.templates.independence_config(var_specs, n_samples, seed=None, force_uniform=True)¶

Build a config with no edges; every node is exogenous.

Parameters:

var_specs (list[dict[str, Any]]) – List of node specs {"name", "type", optionally "cardinality"}. type is one of "continuous", "binary", "categorical"; cardinality defaults to 3 for categorical.
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int (sets both seed_structure and seed_data) or {"structure": int, "data": int}.
force_uniform (bool) – Passed as force_uniform_marginals in simulation_params; when True, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.

Return type:

dict[str, Any]

dagsampler.templates.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a chain config: var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].

Parameters:

var_specs (list[dict[str, Any]]) – Ordered list of node specs ({"name", "type", optionally "cardinality"}).
mechanism (str) – "linear" for all-continuous/binary parents, "sigmoid" for a tanh nonlinearity, or "stratum_means" when any parent is categorical.
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.templates.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a fork config: root -> left and root -> right.

root is exogenous; left and right are endogenous with root as their single parent.

Parameters:

var_specs (dict[str, dict[str, Any]]) – Dict with keys "root", "left", "right" — each a node spec {"name", "type", optionally "cardinality"}.
mechanism (str) – Same options as chain_config().
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.templates.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶

Build a collider config: left -> collider and right -> collider.

left and right are exogenous; collider is endogenous with both as parents.

Parameters:

var_specs (dict[str, dict[str, Any]]) – Dict with keys "left", "right", "collider" — each a node spec {"name", "type", optionally "cardinality"}.
mechanism (str) – Same options as chain_config().
n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) – int, {"structure": int, "data": int}, or None.
post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g. "tanh").

Return type:

dict[str, Any]

dagsampler.cli¶

dagsampler.cli.build_parser()¶

Return type:: ArgumentParser

dagsampler.cli.main()¶

Return type:: int