API Reference¶
dagsampler¶
Top-level package for dagsampler.
Provides the main public API surface for consumers of the package.
- class dagsampler.CausalDataGenerator(config)¶
Bases:
objectGenerates data from a causal graph based on a detailed configuration.
This class is designed to be flexible, allowing for the specification of graph structures, variable types, functional relationships, and noise distributions through a single configuration object. It supports both random parameter generation and hardcoded parameters for reproducible experiments.
- Parameters:
config (dict[str, Any])
- simulate()¶
Run the full simulation pipeline and return generated artifacts.
- Returns:
- Dictionary with:
"data": pandas DataFrame with one column per node."parametrization": dict containing all resolved parameters (including sampled defaults) used for generation."dag": networkx.DiGraph aligned to the dataframe column order."ci_oracle"(optional): list of d-separation records whensimulation_params.store_ci_oracleis enabled.
- Return type:
dict[str, Any]
- dagsampler.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')¶
Backward-compatible shorthand for independent-node configurations.
If
var_specsis omitted, generaten_varsvariables ofnode_type(default: binary) named{prefix}0..{prefix}{n_vars-1}.- Parameters:
var_specs (list[dict[str, Any]] | None)
n_samples (int)
seed (int | dict[str, int] | None)
force_uniform (bool)
force_uniform_marginals (bool | None)
n_vars (int | None)
node_type (str)
cardinality (int)
prefix (str)
- Return type:
dict[str, Any]
- dagsampler.independence_config(var_specs, n_samples, seed=None, force_uniform=True)¶
Build a config with no edges; every node is exogenous.
- Parameters:
var_specs (list[dict[str, Any]]) – List of node specs
{"name", "type", optionally "cardinality"}.typeis one of"continuous","binary","categorical";cardinalitydefaults to 3 for categorical.n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int(sets bothseed_structureandseed_data) or{"structure": int, "data": int}.force_uniform (bool) – Passed as
force_uniform_marginalsinsimulation_params; whenTrue, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.
- Return type:
dict[str, Any]
- dagsampler.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a chain config:
var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].The first node is exogenous; each subsequent node is endogenous with the previous node as its single parent. Continuous endogenous nodes use additive Gaussian noise (
std=0.5). Categorical endogenous nodes always usecategorical_model = {"name": "logistic"}regardless of themechanismargument.- Parameters:
var_specs (list[dict[str, Any]]) – Ordered list of node specs (
{"name", "type", optionally "cardinality"}).mechanism (str) –
"linear"for all-continuous/binary parents,"sigmoid"for a tanh nonlinearity, or"stratum_means"when any parent is categorical.n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
- dagsampler.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a fork config:
root -> leftandroot -> right.rootis exogenous;leftandrightare endogenous withrootas their single parent.- Parameters:
var_specs (dict[str, dict[str, Any]]) – Dict with keys
"root","left","right"— each a node spec{"name", "type", optionally "cardinality"}.mechanism (str) – Same options as
chain_config().n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
- dagsampler.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a collider config:
left -> colliderandright -> collider.leftandrightare exogenous;collideris endogenous with both as parents.- Parameters:
var_specs (dict[str, dict[str, Any]]) – Dict with keys
"left","right","collider"— each a node spec{"name", "type", optionally "cardinality"}.mechanism (str) – Same options as
chain_config().n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
dagsampler.causal_sim¶
- class dagsampler.causal_sim.CausalDataGenerator(config)¶
Bases:
objectGenerates data from a causal graph based on a detailed configuration.
This class is designed to be flexible, allowing for the specification of graph structures, variable types, functional relationships, and noise distributions through a single configuration object. It supports both random parameter generation and hardcoded parameters for reproducible experiments.
- Parameters:
config (dict[str, Any])
- simulate()¶
Run the full simulation pipeline and return generated artifacts.
- Returns:
- Dictionary with:
"data": pandas DataFrame with one column per node."parametrization": dict containing all resolved parameters (including sampled defaults) used for generation."dag": networkx.DiGraph aligned to the dataframe column order."ci_oracle"(optional): list of d-separation records whensimulation_params.store_ci_oracleis enabled.
- Return type:
dict[str, Any]
dagsampler.templates¶
Config template helpers for common DAG structures.
- dagsampler.templates.indep_config(var_specs=None, n_samples=100, seed=None, force_uniform=True, force_uniform_marginals=None, n_vars=None, node_type='binary', cardinality=2, prefix='X')¶
Backward-compatible shorthand for independent-node configurations.
If
var_specsis omitted, generaten_varsvariables ofnode_type(default: binary) named{prefix}0..{prefix}{n_vars-1}.- Parameters:
var_specs (list[dict[str, Any]] | None)
n_samples (int)
seed (int | dict[str, int] | None)
force_uniform (bool)
force_uniform_marginals (bool | None)
n_vars (int | None)
node_type (str)
cardinality (int)
prefix (str)
- Return type:
dict[str, Any]
- dagsampler.templates.independence_config(var_specs, n_samples, seed=None, force_uniform=True)¶
Build a config with no edges; every node is exogenous.
- Parameters:
var_specs (list[dict[str, Any]]) – List of node specs
{"name", "type", optionally "cardinality"}.typeis one of"continuous","binary","categorical";cardinalitydefaults to 3 for categorical.n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int(sets bothseed_structureandseed_data) or{"structure": int, "data": int}.force_uniform (bool) – Passed as
force_uniform_marginalsinsimulation_params; whenTrue, exogenous binary nodes get an exact 50/50 split and exogenous categorical nodes get equal class counts.
- Return type:
dict[str, Any]
- dagsampler.templates.chain_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a chain config:
var_specs[0] -> var_specs[1] -> ... -> var_specs[-1].The first node is exogenous; each subsequent node is endogenous with the previous node as its single parent. Continuous endogenous nodes use additive Gaussian noise (
std=0.5). Categorical endogenous nodes always usecategorical_model = {"name": "logistic"}regardless of themechanismargument.- Parameters:
var_specs (list[dict[str, Any]]) – Ordered list of node specs (
{"name", "type", optionally "cardinality"}).mechanism (str) –
"linear"for all-continuous/binary parents,"sigmoid"for a tanh nonlinearity, or"stratum_means"when any parent is categorical.n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
- dagsampler.templates.fork_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a fork config:
root -> leftandroot -> right.rootis exogenous;leftandrightare endogenous withrootas their single parent.- Parameters:
var_specs (dict[str, dict[str, Any]]) – Dict with keys
"root","left","right"— each a node spec{"name", "type", optionally "cardinality"}.mechanism (str) – Same options as
chain_config().n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
- dagsampler.templates.collider_config(var_specs, mechanism, n_samples, seed=None, post_transform=None)¶
Build a collider config:
left -> colliderandright -> collider.leftandrightare exogenous;collideris endogenous with both as parents.- Parameters:
var_specs (dict[str, dict[str, Any]]) – Dict with keys
"left","right","collider"— each a node spec{"name", "type", optionally "cardinality"}.mechanism (str) – Same options as
chain_config().n_samples (int) – Number of rows to generate.
seed (int | dict[str, int] | None) –
int,{"structure": int, "data": int}, orNone.post_transform (str | None) – Optional name of a post-nonlinear transform applied element-wise to continuous endogenous nodes (e.g.
"tanh").
- Return type:
dict[str, Any]
dagsampler.cli¶
- dagsampler.cli.build_parser()¶
- Return type:
ArgumentParser
- dagsampler.cli.main()¶
- Return type:
int