citk.tests package

This package contains the implementations of various conditional independence tests.

Statistical Model-Based Tests

class citk.tests.statistical_model_tests.Regression(data: ndarray, **kwargs)

Bases: CITKTest

CI test for continuous targets using Linear Regression (OLS).

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a conditional independence test for continuous data using a likelihood-ratio test between nested Ordinary Least Squares (OLS) linear regression models.

Xint

The index of the first variable.

Yint

The index of the second variable (the target).

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Linear Regression Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import Regression

# Generate data where X and Y are independent given Z
# X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = 2 * X + np.random.randn(n)
Y = 3 * Z + np.random.randn(n)
data = np.vstack([X, Y, Z]).T

# Initialize the test
regression_test = Regression(data)

# Test if X and Y are independent
p_value_unconditional = regression_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_value_unconditional:.4f}")

# Test if X and Y are independent given Z
p_value_conditional = regression_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_value_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 0.6210

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc
from citk.tests import Regression # Make sure it's registered

# Re-use the same data from the standalone example
cg = pc(data, alpha=0.05, indep_test='reg')

print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.statistical_model_tests.Logit(data: ndarray, **kwargs)

Bases: CITKTest

CI test for binary targets using Logistic Regression.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a conditional independence test for binary data using a likelihood-ratio test between nested Logistic Regression models.

Xint

The index of the first variable.

Yint

The index of the second variable (the binary target).

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Logistic Regression Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import Logit

# Generate data where Y is a binary variable
# X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = 2 * X + np.random.randn(n)
Y = (3 * Z + np.random.randn(n)) > 0
data = np.vstack([X, Y, Z]).T

# Initialize the test
logit_test = Logit(data)

# Test if X and Y are independent
p_value_unconditional = logit_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_value_unconditional:.4f}")

# Test for conditional independence of X and Y given Z
p_value_conditional = logit_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_value_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 0.9388

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc
from citk.tests import Logit
import numpy as np

# For the PC algorithm example, we use fully binary data to ensure
# the Logit test is always applicable, as PC may test any variable pair.
# We model a causal chain X -> Z -> Y with some noise.
n = 500
X = np.random.randint(0, 2, n)
# Z depends on X, with a 10% chance of flipping
Z = X.copy()
flip_mask_z = np.random.random(n) < 0.1
Z[flip_mask_z] = 1 - Z[flip_mask_z]
# Y depends on Z, with a 10% chance of flipping
Y = Z.copy()
flip_mask_y = np.random.random(n) < 0.1
Y[flip_mask_y] = 1 - Y[flip_mask_y]
binary_data = np.vstack([X, Y, Z]).T

cg = pc(binary_data, alpha=0.05, indep_test='logit')

print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.statistical_model_tests.Poisson(data: ndarray, **kwargs)

Bases: CITKTest

CI test for count targets using Poisson Regression.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a conditional independence test for count data using a likelihood-ratio test between nested Poisson Regression models.

Xint

The index of the first variable.

Yint

The index of the second variable (the count target).

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Poisson Regression Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import Poisson

# Generate data where Y is a count variable
# X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = 0.5 * X + np.random.randn(n)
Y = np.random.poisson(np.exp(1 + 0.5 * Z))
data = np.vstack([X, Y, Z]).T

# Initialize the test
poisson_test = Poisson(data)

# Test if X and Y are independent
p_value_unconditional = poisson_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_value_unconditional:.4f}")

# Test for conditional independence of X and Y given Z
p_value_conditional = poisson_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_value_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 0.2017

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc
from citk.tests import Poisson
import numpy as np

# For the PC algorithm example, we use fully count-based data to
# ensure the Poisson test is always applicable.
n = 500
X = np.random.poisson(2, size=n)
Z = np.random.poisson(1 + X / 2)
Y = np.random.poisson(1 + Z / 2)
count_data = np.vstack([X, Y, Z]).T

cg = pc(count_data, alpha=0.05, indep_test='pois')

print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3

Machine Learning-Based Tests

class citk.tests.ml_based_tests.KCI(data, **kwargs)

Bases: CITKTest

Wrapper for the Kernel Conditional Independence (KCI) test from the causal-learn library.

Parameters:
  • data (np.ndarray) – The dataset from which to run the test.

  • **kwargs (dict) – Additional keywords for the KCI test. See causal-learn documentation.

__call__(X, Y, condition_set=None, **kwargs)

Performs a Kernel Conditional Independence (KCI) test.

Xint

The index of the first variable.

Yint

The index of the second variable.

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Kernel Conditional Independence (KCI) Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import KCI

# Generate data with a non-linear relationship: X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = np.cos(X) + np.random.randn(n) * 0.1
Y = Z**2 + np.random.randn(n) * 0.1
data = np.vstack([X, Y, Z]).T

# Initialize the test
kci_test = KCI(data)

# Test for unconditional independence (should be dependent)
p_unconditional = kci_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_unconditional:.4f}")

# Test for conditional independence given Z (should be independent)
p_conditional = kci_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 0.8521

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc
from citk.tests import KCI
import numpy as np

# Using the same non-linear data
n = 200
X = np.random.randn(n)
Z = np.cos(X) + np.random.randn(n) * 0.1
Y = Z**2 + np.random.randn(n) * 0.1
data = np.vstack([X, Y, Z]).T

cg = pc(data, alpha=0.05, indep_test='kci')
print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.ml_based_tests.RandomForest(data: ndarray, **kwargs)

Bases: CITKTest

Performs a conditional independence test using Random Forest feature importance.

Parameters:
  • data (np.ndarray) – The dataset from which to run the test.

  • n_estimators (int, optional) – The number of trees in the forest.

  • num_permutations (int, optional) – The number of permutations to perform for the permutation test.

  • random_state (int, optional) – Seed for the random number generator for reproducibility.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a conditional independence test using Random Forest feature importance.

The test measures the feature importance of X in predicting Y, conditioned on Z. A permutation test is used to assess the statistical significance of this importance.

Xint

The index of the first variable.

Yint

The index of the second variable (the target).

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Random Forest CI Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import RandomForest

# Generate data with a non-linear relationship: X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = np.sin(X * 2) + np.random.randn(n) * 0.2
Y = Z**3 + np.random.randn(n) * 0.2
data = np.vstack([X, Y, Z]).T

# Initialize the test
rf_test = RandomForest(data, num_permutations=99, random_state=42)

# Test for unconditional independence (should be dependent)
p_unconditional = rf_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_unconditional:.4f}")

# Test for conditional independence given Z (should be independent)
p_conditional = rf_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0100
P-value (conditional) for X _||_ Y | Z: 0.5400

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc
from citk.tests import RandomForest
import numpy as np

n = 200
X = np.random.randn(n)
Z = np.sin(X * 2) + np.random.randn(n) * 0.2
Y = Z**3 + np.random.randn(n) * 0.2
data = np.vstack([X, Y, Z]).T

cg = pc(data, alpha=0.05, indep_test='rf', num_permutations=49)
print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.ml_based_tests.DML(data: ndarray, **kwargs)

Bases: CITKTest

Double-ML based conditional independence test.

Parameters:
  • data (np.ndarray) – The dataset from which to run the test.

  • model (scikit-learn compatible regressor, optional) – The model used to predict X from Z and Y from Z. Defaults to LightGBM.

  • cv_folds (int, optional) – The number of folds for cross-fitting.

  • n_perms (int, optional) – The number of permutations for the final distance correlation test.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a Double Machine Learning (DML) based conditional independence test.

It partials out the effect of the conditioning set Z from X and Y using a machine learning model and then tests for independence between the residuals.

Parameters:
  • X (int) – The index of the first variable.

  • Y (int) – The index of the second variable.

  • condition_set (list[int], optional) – A list of indices for the conditioning set. Can be empty.

Returns:

  • p_value (float) – The p-value from the distance correlation test on the residuals.

  • .. seealso:: – For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Double Machine Learning (DML) CI Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import DML

# Generate data with a non-linear common confounder Z
# Z -> X and Z -> Y
n = 500
Z = np.random.uniform(-3, 3, n)
X = np.sin(Z) + np.random.randn(n) * 0.2
Y = np.cos(Z) + np.random.randn(n) * 0.2
data = np.vstack([X, Y, Z]).T

# Initialize the test (uses LightGBM by default)
dml_test = DML(data)

# Test for unconditional independence (should be dependent)
p_unconditional = dml_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_unconditional:.4f}")

# Test for conditional independence given Z (should be independent)
p_conditional = dml_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0050
P-value (conditional) for X _||_ Y | Z: 0.6381

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc

cg = pc(data, alpha=0.05, indep_test='dml')
print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.ml_based_tests.CRIT(data: ndarray, **kwargs)

Bases: CITKTest

Conformalized Residual Independence Test (CRIT).

Parameters:
  • data (np.ndarray) – The dataset from which to run the test.

  • alpha (float, optional) – The significance level for the conformal prediction intervals.

  • cv_folds (int, optional) – The number of folds for cross-fitting.

  • n_perms (int, optional) – The number of permutations for the final distance correlation test.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs a Conformalized Residual Independence Test (CRIT).

This test uses conformal prediction to create robust, distribution-free residuals before testing for independence.

Xint

The index of the first variable.

Yint

The index of the second variable.

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value from the distance correlation test on the conformalized residuals.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Conformalized Residual Independence Test (CRIT) guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import CRIT

# Generate data with a non-linear relationship: X -> Z -> Y
n = 500
X = np.random.randn(n)
Z = np.sin(X * 2) + np.random.randn(n) * 0.2
Y = Z**3 + np.random.randn(n) * 0.2
data = np.vstack([X, Y, Z]).T

# Initialize the test
crit_test = CRIT(data, alpha=0.1, n_perms=99)

# Test for unconditional independence (should be dependent)
p_unconditional = crit_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_unconditional:.4f}")

# Test for conditional independence given Z (should be independent)
p_conditional = crit_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0100
P-value (conditional) for X _||_ Y | Z: 0.6800

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc

cg = pc(data, alpha=0.05, indep_test='crit')
print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3
class citk.tests.ml_based_tests.EDML(data: ndarray, **kwargs)

Bases: CITKTest

E-Value Double-ML based conditional independence test.

Parameters:
  • data (np.ndarray) – The dataset from which to run the test.

  • model (scikit-learn compatible regressor, optional) – The model used to predict X from Z and Y from Z. Defaults to LightGBM.

  • cv_folds (int, optional) – The number of folds for cross-fitting the residual models.

  • betting_folds (int, optional) – The number of folds for the e-value betting mechanism.

__call__(X: int, Y: int, condition_set: List[int] | None = None, **kwargs) float

Performs an E-Value Double Machine Learning (EDML) CI test.

This test produces an e-value, which is then converted to a p-value. It uses the same residualization as DML but replaces the final permutation test with a betting-based e-value calculation.

Xint

The index of the first variable.

Yint

The index of the second variable.

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value derived from the calculated e-value.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the E-Value Double Machine Learning (EDML) CI Test guide.

Examples

Standalone Usage

import numpy as np
from citk.tests import EDML

# Generate data with a non-linear common confounder Z
# Z -> X and Z -> Y
n = 500
Z = np.random.uniform(-3, 3, n)
X = np.sin(Z) + np.random.randn(n) * 0.2
Y = np.cos(Z) + np.random.randn(n) * 0.2
data = np.vstack([X, Y, Z]).T

# Initialize the test.
edml_test = EDML(data)

# Test for unconditional independence (should be dependent, p-value should be small)
p_unconditional = edml_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_unconditional:.4f}")

# Test for conditional independence given Z (should be independent, p-value should be large)
p_conditional = edml_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 1.0000

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc

cg = pc(data, alpha=0.05, indep_test='edml')
print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3

Simple Correlation-Based Tests

class citk.tests.simple_tests.Spearman(data: ndarray, **kwargs)

Bases: CITKTest

This class is a wrapper around the fisherz test from the causal-learn library on ranked data.

Parameters:

data (np.ndarray) – The dataset from which to run the test.

__call__(X: int, Y: int, condition_set: list[int] | None = None, **kwargs) float

Performs a Spearman partial correlation conditional independence test.

Xint

The index of the first variable.

Yint

The index of the second variable.

condition_setlist[int], optional

A list of indices for the conditioning set. Can be empty.

p_valuefloat

The p-value of the test.

See also

For a detailed explanation of the statistical test, including mathematical formulations and assumptions, please refer to the Spearman’s Rho Test guide.

Standalone Usage

import numpy as np
from citk.tests import Spearman

# Generate data with a non-linear, monotonic relationship
# X -> Z -> Y, where the relationships are not linear
n = 500
X = np.random.rand(n) * 5
Z = np.exp(X / 2) + np.random.randn(n) * 0.1
Y = np.log(Z**2) + np.random.randn(n) * 0.1
data = np.vstack([X, Y, Z]).T

# Initialize the test
spearman_test = Spearman(data)

# Test if X and Y are independent
p_value_unconditional = spearman_test(0, 1)
print(f"P-value (unconditional) for X _||_ Y: {p_value_unconditional:.4f}")

# Test if X and Y are independent given Z
p_value_conditional = spearman_test(0, 1, [2])
print(f"P-value (conditional) for X _||_ Y | Z: {p_value_conditional:.4f}")
P-value (unconditional) for X _||_ Y: 0.0000
P-value (conditional) for X _||_ Y | Z: 0.4640

Usage with PC Algorithm

from causallearn.search.ConstraintBased.PC import pc

# The same data from the standalone example
cg = pc(data, alpha=0.05, indep_test='spearman')

print("Estimated Causal Graph:")
print(cg.G)
Estimated Causal Graph:
Graph Nodes:
X1;X2;X3

Graph Edges:
1. X1 --- X3
2. X2 --- X3