How to Choose a Conditional Independence Test

Choosing the right conditional independence (CI) test depends on the type of your data and the assumptions you are willing to make about the underlying dependence. citk ships 19 tests organised under the six survey families plus four adapter strategies; this guide gives a practical mapping from data + assumptions to tests.

Key Considerations

1. Data Type

  • All continuous:

    • fisherz_citk: linear, Gaussian — fastest baseline.

    • spearman: monotonic but not necessarily linear — robust non-parametric alternative.

    • kci: kernel-based, captures arbitrary non-linear dependence.

    • rcit, rcot: random-Fourier-feature approximations to KCI; faster on larger samples.

    • cmiknn: kNN-based conditional mutual information with local-permutation p-values.

  • All discrete (categorical):

    • gsq (G-test) or chisq (Chi-Square): classical contingency-table tests.

    • dummy_fisherz: one-hot encoding adapter that aggregates Fisher-Z calls; competitive when categorical cardinalities are moderate.

  • Mixed continuous + discrete:

    • cmiknn_mixed: mixed-type kNN CMI estimator (tigramite).

    • mcmiknn: another mixed-type kNN CMI implementation (vendored from upstream hpi-epic/mCMIkNN).

    • regci: parametric likelihood-ratio test using GLM regression chosen per response type (continuous → linear, discrete → logistic).

    • ci_mm: symmetric likelihood-ratio test from R MXM that runs both regression directions and combines them.

    • gcm, wgcm, pcm: ML-residualisation tests using random forest regression (via pycomets); flexible, asymptotically calibrated, and the RF nuisance regressions handle continuous, discrete, or mixed inputs natively.

    • disc_chisq, disc_gsq: equal-frequency discretisation adapters around classical discrete tests.

    • hartemink_chisq: information-preserving Hartemink discretisation (via R bnlearn) + Chi-Square; better dependence preservation than equal-frequency binning.

2. Relationship Type

  • Linear: fisherz_citk is the computationally efficient choice when both Gaussianity and linearity hold.

  • Monotonic: spearman works on ranks; robust to non-linearities as long as the relationship is monotonic.

  • Non-linear / complex: kernel tests (kci, rcit, rcot), kNN-based tests (cmiknn, cmiknn_mixed, mcmiknn), and ML-residualisation tests (gcm, wgcm, pcm) are all designed to detect arbitrary dependence at higher computational cost. wgcm and pcm add power on alternatives where the dependence is localised in the conditioning space or where the predictor is weakly identified.

3. Sample Size

  • Small samples: classical tests (fisherz_citk, spearman, chisq, gsq) are most reliable; non-parametric and ML-based tests need more data for stable estimation.

  • Large samples: kernel tests (especially exact kci) become expensive — prefer rcit/rcot for random-feature approximations, or gcm/wgcm/pcm for ML-residualisation with linear cost.

  • Very large samples: kci is roughly quadratic in \(n\); consider capping or switching to a faster family.

Summary Table

Test Name

Family

Data Type

Relationship Type

Key Assumption(s)

fisherz_citk

Partial Correlation

Continuous

Linear

Approximate Gaussianity

spearman

Partial Correlation

Continuous

Monotonic

Monotonicity

chisq

Contingency Table

Discrete

Any

Adequate cell counts

gsq

Contingency Table

Discrete

Any

Adequate cell counts

regci

Regression

Mixed or continuous

Any (within model class)

Correct GLM specification per variable type; requires tigramite

ci_mm

Regression

Mixed

Any (within model class)

Correct linear/logistic per variable; requires rpy2 + R MXM

cmiknn

Nearest Neighbor

Continuous

Any

Sample size adequate for kNN density estimation; requires tigramite

cmiknn_mixed

Nearest Neighbor

Mixed

Any

Variable types declared via data_type; requires tigramite

mcmiknn

Nearest Neighbor

Mixed

Any

Vendored upstream mCMIkNN; no extra install required

kci

Kernel

Continuous

Any

Suitable kernel choice; cost is at least quadratic in \(n\)

rcit

Kernel

Continuous

Any

Random-feature approximation; requires rpy2 + R RCIT

rcot

Kernel

Continuous

Any

Random-feature approximation with reduced-dim conditioning; requires rpy2 + R RCIT

gcm

Machine-Learning-Based

Mixed or continuous

Any

Consistent nuisance regression; requires pycomets

wgcm

Machine-Learning-Based

Mixed or continuous

Any (esp. localised)

Consistent nuisance regression + sample splitting; requires pycomets

pcm

Machine-Learning-Based

Mixed or continuous

Any (assumption-lean)

Consistent residualisation; requires pycomets

disc_chisq

Adapter Strategies

Mixed or continuous

Any

Discretisation preserves dependence; ChiSq cell-count rule

disc_gsq

Adapter Strategies

Mixed or continuous

Any

Discretisation preserves dependence; GSq cell-count rule

dummy_fisherz

Adapter Strategies

Mixed or discrete

Any (encoded space)

One-hot encoding fidelity; combined p-values approximation

hartemink_chisq

Adapter Strategies

Mixed or continuous

Any

Information-preserving discretisation; requires rpy2 + R bnlearn