A Taxonomy of Conditional Independence Tests¶
The 19 conditional independence tests in citk are organised under the six families of the Paper 0 survey, plus a group of adapter strategies that wrap base tests rather than constituting a distinct family. Understanding these groups helps you reason about which test fits a given research question and data type.
1. Partial Correlation¶
Tests based on (Pearson- or Spearman-)partial correlation between two variables after controlling for the conditioning set.
Core idea: Fit out the conditioning set linearly (or after ranking) and test whether the residualised correlation is zero.
Examples in
citk:fisherz_citk,spearman.Strengths: Computationally very fast; statistically efficient under the relevant assumptions (linearity, Gaussianity, or monotonicity).
Weaknesses: Low power against non-linear or non-monotonic dependence.
2. Contingency Table¶
Classical statistical tests designed for discrete (categorical) variables.
Core idea: Compare observed and expected cell counts in a stratified contingency table.
Examples in
citk:chisq,gsq.Strengths: Well-understood asymptotic theory; robust on truly categorical data.
Weaknesses: Requires discrete data; loses power as the contingency table grows sparse relative to the sample size.
3. Regression¶
Parametric likelihood-ratio tests built on regression models, with link functions chosen per variable type.
Core idea: Compare nested regression fits with and without the variable of interest in the predictor set; the likelihood-ratio statistic is asymptotically chi-squared under the null.
Examples in
citk:regci(tigramite RegressionCI),ci_mm(R MXM ci.mm — symmetric, both directions combined).Strengths: Native support for mixed continuous and discrete data; small-sample behaviour better than non-parametric tests when the model class is appropriate.
Weaknesses: Power degrades when the linear / logistic link misrepresents the true dependence.
4. Nearest Neighbor¶
Non-parametric tests based on \(k\)-nearest-neighbour estimators of conditional mutual information, paired with permutation-based p-values.
Core idea: Estimate \(I(X; Y \mid Z)\) from local neighbourhood statistics and assess significance with a local-permutation null.
Examples in
citk:cmiknn,cmiknn_mixed,mcmiknn.Strengths: Detects arbitrary non-linear dependence; mixed-data variants handle ties on discrete coordinates.
Weaknesses: Requires adequate sample size for stable density estimation; permutation p-values are computationally non-trivial.
5. Kernel¶
Non-parametric tests that operate in a Reproducing Kernel Hilbert Space (RKHS), with Hilbert-Schmidt independence criteria as the underlying dependence measure.
Core idea: Map data into an RKHS and test for independence in the residualised kernel features; under a universal kernel, the criterion is zero exactly when the variables are independent.
Examples in
citk:kci(exact, Python causal-learn implementation),rcitandrcot(random Fourier feature approximations, R RCIT package).Strengths: Detects arbitrary smooth dependence; few distributional assumptions.
Weaknesses: Exact
kciis at least quadratic in sample size; sensitivity to kernel and bandwidth choice.
6. Machine-Learning-Based¶
Tests built around nuisance regressions estimated by flexible ML predictors, with calibrated test statistics derived from the residual structure.
Core idea: Regress \(X\) and \(Y\) on the conditioning set \(Z\) using an ML method, then test the residuals for non-zero covariance (GCM), weighted covariance (WGCM), or projected covariance (PCM).
Examples in
citk:gcm,wgcm,pcm(all viapycometswith random forest regression by default).Strengths: Asymptotic-normal calibration with flexible nuisance models; the RF nuisance regressions handle continuous, discrete, and mixed inputs natively;
wgcmadds power on localised dependence;pcmis assumption-lean and robust to weakly identified predictors.Weaknesses: Requires sufficient sample size for nuisance estimation rates to hold; test calibration depends on the rate condition.
7. Adapter Strategies¶
Adapters that modify or wrap a base test rather than constituting a distinct family. The survey describes these as robustness layers — transformations applied on top of an existing CI test rather than a seventh family.
Core idea: Transform the data — discretise, dummy-encode, or apply an information-preserving binning — and then call a base CI test on the transformed data.
Examples in
citk:disc_chisq,disc_gsq(equal-frequency discretisation +chisq/gsq);dummy_fisherz(one-hot encoding + Fisher’s combinedfisherz);hartemink_chisq(Hartemink information-preserving discretisation via Rbnlearn+chisq).Strengths: Lets classical tests apply to data types they were not designed for; useful baselines for mixed-data settings.
Weaknesses: Inherits the assumptions of both the transformation and the base test; performance depends on whether the transformation preserves the dependence structure.