How to Choose a Conditional Independence Test
Choosing the right conditional independence (CI) test is crucial for the validity of your causal discovery or feature selection analysis. The appropriate test depends on the characteristics of your data and the underlying assumptions you are willing to make.
Key Considerations
Here are the primary factors to consider when selecting a test:
1. Data Type
Continuous Data: If your variables are all continuous, you have several options:
fisherz_citk: Assumes linear relationships and multivariate normal data. It is very fast but may fail if these assumptions are violated.spearman: A non-parametric alternative that works on ranked data. It is suitable for monotonic (but not necessarily linear) relationships.kci: Optional R-backed KCIT (RCIT::KCIT) for complex, non-linear relationships.rcot,rcit: Optional R-backed kernel/random-feature tests from the RCIT package.cmiknn,cmiknn_mixed: Optional tigramite kNN-CMI tests.mcmiknn: Optional mixed-type kNN CMI wrapper from local mCMIkNN repo.regci: Optional tigramite parametric mixed-data regression CI.rf,dml,crit,edml: Flexible ML-based options for non-linear structure.
Discrete Data: If your variables are categorical:
gsq(G-Square) orchisq(Chi-Square): Classical tests based on contingency tables.
Mixed Data: Current built-in options are limited; discretization is still a practical baseline for
gsq/chisq.disc_chisq,disc_gsq: Equal-frequency discretization adapters around classical discrete tests.dummy_fisherz: One-hot encoding adapter with Fisher-Z aggregation.hartemink_chisq: Information-preserving Hartemink discretization (via Rbnlearn) + Chi-square.dct: Optional DCT wrapper from local DCT repository.
2. Relationship Type
Linear: If you believe the relationships between your variables are linear,
fisherzis a computationally efficient choice.Monotonic: For relationships that are consistently increasing or decreasing but not necessarily linear,
spearmanis a robust option.Non-Linear / Complex: For arbitrary, complex relationships, machine learning-based tests like
kciorrfare the most powerful and flexible choices, though they come at a higher computational cost.
Summary Table
Test Name |
Data Type |
Relationship Type |
Key Assumption(s) |
|---|---|---|---|
|
Continuous |
Linear |
Approximate Gaussianity |
|
Continuous |
Monotonic |
Monotonicity |
|
Discrete |
Any |
Adequate contingency support |
|
Continuous |
Any |
Requires |
|
Continuous |
Any |
Requires |
|
Mixed or continuous |
Any |
Requires |
|
Mixed |
Any |
Requires local mCMIkNN repository |
|
Continuous |
Any |
ML residualization quality |
|
Continuous |
Any |
Residual covariance test |
|
Mixed or continuous |
Any |
Discretization quality |
|
Mixed or discrete |
Any |
One-hot encoding fidelity |
|
Mixed or continuous |
Any |
Requires |
|
Mixed or discretized continuous |
Any |
Requires local DCT repository |