How to Choose a Conditional Independence Test
Choosing the right conditional independence (CI) test is crucial for the validity of your causal discovery or feature selection analysis. The appropriate test depends on the characteristics of your data and the underlying assumptions you are willing to make.
Key Considerations
Here are the primary factors to consider when selecting a test:
1. Data Type
Continuous Data: If your variables are all continuous, you have several options:
fisherz
: Assumes linear relationships and multivariate normal data. It is very fast but may fail if these assumptions are violated.spearman
: A non-parametric alternative that works on ranked data. It is suitable for monotonic (but not necessarily linear) relationships.kci
: A kernel-based test that can capture complex, non-linear relationships. It is powerful but computationally more intensive.
Discrete Data: If your variables are categorical:
gsq
(G-Square) orchisq
(Chi-Square): Both are classical tests for discrete data based on contingency tables.gsq
is often preferred for theoretical reasons, especially with smaller sample sizes.
Mixed Data: When you have a combination of continuous and discrete variables, you currently need to discretize your continuous data to use tests like
gsq
orchisq
. Future versions may include dedicated tests for mixed data.
2. Relationship Type
Linear: If you believe the relationships between your variables are linear,
fisherz
is a computationally efficient choice.Monotonic: For relationships that are consistently increasing or decreasing but not necessarily linear,
spearman
is a robust option.Non-Linear / Complex: For arbitrary, complex relationships, machine learning-based tests like
kci
orrf
are the most powerful and flexible choices, though they come at a higher computational cost.
Summary Table
Test Name |
Data Type |
Relationship Type |
Key Assumption(s) |
---|---|---|---|
|
Continuous |
Linear |
Multivariate normality |
|
Continuous |
Monotonic |
Monotonicity |
|
Discrete |
Any |
Adequate sample size for contingency table cells |
|
Continuous |
Any |
None (non-parametric) |
|
Continuous |
Any |
None (non-parametric) |