How to Choose a Conditional Independence Test

Choosing the right conditional independence (CI) test is crucial for the validity of your causal discovery or feature selection analysis. The appropriate test depends on the characteristics of your data and the underlying assumptions you are willing to make.

Key Considerations

Here are the primary factors to consider when selecting a test:

1. Data Type

  • Continuous Data: If your variables are all continuous, you have several options:

    • fisherz: Assumes linear relationships and multivariate normal data. It is very fast but may fail if these assumptions are violated.

    • spearman: A non-parametric alternative that works on ranked data. It is suitable for monotonic (but not necessarily linear) relationships.

    • kci: A kernel-based test that can capture complex, non-linear relationships. It is powerful but computationally more intensive.

  • Discrete Data: If your variables are categorical:

    • gsq (G-Square) or chisq (Chi-Square): Both are classical tests for discrete data based on contingency tables. gsq is often preferred for theoretical reasons, especially with smaller sample sizes.

  • Mixed Data: When you have a combination of continuous and discrete variables, you currently need to discretize your continuous data to use tests like gsq or chisq. Future versions may include dedicated tests for mixed data.

2. Relationship Type

  • Linear: If you believe the relationships between your variables are linear, fisherz is a computationally efficient choice.

  • Monotonic: For relationships that are consistently increasing or decreasing but not necessarily linear, spearman is a robust option.

  • Non-Linear / Complex: For arbitrary, complex relationships, machine learning-based tests like kci or rf are the most powerful and flexible choices, though they come at a higher computational cost.

Summary Table

Test Name

Data Type

Relationship Type

Key Assumption(s)

fisherz

Continuous

Linear

Multivariate normality

spearman

Continuous

Monotonic

Monotonicity

gsq / chisq

Discrete

Any

Adequate sample size for contingency table cells

kci

Continuous

Any

None (non-parametric)

rf / dml

Continuous

Any

None (non-parametric)