# How to Choose a Conditional Independence Test Choosing the right conditional independence (CI) test is crucial for the validity of your causal discovery or feature selection analysis. The appropriate test depends on the characteristics of your data and the underlying assumptions you are willing to make. ## Key Considerations Here are the primary factors to consider when selecting a test: ### 1. Data Type - **Continuous Data**: If your variables are all continuous, you have several options: - `fisherz`: Assumes linear relationships and multivariate normal data. It is very fast but may fail if these assumptions are violated. - `spearman`: A non-parametric alternative that works on ranked data. It is suitable for monotonic (but not necessarily linear) relationships. - `kci`: A kernel-based test that can capture complex, non-linear relationships. It is powerful but computationally more intensive. - **Discrete Data**: If your variables are categorical: - `gsq` (G-Square) or `chisq` (Chi-Square): Both are classical tests for discrete data based on contingency tables. `gsq` is often preferred for theoretical reasons, especially with smaller sample sizes. - **Mixed Data**: When you have a combination of continuous and discrete variables, you currently need to discretize your continuous data to use tests like `gsq` or `chisq`. Future versions may include dedicated tests for mixed data. ### 2. Relationship Type - **Linear**: If you believe the relationships between your variables are linear, `fisherz` is a computationally efficient choice. - **Monotonic**: For relationships that are consistently increasing or decreasing but not necessarily linear, `spearman` is a robust option. - **Non-Linear / Complex**: For arbitrary, complex relationships, machine learning-based tests like `kci` or `rf` are the most powerful and flexible choices, though they come at a higher computational cost. ## Summary Table | Test Name | Data Type | Relationship Type | Key Assumption(s) | |----------------|-----------------|-------------------|--------------------------------------------------| | `fisherz` | Continuous | Linear | Multivariate normality | | `spearman` | Continuous | Monotonic | Monotonicity | | `gsq` / `chisq` | Discrete | Any | Adequate sample size for contingency table cells | | `kci` | Continuous | Any | None (non-parametric) | | `rf` / `dml` | Continuous | Any | None (non-parametric) |