What is Conditional Independence?
Conditional Independence (CI) is a fundamental concept in probability theory and statistics, forming the bedrock of modern causal inference and graphical modeling.
Formal Definition
Two variables, \(X\) and \(Y\), are said to be conditionally independent of each other given a third variable (or set of variables), \(Z\), if and only if their conditional joint distribution, given \(Z\), is equal to the product of their individual conditional distributions given \(Z\).
Mathematically, this is written as:
\(X \perp Y \mid Z \iff P(X, Y \mid Z) = P(X \mid Z)P(Y \mid Z)\)
This statement implies that once we know the value of \(Z\), gaining knowledge about \(Y\) tells us nothing new about \(X\), and vice-versa. The conditioning set \(Z\) “blocks” the flow of information between \(X\) and \(Y\).
Intuitive Explanation
Consider the following variables:
X: Ice Cream Sales
Y: Number of Drownings
Z: Average Daily Temperature
In a typical city, you would observe a strong positive correlation between ice cream sales and drownings. As one goes up, the other tends to go up.
However, this relationship is not causal. It is a result of a common cause, or a confounder: the temperature.
When the temperature is high, more people buy ice cream.
When the temperature is high, more people go swimming, which unfortunately leads to more drownings.
If we condition on temperature (i.e., we look at data from only the days where the temperature was, say, 75°F), the relationship between ice cream sales and drownings would vanish. Knowing how many ice creams were sold on a 75°F day gives you no new information about how many drownings occurred on that same day.
Thus, we can say: Ice Cream Sales \(\perp\) Drownings \(\mid\) Temperature.
Why It Matters in Causal Discovery
Conditional independence tests are the primary tool used by constraint-based causal discovery algorithms (like the PC algorithm). These algorithms work by systematically performing CI tests on the data to identify the “d-separations” in the underlying causal graph.
By determining which variables become independent when conditioned on others, these algorithms can prune edges from a fully connected graph to reveal the likely causal structure that generated the data.