If $P(A) \neq 0$ and $P(B) \neq 0$, then $P(B|A) \geq P(B)$ is equivalent to $P(A|B) \geq P(A)$

Both statements are saying that $P(A \cap B) \ge P(A)\cdot P(B)$. Note that $P(A)\cdot P(B)$ corresponds to $P(A \cap B)$ if $A$ and $B$ were independent. Thus $P(A \cap B) \ge P(A)\cdot P(B)$ means that there is some positive correlation (in a figurative sense) between these two events.

On the other hand, the phrase that always pops in my mind when I am talking about conditional probability is you are changing your universe. $P(X|Y)$ means that you are looking at $P(X)$ in a different universe, i.e. $Y$. Since there is a positive correlation between these two, it makes sense that $P(A|B) \ge P(A)$. This is because we are changing our universe to $B$, and since we know that $B$ has a positive correlation with $A$, it means that in this new universe, we are more likely to see $A$. This is symmetric in $A$ and $B$ of course.

I don't know if this is intuition enough.


General intution answer: Having $P(A | B) > P(A)$ is an indication that $A$ and $B$ are positively correlated -- that is, they occur more frequently together. From this notion, it should be natural that if $A$ has happened, $B$ is more likely.

Geometric intuition answer: If we talk about this in a Venn Diagram context, then the statement $P(B | A) > P(B)$ means that events $A$ and $B$ have heavy overlap. Here are some poorly-drawn MSPaint pictures to explain what I mean: enter image description here Here, I'm presuming that (1) Venn Diagrams always have areas correspond to probabilities (which is of course not true), and (2) that my drawings actually are accurate, which is probably false. But the spirit of the drawings is: the top set have independent events, for which the intersection area is "proportional" to the event probabilities. Specifically, the proportion is $$\frac{|A \cap B|}{|A|} = \frac{|B|}{|\Omega| = 1}$$ which is obviously just a thinly-veiled version of the original probabilistic statement. In the top case, the pictures (allegedly) have "proportional intersections" in this way; in the bottom pictures, the overlap is larger than that prescribed proportion. I think this "intersection proportionality" might be the kind of geometric crux of the question that you're asking about.


First, note that this theorem is equivalent to

If $P(A) \neq 0$ and $P(B) \neq 0$ and $$P(B|A) \geq P(B), \tag{1}$$ then $$P(A|B) \geq P(A).\tag{2}$$

because you can just swap $A$ and $B$ in the theorem to get the direction (2) $\Rightarrow$ (1).


Real life intuition (aka Bayesian inference):

Consider a thing $B$ that's more likely when you know $A$ happens than it is when you don't know whether $A$ happens. Mathematically, this is condtion (1). Now if you observe $B$, it is actually evidence for $A$. Which mathematically means (2).

Example:

You know that rigged dice tend to give $6$s more often than fair dice. You pick up a die you find somewhere. As you really haven't encountered that many significantly rigged dice in your life, but have seen quite many almost fair dice, you initially think that your die is likely close to fair. You throw the die 20 times and observe 20 sixes. What do you deduce? You start thinking that the die is very probably rigged.

Mathematically: Let $A$ be the event that the die is significantly rigged, and $B$ be the event that you get 20 sixes in a row. The fact that rigged dice give more sixes than fair dice means $P(B|A)>P(B)$. Before observing the throws, you think the probability that the die is rigged is $P(A)$, which is very small. But once you see the throws, you think that the probability is $P(A|B)$. The new probability is a lot larger, which means that $P(A|B)>P(A)$. The fact that the new probability is a lot larger was based on your knowledge that $P(B|A)>P(B)$.