Conditional Probability & Bayes Rule

Conditional Probabilities

Let us consider a probability measure P: \mathcal{A} \rightarrow \mathbb{R} of a measurable space (\Omega,  \mathcal{A}). Further, let A, B\in \mathcal{A}, valid for the entire post.

Rendered by

Venn diagram of a possible constellation of the sets A and B

Let us directly start with the formal definition of a conditional probability. Illustrations and explanations follow immediately afterwards.

Definition (Conditional Probability)
Let (\Omega, \mathcal{A}, P) be a probability space and P(B)>0. The real value

P(A|B) := \frac{P(A \cap B)}{P(B)}

is the probability of A given that B has occurred. P(A \cap B) is the probability that both events A and B occur and P(B) is the new basic set since P(\Omega \setminus B)=0.


A conditional probability, denoted by P(A|B), is a probability measure of an event A occurring, given that another event B has already occurred. That is, P(A|B) reflects the probability that both events A and B occur relative to the new basic set B.

The objective of P(A|B) is two-fold:

  1. Determine the probability of A \in \mathcal{A} while
  2. Considering that B\in \mathcal{A} has already occurred.

The last bullet-point 2. actually means P(B)=1 since we know (by assumption, presumption, assertion or evidence) that B has been occurred. In particular, B cannot be a null set since P(B)>0. Due to the additivity of a probability space we get P(\Omega \setminus B)=0 as (\Omega\setminus B)+B=\Omega. The knowledge about B might be interpreted as an additional piece of information that we have received over time.

The following examples are going to illustrate this very basic concept.

Example (Default Rates)
Let us assume that A represents the set of all defaulting companies in the world, and B represents the defaulting companies in Germany. Hereby, we further assume B \subseteq A. Let us further assume that the average probability of default of A equals P(A)=0.021 . If we restrict the population to defaulting companies located in Germany, our estimate can be updated by this knowledge. For instance, we could state that P(A | B)=0.0141.

As a motivation of the above example, the latest S&P’s 2018 Annual Global Corporate Default And RatingTransition Study and 2018 CreditReform Default Study of German companies state average default rates.


Example (Urn)
An urn contains 3 white and 3 black balls. Two balls will be drawn successively without putting the balls back to the urn. We are interested in the event

A:= \{ “white ball in the second draw” \}

The probability of A depends obviously on the result of the first draw. We distinguish two cases as follows.

  1. B_w := \{ “First draw results in a white ball” \}
    3 black and 2 white balls are left after the first draw. Hence, we have P(A|B_w)=\frac{\frac{3}{6} \cdot  \frac{2}{5} }{ \frac{3}{6} }= \frac{2}{5}.
  2. B_b := \{ “First draw results in a black ball” \}
    2 black and 3 white balls are left after the first draw. Hence, we have P(A|B_b)=\frac{\frac{3}{6} \cdot  \frac{3}{5}}{\frac{3}{6} }= \frac{3}{5}.

Notice that P(B_b)=P(B_w)= \frac{1}{2}. In addition, please realize that A and B_b/B_w are independent since we have not put the ball back to the urn.


Let us consider the probability measure derived from the conditional probability in more detail.

Let (\Omega, \mathcal{A}, P) be a probability space, A, B\in \mathcal{A} and P(B)>0. The map

A \mapsto P(A|B) =  \frac{P(A \cap B)}{P(B)}

defines a probability measure on \mathcal{A}.

Apparently, P(A|B)>0 since P(A\cap B)>0 and P(B)>0 for all A, B\in \mathcal{A}. Further, P(\Omega|B)= \frac{P(\Omega\cap B)}{P(B)}=1. The \sigma-additivity follow by

    \begin{align*}P(\sum_{i=1}^{\infty}{A_i} | B) &=  \frac{ P(\sum_{i=1}^{\infty}{A_i \cap B}) }{ P(B) } \\ &=  \frac{  \sum_{i=1}^{\infty} { P(A_i \cap B) } }{ P(B) } \\ &=   \sum_{i=1}^{\infty}{ P(A_i | B)}. \end{align*}


Bayes Rule

As outlined in the last section of this post, the conditional probability P(A|B) is the probability that both events A and B occur relative to the new basic set B. Let us transform the conditional probability formula as follows:

    \begin{align*}P(A|B) &= \frac{P(A \cap B)}{P(B)} \\\Leftrightarrow P(A|B) P(B) &= P(A\cap B).\end{align*}

Notice that

    \begin{align*}P(B|A) &= \frac{P(B \cap A)}{P(A)} \\\Leftrightarrow P(B|A) P(A) &= P(A\cap B).\end{align*}

Hence, we can conclude that

(1)   \begin{align*}P(B|A) P(A) &= P(A|B) P(B)\\\Leftrightarrow P(A|B) &= \frac{P(A) P(B|A)}{P(B)}.\end{align*}

Formula (1) is also called Bayes’ Rule or Bayes’ Theorem.