Probability Refresher

We can display the joint frequency distributions for two discrete random variables as a table, which can offer an straightforward way for calculating probabilities given observed data (also, it provides a better intuition of what is going on). These are also known as cross-tabs (or “cross-tabulation tables”)

We can calculate the “marginal probabilities” by summing across the rows and columns (and reporting those sums in the margins of the table (i.e. thus why we say “marginal”)).

Key Idea: Probability at the end of the day is just counting with style.

Previous Drug Use No Previous Drug Use Total
Voted (2016) 344 456 800
Did Not Vote (2016) 566 132 698
Total 910 588 1498

  1. What is the probability of someone voted in 2016?

\[ Pr(\text{Voted}) = \frac{800}{1498} \approx .53\]

  1. What is the probability that someone voted AND reported previous drug use?

\[ Pr(\text{Voted} \cap \text{Drug Use}) = \frac{344}{1498} \approx .23 \]

  1. What is the probability someone voted OR reported no previous drug use?

\[ Pr(\text{Voted} \cup \text{No Drug Use}) = \frac{800}{1498} + \frac{588}{1498} - \frac{456}{1498} \approx .62 \]

  1. What is the probability someone voted GIVEN reporting previous drug use?

\[ Pr(\text{Voted} | \text{Drug Use}) = \frac{344}{910} \approx .38 \]

\[ Pr(\text{Voted} | \text{Drug Use}) = \frac{Pr(\text{Voted} \cap \text{Drug Use})}{Pr(\text{Drug Use})} = \frac{\frac{344}{1498}}{\frac{910}{1498}} \approx .38 \]

Conditional Probability as Aggregating and Subsetting Data

In terms of data, conditional probability is just aggregating and subsetting data.

Calculating Conditional Probability

.rename({0.0:"Pr(Voted =  0 | Drugs == 1)",
         1.0:"Pr(Voted =  1 | Drugs == 1)"}) 
##                          voted  drugs
## 0  Pr(Voted =  0 | Drugs == 1)  566.0
## 1  Pr(Voted =  1 | Drugs == 1)  344.0

Use cross-tabs to build out the contingency tables with the marginal counts.

## drugs  0.0  1.0   All
## voted                
## 0.0    132  566   698
## 1.0    456  344   800
## All    588  910  1498

With the above in mind, calculating probabilities is really straight forward.

For example, let’s consider question 4 again: What is the probability someone voted GIVEN reporting previous drug use?

total_drugs = dat.query("drugs == 1").shape[0]
total_voted_given_drugs = dat.query("drugs == 1 & voted == 1").shape[0]
pr = total_voted_given_drugs/total_drugs
## 0.38

Bayes Theorem

We can define a conditional probability as follows

\[Pr(B | A)Pr(A) = Pr(A|B)Pr(B)\]

Thus, a conditional probability can be expressed as:

\[\begin{equation} Pr(B | A) = \frac{Pr(A|B)Pr(B)}{Pr(A)} \end{equation}\]

Bayes rule (or Bayes Theorem) offers a way of re-arranging the above.

\[\begin{equation} Pr(A | B) = \frac{Pr(B|A)Pr(A)}{Pr(B)} \end{equation}\]

This is useful when \(Pr(A | B)\) is easier to calculate than \(Pr(B | A)\) (or vice versa) or when the joint probability is unknown.

Looking at the above equation, we might not know \(Pr(B)\). However, we can calculate it by using information that we do have.

\[ Pr(B) = Pr(B | A)Pr(A) + Pr(B | A^{not})Pr(A^{not}) \]


  • \(Pr(A^{not}) = 1 - Pr(A)\)
  • \(Pr(B | A^{not}) = 1 - Pr(B^{not}|A^{not})\)

This offers a complete formulation of Bayes theorem.

\[\begin{equation} Pr(A | B) = \frac{Pr(B|A)Pr(A)}{Pr(B | A)Pr(A) + Pr(B | A^{not})Pr(A^{not})} \end{equation}\]


  • \(Pr(B|A)\) is the likelihood of event \(B\) given \(A\).
  • \(Pr(A)\) is the prior probability of event \(A\) (i.e. our belief about the likelihood of event \(A\))
  • \(Pr(B)\) or \(Pr(B | A)Pr(A) + Pr(B | A^{not})Pr(A^{not})\) is a normalizing constant (it ensures the probabilities sum to 1).
  • \(Pr(A|B)\) is known as the posterior probability. The updated probability of event \(A\) given \(B\) after learning something from the data.

Put simply,

\[\text{Posterior} \propto \text{Likelihood}\times \text{Prior}\]


The following materials were generated for students enrolled in PPOL564. Please do not distribute without permission. |