# Introduction: What is Probability?

Probability is used to mathematically describe the chance of occurrence of an event. It quantifies randomness and uncertainty. For example, probability tells us the chance of it raining on a particular day, or someone winning a lottery. The probability that an event occurs is always between 0 and 1, where 1 represents absolute certainty and 0 represents completely impossible.

Probability can be basically determined in two ways - theoretically and empirically. The theoretical (also called classical) method is used especially to determine probabilities of game of chances like lotteries, roulette wheel, coin flip etc. The empirical (also called observational) method is used to determine probabilities of an event whose outcome can’t be predetermined.

To describe probability mathematically, we need to define a few terms first:

**Sample Space**

Sample space is a collection of all possible outcomes of a random experiment. For example, in a six faced die throwing experiment, the sample space **S** = {1, 2, 3, 4, 5, 6}.

**Event Space**

Event Space is a set whose elements (events) are the subset of a sample space. For example, **A** = {1, 2} is an event space of sample space **S**. Here **A** denotes the event that the die throw results in a 1 or 2.

**Probability measure**

Probability measure **P** is the function which give the measure of how likely certain event is. For example, the event A happens when the die throw results in 1 or 2, i.e. 2 out of 6 possibilities. Hence, the probability measure of **A**, **P(A)** = 2/6 = 1/3.

# Probability of multiple variables

## Joint Probability

Joint probability is used to denote the probability of multiple variables at the same time. For example, let's say we care about two things in a person - their gender, which is male or female, and the length of their hair, long or short. P(male, short) is the probability that a person is male and has short hair.

## Conditional Probability

Conditional probability denotes the probability of an event given that some other event has already occurred. For example, probability of a person being male given long hair is denoted as P(male | long hair). Note that this is different from, P(long hair | male), the probability of having long hair given that a person is male.

## Marginal Probability

When we find probability of occurrence of certain event irrespective of any other event, then it is called marginal probability. For example, probability of person being male is given by P(male). It doesn't matter whether or not the person has short hair or long.

## Relation between joint, conditional and marginal probabilities

Joint, conditional and marginal probabilities satisfy the following relation:

That is, probability of A and B, is the same as, probability of A times probability of B given A. For example,

That is, probability of a person being male and having long hair = probability of person being male times probability of having long hair given the person is male.

# Bayes' Theorem

Bayes' Theorem is a relation between conditional and marginal probabilities of two variables. It's given as follows:

The Bayes' theorem can be derived from the relation between joint, marginal and conditional probabilities given above, since

## Example of Bayes Theorem

To illustrate the concept of Bayes theorem let’s take an example.

Suppose a man is going to throw a die and tell us the result. 2 out of 3 times, he simply tells us the results honestly. The other 1 out of 3 times, he lies in the following way:

- if the die result is 6, he picks a number randomly (from 1 to 5) to lie
- if the die result is between 1 and 5, he says the result is 6.

He throws a die and reports that the number he obtained is 6. We don't get to see the die, but we want to know the probability of that the result was actually a 6.

To solve the problem, let’s consider **A **to be the event the dice throw resulted in a 6, and let **B** be the event that the man reports a six. We would like to calculate P(A|B).

We know that,

- P(A) = 1/6, i.e. probability that six occurs is 1/6 since it is 1 out of 6 possible outputs.
- P(B|A) = 2/3, i.e. probability that man reports a 6 when it is a 6 is 2/3, since he tells the truth 2 out of 3 times.
- P(B), probability that the man reports six, is slightly more complicated. We need to take into account what the man would do in each case, depending on whether the dice throw result was actually a six or not, and then add it up.
- Probability that man reports six when it is actually six, P(B|A) = 2/3. We know that P(A) = 1/6.
- Probability that the man reports six when it is not P(B|A') = 1/3. Also, P(A') = 5/6.
- Hence, P(B) = 1/6*2/3 + 5/6*1/3.

Now, we can use Bayes theorem:

That is, given that the man says it was the six, the probability that it actually was a 6 is only 2 / 7.

# References

- Probability and Information Theory, Deep Learning, Ian Goodfellow: http://www.deeplearningbook.org/contents/prob.html
- Probability Theory Review for Machine Learning: https://see.stanford.edu/materials/aimlcs229/cs229-prob.pdf