In the previous tutorial, we reviewed some basic concepts in probability. In this tutorial, we are going to talk about **probability distributions** and **random variables**. We will also discuss some probability distributions with commonly appear in real-world datasets and problems.

# Probability Distributions and Random variables

Those variables which can take different values randomly are called **random variables**. If the the variables are discrete in nature, they are called **discrete random variables**. For instance, the number of heads that might occur in a series of coin tosses (let’s say 15 coin toss) is a discrete random variable. This number can take any whole number value in the range 0 to 15. Similarly, if the variables are continuous in nature, then it is called **continuous random variable**. For example, the time taken by a radioactive particle to decay is a continuous random variable as it can take infinite number of possible values which can’t be predetermined.

As we discussed above, random variable can randomly take on different values. But are all the values that a random variable can take equally likely, or is it more likely that the random variable takes a particular value more often than other? To understand these things we use a **probability distribution** which describes how likely a random variable is to take on each of its possible values.

## Probability Mass Function

If the random variable is discrete in nature, we use Probability Mass Function to describe its probability distribution. The figure given below is a graph constructed from two simultaneous six-face dice throwing experiment. It captures the result of the sum of the number appeared in two dice in **x** axis and the probability of the occurrence of the sum in **y** axis. For example, the probability of getting 2 as the result of the sum is 1/36.

## Probability Density Function

If the random variable is continuous in nature, we use Probability Density Function to describe its probability distribution. The graph shown in the figure is the probability distribution of speed of vehicle on a highway at a particular location. The x-axis represents speed of vehicle and the y-axis represents probability of vehicles moving at that speed in that location. From the graph, we can clearly see that vehicle moving at the speed of 65 km/hr has highest probability. The total area under the curve gives the total probability of vehicle moving in various speed at that location.

# Expectation or Expected Value

Expectation E of a random variable x is the average value of the random value when it is sampled from its probability distribution **P(x)**. Mathematically, it's given as:

for discrete random variable

for continuous random variable

# Common Probability Distributions and Random Variables

## Discrete Probability Distributions / Random Variables

**Bernoulli**

Bernoulli random variable is used when the experiment results in either success or failure where success is represented as 1 and failure as 0. If **P** is probability of success, then probability mass function of Bernoulli variable **x** is given as:

**Binomial**

This distribution is used when we want to count how many successes we have when we repeat an experiment for **n** number of times independently. The probability mass function of binomial variable **x,** where p is probability of success, is given as:

**Poisson**

Poisson variable **x **is used when we are counting the number of occurrences of an event in a unit of time such that the occurrences are independent and rarely simultaneous. For an average number of occurrence** λ **the probability mass function is given as :

## Continuous Probability Distributions / Random Variables

**Uniform**

We use uniform random variable **x** when the probability density for every value between an interval (starting from **a** and ending at **b**) is equal. The probability density function of x is given as:

**Exponential**

Exponential variable **x** is used when we are measuring the time until the first occurrence of an event, such that the occurrences in disjoint time intervals are independent and rarely simultaneous. For the average number of occurrences per unit time **a**, the probability density function is given as:

**Normal**

Normal random variable **x** comes handy for different situations. We can use it to model physical measurements like weight, height etc. We can also use it to model error made by measuring instruments. In general, it can be used when the average and variance of the quantity being measured is known. The probability density function is given as:

# References

- Probability and Information Theory, Deep Learning, Ian Goodfellow: http://www.deeplearningbook.org/contents/prob.html
- Probability Theory Review for Machine Learning: https://see.stanford.edu/materials/aimlcs229/cs229-prob.pdf