Ravi, the man only lies 1/3rd of the time. Hence P(B|A') = 1/3.

Message

Follow

Keshav Dhandhania

BSc, MSc 2014 @ MIT (AI, Deep Learning). Former Competitive Programmer.

Active In

Competitive Programming

68 replies. 64 discussions. Moderator

Deep Learning

33 replies. 105 discussions. Moderator

International Olympiad in Informatics

150 replies. 111 discussions. Moderator

Machine Learning

34 replies. 42 discussions. Moderator

Indian Computing Olympiad (ICO)

56 replies. 45 discussions. Moderator

Algorithms and Computer Programming

19 discussions. Moderator

HBO Westworld

46 replies. 3 discussions. Member

Artificial Intelligence

10 replies. 7 discussions. Moderator

Natural Language Processing

10 discussions. Moderator

Cryptography and Cyber Security

7 discussions. Moderator

Featured Contributions

NaN.

tutorial

TutorialMachine Learning23w

Probability: Conditional and Marginal Probabilities, Bayes' Rule and Random Variables (Quick Review)by Wiki

Probability is used to mathematically describe the chance of occurrence of an event. It quantifies randomness and uncertainty. For example, probability tells us the chance of it raining on a particular day, or someone winning a lottery. The probability that an event occurs is always between 0 and 1, where 1 represents absolute certainty and 0 represents completely impossible.

Probability can be basically determined in two ways - theoretically and empirically. The theoretical (also called classical) method is used especially to determine probabilities of game of chances like lotteries, roulette wheel, coin flip etc. The empirical (also called observational) method is used to determine probabilities of an event whose outcome can’t be predetermined.

To describe probability mathematically, we need to define a few terms first:

**Sample Space**

Sample space is a collection of all possible outcomes of a random experiment. For example, in a six faced die throwing experiment, the sample space **S** = {1, 2, 3, 4, 5, 6}.

**Event Space**

Event Space is a set whose elements (events) are the subset of a sample space. For example, **A** = {1, 2} is an event space of sample space **S**. Here **A** denotes the event that the die throw results in a 1 or 2.

**Probability measure**

Probability measure **P** is the function which give the measure of how likely certain event is. For example, the event A happens when the die throw results in 1 or 2, i.e. 2 out of 6 possibilities. Hence, the probability measure of **A**, **P(A)** = 2/6 = 1/3.

Joint probability is used to denote the probability of multiple variables at the same time. For example, let's say we care about two things in a person - their gender, which is male or female, and the length of their hair, long or short. P(male, short) is the probability that a person is male and has short hair.

Conditional probability denotes the probability of an event given that some other event has already occurred. For example, probability of a person being male given long hair is denoted as P(male | long hair). Note that this is different from, P(long hair | male), the probability of having long hair given that a person is male.

When we find probability of occurrence of certain event irrespective of any other event, then it is called marginal probability. For example, probability of person being male is given by P(male). It doesn't matter whether or not the person has short hair or long.

Joint, conditional and marginal probabilities satisfy the following relation:

P(A,B) = P(B|A) \space P(A)

That is, probability of A and B, is the same as, probability of A times probability of B given A. For example,

P(\text{male}, \text{long hair}) = P(\text{long hair}|\text{male}) \space P(\text{male})

That is, probability of a person being male and having long hair = probability of person being male times probability of having long hair given the person is male.

Bayes' Theorem is a relation between conditional and marginal probabilities of two variables. It's given as follows:

The Bayes' theorem can be derived from the relation between joint, marginal and conditional probabilities given above, since

P(A|B) \space P(B) = P(A,B) = P(B|A) \space P(A)

To illustrate the concept of Bayes theorem let’s take an example.

Suppose a man is going to throw a die and tell us the result. 2 out of 3 times, he simply tells us the results honestly. The other 1 out of 3 times, he lies in the following way:

- if the die result is 6, he picks a number randomly (from 1 to 5) to lie
- if the die result is between 1 and 5, he says the result is 6.

He throws a die and reports that the number he obtained is 6. We don't get to see the die, but we want to know the probability of that the result was actually a 6.

To solve the problem, let’s consider **A **be the event the dice throw resulted in a 6, and let **B** be the event that the man reports a six. We would like to calculate P(A|B).

We know that,

- P(A) = 1/6, i.e. probability that six occurs is 1/6 since it is 1 out of 6 possible outputs.
- P(B|A) = 2/3, i.e. probability that man reports a 6 when it is a 6 is 2/3, since he tells the truth 2 out of 3 times.
- P(B), probability that the man reports six, is slightly more complicated. We need to take into account what the man would do in each case, depending on whether the dice throw result was actually a six or not, and then add it up.
- Probability that man reports six when it is actually six, P(B|A) = 2/3. We know that P(A) = 1/6.
- Probability that the man reports six when it is not P(B|A') = 1/3. Also, P(A') = 5/6.
- Hence, P(B) = 1/6*2/3 + 5/6*1/3.

Now, we can use Bayes theorem:

P(A|B)=\frac{P(B|A)P(A)}{P(B)}

P(A|B)=\frac{\frac{1}{6}\times\frac{2}{3}}{\frac{1}{6}\times\frac{2}{3}+\frac{5}{6}\times\frac{1}{3}}=\frac{2}{7}

Those variables which can take different values randomly are called **random variables**. If the the variables are discrete in nature, they are called **discrete random variables**. For instance, the number of heads that might occur in a series of coin tosses (let’s say 15 coin toss) is a discrete random variable. This number can take any whole number value in the range 0 to 15. Similarly, if the variables are continuous in nature, then it is called **continuous random variable**. For example, the time taken by a radioactive particle to decay is a continuous random variable as it can take infinite number of possible values which can’t be predetermined.

As we discussed above, random variable can randomly take on different values. But are all the values that a random variable can take equally likely, or is it more likely that the random variable takes a particular value more often than other? To understand these things we use a **probability distribution** which describes how likely a random variable is to take on each of its possible values.

If the random variable is discrete in nature, we use Probability Mass Function to describe its probability distribution. The figure given below is a graph constructed from two simultaneous six-face dice throwing experiment. It captures the result of the sum of the number appeared in two dice in **x** axis and the probability of the occurrence of the sum in **y** axis. For example, the probability of getting 2 as the result of the sum is 1/36.

If the random variable is continuous in nature, we use Probability Density Function to describe its probability distribution. The graph shown in the figure is the probability distribution of speed of vehicle on a highway at a particular location. The x-axis represents speed of vehicle and the y-axis represents probability of vehicles moving at that speed in that location. From the graph, we can clearly see that vehicle moving at the speed of 65 km/hr has highest probability. The total area under the curve gives the total probability of vehicle moving in various speed at that location.

Expectation E of a random variable x is the average value of the random value when it is sampled from its probability distribution **P(x)**. Mathematically, it's given as:

E[x] = \sum_x{xP(x)}

for discrete random variable

E[x] = \int{xP(x)dx}

for continuous random variable

**Bernoulli**

Bernoulli random variable is used when the experiment results in either success or failure where success is represented as 1 and failure as 0. If **P** is probability of success, then probability mass function of Bernoulli variable **x** is given as:

\ P(x) = \begin{cases} p & \quad \text{if } x=1\\ 1-p & \quad \text{if } x=0 \end{cases}

**Binomial**

This distribution is used when we want to count how many successes we have when we repeat an experiment for **n** number of times independently. The probability mass function of binomial variable **x,** where p is probability of success, is given as:

P(x) = {n \choose x} p^x (1-p)^{ n-x}

**Poisson**

Poisson variable **x **is used when we are counting the number of occurrences of an event in a unit of time such that the occurrences are independent and rarely simultaneous. For an average number of occurrence** λ **the probability mass function is given as :

P(x) = e^{-\lambda}\frac{\lambda^x}{x!}

**Uniform**

We use uniform random variable **x** when the probability density for every value between an interval (starting from **a** and ending at **b**) is equal. The probability density function of x is given as:

\ P(x) = \begin{cases} \frac{1}{b-a} & \quad \text{if } a\leq x\leq b\\ 0 & \quad \text{otherwise} \end{cases}

**Exponential**

Exponential variable **x** is used when we are measuring the time until the first occurrence of an event, such that the occurrences in disjoint time intervals are independent and rarely simultaneous. For the average number of occurrences per unit time **a**, the probability density function is given as:

\ P(x) = \begin{cases} ae^{-ax} & \quad \text{if } x\geq 0\\ 0 & \quad \text{otherwise} \end{cases}

**Normal**

Normal random variable **x** comes handy for different situations. We can use it to model physical measurements like weight, height etc. We can also use it to model error made by measuring instruments. In general, it can be used when the average and variance of the quantity being measured is known. The probability density function is given as:

P(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{1}{2\sigma^2}(x-\mu)^2} where,\mu =mean,\sigma^2=variance

- Probability and Information Theory, Deep Learning, Ian Goodfellow: http://www.deeplearningbook.org/contents/prob.html
- Probability Theory Review for Machine Learning: https://see.stanford.edu/materials/aimlcs229/cs229-prob.pdf

Read more…(1531 words)

Copied

reply in this discussion

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014) ·

Ravi, the man only lies 1/3rd of the time. Hence P(B|A') = 1/3.

Read more… (13 words)

3.

course

Advanced Algorithms and Data Structures

This is an advanced course on algorithms and data structures.

If you want to learn algorithms and data structures from the ground-up, follows this course instead: Learn Algorithms and Data Structures.

The tutorials have been edited and curated meticulously and are some of the best tutorials on each topic available online.

Read more…(52 words)

Subscribe12

18 items.

Start this course

Solving linear recurrence using matrix exponentiation

Contributed 100%

4.

tutorial

Kruskal's algorithm for Minimum Spanning Tree

Given an *undirected weighted graph*, a minimum spanning tree (MST) is a subset of the edges of the graph which *form a tree* and have the *minimum total edge weight*. For a MST to exist, the graph must be *connected *(that is, every pair of nodes must be reachable from each other)*.*

Read more…(472 words)

5.

quiz

Pro Content

Hands-on: Implementing Minimum Spanning Tree algorithms (Prim's / Kruskal's)

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →6.

discussion

Pro Content

Hands-on: Implementing Dijkstra's algorithm

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →7.

discussion

Pro Content

Hands-on: Implementing Breadth-first search

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →8.

quiz

Pro Content

Hands-on: Longest Common Subsequence (with progressive 'hint-by-hint' solution)

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →9.

quiz

Pro Content

Hands-on: Paths in a Grid (with progressive 'hint-by-hint' solution)

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →10.

quiz

Pro Content

Hands-on: Tiling Problem v2.0 (with progressive 'hint-by-hint' solution)

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial →11.

discussion

Pro Content

Hands-on: Implementing Binary Search

This is for Pro Members

We like how far you've come. Upgrade today and get access to all quizzes, assignments and projects.

Remember, all our lessons are still free.

Already a member? Sign in.

Start your free 7-day trial → Load More