Till now, we have calculated the probability of events based on the empirical( experimental ) evidence or on probabilities of other events. As we had remarked in our earlier tutorials that statistics enables us to tell a story about data, Bayesian inference is another powerful form of such story telling. It helps us to understand how our belief about the occurrence of an event can affect the probability of the event.
Generally, Bayes' theorem can be defined as:
where A and B are two events.
- P(A|B) := likelihood of the occurrence of event A given that event B has occurred.
- P(A) := prior probability of the occurrence of event A
- P(B) := prior probability of the occurrence of event B
- P(B|A) := likelihood of the occurrence of event B given that event A has occurred.
In calculating the probability of occurrence of A given B has occurred, we have also incorporated the evidence of A and B occurring. Prior probability is the probability of the occurrence of an event without the presence of test evidence while Posterior probability accounts for the change in probability after incorporating test evidence.
Q) Let us do a simple example on spam classification. Let A denote the probability that mail is "spam", B denote that it is of "low priority", and C denote that it is of "high priority". The prior probabilities are:
P(A) = 0.7, P(B) = 0.2, P(C) = 0.1. Let K denote the likelihood that the mail contains the word "free". The posterior probabilities accounting the prior experience of the mails containing "free" can be outlined as follows:
P(K|A) = 0.9, P(K|B) = 0.01, P(K|C) = 0.01. Given that I have received a mail containing "free", what is the probability that it is spam?
Ans. We have to calculate the probability of mail being spam given that it contains the word free. Applying the Bayesian theorem, we get:
Now, to calculate P(K), we need to account our experience too:
Calculating both the above equations, we get: P(A|K) = 0.995.
Key Takeaways:
1) Bayes' theorem is a special tool to incorporate the evidence into the given probabilities to find the posterior probability.
2) Bayes rule helps us to infer more powerfully from the given probabilities.
References:
1) Udacity
2) A Concise Course in Statistical Inference (Springer Texts in Statistics)
3) http://www.sciweavers.org/free-online-latex-equation-editor
I would love to receive feedback in the comment section below.