Naive Bayes is a widely used classification algorithm. It is a supervised learning algorithm based on Bayes’ Theorem. The word naive comes from the assumption of independence among features. That is, if our input vector is (x1, x2,...,xn), then xi's are conditionally independent given y.
Let's start with Bayes' theorem (for naive bayes, x is the input and y is the output):
When we have more than one feature, we can rewrite Bayes' theorem as:
Since we are making the assumption that xi's are conditionally independent given y, we can rewrite the above as
but we also know that P(x1, x2, .., xn) is a constant given the input, i.e.
- the left hand side is the term we are interested in, probability distribution of the output y given input x
- P(y) can be estimated by counting the number of times each class y appears in our training data (this is called Maximum a Posteriori estimation)
- P(xi|y) can be estimated by counting the number of times each value of xi appears for each class y in our training data
- Estimate P(y): P(y=t) = number of times class t appears in the dataset / size of dataset
- Estimate P(xi|y): P(xi=k|y=t) = number of times xi has value k and y has value t / number of data points of class t
- Estimate P(y|x1,...,xn): Use above estimated values of P(y) and P(xi|y) and equation (1). Thereafter, normalize the values.
There are several variants of naive bayes which use different distributions for P(xi|y) such as gaussian distribution (gaussian naive bayes), multinomial distribution (multinomial naive bayes) and bernoulli distribution (bernoulli naive bayes).
# we will use iris datasetfrom sklearn.datasets import load_irisfrom sklearn.naive_bayes import GaussianNBimport numpy as np# load the datasetdata = load_iris()model = GaussianNB()model.fit(data.data, data.target)# evalauteprint(model.score(data.data, data.target))# output = 0.96# predictmodel.predict([[4.2, 3, 0.9, 2.1]])# 0 = setosa, 1 = versicolor, and 2 = virginica
Naive bayes is one of the simplest yet effective algorithms for
- Text classification: For example, we have a number of news articles, and we want to learn to classify if the article is about politics, health, technology, sports or lifestyle.
- Spam filtering: We have a number of emails, and we want to learn to classify if the email is spam or not.
- Gender classification: Given features such as height, weight, etc, predict whether the person is male or female.