**Naive Bayes** is a widely used classification algorithm. It is a supervised learning algorithm based on Bayes’ Theorem. The word **naive** comes from the assumption of independence among features. That is, if our input vector is (x_{1}, x_{2},...,x_{n}), then x_{i}'s are conditionally independent given *y*.

# Deriving the algorithm

Let's start with Bayes' theorem (for naive bayes, x is the input and y is the output):

When we have more than one feature, we can rewrite Bayes' theorem as:

Since we are making the assumption that x_{i}'s are conditionally independent given *y*, we can rewrite the above as

but we also know that *P*(*x _{1}, x_{2}, .., x_{n}*) is a constant given the input, i.e.

Notice that

- the left hand side is the term we are interested in, probability distribution of the output y given input x
- P(y) can be estimated by counting the number of times each class y appears in our training data (this is called
__Maximum a Posteriori__estimation) - P(x
_{i}|y) can be estimated by counting the number of times each value of x_{i}appears for each class y in our training data

# Pseudocode

Training:

**Estimate P(y)**: P(y=t) = number of times class t appears in the dataset / size of dataset**Estimate P(x**: P(x_{i}|y)_{i}=k|y=t) = number of times x_{i}has value k and y has value t / number of data points of class t

Predicting:

**Estimate P(y|x**: Use above estimated values of P(y) and P(x_{1},...,x_{n})_{i}|y) and equation (1). Thereafter, normalize the values.

# Variants

There are several variants of naive bayes which use different distributions for P(x_{i}|y) such as gaussian distribution (gaussian naive bayes), multinomial distribution (multinomial naive bayes) and bernoulli distribution (bernoulli naive bayes).

# Scikit-learn implementation

# we will use iris datasetfrom sklearn.datasets import load_irisfrom sklearn.naive_bayes import GaussianNBimport numpy as np# load the datasetdata = load_iris()model = GaussianNB()model.fit(data.data, data.target)# evalauteprint(model.score(data.data, data.target))# output = 0.96# predictmodel.predict([[4.2, 3, 0.9, 2.1]])# 0 = setosa, 1 = versicolor, and 2 = virginica

# Applications

Naive bayes is one of the simplest yet effective algorithms for

- Text classification: For example, we have a number of news articles, and we want to learn to classify if the article is about politics, health, technology, sports or lifestyle.
- Spam filtering: We have a number of emails, and we want to learn to classify if the email is spam or not.
- Gender classification: Given features such as height, weight, etc, predict whether the person is male or female.