Part of list:

Naive Bayes (and text classification)

- Deriving the algorithm
- Pseudocode
- Variants
- Scikit-learn implementation
- Applications

Naive Bayes (and text classification)[ Edit ]

**Naive Bayes** is a widely used classification algorithm. It is a supervised learning algorithm based on Bayes’ Theorem. The word **naive** comes from the assumption of independence among features. That is, if our input vector is (x_{1}, x_{2},...,x_{n}), then x_{i}'s are conditionally independent given *y*.

Let's start with Bayes' theorem (for naive bayes, x is the input and y is the output):

P( y | x ) = \frac{P(y)P(x | y)}{P(x)}

When we have more than one feature, we can rewrite Bayes' theorem as:

P( y | x_1,...,x_n) = \frac{P(y)P(x_1,...,x_n | y)}{P(x_1,x_2,...,x_n)}

Since we are making the assumption that x_{i}'s are conditionally independent given *y*, we can rewrite the above as

P(y|x_1,...,x_n) = \frac{P(y) \prod_{i=1}^{n} P(x_i|y)}{P(x_1,x_2,...,x_n)}

but we also know that *P(x _{1}, x_{2}, .., x_{n})* is a constant given the input, i.e.

P(y|x_1,...,x_n) \propto P(y) \prod_{i=1}^{n} P(x_i|y) \quad \quad (1)

Notice that

- the left hand side is the term we are interested in, probability distribution of the output y given input x
- P(y) can be estimated by counting the number of times each class y appears in our training data (this is called
__Maximum a Posteriori__estimation) - P(x
_{i}|y) can be estimated by counting the number of times each value of x_{i}appears for each class y in our training data

Training:

**Estimate P(y)**: P(y=t) = number of times class t appears in the dataset / size of dataset**Estimate P(x**: P(x_{i}|y)_{i}=k|y=t) = number of times x_{i}has value k and y has value t / number of data points of class t

Predicting:

**Estimate P(y|x**: Use above estimated values of P(y) and P(x_{1},...,x_{n})_{i}|y) and equation (1). Thereafter, normalize the values.

There are several variants of naive bayes which use different distributions for P(x_{i}|y) such as gaussian distribution (gaussian naive bayes), multinomial distribution (multinomial naive bayes) and bernoulli distribution (bernoulli naive bayes).

# we will use iris datasetfrom sklearn.datasets import load_irisfrom sklearn.naive_bayes import GaussianNBimport numpy as np# load the datasetdata = load_iris()model = GaussianNB()model.fit(data.data, data.target)# evalauteprint(model.score(data.data, data.target))# output = 0.96# predictmodel.predict([4.2, 3, 0.9, 2.1])# 0 = setosa,1 = versicolor, and 2 = virginica

Naive bayes is one of the simplest yet effective algorithms for

- Text classification: For example, we have a number of news articles, and we want learn to classify if the article is about politics, health, technology, sports or lifestyle.
- Spam filtering: We have a number of emails, and we want to learn to classify if the email is spam or not.
- Gender classification: Given features such as height, weight, etc, predict whether the person is male or female.

Read more…(395 words)

Mark as completed

Part of lists:

Previous

Support Vector Machine (SVM)

Next

Hidden Markov Models

About the contributors:

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

62%

Savan VisalparaMachine Learning Practitioner

38%

Loading…

Have a question? Ask here…

Post

Part of list:

Naive Bayes (and text classification)

- Deriving the algorithm
- Pseudocode
- Variants
- Scikit-learn implementation
- Applications

Contributors

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

62%

Savan VisalparaMachine Learning Practitioner

38%

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.