Part of course:

Logistic Regression

- The logistic (or sigmoid) function
- Logistic Regression Model (for binary classification)
- Cost function
- Training the model / Optimization
- Making Predictions
- Example with Scikit-learn on predicting Diabetes

NaN.

Logistic Regression[ Edit ]

Logistic Regression is a variant of linear regression where dependent or output variable is *categorical*, i.e. it takes out of a few possible discrete values. Don't be confused by the name logistic *regression*, it's a *classification* algorithm.

In particular, we can use it for **binary classification** (two categories). For example, we might want to predict whether or not a person has diabetes, or whether or not an email is spam.

The term *logistic* in logistic regression comes from the **logistic function **(also known as **sigmoid function**), which can be written as:

f(z)=\frac{1}{1 + e^{-z}} = =\frac{e^z}{e^z + 1}

The following is what we get if we plot *f*(*z*):

As we can see, the sigmoid function squashes the input value between [0, 1]. For large negative values, the output is very close to 0, and the large positive values, the output is very close to 1. Since the range of the output is between 0 and 1, we can interpret the output as a probability.

The logistic function also has the desirable property that it is a differentiable function. Hence, we can train the machine learning model using gradient descent.

In linear regression we have a linear equation ( y = *w*_{0} + *w*_{1}*x*_{1} ) as our hypothesis. Since y can take any arbitrarily large negative or positive value, linear regression is not a good choice when output is a binary categorical variable. For example, y = *spam* or *not spam*.

In logistic regression, the output *y _{w}*(

y_w(x) = \sigma(w^\top x) = \frac{\exp(w^\top x)}{\exp(w^\top x) + 1}

where w^{T}x is w_{0} + w_{1}x_{1} + ... w_{d}x_{d} (we assume x_{0} = 1 to simplify notation).

Let's take an example. Suppose the input *x* has two dimensions, and for the current data point, (x_{1}, x_{2}) = (2.0, 1.0). If our weights are (w_{0}, w_{1}, w_{2}) = (1.0, 2.0, -1.0) then we have w^{T}x = 1.0 + 2.0*2.0 + -1.0*1.0 = 4.0, and hence y_{w}(x) = e^{4.0}/(e^{4.0}+1) = 0.982.

Since the output is always between 0 and 1, it can be interpreted as the *probability* that *y* = spam given input *x*, i.e.

\begin{aligned}
P(y=spam|x) &= y_w(x) \\
P(y=not\ spam|x) &= 1 - P(y=spam|x) = 1 - y_w(x)
\end{aligned}

The probability that y = not spam is simply 1 - P(y=spam|x).

For example, if the model y_w(x) = 0.8, then it means that the model's prediction is that there is an 80% chance that the email is spam, or mathematically, P(y = spam|x) = 0.8.

The cost function (or loss function) we use for logistic regression is the average negative log-likelihood function.

L(w) = - \frac{1}{n} \sum_i (y^{(i)}_{true} \log( y_w(x^{(i)}) ) + (1 - y^{(i)}_{true}) \log( 1 - y_w(x^{(i)}) )

It is the average over the training data (*i*-th data point is *x*^{(i)}, *y*^{(i)}), of the negative log probability assigned to the target class. Note that in the loss function, *y _{true}* is either 0 (not spam) or 1 (spam). Hence, for each data point, only one of the above terms is non-zero.

** Example**: Let us take an example. If y_{true} = 1 (email is spam) and the prediction y_w(x^{(i)}) = 0.8, then the loss for this data point is given by:

L = - \big( 1 \times \log(0.8) + 0 \times \log(0.2) \big) = -\log(0.8) = 0.22

On the other hand, if y_{true} = 0 (email is not spam) and the prediction y_w(x^{(i)}) = 0.8, then the loss for the data point is given by:

L = - \big( 0 \times \log(0.8) + 1 \times \log(0.2) \big) = -\log(0.8) = 1.60

As you can see, the cost is much higher when the target value is assigned a low probability by the model. In general, the log-likelihood function can be extended to multiple categories, and the loss is given by log(prob(target class)), i.e. the log of the probability assigned to the target class.

** Plot**: The figure below is a plot of negative log probability. As we can see, this cost is high when the target class is assigned a low probability, and is 0 if the assigned probability is 1.

We use gradient descent to optimize the model. In fact, the cost function above is chosen so that the gradients dL/dw we get are meaningful. In fact, if you do the algebra and derive the mathematical formula for the gradients (we'll skip this since the algebra is quite messy and doesn't add much to the conceptual knowledge), you'll find that the *gradients* are exactly the same as in linear regression (even though the predictions are not the same, and the cost is not the same). Roughly speaking, the log in the cost function 'undoes' the sigmoid function.

After training, we can predict the class by calculating probability of each class. The prediction will be the class with the highest probability, i.e. if *y _{w}*(

In this section, we'll see an example for using logistic regression. We'll use the Pima Indians Diabetes Database, where all patients belong to the Pima Indian heritage (subgroup of Native Americans), and are females of ages 21 and above.

import pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom sklearn.linear_model import LogisticRegression## load the datadiabetesDF = pd.read_csv('diabetes.csv')print(diabetesDF.head())

Outcome is whether or not the patient is diabetic. 0 denotes non-diabetic, and 1 is diabetic.

## split and normalize the data# split into train and testdfTrain = diabetesDF[:650]dfTest = diabetesDF[650:750]dfCheck = diabetesDF[750:]# split features from target variabletrainLabel = np.asarray(dfTrain['Outcome'])trainData = np.asarray(dfTrain.drop('Outcome',1))testLabel = np.asarray(dfTest['Outcome'])testData = np.asarray(dfTest.drop('Outcome',1))# normalize the data# makes it easier to interpret the model by looking at its weightsmeans = np.mean(trainData, axis=0)stds = np.std(trainData, axis=0)trainData = (trainData - means)/stdstestData = (testData - means)/stds## train and evaluate the model# models target t as sigmoid(w0 + w1*x1 + w2*x2 + ... + wd*xd)diabetesCheck = LogisticRegression()diabetesCheck.fit(trainData, trainLabel)accuracy = diabetesCheck.score(testData, testLabel)print("accuracy = ", accuracy * 100, "%")# prints "accuracy = 78.0%"## interpreting the modelcoeff = list(diabetesCheck.coef_[0])labels = list(dfTrain.drop('Outcome',1).columns)features = pd.DataFrame()features['Features'] = labelsfeatures['importance'] = coefffeatures.sort_values(by=['importance'], ascending=True, inplace=True)features['positive'] = features['importance'] > 0features.set_index('Features', inplace=True)features.importance.plot(kind='barh', figsize=(11, 6),color = features.positive.map({True: 'blue', False: 'red'}))plt.xlabel('Importance')

Notice how the model assigns largest weights corresponding to features glucose and BMI. It is good to see the machine learning model match what we have been hearing from doctors our entire lives!

## making predictionssampleData = dfCheck[:1]# prepare samplesampleDataFeatures = np.asarray(sampleData.drop('Outcome',1))sampleDataFeatures = (sampleDataFeatures - means)/stds# predictpredictionProbability = diabetesCheck.predict_proba(sampleDataFeatures)prediction = diabetesCheck.predict(sampleDataFeatures)print('Probability:', predictionProbability)print('prediction:', prediction)

The output produced by the above code is

Probability: [[ 0.4385153, 0.5614847]]prediction: [1]

That is, the model thinks there is a 56.14% chance that the person is diabetic. To see this example in more detail, check out this tutorial: End-to-End Example: Using Logistic Regression for predicting Diabetes.

Read more…(994 words)

Mark as completed

Part of lists:

Previous

Quiz: Types of Machine Learning problems

Next

Quiz: Logistic Regression

About the contributors:

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

84%

Savan VisalparaMachine Learning Practitioner

16%

DS

David Svetlecic

Minor

Loading…

Have a question? Ask here…

Post

Part of course:

Logistic Regression

- The logistic (or sigmoid) function
- Logistic Regression Model (for binary classification)
- Cost function
- Training the model / Optimization
- Making Predictions
- Example with Scikit-learn on predicting Diabetes

Contributors

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

84%

Savan VisalparaMachine Learning Practitioner

16%

DS

David Svetlecic

Minor

Ready to join our community?

Sign up below to automatically get notified of new courses, get **reminders** to finish ones you subscribe to, and **bookmark** lessons to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.

Popular Courses

New Courses

About Us

Get in touch

Copyright 2016-18, Compose Labs Inc. All rights reserved.