Part of list:

Logistic Regression

- The logistic (or sigmoid) function
- Logistic regression for binary classification
- Model
- Cost function
- Optimization technique
- Prediction

Logistic Regression[ Edit ]

Logistic Regression is a variant of linear regression where dependent or output variable is categorical. Don't be confused by the name logistic regression, its a classification algorithm. In particular, we can use it for binary classification.

The term *logistic* in logistic regression comes from the **logistic function **(also known as **sigmoid function**), which can be written as:

y(z)=\frac{1}{1 + e^z}

The following is a plot of **y(z)**:

[Source : Wikipedia]

The sigmoid function squashes the input value between [0,1] so that we can interpret the output as a probability. It also has the desirable property that it is differentiable function (we can use gradient descent as the optimization technique).

In linear regression we have a linear equation ( y = *w*_{0} + *w*_{1}*x*_{1} ) as our hypothesis. Since y can take any arbitrarily large negative or positive value, linear regression is not a good choice when output is a binary categorical variable, i.e. y^{i} = 0 or 1.

In logistic regression, the output *h _{w}(x)* is squashed by a sigmoid function, i.e.

\begin{aligned}
P(y=1|x) &= h_w(x) = \sigma(w^\top x) = \frac{1}{1 + \exp(-w^\top x)} \\
P(y=0|x) &= 1 - P(y=1|x) = 1 - h_w(x)
\end{aligned}

and is interpreted as the probability that y = 1 given the input x, i.e. P(y=1| x). The probability that y = 0 is simply 1 - P(y=1|x).

The cost function (or loss function) we use for logistic regression is the negative log-likelihood function,

L(w) = - \sum_i (y^{(i)} \log( h_w(x^{(i)}) ) + (1 - y^{(i)}) \log( 1 - h_w(x^{(i)}) )

i.e. sum over the training data of negative log probability assigned to the target class. As you can see in the figure below, this cost is high when the target class is assigned a low probability, and is 0 if the assigned probability is 1.

plot of -log(p) for p in range 0 to 1

We use gradient descent to optimize the model. In fact, the cost function above is chosen so that the gradients dL/dw we get are meaningful. Roughly speaking, the log in the cost function 'undoes' the sigmoid function (looking at the whole thing from the perspective of the gradient descent optimizer).

After training, we can predict the class by calculating probability of each class. The prediction will be the class with the highest probability, i.e. if h_{w}(*x*) > 0.5h_{w} then the class is 1 otherwise 0.

Read more…(382 words)

Mark as completed

Part of lists:

Previous

Types of Machine Learning problems: Supervised, unsupervised and reinforcement learning

Next

K-nearest neighbors

About the contributors:

Savan Visalpara

52%

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

48%

Loading…

Have a question? Ask here…

Post

Part of list:

Logistic Regression

- The logistic (or sigmoid) function
- Logistic regression for binary classification
- Model
- Cost function
- Optimization technique
- Prediction

Contributors

Savan Visalpara

52%

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

48%

Ready to join our community?

Sign up below to automatically get notified of new lists, get **reminders** to finish ones you subscribe to, and **bookmark** articles to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.