Part of course:

Dropout (neural network regularization)

- Dropout algorithm
- Why does Dropout work?
- Implementation trick
- TensorFlow Example

NaN.

Dropout (neural network regularization)[ Edit ]

**Dropout **is a widely used regularization technique for neural networks. Neural networks, especially deep neural networks, are flexible machine learning algorithms and hence prone to overfitting. In this tutorial, we'll explain what is dropout and how it works, including a sample TensorFlow implementation.

If you [have] a deep neural net and it's not overfitting, you should probably be using a bigger one and using dropout, ... - Geoffrey Hinton [2]

Dropout is a regularization technique where during each iteration of gradient descent, we *drop* a set of neurons selected at random. By *drop*, what we mean is that we essentially act as if they do not exist.

Left: A standard neural network without dropoutRight: A neural network with dropout. Source [1]

Each neuron is dropped at random with some fixed probability 1-p, and kept with probability p. The value p may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0.0 for input layer (no dropout) has been shown to work well on a wide range of tasks [1].

During evaluation (and prediction), we do not ignore any neurons, i.e. no dropout is applied. Instead, the output of each neuron is multiplied by p. This is done so that the input to the next layer has the same expected value.

Source [1]

One can think of dropout as being **analogous to applying model averaging to every layer of a neural network** separately. For each layer we are (in essence) **training 2 ^{n} different configurations** (depending on which neurons are dropped), and during test time we are (in essence) averaging over all possible configurations. Moreover, the computational requirements are almost unchanged.

Dropout's original inspiration was the following idea: in a neural network without dropout regularization, neurons tend to develop co-dependency amongst each other, which leads to overfitting. When we use dropout a neuron cannot rely on any individual neurons output (since it may be dropped with some probability). This provides incentive to the neural network to reduce co-adaptations of neurons. [2]

Dropout is implemented in libraries such as TensorFlow and pytorch by setting the output of the randomly selected neurons to 0. That is, the neuron still exists, but its output is overwritten to be 0.

The code below is a simple example of dropout in TensorFlow. The neural network has two hidden layers, both of which have use dropout. Notice how dropout in TensorFlow is just another layer on top of the hidden layer. The important lines are marked with asterisks (*).

# import necessary librariesimport tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_data# load mnist datasetmnist = input_data.read_data_sets('MNIST_data', one_hot=True)# create placeholders for the input and outputX = tf.placeholder(tf.float32, shape=[None,784])y = tf.placeholder(tf.float32, shape=[None,10])# model parametersW1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1))b1 = tf.Variable(tf.ones([200]))W2 = tf.Variable(tf.truncated_normal([200,100], stddev=0.1))b2 = tf.Variable(tf.ones([100]))W3 = tf.Variable(tf.truncated_normal([100,10], stddev=0.1))b3 = tf.Variable(tf.ones([10]))# "p" = probability that a neuron is keptkeep_prob = tf.placeholder(tf.float32)y1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1)) # hidden layer 1y1_dropout = tf.nn.dropout(y1, keep_prob) # apply dropout (*)y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2)) # hidden layer 2y2_dropout = tf.nn.dropout(y2, keep_prob) # apply dropout (*)y_logits = tf.add(tf.matmul(y2, W3), b3) # output layery_hat = tf.nn.softmax(y_logits) # softmax# cost functioncost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_logits, labels=y))# optimizer (adam optimizer)train_step = tf.train.AdamOptimizer(0.001).minimize(cost)# accuracytrue_pred = tf.equal(tf.argmax(y_hat,1), tf.argmax(y,1))accuracy = tf.reduce_mean(tf.cast(true_pred, tf.float32))# train the modelsess = tf.Session()init = tf.global_variables_initializer()sess.run(init)for i in range(1000):batch_X, batch_y = mnist.train.next_batch(128)sess.run(train_step, {X:batch_X, y:batch_y, keep_prob:0.5})# print accuracy and cost after every 100 iterationsif i % 100 == 0:# notice keep_prob = 1.0 during test (*)acc, cost_v = sess.run([accuracy, cost], {X:batch_X, y:batch_y, keep_prob:1})print("Iteration number:", i)print("Accuracy", acc)print("Cost", cost_v)

*Note: Code doesn't split data into train and test.*

**See also:**

**Source:**

Read more…(474 words)

Mark as completed

Previous

Regularization methods in Deep Learning

Next

Computer Vision tasks: Image classification, localization, etc

About the contributors:

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

51%

Savan VisalparaMachine Learning Practitioner

49%

Loading…

Have a question? Ask here…

Post

Part of course:

Dropout (neural network regularization)

- Dropout algorithm
- Why does Dropout work?
- Implementation trick
- TensorFlow Example

Contributors

Keshav DhandhaniaMSc in Deep Learning @ MIT (2014)

51%

Savan VisalparaMachine Learning Practitioner

49%

Ready to join our community?

Sign up below to automatically get notified of new courses, get **reminders** to finish ones you subscribe to, and **bookmark** lessons to read later.

Continue with Facebook

— OR —

Your Full Name

Email address

I have an account. Log in instead

By signing up, you agree to our Terms and our Privacy Policy.

Popular Courses

New Courses

Get in touch

Copyright 2016-18, Compose Labs Inc. All rights reserved.