# Dropout (neural network regularization)

October 05, 2017

Dropout is a widely used regularization technique for neural networks. Neural networks, especially deep neural networks, are flexible machine learning algorithms and hence prone to overfitting. In this tutorial, we’ll explain what is dropout and how it works, including a sample TensorFlow implementation.

If you [have] a deep neural net and it’s not overfitting, you should probably be using a bigger one and using dropout, … - Geoffrey Hinton [2]

# Dropout algorithm

Dropout is a regularization technique where during each iteration of gradient descent, we drop a set of neurons selected at random. By drop, what we mean is that we essentially act as if they do not exist.

Left: A standard neural network without dropout Right: A neural network with dropout. Source [1]

Each neuron is dropped at random with some fixed probability 1-p, and kept with probability p. The value p may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0.0 for input layer (no dropout) has been shown to work well on a wide range of tasks [1].

During evaluation (and prediction), we do not ignore any neurons, i.e. no dropout is applied. Instead, the output of each neuron is multiplied by p. This is done so that the input to the next layer has the same expected value.

Source [1]

# Why does Dropout work?

One can think of dropout as being analogous to applying model averaging to every layer of a neural network separately. For each layer we are (in essence) training 2n different configurations (depending on which neurons are dropped), and during test time we are (in essence) averaging over all possible configurations. Moreover, the computational requirements are almost unchanged.

Dropout’s original inspiration was the following idea: in a neural network without dropout regularization, neurons tend to develop co-dependency amongst each other, which leads to overfitting. When we use dropout a neuron cannot rely on any individual neurons output (since it may be dropped with some probability). This provides incentive to the neural network to reduce co-adaptations of neurons. [2]

# Implementation trick

Dropout is implemented in libraries such as TensorFlow and pytorch by setting the output of the randomly selected neurons to 0. That is, the neuron still exists, but its output is overwritten to be 0.

# TensorFlow Example

The code below is a simple example of dropout in TensorFlow. The neural network has two hidden layers, both of which use dropout. Notice how dropout in TensorFlow is just another layer on top of the hidden layer. The important lines are marked with asterisks (*).

# import necessary libraries
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# load mnist dataset
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
# create placeholders for the input and output
X = tf.placeholder(tf.float32, shape=[None,784])
y = tf.placeholder(tf.float32, shape=[None,10])
# model parameters
W1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1))
b1 = tf.Variable(tf.ones([200]))
W2 = tf.Variable(tf.truncated_normal([200,100], stddev=0.1))
b2 = tf.Variable(tf.ones([100]))
W3 = tf.Variable(tf.truncated_normal([100,10], stddev=0.1))
b3 = tf.Variable(tf.ones([10]))
# "p" = probability that a neuron is kept
keep_prob = tf.placeholder(tf.float32)
y1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1))    # hidden layer 1
y1_dropout = tf.nn.dropout(y1, keep_prob)        # apply dropout (*)
y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2))   # hidden layer 2
y2_dropout = tf.nn.dropout(y2, keep_prob)        # apply dropout (*)
y_logits = tf.add(tf.matmul(y2, W3), b3)         # output layer
y_hat = tf.nn.softmax(y_logits)                  # softmax
# cost function
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_logits, labels=y))
# optimizer (adam optimizer)
# accuracy
true_pred = tf.equal(tf.argmax(y_hat,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(true_pred, tf.float32))
# train the model
sess = tf.Session()
init = tf.global_variables_initializer()
sess.run(init)
for i in range(1000):
batch_X, batch_y = mnist.train.next_batch(128)
sess.run(train_step, {X:batch_X, y:batch_y, keep_prob:0.5})
# print accuracy and cost after every 100 iterations
if i % 100 == 0:
# notice keep_prob = 1.0 during test (*)
acc, cost_v = sess.run([accuracy, cost], {X:batch_X, y:batch_y, keep_prob:1})
print("Iteration number:", i)
print("Accuracy", acc)
print("Cost", cost_v)

Note: Code doesn’t split data into train and test.