**Dropout **is a widely used regularization technique for neural networks. Neural networks, especially deep neural networks, are flexible machine learning algorithms and hence prone to overfitting. In this tutorial, we'll explain what is dropout and how it works, including a sample TensorFlow implementation.

If you [have] a deep neural net and it's not overfitting, you should probably be using a bigger one and using dropout, ... - Geoffrey Hinton [2]

# Dropout algorithm

Dropout is a regularization technique where during each iteration of gradient descent, we *drop* a set of neurons selected at random. By *drop*, what we mean is that we essentially act as if they do not exist.

Each neuron is dropped at random with some fixed probability 1-p, and kept with probability p. The value p may be different for each layer in the neural network. A value of 0.5 for the hidden layers, and 0.0 for input layer (no dropout) has been shown to work well on a wide range of tasks [1].

During evaluation (and prediction), we do not ignore any neurons, i.e. no dropout is applied. Instead, the output of each neuron is multiplied by p. This is done so that the input to the next layer has the same expected value.

# Why does Dropout work?

One can think of dropout as being **analogous to applying model averaging to every layer of a neural network** separately. For each layer we are (in essence) **training 2 ^{n} different configurations** (depending on which neurons are dropped), and during test time we are (in essence) averaging over all possible configurations. Moreover, the computational requirements are almost unchanged.

Dropout's original inspiration was the following idea: in a neural network without dropout regularization, neurons tend to develop co-dependency amongst each other, which leads to overfitting. When we use dropout a neuron cannot rely on any individual neurons output (since it may be dropped with some probability). This provides incentive to the neural network to reduce co-adaptations of neurons. [2]

# Implementation trick

Dropout is implemented in libraries such as TensorFlow and pytorch by setting the output of the randomly selected neurons to 0. That is, the neuron still exists, but its output is overwritten to be 0.

# TensorFlow Example

The code below is a simple example of dropout in TensorFlow. The neural network has two hidden layers, both of which have use dropout. Notice how dropout in TensorFlow is just another layer on top of the hidden layer. The important lines are marked with asterisks (*).

# import necessary librariesimport tensorflow as tffrom tensorflow.examples.tutorials.mnist import input_data# load mnist datasetmnist = input_data.read_data_sets('MNIST_data', one_hot=True)# create placeholders for the input and outputX = tf.placeholder(tf.float32, shape=[None,784])y = tf.placeholder(tf.float32, shape=[None,10])# model parametersW1 = tf.Variable(tf.truncated_normal([784,200], stddev=0.1))b1 = tf.Variable(tf.ones([200]))W2 = tf.Variable(tf.truncated_normal([200,100], stddev=0.1))b2 = tf.Variable(tf.ones([100]))W3 = tf.Variable(tf.truncated_normal([100,10], stddev=0.1))b3 = tf.Variable(tf.ones([10]))# "p" = probability that a neuron is keptkeep_prob = tf.placeholder(tf.float32)y1 = tf.nn.relu(tf.add(tf.matmul(X, W1), b1)) # hidden layer 1y1_dropout = tf.nn.dropout(y1, keep_prob) # apply dropout (*)y2 = tf.nn.relu(tf.add(tf.matmul(y1, W2), b2)) # hidden layer 2y2_dropout = tf.nn.dropout(y2, keep_prob) # apply dropout (*)y_logits = tf.add(tf.matmul(y2, W3), b3) # output layery_hat = tf.nn.softmax(y_logits) # softmax# cost functioncost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_logits, labels=y))# optimizer (adam optimizer)train_step = tf.train.AdamOptimizer(0.001).minimize(cost)# accuracytrue_pred = tf.equal(tf.argmax(y_hat,1), tf.argmax(y,1))accuracy = tf.reduce_mean(tf.cast(true_pred, tf.float32))# train the modelsess = tf.Session()init = tf.global_variables_initializer()sess.run(init)for i in range(1000):batch_X, batch_y = mnist.train.next_batch(128)sess.run(train_step, {X:batch_X, y:batch_y, keep_prob:0.5})# print accuracy and cost after every 100 iterationsif i % 100 == 0:# notice keep_prob = 1.0 during test (*)acc, cost_v = sess.run([accuracy, cost], {X:batch_X, y:batch_y, keep_prob:1})print("Iteration number:", i)print("Accuracy", acc)print("Cost", cost_v)

*Note: Code doesn't split data into train and test.*

**See also:**

**Source:**