In this article, we'll discuss some regularization methods in deep learning. Recall that machine learning models, including deep learning models are susceptible to overfitting, and regularization methods are techniques to prevent or curtail it. In this tutorial, we'll discuss early stopping, data augmentation and transfer learning. An important regularization technique, dropout, is discussed in more detail in a linked tutorial which has proven very successful for deep learning models.
We train neural networks using gradient descent, which is an iterative algorithm. Below is a plot of training error (red) and validation error (purple), where the x-axis is number of training iterations.
The paper is quite influential publication in the subfield of Question Answering. Authored by Jason Weston, Sumit Chopra, and Antoine Bordes, this paper introduced a class of models called memory networks.
The intuitive idea is that in order to accurately answer a question regarding a piece of text, you need to somehow store the initial information given to you. If I were to ask you the question “What does RNN stand for”, (assuming you’ve read this post fully J) you’ll be able to give me an answer because the information you absorbed by reading the first part of this post was stored somewhere in your memory. You just had to take a few seconds to locate that info and articulate it in words. Now, I have no clue how the brain is able to do that, but the idea of having a storage place for this information still remains.
The memory network described in the paper is unique ...
This paper describes an approach to the task of Machine Translation. Authored by Google ML visionaries Jeff Dean, Greg Corrado, Orial Vinyals, and others, this paper introduced a machine translation system that serves as the backbone behind Google’s popular Translate service. The system reduced translation errors by an average of 60% compared to the previous production system Google used.
Traditional approaches to automated translation include variants of phrase-based matching. This approach required large amounts of linguistic domain knowledge and ultimately its design proved to be too brittle and lacked generalization ability. One of the problems with the traditional approach was that it would try to translate the input sentence piece by piece. It turns out the more effective approach (that NMT uses) is to translate the whole sentence at a time, thus allowing for a broader context and a more natural rearrangement of words.
This paper looks into an advancement in Sentiment Analysis, the task of determining whether a phrase has a positive or negative connotation / meaning. More formally, sentiment can be defined as “a view or attitude toward a situation or event”. At the time, LSTMs were the most commonly used units in sentiment analysis networks. Authored by Kai Sheng Tai, Richard Socher, and Christopher Manning, this paper introduces a novel way of chaining together LSTMs in a non-linear structure.
The motivation behind this non-linear arrangement lies in the notion that natural language exhibits the property that words in sequence become phrases. These phrases, depending on the order of the words, can hold different meanings from their original word components. In order to represent this characteristic, a network of LSTM units must be arranged into a tree structure where different units are affected by their children nodes.
What is deep learning? How does it relate to machine learning?
Deep learning is a subfield of machine learning, i.e. in deep learning we are dealing with algorithms which learn from examples, similar to machine learning. The main difference between deep learning and machine learning is that deep learning models have a notion of multiple layers or multiple levels of hierarchy, which opens up the possibility of being able to learn models for more complicated tasks.
The following are some different ways of thinking ab...
Convolutional Neural Networks (CNNs, or ConvNets) are a neural network architectures very successful in computer vision applications, and also widely used in applications that process media such as audios and videos. The main difference between a standard neural network and CNN is a special type of neural network layer, called the convolutional layer.
The deep learning research paper that caused the fields revival in 2012 used a CNN for image classification. A standard CNN architecture for image classification takes an image as the input, passes it through a series of convolutional, nonlinear, pooling (downsampling), and fully connected layers, and gets an output. The output is a probability of classes that best describes the image. The network layers might look something like this:
Image classification is the task of taking an input image and outputting a class (a cat, dog, etc) or a probability of classes that best describes the image. We humans learn how to do this task within the first month of us being born, and for the rest of our lives it comes naturally and effortlessly to us. We’re able to quickly and seamlessly identify the environment we are in as well as the objects that surround us, all without even consciously noticing. These skills of being able to quickly recognize patterns, generalize from prior knowledge, and adapt to different image environments are ones that we do not share with our fellow machines.
Deep Learning Research Review Week 3: Natural Language Processing
This is the 3rd installment of a new series called Deep Learning Research Review. Every couple weeks or so, I’ll be summarizing and explaining research papers in specific subfields of deep learning. This week focuses on applying deep learning to Natural Language Processing.