Deep learning is a subfield of machine learning, i.e. in deep learning we are dealing with algorithms which learn from examples, similar to machine learning. The main difference between deep learning and machine learning is that deep learning models have a notion of multiple layers or multiple levels of hierarchy, which opens up the possibility of being able to learn models for more complicated tasks.
The following are some different ways of thinking about the multiple layers of hierarchy in a deep learning model:
use a cascade of many layers of transformation. Each successive layer uses the output from the previous layer as input.
are based on the (unsupervised) learning of multiple levels of features or representations of the data. Higher level features are derived from lower level features to form a hierarchical representation.
learn multiple levels of representations that correspond to different levels of abstraction; the levels form a hierarchy of concepts.
Lets take an example at which deep learning works particularly well, computer vision. Suppose we want to write a computer program which given an image of an animal predicts whether it is a cat, dog, horse, tiger, etc.
In the above image, notice how each part is composed of simpler sub-parts. The cat is made of a face, body, legs, tail, etc. The cat’s face is made of eyes, nose, ears, whiskers, mouth. The ears are made of various shapes such as curves, corners, gradients in a particular arrangement, and the curve itself is composed of individual locations (pixels) with a certain color. Deep learning architectures are designed with multiple layers with the intuition that the lower to higher layers will automatically learn to model lower to higher level of abstractions, starting in this case from pixels and going all the way to the animal.
We are surrounded by this kind of compositionality. Lets take another example - books are made of chapters, chapters are made of paragraphs, paragraphs are made of sentences, sentences are made of words, words are made of characters.
Another intuition behind deep learning is that very often, humans learn and model things in a fairly similar way. For example, when we learn how to write, we first learn alphabets, then small words, then short sentences, then paragraphs or stories which consist of a few lines, and then complete pages, or even books.
Similarly when learning to draw, a teacher will often emphasize that the student first learn how to do simple brush or pencil strokes freely and repeatedly. There after making outlines and sketches with those strokes, and then filling in the details.
One of the big challenges with traditional machine learning models is a process called feature extraction. Specifically, the programmer needs to tell the computer what kinds of things it should be looking for that will be informative in making a decision. Feeding the algorithm raw data rarely ever works, so feature extraction is a critical part of the traditional machine learning workflow. This places a huge burden on the programmer, and the algorithm’s effectiveness relies heavily on how insightful the programmer is. For complex problems such as object recognition or handwriting recognition, this is a huge challenge.
Deep learning, with the ability to learn multiple layers of representation, is one of the few methods which has enabled us to circumvent feature extraction. We can think of the lower layers as performing automatic feature extraction, requiring little guidance from the programmer.
Some applications for which deep learning has produced good results: Computer vision. Language translation. Image captioning. Audio transcription. Molecular biology (predicting protein interaction).