Labelled data is superior to unlabelled data. In general though, it is also much more expensive to acquire. Cheap high-quality labelled data is a blessing, and creative ways of scraping the internet make it possible. Below are some strategies I have come across skimming through papers in deep learning
Output = input aka data compression
Lets start with the simplest one. If the task is to the predict the input, you automatically have the labelled data because it is the input. Its more intuitive to think of this set-up as a data compression mechanism. Since the input needs to be regenerated from the hidden layers, the hidden layer can be though of as a compressed version of the input. For example, Alex Graves uses LSTM based neural networks to compress Wikipedia text and Google Research used neural networks for image compression.
Fill-in-the-blank (i.e. partially hide the input)
Language modeling, i.e. the task of predicting the next word given a sequence of words (say first half of a sentence) is one of the oldest and most popular examples of partially hiding the input. In computer vision, the analogous set-up is to crop out a portion of an image, and try to predict it (example 1, example 2). For any data that is time based (like speech or video or weather), one simply hides the input after a certain time t.
Discriminating between real data and generated data
In Natural Language Processing (almost) from Scratch, the authors use sentences found in Wikipedia as positive samples. The negative samples, or generated data, was made by replacing a word from the sentence with a random word. The data generation strategies can get much more complex, such as using a different neural network to generate the data. This is done in Generative Adversarial Networks, where the positive samples are images from the web and the negative samples are images generated by a neural network.
Movie subtitles have been used as speech transcription data, and also for data for training chat-bots.
I'll keep updating this article as I recall more strategies I have encountered. In the mean time, feel free to reply to this discussion and suggest your favorite ones.