A word cloud (also called tag cloud) is a data visualization technique which highlights the important textual data points from a big text corpus. The approach used creates a meaningful visualization of text which could really help to understand high prominence of words that appear more frequently. This type of visualization can assist in exploratory text analysis by identifying important textual data points (which may be potential features) and contextual themes appearing in a set of documents.
In a word cloud visual, the more common words in the documents appear larger and bolder. Word Cloud generators break down the text into word tokens and count how frequently they appear in the entire corpus. The font point size is assigned to each word based on the frequency it appears in the text. Therefore, the more frequently the word appears, the larger the word is shown in the cloud. The frequency can also be replaced by TF-IDF score of the words which filters out common words across the document and gives a relatively more meaningful representation. Finally, all the words are arranged in a cluster or cloud of words which might also be arranged in any form such as horizontal lines, columns or within a shape.
Word clouds can also be used to display words that have meta-data assigned to them. For example, in a word cloud of countries, the population could be assigned to each country to determine its size. Colors used in a word cloud are usually for aesthetic, but they can also be used to denote categories.