Part of course:
Word Clouds: An Introduction with Code (in Python) and Examples
- What are word clouds?
- How to build a word cloud?
- More tools (software and online)
- Algorithm used in word clouds
- Application of word clouds
A word cloud (also called tag cloud) is a data visualization technique which highlights the important textual data points from a big text corpus. The approach used creates a meaningful visualization of text which could really help to understand high prominence of words that appear more frequently. This type of visualization can assist in exploratory text analysis by identifying important textual data points (may be potential features) and contextual themes appearing in a set of documents.
In a word cloud visual, the more common words in the documents appear larger and bolder. Word Clouds generators break down the text into word tokens and count how frequently they appear in the entire corpus. The font point size is assigned to each word based on the frequency it appears in the text. Therefore, the more frequently the word appears, the larger the word is shown in the cloud. The frequency can also be replaced by tf-idf score of the words which filters out common words across the document and gives relatively more meaningful representation. Finally, all the words are then arranged in a cluster or cloud of words which might also be arranged in any form such as horizontal lines, columns or within a shape.
Word clouds can also be used to display words that have meta-data assigned to them. For example, in a word cloud of countries, the population could be assigned to each country to determine its size. Colour used in a word clouds is usually for aesthetic, but it can also be used to categorize words.
There are a few word cloud generators freely available on the internet, let's use python’s wordcloud library and build a word cloud for the above paragraphs of this article. Make sure you have wordcloud package installed
sudo pip install wordcloud
The following code, creates a word cloud given some text. You can try this code by assigning the first three paragraphs of the article to the text variable or any other text of your choice.
import matplotlib.pyplot as pltfrom wordcloud import WordCloud, STOPWORDStext = '''Copy Paste the above text'''wordcloud = WordCloud(relative_scaling = 1.0,stopwords = set(STOPWORDS)).generate(text)plt.imshow(wordcloud)plt.axis("off")plt.show()
This is what the output word cloud looks like:
Wow! as you see, only in one glance of the output image not only tells that the text is about word clouds but also highlights the contextual elements such as “visualization”, “appear”, “frequently”, etc.
The python word cloud library also provides following configurable parameters to customize your word cloud:
Let us understand the algorithm behind the word clouds, which would help us understand the implementation of some of the common available libraries to build word clouds.
Creating the list of words we want to plot, along with the associated weights which measure the importance of each word.
Now the real challenge is to place the words on the canvas.
In the right setting, word cloud visualization is a powerful tool which could help in analyzing the textual information (feedbacks, tweets, posts etc.) in a single glance. Another powerful use case is to build a word cloud of the website and identify potential keywords to target for SEO. And finally, it could be used to understand the context of blogs, articles and other bigger text and to discover critical textual features.