**Matplotlib** is the most popular Python package for data visualization. It provides a quick way to visualize data from Python and create publication-quality figures in various different formats. Matplotlib is a multi-platform data visualization library built on NumPy arrays. This allows it to work with the broader SciPy stack.

In this article, we are going to explore matplotlib in interactive mode covering 7 basic cases. You are encouraged to **follow along** with the tutorial and play around with Matplotlib, trying various things and making sure you're getting the hang of it. Let's get started!

# Matplotlib, Pyplot and IPython Shell

## Importing Matplotlib

Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use some standard shorthands for Matplotlib imports:

import matplotlib as mpl

import matplotlib.pyplot as plt

**Pyplot**, shortened above as plt, is a module within the matplotlib package that provides a convenient interface to the matplotlib's plotting classes and methods.

## Plotting from an IPython shell

If you are using IPython interactive shell on the Terminal (or something similar), each of your plots will be displayed in a new window.

# First Look: Line Chart

Creating plots with Matplotlib can be easily accomplished with just a few lines of code. As our first choice, we will create sine and cosine waves with a **line chart**. Line charts in general are also a good choice for showing trends.

## Creating the data points

First thing first, let’s import the NumPy library and create an array of data points.

In [3]: import numpy as np

In [4]: x = np.linspace(-np.pi, np.pi, 256, endpoint=True)

In [5]: S, C = np.sin(x), np.cos(x)

In [14]: x.shape

Out[14]: (256,)

In [15]: S.shape

Out[15]: (256,)

In [16]: C.shape

Out[16]: (256,)

In [17]: x[:10]

Out[17]:

array([-3.14159265, -3.11695271, -3.09231277, -3.06767283, -3.04303288,

-3.01839294, -2.993753 , -2.96911306, -2.94447311, -2.91983317])

In [18]: S[:10]

Out[18]:

array([-1.22464680e-16, -2.46374492e-02, -4.92599411e-02, -7.38525275e-02,

-9.84002783e-02, -1.22888291e-01, -1.47301698e-01, -1.71625679e-01,

-1.95845467e-01, -2.19946358e-01])

In [19]: C[:10]

Out[19]:

array([-1. , -0.99969645, -0.99878599, -0.99726917, -0.99514692,

-0.99242051, -0.98909161, -0.98516223, -0.98063477, -0.97551197])

Each of the above variables is a vector of size 256. S is sine(x), and C is cos(x). Now that we have the data, let's plot it.

## Our first plot

# Start your figure

plt.figure()

# Plot sine curve with a solid - line

plt.plot(x, S, '-')

# Plot cosine curve with a dotted -- line

plt.plot(x, C, '--')

# Display plot and show result on screen.

plt.show()

All our plots will begin by first initiating a figure (plt.figure()), and end with displaying the plot (plt.show()). In between, we'll call the functions which decide what gets plotted. In this case, we used the plt.plot function to plot lines.

## A more detailed look at plotting

Let’s move on to instantiating all the built-in settings so that we can customize the appearance of our plot to suit our needs. The settings use a set to of default values unless specified.

## Create a new figure of size 10x6 inches, using 80 dots per inch

fig = plt.figure(figsize=(10,6), dpi=80)

## Plot cosine using blue color with a dotted line of width 1 (pixels)

plt.plot(x, C, color="blue", linewidth=2.5, linestyle="--", label="cosine")

## Plot sine using green color with a continuous line of width 1 (pixels)

plt.plot(x, S, color="green", linewidth=2.5, linestyle="-", label="sine")

## Set axis limits and ticks (markers on axis)

# x goes from -4.0 to 4.0

plt.xlim(-4.0, 4.0)

# 9 ticks, equally spaced

plt.xticks(np.linspace(-4, 4, 9, endpoint=True))

# Set y limits from -1.0 to 1.0

plt.ylim(-1.0, 1.0)

# 5 ticks, equally spaced

plt.yticks(np.linspace(-1, 1, 5, endpoint=True))

## Add legends, title and axis names

plt.legend(loc='upper left', frameon=False)

plt.title("Graph of wave movement with Sine and Cosine functions")

plt.xlabel("Time, t")

plt.ylabel("Position, x")

## Turn on grid

plt.grid(color='b', linestyle='-', linewidth=0.1)

## Moving spines to center in the middle

ax = plt.gca()

# Move left y-axis and bottim x-axis to centre, passing through (0,0)

ax.spines['left'].set_position('center')

ax.spines['bottom'].set_position('center')

# Eliminate upper and right axes

ax.spines['right'].set_color('none')

ax.spines['top'].set_color('none')

# Show ticks in the left and lower axes only

ax.xaxis.set_ticks_position('bottom')

ax.yaxis.set_ticks_position('left')

plt.show()

Well, there you have it!

## Saving a plot

If you would like to save the figure instead of seeing its output, you can use the savefig() command.

In [5]: fig.savefig('my_figure.png')

There are multiple formats we can save this image in.

In [6]: fig.canvas.get_supported_filetypes()

Out[6]: {'eps': 'Encapsulated Postscript',

'jpeg': 'Joint Photographic Experts Group',

'jpg': 'Joint Photographic Experts Group',

'pdf': 'Portable Document Format',

'pgf': 'PGF code for LaTeX',

'png': 'Portable Network Graphics',

'ps': 'Postscript',

'raw': 'Raw RGBA bitmap',

'rgba': 'Raw RGBA bitmap',

'svg': 'Scalable Vector Graphics',

'svgz': 'Scalable Vector Graphics',

'tif': 'Tagged Image File Format',

'tiff': 'Tagged Image File Format'}

# Types of plots

For your reference, here are all the kinds of plots you can call (more on this below):

- ‘bar’ or ‘barh’ for bar charts
- ‘hist’ for histograms
- ‘box’ for boxplots
- ‘kde’ or 'density' for density plots
- ‘area’ for area plots
- ‘scatter’ for scatter plots
- ‘hexbin’ for hexagonal bin plots
- ‘pie’ for pie charts

# Bar Chart

A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items. Let’s create a Bar chart from described set.

# Setting figure size to 7x5

fig = plt.figure(figsize=(7,5))

# Setting data set

men_means = [20, 35, 30, 35, 27]

men_stds = [2, 3, 4, 1, 2]

# Setting index

ind = np.arange(5)

# Setting argument for width

width = 0.35

# Plotting a horizontal bar graph for men_means against index

# with errorbars equal to standard deviation

bottom = ind - width / 2

error_args = {'ecolor': (0, 0, 0), 'linewidth': 2.0}

plt.barh(bottom, men_means, width, xerr=men_stds, error_kw=error_args)

# Y-axis ticks and labels

ax = plt.gca()

ax.set_ylim(-0.5, 4.5)

ax.set_yticks(ind)

ax.set_yticklabels(['A', 'B', 'C', 'D', 'E', ])

plt.show()

In the plot, we need to separately calculate bottom, which is the y-axis position where the bar starts (position of bottom of each bar). error_args specify that the error bar is black in color, and its line-width is 2 pixels.

# Setting figure size to 7x5

fig = plt.figure(figsize=(7,5))

# Setting data set values

women_means = [25, 32, 34, 20, 25]

women_stds = [3, 5, 2, 3, 3]

# Plotting a horizontal bar graph with men's data at the bottom and women's data on top.

p1 = plt.bar(bottom, men_means, width, yerr=men_stds, color='b', error_kw=error_args)

p2 = plt.bar(bottom, women_means, width, bottom=men_means, yerr=women_stds, color='g', error_kw=error_args)

# Modifying x-axis

ax = plt.gca()

ax.set_xlim(-0.5, 4.5)

ax.set_xticks(ind)

ax.set_xticklabels(['A', 'B', 'C', 'D', 'E', ])

plt.show()

# Histogram

Histograms are plot type used to show the frequency across a continuous or discrete variable. Let's have a look.

# Generate 3 different arrays

x = np.random.normal(0, 0.8, 1000)

y = np.random.normal(-2, 1, 1000)

z = np.random.normal(3, 2, 1000)

# Set figure size to 9x6

fig = plt.figure(figsize=(9, 6))

# Configure keyword arguments to customize histogram.

# Alpha adjusts translucency while bins define spacing.

# More features available in the documentation.

kwargs = {

'histtype' : 'stepfilled',

'alpha' : 0.9,

'normed' : True,

'bins' : 40,

}

# Plot all 3 arrays on one graph

plt.hist([x, y, z], **kwargs)

plt.show()

# Generate 3 dimensional numpy array

X = 200 + 25*np.random.randn(1000, 3)

# Set figure size to 9x6

fig = plt.figure(figsize=(9, 6))

# Plot histogram from 3 stacked arrays after normalizing data

n, bins, patches = plt.hist(X, 30, alpha=0.9, stacked=True, normed=True, linewidth=0.0, rwidth=1.0)

plt.show()

# Scatter Plot

A Scatter plot is the right choice for visualizing the entire dataset, and visually look for clusters or correlation.

N = 100

# Generate 2 different arrays

x = np.random.rand(N)

y = np.random.rand(N)

fig = plt.figure(figsize=(9, 6))

# Plotting a scatter graph at the given x-y coordinates

plt.scatter(x, y)

plt.show()

N = 100

# Generate 2 different arrays

x = np.random.rand(N)

y = np.random.rand(N)

fig = plt.figure(figsize=(9, 6))

# Assign random colors and variable sizes to the bubbles

colors = np.random.rand(N)

area = np.pi * (20 * np.random.rand(N))**2 # 0 to 20 point radii

# Scatter plot on x-y coordinate with the assigned size and color

plt.scatter(x, y, s=area, c=colors, alpha=0.7)

plt.show()

# Box and Whisker Plot

Box plot is an easy and effective way to read descriptive statistics. These statistics summarize the distribution of the data by displaying: minimum, first quartile, median, third quartile, and maximum in a single graph.

np.random.seed(10)

# Generate 4 different arrays and combine them in a list

u = np.random.normal(100, 10, 200)

v = np.random.normal(80, 30, 200)

w = np.random.normal(90, 20, 200)

x = np.random.normal(70, 25, 200)

data_to_plot = [u, v, w, x]

fig = plt.figure(figsize=(9, 6))

## Plot a box plot that shows the mean, variance and limits within each column.

# Add patch_artist=True option to ax.boxplot() to get fill color

bp = plt.boxplot(data_to_plot, patch_artist=True, labels=['A', 'B', 'C', 'D', ])

# change outline color, fill color and linewidth of the boxes

for box in bp['boxes']:

# change outline color

box.set(color='#7570b3', linewidth=2)

# change fill color

box.set(facecolor = '#1b9e77')

# change color and linewidth of the whiskers

for whisker in bp['whiskers']:

whisker.set(color='#7570b3', linewidth=2)

# change color and linewidth of the caps

for cap in bp['caps']:

cap.set(color='#7570b3', linewidth=2)

# change color and linewidth of the medians

for median in bp['medians']:

median.set(color='#b2df8a', linewidth=2)

# change the style of fliers and their fill

for flier in bp['fliers']:

flier.set(marker='o', color='#e7298a', alpha=0.5)

plt.show()

If you haven't seen a box plot before, here's how to read the above plot. The starts and end of the box mark the first-quantile and third-quantile values (i.e. 25 percentile - 75 percentile). The line inside the box marks the median value. The ends of the bars mark the minimum and the maximum values (excluding the outliers). Any dots above / below the error bars are the outlier data points.

# Area Plot

Area charts are used to represent cumulative totals using numbers or percentages over time. Since these plot by default are stacked they need each column to be either all positive or all negative values.

x = range(1,6)

# Set values for each line (4 lines in this example)

y = [

[1, 4, 6, 8, 9],

[2, 2, 7, 10, 12],

[2, 8, 5, 10, 6],

[1, 5, 2, 5, 2],

]

# Setting figure size to 9x6 with dpi of 80

fig = plt.figure(figsize=(9,6), dpi=80)

# Stacked area plot

plt.stackplot(x, y, labels=['A','B','C','D'], alpha=0.8)

# Set location of legend

plt.legend(loc='upper left')

plt.show()

# Pie Chart

Pie charts show percentage or proportion of data. This percentage represented by each category is right next to its corresponding slice of pie. For pie charts in Matplotlib, the slices are ordered and plotted counter-clockwise, as shown:

# Set keyword arguments

labels = 'Kenya', 'Tanzania', 'Uganda', 'Ruwanda', 'Burundi'

sizes = [35, 30, 20, 10 ,5]

explode = (0, 0.1, 0, 0, 0) # only "explode" the 2nd slice (i.e. 'Tanzania')

# Plot pie chart with the above set arguments

fig = plt.figure(figsize=(9, 6))

plt.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)

plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.

plt.show()

Above, autopct='%1.1f%%' says display the percentage with 1 digit precision. And startangle=90 says that the first pie (Kenya) should start from angle 90 degrees (angle is the angle made with positive x-axis).

# For Further Exploration

- For more on Matplotlib: pyplot — Matplotlib documentation
**Seaborn** is built on top of matplotlib and allows you to easily produce prettier (and more complex) visualizations. **D3.js** is a JavaScript library for producing sophisticated interactive visualizations for the web. Although it is not in Python, it is both trendy and widely used. **Bokeh** is a newer library that brings D3-style visualizations into Python. **ggplot **is a Python port of the popular R library ggplot2, which is widely used for creating “publication quality” charts and graphics. It’s probably most interesting if you’re already an avid ggplot2 user, and possibly a little opaque if you’re not.

Before wrapping up, I'll leave you to ponder over this Antoine de Saint-Exupery's quote. "*Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away*".