**Matplotlib** is the most popular Python package for data visualization. It provides a quick way to visualize data from Python and create publication-quality figures in various different formats. Matplotlib is a multi-platform data visualization library built on NumPy arrays. This allows it to work with the broader SciPy stack.

In this article, we are going to explore matplotlib in interactive mode covering 7 basic cases. You are encouraged to **follow along** with the tutorial and play around with Matplotlib, trying various things and making sure you're getting the hang of it. Let's get started!

# Matplotlib, Pyplot and IPython Shell

## Importing Matplotlib

Just as we use the np shorthand for NumPy and the pd shorthand for Pandas, we will use some standard shorthands for Matplotlib imports:

import matplotlib as mplimport matplotlib.pyplot as plt

**Pyplot**, shortened above as plt, is a module within the matplotlib package that provides a convenient interface to the matplotlib's plotting classes and methods.

## Plotting from an IPython shell

If you are using IPython interactive shell on the Terminal (or something similar), each of your plots will be displayed in a new window.

# First Look: Line Chart

Creating plots with Matplotlib can be easily accomplished with just a few lines of code. As our first choice, we will create sine and cosine waves with a **line chart**. Line charts in general are also a good choice for showing trends.

## Creating the data points

First thing first, let’s import the NumPy library and create an array of data points.

In [3]: import numpy as npIn [4]: x = np.linspace(-np.pi, np.pi, 256, endpoint=True)In [5]: S, C = np.sin(x), np.cos(x)In [14]: x.shapeOut[14]: (256,)In [15]: S.shapeOut[15]: (256,)In [16]: C.shapeOut[16]: (256,)In [17]: x[:10]Out[17]:array([-3.14159265, -3.11695271, -3.09231277, -3.06767283, -3.04303288,-3.01839294, -2.993753 , -2.96911306, -2.94447311, -2.91983317])In [18]: S[:10]Out[18]:array([-1.22464680e-16, -2.46374492e-02, -4.92599411e-02, -7.38525275e-02,-9.84002783e-02, -1.22888291e-01, -1.47301698e-01, -1.71625679e-01,-1.95845467e-01, -2.19946358e-01])In [19]: C[:10]Out[19]:array([-1. , -0.99969645, -0.99878599, -0.99726917, -0.99514692,-0.99242051, -0.98909161, -0.98516223, -0.98063477, -0.97551197])

Each of the above variables is a vector of size 256. S is sine(x), and C is cos(x). Now that we have the data, let's plot it.

## Our first plot

# Start your figureplt.figure()# Plot sine curve with a solid - lineplt.plot(x, S, '-')# Plot cosine curve with a dotted -- lineplt.plot(x, C, '--')# Display plot and show result on screen.plt.show()

All our plots will begin by first initiating a figure (plt.figure()), and end with displaying the plot (plt.show()). In between, we'll call the functions which decide what gets plotted. In this case, we used the plt.plot function to plot lines.

## A more detailed look at plotting

Let’s move on to instantiating all the built-in settings so that we can customize the appearance of our plot to suit our needs. The settings use a set to of default values unless specified.

## Create a new figure of size 10x6 inches, using 80 dots per inchfig = plt.figure(figsize=(10,6), dpi=80)## Plot cosine using blue color with a dotted line of width 1 (pixels)plt.plot(x, C, color="blue", linewidth=2.5, linestyle="--", label="cosine")## Plot sine using green color with a continuous line of width 1 (pixels)plt.plot(x, S, color="green", linewidth=2.5, linestyle="-", label="sine")## Set axis limits and ticks (markers on axis)# x goes from -4.0 to 4.0plt.xlim(-4.0, 4.0)# 9 ticks, equally spacedplt.xticks(np.linspace(-4, 4, 9, endpoint=True))# Set y limits from -1.0 to 1.0plt.ylim(-1.0, 1.0)# 5 ticks, equally spacedplt.yticks(np.linspace(-1, 1, 5, endpoint=True))## Add legends, title and axis namesplt.legend(loc='upper left', frameon=False)plt.title("Graph of wave movement with Sine and Cosine functions")plt.xlabel("Time, t")plt.ylabel("Position, x")## Turn on gridplt.grid(color='b', linestyle='-', linewidth=0.1)## Moving spines to center in the middleax = plt.gca()# Move left y-axis and bottim x-axis to centre, passing through (0,0)ax.spines['left'].set_position('center')ax.spines['bottom'].set_position('center')# Eliminate upper and right axesax.spines['right'].set_color('none')ax.spines['top'].set_color('none')# Show ticks in the left and lower axes onlyax.xaxis.set_ticks_position('bottom')ax.yaxis.set_ticks_position('left')plt.show()

Well, there you have it!

## Saving a plot

If you would like to save the figure instead of seeing its output, you can use the savefig() command.

In [5]: fig.savefig('my_figure.png')

There are multiple formats we can save this image in.

In [6]: fig.canvas.get_supported_filetypes()Out[6]: {'eps': 'Encapsulated Postscript','jpeg': 'Joint Photographic Experts Group','jpg': 'Joint Photographic Experts Group','pdf': 'Portable Document Format','pgf': 'PGF code for LaTeX','png': 'Portable Network Graphics','ps': 'Postscript','raw': 'Raw RGBA bitmap','rgba': 'Raw RGBA bitmap','svg': 'Scalable Vector Graphics','svgz': 'Scalable Vector Graphics','tif': 'Tagged Image File Format','tiff': 'Tagged Image File Format'}

# Types of plots

For your reference, here are all the kinds of plots you can call (more on this below):

- ‘bar’ or ‘barh’ for bar charts
- ‘hist’ for histograms
- ‘box’ for boxplots
- ‘kde’ or 'density' for density plots
- ‘area’ for area plots
- ‘scatter’ for scatter plots
- ‘hexbin’ for hexagonal bin plots
- ‘pie’ for pie charts

# Bar Chart

A bar chart is a good choice when you want to show how some quantity varies among some discrete set of items. Let’s create a Bar chart from described set.

# Setting figure size to 7x5fig = plt.figure(figsize=(7,5))# Setting data setmen_means = [20, 35, 30, 35, 27]men_stds = [2, 3, 4, 1, 2]# Setting indexind = np.arange(5)# Setting argument for widthwidth = 0.35# Plotting a horizontal bar graph for men_means against index# with errorbars equal to standard deviationbottom = ind - width / 2error_args = {'ecolor': (0, 0, 0), 'linewidth': 2.0}plt.barh(bottom, men_means, width, xerr=men_stds, error_kw=error_args)# Y-axis ticks and labelsax = plt.gca()ax.set_ylim(-0.5, 4.5)ax.set_yticks(ind)ax.set_yticklabels(['A', 'B', 'C', 'D', 'E', ])plt.show()

In the plot, we need to separately calculate bottom, which is the y-axis position where the bar starts (position of bottom of each bar). error_args specify that the error bar is black in color, and its line-width is 2 pixels.

# Setting figure size to 7x5fig = plt.figure(figsize=(7,5))# Setting data set valueswomen_means = [25, 32, 34, 20, 25]women_stds = [3, 5, 2, 3, 3]# Plotting a horizontal bar graph with men's data at the bottom and women's data on top.p1 = plt.bar(bottom, men_means, width, yerr=men_stds, color='b', error_kw=error_args)p2 = plt.bar(bottom, women_means, width, bottom=men_means, yerr=women_stds, color='g', error_kw=error_args)# Modifying x-axisax = plt.gca()ax.set_xlim(-0.5, 4.5)ax.set_xticks(ind)ax.set_xticklabels(['A', 'B', 'C', 'D', 'E', ])plt.show()

# Histogram

Histograms are plot type used to show the frequency across a continuous or discrete variable. Let's have a look.

# Generate 3 different arraysx = np.random.normal(0, 0.8, 1000)y = np.random.normal(-2, 1, 1000)z = np.random.normal(3, 2, 1000)# Set figure size to 9x6fig = plt.figure(figsize=(9, 6))# Configure keyword arguments to customize histogram.# Alpha adjusts translucency while bins define spacing.# More features available in the documentation.kwargs = {'histtype' : 'stepfilled','alpha' : 0.9,'normed' : True,'bins' : 40,}# Plot all 3 arrays on one graphplt.hist([x, y, z], **kwargs)plt.show()

# Generate 3 dimensional numpy arrayX = 200 + 25*np.random.randn(1000, 3)# Set figure size to 9x6fig = plt.figure(figsize=(9, 6))# Plot histogram from 3 stacked arrays after normalizing datan, bins, patches = plt.hist(X, 30, alpha=0.9, stacked=True, normed=True, linewidth=0.0, rwidth=1.0)plt.show()

# Scatter Plot

A Scatter plot is the right choice for visualizing the entire dataset, and visually look for clusters or correlation.

N = 100# Generate 2 different arraysx = np.random.rand(N)y = np.random.rand(N)fig = plt.figure(figsize=(9, 6))# Plotting a scatter graph at the given x-y coordinatesplt.scatter(x, y)plt.show()

N = 100# Generate 2 different arraysx = np.random.rand(N)y = np.random.rand(N)fig = plt.figure(figsize=(9, 6))# Assign random colors and variable sizes to the bubblescolors = np.random.rand(N)area = np.pi * (20 * np.random.rand(N))**2 # 0 to 20 point radii# Scatter plot on x-y coordinate with the assigned size and colorplt.scatter(x, y, s=area, c=colors, alpha=0.7)plt.show()

# Box and Whisker Plot

Box plot is an easy and effective way to read descriptive statistics. These statistics summarize the distribution of the data by displaying: minimum, first quartile, median, third quartile, and maximum in a single graph.

np.random.seed(10)# Generate 4 different arrays and combine them in a listu = np.random.normal(100, 10, 200)v = np.random.normal(80, 30, 200)w = np.random.normal(90, 20, 200)x = np.random.normal(70, 25, 200)data_to_plot = [u, v, w, x]fig = plt.figure(figsize=(9, 6))## Plot a box plot that shows the mean, variance and limits within each column.# Add patch_artist=True option to ax.boxplot() to get fill colorbp = plt.boxplot(data_to_plot, patch_artist=True, labels=['A', 'B', 'C', 'D', ])# change outline color, fill color and linewidth of the boxesfor box in bp['boxes']:# change outline colorbox.set(color='#7570b3', linewidth=2)# change fill colorbox.set(facecolor = '#1b9e77')# change color and linewidth of the whiskersfor whisker in bp['whiskers']:whisker.set(color='#7570b3', linewidth=2)# change color and linewidth of the capsfor cap in bp['caps']:cap.set(color='#7570b3', linewidth=2)# change color and linewidth of the mediansfor median in bp['medians']:median.set(color='#b2df8a', linewidth=2)# change the style of fliers and their fillfor flier in bp['fliers']:flier.set(marker='o', color='#e7298a', alpha=0.5)plt.show()

If you haven't seen a box plot before, here's how to read the above plot. The starts and end of the box mark the first-quantile and third-quantile values (i.e. 25 percentile - 75 percentile). The line inside the box marks the median value. The ends of the bars mark the minimum and the maximum values (excluding the outliers). Any dots above / below the error bars are the outlier data points.

# Area Plot

Area charts are used to represent cumulative totals using numbers or percentages over time. Since these plot by default are stacked they need each column to be either all positive or all negative values.

x = range(1,6)# Set values for each line (4 lines in this example)y = [[1, 4, 6, 8, 9],[2, 2, 7, 10, 12],[2, 8, 5, 10, 6],[1, 5, 2, 5, 2],]# Setting figure size to 9x6 with dpi of 80fig = plt.figure(figsize=(9,6), dpi=80)# Stacked area plotplt.stackplot(x, y, labels=['A','B','C','D'], alpha=0.8)# Set location of legendplt.legend(loc='upper left')plt.show()

# Pie Chart

Pie charts show percentage or proportion of data. This percentage represented by each category is right next to its corresponding slice of pie. For pie charts in Matplotlib, the slices are ordered and plotted counter-clockwise, as shown:

# Set keyword argumentslabels = 'Kenya', 'Tanzania', 'Uganda', 'Ruwanda', 'Burundi'sizes = [35, 30, 20, 10 ,5]explode = (0, 0.1, 0, 0, 0) # only "explode" the 2nd slice (i.e. 'Tanzania')# Plot pie chart with the above set argumentsfig = plt.figure(figsize=(9, 6))plt.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90)plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.plt.show()

Above, autopct='%1.1f%%' says display the percentage with 1 digit precision. And startangle=90 says that the first pie (Kenya) should start from angle 90 degrees (angle is the angle made with positive x-axis).

# For Further Exploration

- For more on Matplotlib: pyplot — Matplotlib documentation
**Seaborn**is built on top of matplotlib and allows you to easily produce prettier (and more complex) visualizations.**D3.js**is a JavaScript library for producing sophisticated interactive visualizations for the web. Although it is not in Python, it is both trendy and widely used.**Bokeh**is a newer library that brings D3-style visualizations into Python.**ggplot**is a Python port of the popular R library ggplot2, which is widely used for creating “publication quality” charts and graphics. It’s probably most interesting if you’re already an avid ggplot2 user, and possibly a little opaque if you’re not.

Before wrapping up, I'll leave you to ponder over this Antoine de Saint-Exupery's quote. "*Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away*".