What is a box plot?
Box plot (also called ‘box and whisker plot’) visualizes the distribution of numerical data and helps in identifying outliers present in the data.
Below, you can see the box plot of a toy data set having just 10 observations – 144, 147, 153, 154, 156, 157, 161, 164, 170, 181.
Box plot for the toy dataset
Notice the following components in the above box plot:
- 1st Quartile (25th percentile)
- Median (2nd Quartile or 50th percentile)
- 3rd Quartile (75th percentile)
- Interquartile Range (IQR – difference between 3rd and 1st Quartile)
- Whiskers — marks the lowest data point which lies within 1.5 IQR of the 1st quartile, and the highest datum which lies within 1.5 IQR of the 3rd quartile
- Outliers — any data points beyond 1.5 IQR of the 1st or 3rd quartile, i.e. values which are greater than 3rd Quartile + 1.5 * IQR or less than 1st Quartile – 1.5 * IQR.
In other words, whiskers extend from the quartiles to the rest of the distribution, except for points that are determined to be “outliers”.