Understanding and interpreting box plots
How to read a box plot/Introduction to box plots
Box plots are drawn for groups of W@S scale scores. They enable us to study the distributional characteristics
of a group of scores as well as the level of the scores.
To begin with, scores are sorted. Then four equal sized groups are made from the ordered scores. That is, 25%
of all scores are placed in each group. The lines dividing the groups are called quartiles, and the groups are
referred to as quartile groups. Usually we label these groups 1 to 4 starting at the bottom.
Definitions
Median
The median (middle quartile) marks the mid-point of the data and is shown by the line that divides the box into
two parts. Half the scores are greater than or equal to this value and half are less.
Inter-quartile range
The middle “box” represents the middle 50% of scores for the group. The range of scores from lower to upper
quartile is referred to as the inter-quartile range. The middle 50% of scores fall within the inter-quartile range.
Upper quartile
Seventy-five percent of the scores fall below the upper quartile.
Lower quartile
Twenty-five percent of scores fall below the lower quartile.
Whiskers
The upper and lower whiskers represent scores outside the middle 50%. Whiskers often (but not always)
stretch over a wider range of scores than the middle quartile groups.
Interpreting box plots/Box plots in general
Box plots are used to show overall patterns of response for a group. They provide a useful way to visualize the
range and other characteristics of responses for a large group.
The diagram below shows a variety of different box plot shapes and positions.
Some general observations about box plots
The box plot is comparatively short – see example (2). This suggests that overall students have a
high level of agreement with each other.
The box plot is comparatively tall – see examples (1) and (3). This suggests students hold quite
different opinions about this aspect or sub-aspect.
One box plot is much higher or lower than another – compare (3) and (4) – This could suggest a
difference between groups. For example, the box plot for boys may be lower or higher than the
equivalent plot for girls. Follow this up by looking at the Items at a Glance reports.
Obvious differences between box plots – see examples (1) and (2), (1) and (3), or (2) and (4). Any
obvious difference between box plots for comparative groups is worthy of further investigation.
Your school box plot is much higher or lower than the national reference group box plot. This also
suggests an area of difference that could be explored further in the Items in Detail reports and through
consultation.
The 4 sections of the box plot are uneven in size – See example (1). This shows that many students
have similar views at certain parts of the scale, but in other parts of the scale students are more variable
in their views. The long upper whisker in the example means that students views are varied amongst the
most positive quartile group, and very similar for the least positive quartile group. The Items in
Detail reports can be used to explore this further.
Same median, different distribution – See examples (1), (2), and (3). The medians (which generally
will be close to the average) are all at the same level. However, the box plots in these examples show
very different distributions of views.
It always important to consider the pattern of the whole distribution of responses in a box plot.
Box Plot Explained: Interpretation, Examples, & Comparison
By Saul Mcleod, PhD
Updated on July 31, 2023
Reviewed by
Olivia Guy-Evans, MSc
In descriptive statistics, a box plot or boxplot (also known as a box and whisker plot) is a type
of chart often used in explanatory data analysis. Box plots visually show the distribution of
numerical data and skewness by displaying the data quartiles (or percentiles) and averages.
Box plots show the five-number summary of a set of data: including the minimum score, first
(lower) quartile, median, third (upper) quartile, and maximum score.
Minimum Score
The lowest score, excluding outliers (shown at the end of the left whisker).
Lower Quartile
Twenty-five percent of scores fall below the lower quartile value (also known as the first
quartile).
Median
The median marks the mid-point of the data and is shown by the line that divides the box into
two parts (sometimes known as the second quartile). Half the scores are greater than or equal
to this value, and half are less.
Upper Quartile
Seventy-five percent of the scores fall below the upper quartile value (also known as the third
quartile). Thus, 25% of data are above this value.
Maximum Score
The highest score, excluding outliers (shown at the end of the right whisker).
Whiskers
The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower 25% of
scores and the upper 25% of scores).
The Interquartile Range (or IQR)
The box plot shows the middle 50% of scores (i.e., the range between the 25th and 75th
percentile).
Why Are Box Plots Useful?
Box plots divide the data into sections containing approximately 25% of the data in that set.
Box plots are useful as they provide a visual summary of the data enabling researchers to quickly
identify mean values, the dispersion of the data set, and signs of skewness.
Note the image above represents data that is a perfect normal distribution, and most box
plots will not conform to this symmetry (where each quartile is the same length).
Box plots are useful as they show the average score of a data
set
The median is the average value from a set of data and is shown by the line that divides the
box into two parts. Half the scores are greater than or equal to this value, and half are less.
Box plots are useful as they show the skewness of a data set
The box plot shape will show if a statistical data set is normally distributed or skewed.
When the median is in the middle of the box, and the whiskers are about the same on both
sides of the box, then the distribution is symmetric.
When the median is closer to the bottom of the box, and if the whisker is shorter on the lower
end of the box, then the distribution is positively skewed (skewed right).
When the median is closer to the top of the box, and if the whisker is shorter on the upper
end of the box, then the distribution is negatively skewed (skewed left).
Box plots are useful as they show the dispersion of a data set
In statistics, dispersion (also called variability, scatter, or spread) is the extent to which a
distribution is stretched or squeezed.
The smallest and largest values are found at the end of the ‘whiskers’ and are useful for
providing a visual indicator regarding the spread of scores (e.g., the range).
The interquartile range (IQR) is the box plot showing the middle 50% of scores and can be
calculated by subtracting the lower quartile from the upper quartile (e.g., Q3−Q1).
Box plots are useful as they show outliers within a data set
An outlier is an observation that is numerically distant from the rest of the data.
When reviewing a box plot, an outlier is defined as a data point that is located outside the
whiskers of the box plot.
Source: https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
For example, outside 1.5 times the interquartile range above the upper quartile and below
the lower quartile (Q1 – 1.5 * IQR or Q3 + 1.5 * IQR).
How To Compare Box Plots
Box plots are a useful way to visualize differences among different samples or groups. They
manage to provide a lot of statistical information, including — medians, ranges, and outliers.
Note although box plots have been presented horizontally in this article, it is more common to
view them vertically in research papers
Step 1: Compare the medians of box plots
Compare the respective medians of each box plot. If the median line of a box plot lies outside
of the box of a comparison box plot, then there is likely to be a difference between the two
groups.
Source: https://blog.bioturing.com/2018/05/22/how-to-compare-box-plots/
Step 2: Compare the interquartile ranges and whiskers of box
plots
Compare the interquartile ranges (that is, the box lengths) to examine how the data is
dispersed between each sample. The longer the box, the more dispersed the data. The
smaller, the less dispersed the data.
Next, look at the overall spread as shown by the extreme values at the end of two whiskers.
This shows the range of scores (another type of dispersion). Larger ranges indicate wider
distribution, that is, more scattered data.
Step 3: Look for potential outliers
When reviewing a box plot, an outlier is defined as a data point that is located outside the
whiskers of the box plot.
Step 4: Look for signs of skewness
If the data do not appear to be symmetric, does each sample show the same kind of
asymmetry?