1. Content
Sr. no. Content name
1. Definition
2. How to read box plot
3. Exception
3. Parts of box plot
4. Formula of quartiles
5. Box plot distribution
6. Uses of Box Plot
7. How to compare box plot
8. Examples for raw frequency data no -1
9. Examples for raw frequency data no -2
10. Example for discreate frequency data
11. Example for continuous frequency data
2. Box plot
• The method to summarize a set of data that is measured
using an interval scale is called a box plot.
or
• A box plot is a graphical way that summarizes the important
aspects of the distribution of numeric data.
• It is also referred to as a Box-and-Whisker Plot as it displays
the data in a box-and-whiskers format.
3. • Box plots can be drawn either vertically or
horizontally
Diagram of box plot
4. How to Read a Box Plot
• A boxplot is a way to show a five number summary in a chart.
• The main part of the chart (the “box”) shows where the middle portion of
the data is: the interquartile range.
• At the ends of the box, you” find the first quartile (the 25% mark) and the
third quartile (the 75% mark).
• The far left of the chart (at the end of the left “whisker”) is the minimum
(the smallest number in the set) and the far right is the maximum (the
largest number in the set).
• Finally, the median is represented by a vertical bar in the center of the box.
• Box plots aren’t used that much in real life. However, they can be a useful
tool for getting a quick summary of data
5. Exception
• If your data set has outliers (values that are very high
or very low and fall far outside the other values of the
data set), the box and whiskers chart may not show
the minimum or maximum value. Instead, the ends of
the whiskers represent one and a half times
the interquartile range
6. Parts of Box Plots
1. Minimum: The minimum value in the given dataset
2. First Quartile (Q1): The first quartile is the median of the lower half of
the data set.
3. Median: The median is the middle value of the dataset, which divides
the given dataset into two equal parts. The median is considered as
the second quartile.
4. Third Quartile (Q3): The third quartile is the median of the upper half
of the data.
5. Maximum: The maximum value in the given dataset.
7. Cont…
Apart from these five terms, the other terms used in the box
plot are:
• Interquartile Range (IQR): The difference between the third quartile
and first quartile is known as the interquartile range. (i.e.) IQR = Q3-
Q1
• Outlier: The data that falls on the far left or right side of the ordered
data is tested to be the outliers. Generally, the outliers fall more than
the specified distance from the first and third quartile.
9. Quartiles formula
For Raw & discrete data set:-
• Q1 = item value
• Median or Q2 = item value
• Q3 = item value
Where is ,
n = number of observations
10. Cont..
• For Continuous data set :-
Q1 =
Q2=
Q3 =
Where is ,
n = number of observations
L= lower limit of class interval
CF = cumulative frequency from upper one
F = frequency of selected class interval
H= class interval
11. Boxplot Distribution
• The box plot distribution will explain how tightly the
data is grouped, how the data is skewed, and also
about the symmetry of data.
• Positively Skewed: If the distance from the median to
the maximum is greater than the distance from the
median to the minimum, then the box plot is
positively skewed.
12. Cont..
• Negatively Skewed: If the distance from the
median to minimum is greater than the
distance from the median to the maximum,
then the box plot is negatively skewed.
• Symmetric: The box plot is said to be
symmetric if the median is equidistant from the
maximum and minimum values
14. Uses of box Plot
• Box plots are widely used in statistics, process
improvement, scientific research, economics, and in
social and human sciences.
• Mainly used to explore data as well as to present the
data in an easy and understandable manner.
• Box plots provide a visual summary of the data with
which we can quickly identify the average value of
the data, how dispersed the data is, whether the data
is skewed or not (skewness).
• The Median gives you the average value of the data.
15. • Box Plots shows Skewness of the data
• The dispersion or spread of data can be visualized by the
minimum and maximum values which are found at the end
of the whiskers.
• The Box plot gives us the idea of about the Outliers which
are the points which are numerically distant from the rest of
the data
Cont…
16. How to compare box plots
• As we have discussed at the beginning of the article that box
plots make comparing characteristics of data between
categories very easy. Let us have a look at how we can
compare different box plots and derive statistical
conclusions from them.
• Let us take the below two plots as an example:-
17. • Compare the Medians — If the median line of a box plot lies
outside the box of the other box plot with which it is being
compared, then we can say that there is likely to be a
difference between the two groups. Here the Median line of
the plot B lies outside the box of Plot A.
• Compare the Dispersion or Spread of data — The Inter
Quartile range (length of the box) gives us an idea about how
dispersed the data is. Here Plot A has a longer length than
Plot B which means that the dispersion of data is more in plot
A as compared to plot B. The length of whiskers also gives an
idea of the overall spread of data. The extreme values
(minimum &maximum) gives the range of data distribution.
Larger the range more scattered the data. Here Plot A has a
larger range than Plot B.
18. • Comparing Outliers — The outliers gives the idea of
unusual data values which are distant from the rest of
the data. More number of Outliers means the prediction
will be more uncertain. We can be more confident while
predicting the values for a plot which has less or no
outliers.
• Compare Skewness — Skewness gives us the direction
and the magnitude of the lack of symmetry. We have
discussed above how to identify skewness. Here Plot A is
Positive or Right Skewed and Plot B is Negative or Left
Skewed.
Cont…
19. Examples for raw data
Q.1- Draw the box plot from given data .
Given- 20, 28, 40, 12, 30, 15, 50
Solution- Firstly arrange the data in ascending order.
12, 15, 20, 28, 30, 40, 50
• Number of observations (n) = 7
• Q1 = item value
=
=
= 2nd
item
Q1 = 15
20. • Q2 or mean = item value
=
=
= 4th
item value
=
Q2 = 28
21. • Q3 = item value
=
=
= (th
= 6th
value
=
Q3 = 40
22. Q1= 15
Q2= 28
Q3 = 40
Minimum value = 12
Maximum value = 50
24. Examples for raw data
Q.1- Draw the box plot from the given data.
Given- 64, 25, 52, 32, 48, 29, 57, 21
Solution- Firstly, arrange the data in ascending order.
21, 25, 29, 32, 48, 52, 57, 64
• Number of observations (n) = 8
• Q1 = item value
=
=
= 2.25th
item
30. Q. 2- Draw the box plot for given discreate frequency data .
Given-
Solution-
• Number of observations (N) = 43
X 10 20 30 40 50 60
F 4 7 15 8 7 2
X F CF
10 4 4
20 7 11
30 15 26
40 8 34
50 7 41
60 2 43
N = 43
Example for discreate frequency data
31. • Q1= item
=
=
= 11th
item
now see the 11th
item value in the table
Here we found the Q1 = 20
32. • Q2 = item
=
=
= 22th item
now see the 22th
item value in the table
Here we found the Q2 = 30
33. • Q3 = item
=
=
= 11
= 33th
now see the 22th
item value in the table
Here we found the Q3 = 40
Q1= 20
Q2= 30
Q3 = 40
Minimum value = 10
Maximum value = 60