2. Which graph to use?
• Depends on type of data
• Depends on what you want to illustrate
• Depends on available statistical software
3. Bar Chart
Middle Oldest Only Youngest
10
20
30
40
Birth Order
P
ercent
Birth Order of Spring 1998 Stat 250 Students
n=92 students
4. Bar Chart
• Summarizes categorical data.
• Horizontal axis represents categories, while
vertical axis represents either counts
(“frequencies”) or percentages (“relative
frequencies”).
• Used to illustrate the differences in
percentages (or counts) between categories.
5. Histogram
18 19 20 21 22 23 24 25 26 27
0
10
20
30
40
50
Age (in years)
Frequency
(C
ount)
Age of Spring 1998 Stat 250 Students
n=92 students
6. Analogy
Bar chart is to categorical data as
histogram is to ...
measurement data.
7. Histogram
• Divide measurement up into equal-sized
categories.
• Determine number (or percentage) of
measurements falling into each category.
• Draw a bar for each category so bars’ heights
represent number (or percent) falling into the
categories.
• Label and title appropriately.
8. Use common sense in determining
number of categories to use.
(Trial-and-error works fine, too.)
Histogram
9. Too few categories
18 23 28
0
10
20
30
40
50
60
Age (in years)
Frequency
(C
ount)
Age of Spring 1998 Stat 250 Students
n=92 students
10. Too many categories
2 3 4
0
1
2
3
4
5
6
7
GPA
Frequency
(Count) GPAs of Spring 1998 Stat 250 Students
n=92 students
14. Stem-and-Leaf Plot
• Summarizes measurement data.
• Each data point is broken down into a
“stem” and a “leaf.”
• First, “stems” are aligned in a column.
• Then, “leaves” are attached to the stems.
16. Box Plot
• Summarizes measurement data.
• Vertical (or horizontal) axis represents
measurement scale.
• Lines in box represent the 25th percentile
(“first quartile”), the 50th percentile
(“median”), and the 75th percentile (“third
quartile”), respectively.
17. An aside...
• Roughly speaking:
– The “25th percentile” is the number such that
25% of the data points fall below the number.
– The “median” or “50th percentile” is the
number such that half of the data points fall
below the number.
– The “75th percentile” is the number such that
75% of the data points fall below the number.
18. Box Plot (cont’d)
• “Whiskers” are drawn to the most extreme
data points that are not more than 1.5 times
the length of the box beyond either quartile.
– Whiskers are useful for identifying outliers.
• “Outliers,” or extreme observations, are
denoted by asterisks.
– Generally, data points falling beyond the
whiskers are considered outliers.
19. Using Box Plots to Compare
female male
60
110
160
Gender
Fastest
Speed
(mph)
Fastest Ever Driving Speed
226 Stat 100 Students, Fall 1998
20. Which graph to use when?
• Stem-and-leaf plots and dotplots are good for
small data sets, while histograms and box plots
are good for large data sets.
• Boxplots and dotplots are good for comparing
two groups.
• Boxplots are good for identifying outliers.
• Histograms and boxplots are good for
identifying “shape” of data.
21. Scatter Plots
22 23 24 25 26 27 28 29 30 31
22
23
24
25
26
27
28
29
30
31
Left foot (in cm)
Right
foot
(in
cm)
Foot sizes of Spring 1998 Stat 250 students
n=88 students
22. Scatter Plots
• Summarizes the relationship between two
measurement variables.
• Horizontal axis represents one variable and
vertical axis represents second variable.
• Plot one point for each pair of
measurements.
23. No relationship
52 57 62
22
23
24
25
26
27
28
29
30
31
32
Head circumference (in cm)
Left
forearm
(in
cm)
Lengths of left forearms and head circumferences
of Spring 1998 Stat 250 Students
n=89 students
24. Closing comments
• Many possible types of graphs.
• Use common sense in reading graphs.
• When creating graphs, don’t summarize
your data too much or too little.
• When creating graphs, label everything for
others. Remember you are trying to
communicate something to others!