SlideShare a Scribd company logo
Introductory
Statistics
Laboratory
for Excel
Lab Manual Author:
R. J. (Bob) Baker
December 2003
Revised by:
Krista Wilde (2016)
i
Table of Contents
Assignment #0
_____________________________________________________
__________ 2
Assignment #1
_____________________________________________________
__________ 6
Assignment #2
_____________________________________________________
_________ 10
Assignment #3
_____________________________________________________
_________ 16
Assignment #4
_____________________________________________________
_________ 22
Assignment #5
_____________________________________________________
_________ 26
Assignment #6
_____________________________________________________
_________ 32
Assignment #7
_____________________________________________________
_________ 36
Assignment #8
_____________________________________________________
_________ 44
Assignment #9
_____________________________________________________
_________ 52
INTRODUCTION
_____________________________________________________
______ 58
Example 1: Reading data from a data file into the EXCEL
worksheet. _________________ 60
Example 2: Preparing a histogram of data
________________________________________ 62
Example 3: Entering data from the keyboard into the EXCEL
worksheet _______________ 66
Example 4: Calculating relative frequencies
______________________________________ 67
Example 5: Leaving EXCEL and grading your assignment
__________________________ 68
Example 6: How to prepare a stem-and-leaf diagram
_______________________________ 69
Example 7: How to draw a frequency (or relative frequency)
polygon __________________ 71
Example 8: How to use EXCEL to calculate various numbers
that summarize the
characteristics of a population (or sample)
________________________________________ 73
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL _________ 75
Example 10: Further uses of EXCEL->As a calculator
_____________________________ 76
Example 11: Calculations with a discrete probability
distribution _____________________ 77
Example 12: Reading and storing constants for further use
__________________________ 79
Example 13: Using EXCEL to answer questions about
continuous distributions _________ 80
Example 14: How to calculate a chi-squared statistic for a
'goodness-of-fit' test _________ 82
Example 15: How to calculate a confidence interval for one
mean when σ is known ______ 84
Example 16: How to calculate a confidence interval for one
mean when σ is NOT known _ 85
Example 17: How to calculate a confidence interval for a
binomial proportion __________ 86
Example 18: How to calculate a test of hypothesis concerning
one mean when σ is NOT
known
_____________________________________________________
________________ 87
ii
Example 19: Large sample confidence intervals and tests of
hypothesis for differences between
two means when population variance is unknown and equal
_________________________ 89
Example 20: Confidence intervals and tests of hypothesis for
differences between two means
for independent samples: population variances are unknown but
equal ________________ 91
Example 21: Large sample confidence intervals and tests of
hypothesis for differences between
two binomial proportions.
_____________________________________________________
94
Example 22: How to carry out a one-way analysis of variance.
_______________________ 97
Example 23: .
_____________________________________________________
_________ 101
Example 24: How to use information from analysis of variance
to calculate confidence
intervals or test hypotheses about treatment means (including
least significant difference). 101
Example 25: How to perform a two-way analysis of variance.
_______________________ 103
Example 26: How to calculate a randomized complete block
analysis of variance _______ 106
Example 27: How to prepare a scatterplot of two variables.
_________________________ 108
Example 28: How to calculate a correlation coefficient.
____________________________ 111
Example 29: How to perform a regression analysis using
EXCEL ____________________ 113
Blank page
ASSIGNMENT 0
2 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #0
Purpose
This assignment is designed for use in the instructed
introduction for students using the
Introductory Statistics Laboratory for Excel (ISLeX) program.
NOTES
Login to ISLeX and get the data for Assignment 0. Then start
Microsoft Excel and
determine the answers to the questions in this assignment. When
finished, exit from EXCEL,
return to ISLeX and submit your answers.
In this assignment, all students use the same data set. In
remaining assignments, each
student will have unique data sets.
See the examples indicated by {Example } to learn how to use
EXCEL to perform a
particular task. Reference to an example will be given at the end
of each major task. The symbol
beginning of a new task.
Question A
Data called LAB0A.DAT in Table A represents measured
yields (q/ha, where 1q = 1
quintal = 100 kg) of a sample of wheat varieties tested at
Saskatoon.
EL
worksheet.
{Example 1}
midpoint (20.5 as its upper bin) and 1 as
the interval width (bin size).
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 3
Record the frequencies from the histogram into the following
table; add the relative
frequencies later.
Bin Midpoint Frequency Relative frequency
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
Record your answers to the following questions
1. How many observations were there in this sample?
2. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
3. How many observations were there in the class with midpoint
equal to 22?
{Example 2}
into two columns of the EXCEL
worksheet. Verify that you have entered the correct data.
Calculate and store relative frequencies
in a new column. Record relative frequencies in the above table.
{Examples 3 and 4}
ASSIGNMENT 0
4 INTRODUCTORY STATISTICS LABORATORY
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
calculate the mean value.
4. How many observations were there in this data set?
5. What was the mean yield of this sample of wheat varieties?
{Example 1, and Example 8 a and b}
recorded numerical answers to each of
the five questions, you should now leave EXCEL and submit
your answers for grading by the
ISLeX program.
{Example 5}
- END OF ASSIGNMENT 0 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 5
Blank page
ASSIGNMENT 1
6 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #1
Purpose
This lab is an introduction to tabular and graphical methods of
descriptive statistics.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX
program.
Question A
Data in Table A represents measured yields (q/ha, where 1q = 1
quintal = 100 kg) of a
sample of wheat varieties tested at Saskatoon.
ata into an EXCEL worksheet.
{Example 1}
midpoint (20.5 as the starting bin) and 1
as the interval width (bin size). Note that the lower endpoint of
any interval is the midpoint
minus one-half the interval width while the upper endpoint is
the midpoint plus one-half the
interval width. Record the frequencies in the preceding table;
add relative frequencies later.
Excel places data points that are on a bin boundary in the lower
bin.
Bin Midpoint Frequency Relative frequency
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
Record your answers to the following questions
1. How many observations were there in this sample?
2. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
3. How many observations were greater than 21.5 and less than
or
equal to 22.5
q/ha?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 7
{Example 2}
EXCEL Worksheet.
{Example 3}
these will be used in question C).
Check the data you have entered and verify that the relative
frequencies sum to 1.0 (within
0.001). Record the relative frequencies in the preceding table.
4. What is the relative frequency of yields in sample A that
were greater than
21.5 and less than or equal to 22.5 ?
{Example 4}
-and-leaf diagram of the data from sample A.
Use an increment of 1.0 between
consecutive stem positions (leaf unit = 0.1). Use the stem-and-
leaf diagram to answer the
following questions.
5. What is the value (in q/ha) of the leaf unit in this stem-and-
leaf diagram?
6. What is the yield (in q/ha) for the item represented by the last
leaf position in
the fifth (from the top) stem position?
{Example 6}
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
{Example 1}
midpoint (24.5 as the first bin) and 1 as
the interval width.
ASSIGNMENT 1
8 INTRODUCTORY STATISTICS LABORATORY
Record the frequencies in the following table; add relative
frequencies later.
Bin Midpoint Frequency Relative frequency
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
29.5 29
30.5 30
31.5 31
32.5 32
33.5 33
34.5 34
35.5 35
36.5 36
Record your answers to the following questions
7. How many observations were there in this sample?
8. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
9. How many observations fell between 31.5 and 32.5 q/ha?
{Example 2}
EXCEL Worksheet.
Calculate the relative frequencies in each class.
Check that the correct information has been entered, that
frequencies sum to the total
number of observations and that the relative frequencies sum to
1.0.
Record the relative frequencies in the preceding table. Answer
the following question.
10. What is the relative frequency of yields in sample B that
were greater than
31.5 and less than or equal to 32.5 q/ha ?
{Example 4}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 9
Question C
Compare the distributions of yields of wheat varieties in
sample A (Saskatoon) with those
from sample B (Tisdale).
from both samples. Include
appropriate titles and axis labels. Use different line types for
each sample.
Answer the following questions from the relative frequency
polygon.
11. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the highest
relative frequency in the class whose midpoint is 26 q/ha?
(Answer 1 or 2; 0 if same)
12. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the greatest
spread looking at the midpoints?
(i.e. greatest difference between maximum and minimum
midpoint values)?
(Answer 1 or 2; 0 if same)
{Example 7}
recorded numerical answers to each of
the twelve questions, you should now leave EXCEL and submit
your answers for grading by the
ISLeX program.
{Example 5}
- End of Assignment #1 -
ASSIGNMENT 2
10 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistical Laboratory
Assignment #2
Purpose
The three main objectives of this assignment are to:
a) use numerical values as descriptive statistics,
b) introduce the concept of sampling from a population, and
c) demonstrate the effects of sample size.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX program.
Question A
Data in Table A represents protein concentrations (g/kg) of
boxcar lots of durum wheat
delivered to Thunder Bay, Ontario. This data is supposed to be a
population of data points.
EXCEL worksheet, and name the
column. When viewing the data for the first time, you should try
to determine approximately the
number of items and guess at the average value. Scan the data to
try to determine what the
smallest and largest values are.
{Example 1}
record the values of the following
population characteristics (i.e. parameters).
1. How many data points are there in this data set?
2. What is the mean protein concentration (g/kg)?
3. What is the minimum protein concentration?
4. What is the maximum protein concentration?
5. What is the median protein concentration?
6. What is the value of the first quartile?
7. What is the value of the third quartile?
8. What is the standard deviation of the population of protein
concentrations?
{Example 8}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 11
Question B
The data in Table B constitutes 10 random samples, each of
size 7, from the population of
protein concentrations. The data file contains seven rows of
data with each row containing ten
columns.
ata for the ten samples into columns of the
EXCEL worksheet.
{Example 1}
the mean, median, standard
deviation, minimum, maximum, first quartile and third quartile
of each of the ten samples.
Record these descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
{Example 9}
calculated in question A to answer
the following questions.
These questions are designed to get you thinking about how
well sample statistics
represent the characteristics of the population from which the
sample was taken.
9. How many of the ten sample means are less than or equal to
the
population mean?
ASSIGNMENT 2
12 INTRODUCTORY STATISTICS LABORATORY
10. How many of the ten sample medians are exactly equal to
the population
median?
11. How many of the ten sample minimums are less than or
equal to the
population minimum?
12. How many of the ten sample maximums are greater than or
equal to the
population maximum?
13. How many of the sample first quartiles are less than or
equal to the
population first quartile?
14. How many of the sample third quartiles are greater than or
equal to the
population third quartile?
15. Which sample has the largest standard deviation?
16. Which sample has the largest range (=Maximum -
Minimum)?
17. What is the ratio of the largest sample standard deviation to
the smallest
sample standard deviation?
18. What is the ratio of the largest sample mean to the smallest
sample mean?
19. Of the two ratios (Questions 17 and 18), which is the
largest, the ratio
of standard deviations (17) or the ratio of means (18)?
{Answer 17 or 18}
{Example 10}
Question C
The data in Table C constitutes 10 random samples, each of
size 27, from the population
of protein concentrations.
EXCEL worksheet.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 13
median, standard deviation,
minimum, maximum, first quartile and third quartile of each of
the ten samples. Record the
descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 27
2 27
3 27
4 27
5 27
6 27
7 27
8 27
9 27
10 27
{Example 9}
questions A and B to answer the
following questions.
The following questions are designed to get you thinking about
how the size of the
sample affects relationship between sample statistics and
population parameters.
20. How many of the ten sample minimums were exactly equal
to the population
minimum?
21. How many of the ten sample maximums were exactly equal
to the population
maximum?
22. For samples of size 27, what is the ratio of the largest
sample mean to the
smallest sample mean?
23. For samples of 27, what is the ratio of the largest sample
standard deviation
to the smallest sample standard deviation?
ASSIGNMENT 2
14 INTRODUCTORY STATISTICS LABORATORY
For the following questions, answer 0 if the statement is false
or 1 if it is true.
24. The ratio of the largest sample mean to the smallest sample
mean was less in
samples of 27 than in samples of 7.
25. The ratio of the largest to the smallest sample standard
deviations was greater
in the larger samples.
{Example 10}
- Please use ISLeX to record and grade your answers -
- END OF ASSIGNMENT 2 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 15
Blank page
ASSIGNMENT 3
16 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #3
Purpose
This assignment is and introduction to questions concerning
discrete probability
distributions.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX
program.
Question A
A binomial experiment consists of repeated trials each with two
possible outcomes. The
outcome of any trial is independent of all other trials. The
binomial distribution gives the
probability that a number X of n independent trials will have
one type of outcome. X can be any
number from 0 up to the total number of trials.
The data in Table A gives the probabilities of observing that X
= 0, 1, .. 20 out of 20
flower seeds from a given lot will germinate.
lumns of the EXCEL worksheet and
attach appropriate names to those
two columns. Then, record the probabilities in the following
table.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 17
Number germinated (out of 20) Probability
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Use the table to answer the following questions.
1. What is the probability that all 20 seeds in a random sample
of 20 seeds will
germinate?
2. What is the probability that fewer than 15 seeds in a random
sample of 20
seeds will germinate?
ASSIGNMENT 3
18 INTRODUCTORY STATISTICS LABORATORY
3. What is the probability that at least 17 seeds in a random
sample of 20 will
germinate?
4. What is the probability that the number of seeds in a random
sample of 20 that
will germinate is between 10 and 15?
HINT: Do not include 10 and 15.
5. What is the probability that the number of seeds in a random
sample of 20 that
will germinate will be less than 10 or greater than 17?
HINT : You will have to add the probabilities for 0, 1, .. 9
and 18, 19, 20.
6. What is the mean of this binomial distribution?
HINT: The mean of a discrete variable can be calculated by
summing the
products of each value multiplied by its corresponding
probability.
7. What is the variance of this binomial distribution?
HINT : The variance of a probability distribution is the
mean of the
squares of values minus the square of the mean of values.
{Example 11}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 19
Question B
This question is based on a Poisson discrete probability
distribution. The distribution is
important in biology and medicine, and can be dealt with in the
same way as any other discrete
distribution.
Red blood cell deficiency may be determined by examining a
specimen of blood under
the microscope. The data in Table B gives a hypothetical
distribution of numbers of red blood
cells in a certain small fixed volume of blood from normal
patients. Theoretically, there is no
upper limit to the value of a POISSON distribution. In reality,
you can force only so many red
blood cells into a given volume.
worksheet, name the columns, and
view the table. Since the table is quite large, you should attempt
to answer the following
questions without actually recording the table.
{Example 1}
questions.
8. What is the probability that a blood sample from this
distribution will have
exactly 20 red blood cells?
9. What is the probability that a blood sample from a normal
person will have
between 19 and 26 red blood cells?
HINT: See questions 3 and 4.
10. What is the probability that a blood sample from a normal
person would have
fewer than 10 red blood cells?
11. What is the probability that a blood sample from a normal
person will have at
least 15 red blood cells?
HINT: Since there is no theoretical upper limit to the Poisson
distribution, the
correct way to answer this question is to calculate 1 –
probability of fewer than
15 red blood cells.
ASSIGNMENT 3
20 INTRODUCTORY STATISTICS LABORATORY
12. A person with a red blood cell count in the lower 2.5
percent of the
distribution might be considered as deficient. What is the red
blood cell
count below which 2.5 percent of the distribution lies?
HINT: You need to determine a value X so that if you sum all
the probabilities
for counts up to and including that value they will sum to at
least 0.025. The
sum of probabilities of all counts up to but excluding X should
be less than
0.025.
You can proceed in the following way.
Look at the table to guess how many probabilities (P[X = 0]
+ P[X = 1] + . . )
should be added to give a sum of approximately 0.025.
Calculate sums of
probabilities for your guess of X.
Continue your guessing of X until you get a sum ≥ 0.025
while the sum for
X-1 < 0.025.
13. What is the mean red blood cell count in this distribution?
14. What is the variance of red blood cell count in this
distribution?
HINT: See question 7, and remember it is a Poisson
distribution.
15. Is the following statement true (1) or false (0) for this
distribution?
In a Poisson distribution, the variance is equal to the
mean (within
rounding error). Record 1 if true, 0 if false.
{Example 11}
Please enter your answers into the ISLeX program
- END OF ASSIGNMENT 3 –
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 21
Blank page
ASSIGNMENT 4
22 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #4
Purpose
This lab is an introduction to questions concerning cumulative
continuous probability
distributions.
NOTES
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX
program.
With continuous distributions, P{X = x} = 0. In words, the
probability that a continuous
variable equals a particular value is considered to be zero. For
this reason, all questions
concerning continuous distributions must be phrased in terms of
intervals. Furthermore, the
probability that a continuous variable is less than or equal (≤) to
a particular value is the same as
the probability that the variable is less (<) than that particular
value.
The EXCEL NORM.DIST function gives the probability that a
normal variable is less
than (or equal to) a specified constant.
The terminology concerning probability varies from one source
to another. For this
assignment, consider that probability = relative frequency =
proportion. Also for this
assignment, percentage = 100 * probability.
Question A
Suppose that height (cm) of male university students is
normally distributed with the
mean given in column 1 of Table A (LAB4A.DAT) and a
standard deviation given in column 2
of Table A.
heights from Table A and store
them for further use. The data file contains one row with two
columns. The first column contains
the mean, the second contains the standard deviation.
1. What is the mean height in this population?
2. What is the standard deviation of height in this population?
{Example 12}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 23
NORM.DIST function, to calculate
answers for the following questions.
3. What proportion of male university students are expected to
have a height
between 170 and 180 cm?
4. What percentage of male university students would have a
height less than
170 cm?
5. If a student is chosen at random from this population, what is
the probability
that he will be taller than 180 cm?
{Example 13}
Question B
Suppose that the average length of telephone calls made by
teenagers is a normally
distributed variable with mean and standard deviation given in
columns 1 and 2 of Table B
(LAB4B.DAT).
mean and standard deviation of the distribution of
lengths of telephone calls from the
first two columns of Table B and store them for further use.
{Example 12}
Use the values and the EXCEL NORM.DIST function to
calculate answers for the
following.
6. What is the mean length of telephone call?
7. What is the standard deviation of this distribution?
8. What is the probability that a random telephone call will last
a length of time
that is within one standard deviation of the mean (± 1 standard
deviation)?
9. What is the proportion of telephone calls that last a length of
time that is
within two standard deviations of the mean (± 2 standard
deviations)?
10. What is the relative frequency of lengths of teenage
telephone calls that lie
within three standard deviations of the mean (± 3 standard
deviations)?
11. What is the probability that a telephone call will be longer
than the mean by
more than 1.645 standard deviations?
{Example 13}
ASSIGNMENT 4
24 INTRODUCTORY STATISTICS LABORATORY
Question C
In a study conducted by Booth et al (Int. J. Sports Psychol.
17:269-279 1986), student
nurses at the University of Ottawa completed the Thurston-
Richardson attitude questionnaire and
voluntarily took the Canadian Home Fitness Test. They found
that the frequency response of
heart rates after a second exercise bout ranged from 101 to 190
beats per minute and seemed to
follow a normal distribution. The mean heart rate was 145 with
a standard deviation of 20.
and standard deviation = 20) to
calculate the answer to the following question.
12. What is the estimated proportion that had a frequency
response of less than
130 after the second exercise session?
{Example 13}
Question D
A standard normal distribution is one for which the mean is
zero and the standard
deviation is unity (1.0). This distribution is often referred to as
the z-distribution.
IST function to calculate answers
to the following questions.
13. What is the probability that a standard normal variable will
have a value less
than 1.96?
14. What is the probability that a standard normal variable will
have a value
between -1 and +1?
{Example 13}
Please enter your answers into the ISLeX program
- END OF ASSIGNMENT 4 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 25
Blank page
ASSIGNMENT 5
26 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #5
Purpose
The main objectives of this assignment are to:
a) use a goodness-of-fit test to demonstrate an important
statistical theorem and
b) calculate means and confidence intervals for a single sample
when σ is known and when σ is
not known.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
The central limit theorem states that means of samples of more
than 30 observations from
any distribution will have a distribution that
a) is approximately normal,
b) has a mean equal to the mean of the original distribution,
and
c) has a standard deviation equal to the standard deviation of
the original distribution
divided by the square root of the sample size.
The Poisson distribution is discrete and skewed; it is decidedly
non-normal! However,
the central limit theorem states that the means of sufficiently
large (n ≥ 30) samples from even a
Poisson distribution will be normally distributed.
The means of 100 samples, each of size 40, from a Poisson
distribution are recorded in
Table A. For this first question, you are required to use the
'goodness-of-fit' test to test the
hypothesis that the means in this file are normally distributed
with a mean of 10 and a standard
deviation of 0.5.
distribution into the EXCEL worksheet.
{Example 1}
the sample means.
1. What is the mean of the 100 sample means?
2. What is the standard deviation of the 100 sample means?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 27
{Example 9}
means in each of the classes
indicated in the following table. Note that interval endpoints are
midpoint ± 0.5*width and the
interval midpoint is average of two endpoints. Use 9.31, 9.61,
9.91, 10.21, 10.51 and 10.81 as the
“bin” boundaries for the Excel HIST0GRAM procedure.
Class interval Midpoint Expected
frequency
Observed
Frequency
< 9.31 - 8.3794
9.31 - 9.61 9.46 13.3902
9.61 - 9.91 9.76 21.0881
9.91 - 10.21 10.06 23.4181
10.21 - 10.51 10.36 18.3379
10.51 - 10.81 10.66 10.1248
> 10.81 - 5.2616
3. What was the observed frequency of sample means that fell
between 9.91 and
10.21 ?
{Example 2}
EXCEL worksheet and the seven
observed frequencies into another column. Make sure that
expected and observed frequencies for
the same class are entered in the same row. Check that both
columns of data sum to 100 (within
rounding error). If they do not, correct your error(s).
{Example 3}
-of-fit test should now be used to see if the
observed frequencies in two or more
classes of observed values agree sufficiently well with those
expected on the basis of some
hypothesis. In this example, the hypothesis is that the means of
samples will be normal with
mean 10 and standard deviation 0.5.
The test requires that you calculate a chi-squared statistic by:
a) calculating the differences between the observed and
expected frequencies in each class,
b) squaring the differences and dividing by the expected
frequencies in each class, and
c) summing the values from step b.
ASSIGNMENT 5
28 INTRODUCTORY STATISTICS LABORATORY
4. What is the value of (O-E)2/E for the first class ?
5. What is the value of the chi-squared statistic (that is, the sum
over all seven
classes of (O-E)2/E) ?
With seven classes, the chi-square statistic has 7-1 = 6 degrees
of freedom and the critical
value of a 5% significance level is 12.6. If your test statistic is
less than 12.6, you should
conclude that the observed data show a good fit to the
hypothesis.
6. Does the data show a good fit to the normal distribution with
mean 10 and
standard deviation 0.5 (0 for no, 1 for yes) ?
7. Based on your limited experience, is the following statement
true (use 1) or
false (use 0)?
Means of samples of size 40 from a Poisson (discrete)
distribution are
approximately normal (continuous).
{Example 14}
Question B
The time (in minutes) required for six-year old children to
assemble a certain toy is
believed to be normally distributed with a known standard
deviation of 3.0. The data in Table
B gives the assembly times for a random sample of 25 children.
compute and report the mean and
standard deviation.
8. What was the mean assembly time for this sample of 25 six-
year old children?
9. What was the estimated standard deviation?
{Examples 1 and 9}
deviation is known or given, one
should use a standard normal distribution to calculate a
confidence interval for the population
mean. The procedure for calculating a large sample confidence
interval for one mean involves
three basic steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with known standard deviation the critical value is
z0.05 = 1.645).
b) calculate the margin of error of the estimate E = zα/2σ/√n,
and
c) calculate lower limit = mean – margin of error,
and upper limit = mean + margin of error
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 29
10. What was the margin of error of the estimate for a 90%
confidence interval?
11. What was the lower limit of the 90% confidence interval for
average
assembly time?
12. What was the upper limit of the 90% confidence interval for
average
assembly time?
13. From this example, would you say that the following
statement is true (use 1)
or false (use 0) ?
The lower confidence limit must always be less than the sample
mean and
the upper confidence limit must always be greater.
14. From this example, would you say that the following
statement is true (use 1)
or false (use 0)?
When one has a choice of a known (or given) standard deviation
and an
estimated standard deviation, one should ignore the estimated
standard
deviation in calculating confidence intervals.
{Example 15}
Question C
The level of monoamine oxidase (MOA) activity (nmol/hr/mg
protein) was recorded for
fourteen non-responsive depressed patients who had been
treated with phenylzine. MOA activity
is assumed to follow a normal distribution. The data are stored
in a single column of Table C.
You are asked to calculate a point estimate and an interval
estimate of the mean MOA activity of
this type of patient. Nothing is known about the variability of
MOA activity.
worksheet, and compute and
report the mean and standard deviation.
15. What was the point estimate for the mean MOA activity for
this sample of 14
depressed patients?
16. What was the standard deviation?
{Examples 1 & 9}
ASSIGNMENT 5
30 INTRODUCTORY STATISTICS LABORATORY
When data has a normal distribution but is from a small
(<30) sample or when data is from a
large sample (≥30) and in either case σ is not known, one
should use a t-distribution to calculate
a confidence interval for the population mean. The procedure
for calculating a confidence
interval for one mean when σ is not known involves three basic
steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with estimated standard deviation the critical value is
tα/2,n-1 = t0.05,13 = 1.771),
b) calculate the margin of error of the estimate, E = tα/2,n-
1s/√n, and
c) calculate lower limit = mean – margin of error
and upper limit = mean + margin of error
17. What was the margin of error of estimate for a 90%
confidence interval in
this sample of 14 depressed patients?
18. What was the lower limit of the 90% confidence interval for
average MOA
activity?
19. What was the upper limit of the 90% confidence interval for
average MOA
activity ?
20. From these examples, would you say that the following
statement is true (use
1) or false (use 0)?
All confidence intervals are calculated by calculating a point
estimate and then
subtracting and adding a margin of error of the estimate.
{Example 16}
- END OF ASSIGNMENT 5 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 31
Blank page
ASSIGNMENT 6
32 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #6
Purpose
The objectives of this assignment are to:
a) calculate a confidence interval for a proportion and
b) present confidence intervals and tests of hypothesis for
matched pairs.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
Opinion polls are a popular method for assessing product
preference, political preference,
and more. As a simple example, consider that a poll was taken
ten days prior to a civic election
to try to predict what proportion of the electorate would vote for
the incumbent mayor. The data
in Table A represents the results of a moderate sample of
persons who were asked if they would
vote for the same mayor; a yes was recorded as 1, a no as 0.
You are required to analyze the
results of the poll and predict what proportion of voters will
vote for the incumbent.
he EXCEL worksheet, prepare a
histogram to count the number of yes (1)
and no (0) responses, and calculate the proportion who
indicated that they would vote for the
incumbent mayor. Note that, since yes and no are represented by
1 and 0, the proportion of yes
can be determined by calculating the sum and dividing by the
total sample size.
1. How large was the sample of voters represented in this poll?
2. What proportion of the sample voters indicated they would
vote for the
incumbent mayor?
{Examples 1, 2 and 10}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 33
voters expected to vote for the
incumbent mayor. The procedure for calculating a confidence
interval for a proportion involves
three basic steps.
3. Determine the α/2 critical value for the appropriate
distribution (standard
normal in this case). Use the NORM.INV function to calculate
the critical value
=NORM.INV(0.95,0,1). What is the critical value for a 90%
confidence interval
based on the standard normal distribution?
4. What is the standard error of the estimated proportion of
polled voters who
favour the incumbent? n
qpsp
ˆˆ
ˆ =
5. What is the margin of error of the estimated proportion?
6. What is the lower 90% confidence limit on the proportion of
voters who will
vote for the incumbent?
7. What is the upper 90% confidence limit on the estimated
proportion of voters
who will vote for the incumbent?
{Example 17}
23,217 of the 58,839 persons that voted
actually voted for the incumbent. Calculate and report the actual
proportion that voted for the
incumbent.
8. What was the proportion that actually voted for the
incumbent?
9. Based on the results given in questions 6, 7 and 8, which of
the following
statements (1, 2 or 3) is most correct?
1 - The poll of a sample of voters gave a good indication of the
final vote.
2 - Many of the voters who would have voted for the incumbent
at the time of the poll
must have changed their minds.
3 - The persons sampled in the poll must have contained an
unusually low proportion of
those who favoured the incumbent.
{Example 10}
Question B
The Monster Chemical Company believes that its herbicide
(Avena-doom) is better than
its competitor's herbicide (Avena-kill) for controlling wild oat
in barley fields. To demonstrate
ASSIGNMENT 6
34 INTRODUCTORY STATISTICS LABORATORY
the advantage of their herbicide over that of their competitor,
Monster grew side-by-side plots of
barley treated with each of the two herbicides in a large sample
of farmers' fields throughout
western Canada. The company then wished to compare the
yields of barley treated with the two
types of herbicides.
Yield of barley will vary from farm to farm regardless of which
herbicide is used. A
difference in climate, differences in agronomic practices, and
differences in type of barley grown
cause variation. For this reason, it is desirable to match the data
from the two plots on each farm.
The analysis is one of looking at differences between matched
pairs.
with Avena-doom (second
column), and barley yield with Avena-kill (third column) from
the three columns in Table B into
columns of the EXCEL worksheet. Describe the data from the
two treatments.
10. What was the average barley yield for plots treated with the
Avena-doom
herbicide?
11. What was the standard deviation of yields of barley plots
treated with
Avena-doom ?
12. What was the average yield of plots treated with Avena-kill?
13. What was the standard deviation with Avena-kill?
{Examples 1 and 9}
calculated and then analyze the
differences.
14. What was the mean of the differences between yield of
barley plots treated
with Avena-doom and Avena-kill ?
15. What was the standard deviation of the differences (for each
pair)?
16. Was the standard deviation of the differences smaller (0) or
larger (1) than
the standard deviation of the barley yields from plots treated
with
Avena-doom?
{Examples 10 and 9}
differences in yield between plots
treated with Avena-doom and those treated with Avena-kill.
NOTE: The standard deviation is estimated from the data so
we use the t distribution.
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 35
17. What is the critical value for the confidence interval?
18. What was the margin of error of the estimated mean
difference?
19. What was the lower limit of the 95% confidence interval for
the average
difference in yields of barley treated with Avena-doom and
barley treated with
Avena-kill?
20. What was the upper limit?
{Example 16}
Question C
Use the same data and results of Question B to investigate the
hypothesis that the
increase in barley yield by using Avena-doom instead of Avena-
kill is no greater than 3.0 q/ha
(300 kg/ha). The alternative to this hypothesis is that the
increase is greater than 3 q/ha.
To test this hypothesis, one must calculate a test statistic, t =
Mean of differences - hypothesized mean ( =3.0)
Standard error of the differences
The null hypothesis should be rejected if the test statistic
exceeds the critical value from
the theoretical distribution. For a 5% significance level, α =
0.05, the critical value for a
one-tailed test can be found by using the appropriate T.INV
function (see Example 18) with n-1
degrees of freedom. For matched pairs, n is the number of pairs.
In this instance, the null hypothesis should be rejected if the
test statistic exceeds the
critical value.
21. 21. What is the value of the test statistic for testing the
hypothesis that the mean
difference is 3.0 q/ha or less?
22. What is the critical value against which the test statistic in
question 21 should
be compared?
23. Should the hypothesis that the yield difference is 3 q/ha or
less be rejected
(1) or not (0)?
{Example 18}
- END OF ASSIGNMENT 6 -
ASSIGNMENT 7
36 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #7
Purpose
This lengthy assignment serves to review calculations of
confidence intervals and tests of
hypothesis for:
a) two means of large independent samples from populations
with unknown and unequal
variances,
b) two means of small independent samples from populations
with the same unknown variance,
c) two proportions from large independent samples.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your
answers into the ISLeX program.
Question A
The role that cholesterol plays in the development of
"hardening of the arteries"
(atherosclerosis) and heart disease has been widely reported. In
one experiment, a group of
patients who were considered to be high-risk were split into two
equal groups. The first group
was put on a special diet with a high proportion of fish (salmon,
tuna, mackerel and cod). Oil
from these deep-sea fish is known to be very rich on Omega-3
fatty acids. The other (control)
group was maintained on a standard diet (high-protein, low-fat,
complex carbohydrates and
polyunsaturated cooking oil). The change (decrease) in
cholesterol was measured after a period
of time. A greater change is desirable.
The (simulated) data (mg decrease per decilitre of blood) for
the Omega-3 group is stored
in Table A1, and the data for the control group is stored in
Table A2. You are required to
calculate a 95% confidence interval for the average difference
in cholesterol reduction and to test
the hypothesis that there was no difference between the two
diets in average reduction of
cholesterol.
m the 'Omega-3' group [Table A1] the data
from the 'control' group [Table
A2] into the EXCEL worksheet. Determine and report the
number of observations in each group,
the mean change (mg/dl) in each group and the standard
deviation of the change in each group.
1. How many patients were in each diet group?
2. What was the mean (decrease) in cholesterol for the Omega-3
group of
patients?
3. What was the standard deviation in that group?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 37
4. What was the mean (decrease) in cholesterol for the control
group of patients?
5. What was the standard deviation in the control group?
{Examples 1 and 9}
variances that are unequal. We
can use the normal distribution as an approximation to the t
distribution when the sample sizes
are large. The method for calculating a large-sample confidence
interval for the difference
between two means consists of three basic steps.
a) Estimate the difference between the two sample means
and the standard error of the
difference between the two sample means.
6. What is the estimated difference of means?
7. Standard error of the difference between means
2
2
2
1
2
1
n
s
n
s
+=
What is the standard error of the difference of means?
b) Calculate the margin of error of the estimated difference of
means. For this
large-sample 95% confidence interval we can approximate with
a z value which is z0.025 = 1.96.
Calculate the confidence interval as difference between means ±
margin of error.
8. What is the margin of error of the estimated difference?
9. What is the lower limit for the 95% confidence interval of the
difference in
cholesterol reduction between Omega-3 and control diets?
10. What is the upper limit?
{Example 19}
difference between the two diets
proceeds as follows. Since we expect that the Omega-3 diet
should give a greater decrease in
cholesterol than the control, we will use a one-tailed alternative
hypothesis. Use a 5%
significance level to test the null hypothesis that there is no
difference between the diets against
an alternative that the difference between Omega-3 and control
groups is greater than zero.
The test of hypothesis has two basic steps:
ASSIGNMENT 7
38 INTRODUCTORY STATISTICS LABORATORY
a) Compute the test statistic (z) as the difference in means
divided by standard error of the
difference.
b) The null hypothesis should be rejected if the test statistic
exceeds the critical value for
a one-tailed alternative (approximately 1.645 for 5%
significance in a large-sample, one-tailed
test).
11. What is the value of the test statistic?
12. Should the null hypothesis be rejected and the conclusion be
that Omega-3
diet did indeed cause a greater reduction in cholesterol than the
control diet?
Yes =1, No = 0
{Example 19}
Question B
In some law schools, the score on a test known as LSAT is an
important criterion for
acceptance. Two law schools decided to compare the LSAT
scores of students registered in their
respective schools. LSAT scores for students in Law school 1
are stored in Table B1 and those
for students from Law school 2 in Table B2.
Assume that the variances of LSAT scores are equal in the two
schools. You are asked to
calculate a 90% confidence interval for the difference in
average LSAT scores and to test the
hypothesis that students from the two schools do not differ in
their average LSAT scores. Use a
5% significance level.
from Law school 2 into the
EXCEL worksheet. Compute and report the number, means and
standard deviations of scores
from each school.
13. How many LSAT scores from school 1?
14. What was the mean LSAT score from school 1?
15. What was the standard deviation of scores from school 1?
16. How many LSAT scores from school 2?
17. What was the mean LSAT score from school 2?
18. What was the standard deviation of scores from school 2?
{Examples 1 and 9}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 39
eps to calculate a 90% confidence
interval for the difference in mean
LSAT scores when variances are unknown but assumed to be
equal.
a) calculate the difference between the two means (school 1 -
school 2)
b) calculate the pooled variance for the two samples:
c) calculate the standard error of the difference:
d) Calculate the critical value and margin of error for α = 0.10.
Use the T.INV function to
get the critical value. Multiply the critical value by the standard
error of the difference to get the
margin of error. Use degrees of freedom = n1 + n2 – 2.
e) Calculate the lower and upper 90% confidence limits
19. What is the estimated pooled variance for this data?
20. What is the standard error of the difference?
21. What is the margin of error of the difference?
22. What is the lower limit of the difference between the two
schools in LSAT
scores?
{Example 20}
pools =
( 1n -1) 21s + ( 2n -1)
2
2s
( 1n -1) + ( 2n -1)
1n = size, sample 1
2n = size, sample 2
1s = st.dev, sample 1
2s = st.dev, sample 2
sx1−x2 = pools (
1
1n
+ 1
2n
)
ASSIGNMENT 7
40 INTRODUCTORY STATISTICS LABORATORY
hypothesis that the means of the
two groups of LSAT scores are equal when the samples are
independent and the population
variances are unknown but equal. The test statistic is the
difference in means minus zero divided
by the standard error of the difference. The null hypothesis
should be rejected if the test statistic
is less than -tα/2,df or greater than tα/2,df where df = n1 + n2 -
2 and α=0.05 is the chosen
significance level. Use the T.INV function to calculate the
critical values for this two-tailed test.
23. What is the value of the test statistic for testing the
hypothesis that the mean
LSAT scores are the same for the two law schools?
24. Using the 5% significance level, should the null hypothesis
be rejected (1) or
not (0)?
{Example 20}
Question C
The legislature of a southern state in the U.S. passed a rule,
commonly called "no-pass,
no-play", which prohibits a student who fails in any subject
from participating in any
extracurricular activity for six weeks. Data were collected for
students involved in football,
volleyball, cross country, and band for the first six-week
grading period. Records were kept from
last year and this year.
The numbers of students is stored in column 1 and the
proportions sidelined because of
the rule are stored in column 2 of Table C, the first row being
for last year and the second for this
year.
values.
25. How many students were there in last year's sample?
26. What proportion of the last year's students were sidelined
because of one or
more failures?
27. How large was this year's sample?
28. What proportion failed and were sidelined this year?
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 41
change (last year minus this year)
in proportion of students sidelined.
a) Calculate the difference in proportions.
b) Calculate the standard error of the difference.
n
)p-(1p
+
n
)p-(1p
= s
2
22
1
11
pp
ˆˆˆˆ
2ˆ1ˆ −
c) Calculate the margin of error of estimate. For a 90%
confidence interval with large
samples, use z0.05 = 1.645.
d) Calculate the lower and upper limits.
29. What is the upper 90% confidence limit on the change in
proportion of
students sidelined because of failure?
{Example 21}
an alternative that the proportion
sidelined has decreased (that is, the difference in proportions is
greater than zero). Use a 5%
significance level.
NOTE: Under the null hypothesis, the proportions are equal
and we should therefore calculate
an average proportion for the two groups. This will result in a
new estimate of the standard error
of the difference between sample proportions.
average (pooled) proportion =
30. What was the average (pooled) proportion sidelined?
31. Now use the pooled proportion to calculate the standard
error of the
difference between the two proportions.
)
n
+
n
)(p-(1p = s
2
pp
11
1
2ˆ1ˆ −
What is the value of the test statistic for testing the hypothesis
that the
proportion did not change (remember to divide by the standard
error of the
difference between the two proportions which was calculated
using the
pooled proportion)?
n + n
pn +pn = p
21
2211 ˆˆ
ASSIGNMENT 7
42 INTRODUCTORY STATISTICS LABORATORY
Use a one-tailed test with a 5% significance level to answer
the following question.
Remember that you will reject the null hypothesis if the test
statistic exceeds the critical value
(1.645 in this case).
32. Was the superintendent of schools justified in saying, "We
are very pleased
with the improvement. It shows coaches and students are taking
the rule
seriously"? Answer 1 for yes or 0 for no.
{Example 21}
- END OF ASSIGNMENT 7 –
ASSIGNMENT 8
44 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #8
Purpose
In this assignment calculations will be completed for analyses
of variance for :
a) a one-way design,
b) a two-way design with more than one observation per cell,
and
c) a two-way design with one observation per cell (randomized
complete block design)
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When you
have completed the assignment and exit from EXCEL, you are
required to enter your answers into
the ISLeX program.
Question A
Gasoline mileage (mpg) was measured on several cars of each
of four different makes
(coded 1, 2, 3 and 4). The make of each car is stored in the first
column, and the mileage for each
car is stored in the second column, of Table A. You need to
conduct an analysis of variance to see if
there are differences among the four makes in gasoline mileage.
You should also estimate the
mileage of each of the four makes of cars.
worksheet. Name the columns and
view the data.
{Example 1}
-way analysis of variance on this data. Since
each data point can be classified only
according to the make of car, a one-way analysis of variance is
required. It is important that students
be able to interpret analysis of variance tables such as those
produced by EXCEL. For this analysis,
you will need to copy data for each make into different adjacent
columns. Fill in the following
one-way analysis of variance table and answer the first five
questions.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Make of car 3
Error
1. What is the value of the F-statistic for testing the null
hypothesis that there are no
differences in gasoline mileage among the four makes of
automobile?
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 45
2. What are the degrees of freedom associated with the
numerator of this test
statistic?
3. What are the degrees of freedom associated with the
denominator of the F-value
for MAKE of car?
4. What is the estimate of the pooled variance within makes of
cars (i.e. the Error
mean square)?
5. What are the degrees of freedom for this variance in #4?
{Example 22}
NOTE: For the following questions (6 - 13), use the error mean
square and the error degrees of
freedom to calculate confidence intervals and to test hypotheses
about pairs of means.
car
and record them in the following table.
Make of car Number tested Average mileage
1
2
3
4
6. How many cars of make 2 were evaluated in this experiment?
7. What was the average gasoline mileage for make 2?
8. How many cars of make 3 were evaluated in this experiment?
9. What was the average gasoline mileage for make 3?
make 2. Use the method for single
means when σ is not known, but use the Error Mean Square as
the estimate of the variance. The
degrees of freedom will be the Error DF, not n-1!
Reminders:
Confidence Interval = mean ± margin of error
Margin of error = critical value * standard error
Use critical value for T at α/2 = 0.025 and df = error df (t table
or EXCEL T.INV function)
Use standard error = √(error mean square/number of
observations of that make of car)
10. What was the margin of error for the confidence interval for
gasoline mileage
of make 2?
ASSIGNMENT 8
46 INTRODUCTORY STATISTICS LABORATORY
11. What was the lower 95% confidence limit for make 2
mileage?
12. What was the upper 95% confidence limit for make 2
mileage?
{Example 24}
of makes
2 and 3 do not differ. Use the
method for single means when σ is not known with the Error
MS serving as the pooled variance.
Reminders:
Test statistic t = difference of means / standard error of
difference of means.
The standard error of the difference equals square root of the
sum of variances of the two
means. The variance of each mean is estimated by the error
mean square/number of
observations in that mean.
13. What is the value of the t test statistic for testing the
hypothesis that makes 2
and 3 do not differ in mileage?
{Example 24}
Question B
The data in Table B represents the times (in seconds) for men
of three different ages (40, 50
and 60) in each of three different fitness classes (1, 2 and 3) to
run a 2 km course. For each runner,
age is recorded in the first column, fitness category is recorded
in the second column, and running
time is recorded in the third.
Two men in each of the nine categories ran the course. You
should be interested in
determining whether age and/or fitness affect running time.
Each data point can be classified
according to age of the runner or according to fitness of the
runner. The data therefore requires a
two-way analysis of variance. It is possible that differences
among ages of runner will depend upon
the fitness categories of those two runners. The model for the
analysis should include an interaction
term.
the columns, and view the data. You
will have to copy the data into three different columns each
with six observations in order to
perform the following analysis (see Example 25).
{Example 1, 25}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 47
out a two-way analysis of variance and answer the
following questions.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Age of runner 2
Fitness of runner 2
Interaction 4
Error 9
14. What is the value of the F test statistic for testing the
hypothesis that age, on
average, has no effect on running time?
15. What are the numerator degrees of freedom for that F
statistic reported in
question 14?
16. What are the denominator degrees of freedom for that F
statistic reported in
question 14?
17. What is the value of the F test statistic for testing the
hypothesis that fitness, on
average, has no effect on running time?
18. What is the value of the F test statistic for testing the
hypothesis that the effect
of age (if any) on running time does not depend of the runner's
fitness?
NOTE
In analysis of variance, the null hypothesis should be rejected
whenever the calculated F-statistic is
greater than the critical value for a chosen significance level
and appropriate numerator and
denominator degrees of freedom. Equivalently, the null
hypothesis should be rejected whenever the
computed p-value is less than the chosen significance level. Use
α = 0.01 (significance level =1 %)
and answer the following two questions.
19. Should the null hypothesis that age has no effect on running
time be rejected (1)
or not rejected (0)?
20. Should the null hypothesis that the effect of age is
independent of the effect of
fitness be rejected (1) or not rejected (0)?
{Example 25}
ASSIGNMENT 8
48 INTRODUCTORY STATISTICS LABORATORY
following three questions.
Age Fitness 1 Fitness 2 Fitness 3 Average
40
50
60
Average
21. What was the average running time for all 60-year olds?
22. What was the average running time for all men in fitness
category 3?
23. What was the mean running time of the two 60-year,
category 3 runners?
{Example 25}
Question C
In many agricultural and biological experiments, one may use a
two-way model with only
one observation per cell. When one of the factors is related to
the grouping of experimental units
into more uniform groups, the design may be called a
randomized complete block design (RCBD).
The analysis is similar to a two-way analysis of variance
(question B) except that the model does
not include an interaction term.
The specific leaf areas (area per unit mass) of three types of
citrus each treated with one of
three levels of shading are stored in Table C. The first column
contains the code for the shading
treatment, the second column contains the code for the citrus
species, and the third column contains
the specific leaf area. Assume that there is no interaction
between citrus species and shading. Carry
out a two-way analysis of this data.
The shading treatment and citrus species are coded as follows:
Treatment Code Species Code
Full sun 1 Shamouti orange 1
Half shade 2 Marsh grapefruit 2
Full shade 3 Clementine mandarin 3
leaf area into the EXCEL worksheet,
label the columns and look at the data.
{Example 1}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 49
-way (without interaction) analysis of this
data and answer the following questions.
Use a 5% significance level.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Shading treatment 2
Citrus species 2
Error 4
24. Should the hypothesis that shading treatment has no effect
on specific leaf area
be rejected (1) or not (0)?
25. Should the hypothesis that citrus species do not differ in
specific leaf area be
rejected (1) or not (0)?
26. What is the estimate of the average (pooled) variance in this
experiment (i.e.
Error mean square)?
27. What are the error degrees of freedom for the pooled
variance?
{Example 26}
Recall that the confidence interval for a difference between two
means is based on a
calculation of the margin of error of the estimated difference.
With a common variance (Error MS)
and the same number of observations in all shading treatments,
the margin of error of an estimated
difference will be the same whether we calculate it for
treatments 1 and 2, 1 and 3, or 2 and 3. This
margin of error of the difference between two means is
sometimes referred as the least significant
difference (LSD).
experiment.
LSD = critical t value × standard error of difference.
Use the critical t value with 4 degrees of freedom is t 0.025,4 =
2.776.
n is the number of times of times each treatment was tested (in
this case n = 3 for the 3 species).
n
quareErrorMeanS
t=)LSD( edf/2,
*2
αα
28. What is the least significant difference (α = 0.05) for
comparing shading
treatments in this experiment?
{Example 24}
ASSIGNMENT 8
50 INTRODUCTORY STATISTICS LABORATORY
Any two shading treatments are judged to be significantly
different if their absolute (ignore
the + or - sign) difference exceeds the least significant
difference.
differences. Compare the appropriate
differences to the LSD to answer the following questions.
Shading Treatment Mean Specific Leaf Area
Full Sun
Half Shade
Full Shade
29. Should the hypothesis that the specific leaf area under full
sun is not different
from the specific leaf area in half shade be rejected (1) or not
rejected (0)?
30. Should the hypothesis that the specific leaf areas of half
shade and full shade
are not different be rejected (1) or not rejected (0)?
{Example 24}
- END OF ASSIGNMENT 8 -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 51
Blank page
ASSIGNMENT 9
52 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #9
Purpose
This final assignment presents some of the important points to
consider in correlation
analysis and simple linear regression analysis.
Question A
The data in Table A gives the (simulated) advertising
expenditures of 25 large companies
for last year and this year. You are asked to investigate the
question of whether or not expenditures
in one year are related to expenditures in another. The data file
contains the company number in the
first column, last year's expenditures ($ millions) in the second
column, and this year's expenditures
($ millions) in the third column.
t,
name the columns, and view the data.
1. Which company had the greatest advertising expenditures last
year?
2. Which company had the greatest advertising expenditures this
year?
{Example 1}
ditures in the
two years and answer the following
question.
3. Which of the following three statements (1, 2 or 3) most
correctly describes the
relationship between last year's and this year's expenditures?
1 - There is little relationship between what a company spends
on advertising in one year and
what that company spends in another.
2 - Companies that spent most on advertising last year tended
to be among those spending the
greatest amount this year.
3 - Companies that spend a lot on advertising in one year tend
to reduce their advertising
expenditures in the next.
{Example 27}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 53
riables
can be measured by the covariance.
The covariance is a measure of how much two random variables
vary together. The larger the
magnitude of the product, the stronger the strength of the
relationship.
The value of the covariance is interpreted as follows:
• Positive covariance - indicates that higher than average values
of one variable tend to be
paired with higher than average values of the other variable.
• Negative covariance - indicates that higher than average
values of one variable tend to
be paired with lower than average values of the other variable.
• Zero covariance - if the two random variables are independent,
the covariance will be
zero. However, a covariance of zero does not necessarily mean
that the variables are
independent. A nonlinear relationship can exist that still would
result in a covariance
value of zero.
Calculate the standard deviation for last year's expenditures, the
standard deviation for this year's
expenditures and the covariance between the two.
4. What is the standard deviation of last year's advertising
expenditures ($ millions)
of these 25 companies?
5. What is the standard deviation of this year's advertising
expenditures ($ millions)
of these 25 companies?
6. What is the covariance between the last year's and this year's
advertising
expenditures ($ millions2) of these 25 companies?
Because the covariance depends on the units of the data, it is
difficult to compare covariances
among data sets having different scales. A value that might
represent a strong linear relationship
for one data set might represent a very weak one in another.
The correlation coefficient (r) addresses this issue by
normalizing the covariance (i.e. divide the
covariance sxy by the product of the two standard deviations (sx
* sy)), creating a dimensionless
quantity that allows the comparison of different data sets.
7. What is the correlation (r) between last year's and this year's
expenditures?
{Example 28}
ASSIGNMENT 9
54 INTRODUCTORY STATISTICS LABORATORY
expenditures from one year to another?
Test the null hypothesis that there is no relationship between
last year's and this year's expenditures
against an alternative that there is a positive relationship (r >
0). Use a 10% significance level.
Because this is a one-tailed test with 25 pairs of observations
(degrees of freedom = 23), we find
that the critical value against which to compare the estimated
correlation is t = 1.319. Using your r
value and n = 25, calculate the test statistic tcalc and compare.
If the test statistic is greater than the
critical value of 1.319, the null hypothesis will be rejected.
21
2
r
nr=tcalc −
−
8. Should the hypothesis that there is no relationship between
last year's and this
year's advertising expenditures be rejected (1) or not (0)?
{Example 28}
Question B
In a study of the role of young drivers in automobile accidents,
data on percentage of
licensed drivers under the age of 21 and the number of fatal
accidents per 1000 licenses were
determined for 32 cities. The data are stored in Table B. The
first column contains a number as the
city code, the second column contains the percentage of drivers
who are under 21, and the third
column contains the number of fatal accidents per 1000 drivers.
The primary interest is whether or
not the number of fatal accidents is dependent upon the
proportion of licensed drivers that are under
21.
py the data into the EXCEL worksheet, name the
columns, and view the data.
9. Which city (number) had the highest number of fatal
accidents per 1000 licensed
drivers?
{Example 1}
percentage of drivers under 21. Based on the
plot, try to anticipate whether or not the following analysis will
show that there is a significant
increase or decrease in number of fatalities with increases in
percentage of drivers under 21.
{Example 27}
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 55
can be used to predict levels of a
dependent variable for specified levels of an independent
variable. Use the EXCEL REGRESSION
command to calculate the intercept and slope of the least-
squares line, as well as the analysis of
variance associated with that line. Fill in the following table
and use the results to answer the next
few questions. Carefully choose your independent and
dependent variables and input them
correctly using EXCEL’s regression command. In this example,
the percentage of drivers under the
age of 21 affects the number of Fatals/1000 licenses.
The regression equation (least-squares line) is
Fatals/1000 licenses = + % under 21
(intercept) (slope)
Analysis of variance
Source DF SS MS F P
Regression 1 ________ _______ ________ _______
Residual (Error) 30 ________ _______
10. What is the estimated increase in number of fatal accidents
per 1000 licenses
due to a one percent increase in the percentage of drivers under
21 (i.e. the
slope)?
11. What is the standard deviation of the estimated slope?
12. What is the estimated number of fatal accidents per 1000
licenses if there were
no drivers under the age of 21 (i.e. the y intercept)?
13. What percentage of the variation in accident fatalities can
be explained by the
linear relationship with drivers under 21 (i.e. 100 × the
unadjusted coefficient
of determination)?
14. Should the hypothesis that the slope does not differ from
zero (no effect of
young drivers on fatals) be rejected (1) or not (0) based on a
test at the 1%
significance level (i.e. is the p-value from the ANOVA less than
0.01)?
15. What are the degrees of freedom for the standard error of
estimate (and the
standard deviation of the slope); i.e. what are the error degrees
of freedom?
{Example 29}
ASSIGNMENT 9
56 INTRODUCTORY STATISTICS LABORATORY
to calculate a confidence interval for
the slope of the least-squares line and to test hypotheses other
than H0 : ß1 = 0. In both cases, one
needs to have an estimate of the slope and of its standard
deviation (sometimes called standard
error). Furthermore, one needs to recognize that the degrees of
freedom for the standard deviation is
the same as the error degrees of freedom (n - 2).
Note that the EXCEL gives the standard error of estimate
directly, but correctly calls it the standard
deviation of the slope. Therefore, you must not divide by the
square root of sample size as in
example 16.
Use the above information to calculate a 90% confidence
interval for the slope of the true regression
line. For 30 degrees of freedom and α = 0.1, the critical t-value
is 1.697.
16. What is the margin of error for calculating a 90%
confidence interval for the
slope of the regression line (i.e. 1.697 × the standard deviation
of the slope)?
17. What is the lower 90% confidence limit for the slope?
(i.e. slope – margin of error)
18. What is the upper 90% confidence limit for the slope?
(i.e. slope + margin of error)
null hypothesis H0 : ß1 = 0.05 against
a one-sided alternative H1 : ß1 > 0.05. Use a 1 percent
significance level (for which the critical value
is 2.423).
Reminder : t = estimated value - hypothesized value = slope
- 0.05
standard error (deviation) of estimate st dev of slope
19. What is the value of the test statistic for testing this
hypothesis?
20. Should the hypothesis that the increase in fatals per one
percent increase in
drivers under 21 is not greater than 0.05 be rejected (1) or not
(0)?
- END OF ASSIGNMENT #9 - THE LAST ASSIGNMENT -
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 57
Introductory
Statistics
Laboratory
for Excel
PC Instructions for Excel 2013
EXCEL EXAMPLES INTRODUCTION
58 INTRODUCTORY STATISTICS LABORATORY
Excel Examples
INTRODUCTION
Note: Specific Excel 2013 instructions are shown in [Excel
2013: ] throughout the excel
examples.
These EXCEL examples provide a basis for learning to use
MICROSOFT EXCEL to
perform various tasks required in the ISLeX laboratory
assignments.
The examples may not refer exactly to the task to be
performed. For instance, in some
cases, the example may use different columns than required for
a particular task.
Your laboratory sessions will be much less frustrating if you
study the assignment and
associated examples before sitting down at a computer.
The examples will not match exactly what you need to do to
complete your assignments.
They should provide an adequate outline, but you will have to
modify the example to complete
your assigned task. For instance, you will need to use different
file names in your lab
assignments than those used in examples. You will also have to
refer to different EXCEL
worksheet columns.
The EXCEL workbook contains one or more worksheets each
identified by a tab on the
lower left part of the window. EXCEL will assign default
names, such as Sheet 1, to individual
worksheets or the user can change the name by clicking the
right mouse button on the tab and
choosing the 'rename' option.
Each worksheet is composed of cells arranged in rows and
columns. Rows are identified
by numbers 1, 2, 3 and so on, while columns are identified by
letters A, B, C and so on. After
column Z, naming starts with AA and proceeds to ZZ. Each cell
may contain a number, some
text, or a formula.
In this manual, only absolute referencing is used to refer to
cells or blocks of cells. To
refer to the cell located in the second row of column C, use C2.
To indicate all cells in the block
that includes rows 2 to 10 of columns B through D, use the cell
designations for the cell in the
upper left corner (i.e. B2) and for the cell in the lower right
corner (i.e. D10) separated by a
colon, thus B2:D10.
Sometimes, it will be useful to enter a formula into a cell and
then copy that formula to
other cells. If the formula in cell B2 refers to cell A1, it will
refer to cell D5 when the formula is
copied to cell E6. If you wish it to continue to refer to cell A1,
use $A$1 instead of A1 in the
formula.
INTRODUCTION EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 59
EXCEL commands and subcommands can be selected by
clicking the left mouse button
on the required command or subcommand.
When you first start using Excel, you should become familiar
with three important areas
in the Excel window. Mention has already been made of the
cells arranged in rows and columns
in the worksheet. In fact there may be several worksheets in a
single workbook.
If you place the cursor in a particular cell, the “Name box”
located at the upper left hand
side of the worksheet will indicate the identity of the active
cell, e.g. B5.
If you type a number, name or formula into that cell, it will
also appear in the “Formula
bar” at the top of the worksheet. If you then press the enter key,
the cursor will move to the next
cell and the formula bar will become blank (if the next cell is
empty). If you had entered an
actual formula, it will be evaluated and the evaluation will be
present in the cell that you entered
the formula. If you made an error and need to edit the formula,
highlight the cell and then move
the cursor to the formula bar to edit the formula.
In these laboratory assignments, you are sometimes required to
combine information
from two parts of an assignment. Typically, each part will result
in a separate workbook in
Excel. You can copy data from one workbook to another by
using the following procedure.
Highlight the data you wish to copy and press Ctrl-C to copy
the data.
Use the Window command of Excel to choose the workbook you
wish to copy to.
Place the cursor where you wish to past the data and press Ctrl-
V
Note: Rather than using Ctrl-C and Ctrl-V to copy and paste,
you may use Edit->Copy and
Edit->Paste.
Most data analysis tools of Excel default to printing their
results on a new worksheet.
However, most also have an option to specify an output range
on the same worksheet. If you
choose the Output range option, click in the adjacent box and
then highlight the area of the
worksheet where you wish to store the results.
EXCEL EXAMPLES EXAMPLE 1
60 INTRODUCTORY STATISTICS LABORATORY
Example 1: Copying data from the assignment webpage into the
EXCEL worksheet.
Your data will be presented to you in a web page. To copy
the data to Excel:
• First highlight the data and either press the key combination
ctrl-c, or select Copy from
the Edit menu to copy the data (to the clipboard).
• Then, switch to the Excel window and either use the key
combination ctrl-v, or select
Paste from the Edit menu to paste the data into Excel.
At this stage, you should now have the data on an Excel
worksheet. (If you wish, you
can name this worksheet LAB0A.DAT by right clicking on its
tab at the bottom and choosing
the rename option.)
This same procedure applies to all assignments. Follow the
above procedure even with
multi-column tables.
If you wish to add a label in cell 1 of column A, move the
cursor to that cell and then
choose Insert->Cells and click OK (or press enter) on the Insert
dialog box to move all cells
down. [Excel 2013: Home Tab – Insert] This will allow you to
type a label in cell A1.
The following procedure will allow you to calculate some
summary statistics for data in
a column. It is good practice to look at summary statistics
before proceeding with further
analysis. This will alert you to the number of data points, their
average value, and a few other
informative characteristics about the data.
Data Analysis… to pop-up Data Analysis window [Excel 2013:
Data Tab – Data
Analysis over on far right side] (SEE NOTE BELOW if Data
Analysis is missing.)
double click on Descriptive statistics
With cursor flashing in Input Range: box, click on column letter
for column with
data
If you have entered a name in the first column, click Labels in
first row.
Click in box preceding Summary statistics, and click on OK or
press the enter key.
EXCEL will create a new worksheet with the summary
statistics. You should note such key
characteristics as count, minimum, mean and maximum. At
more advanced stages, you may
choose to think about kurtosis, skewness and standard deviation
or standard error.
If you wish, you can delete this temporary worksheet by right-
clicking on its tab and
choosing the delete option.
The same basic procedures will be used in later assignments to
enter data from a file that
contains several columns.
EXAMPLE 1 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 61
NOTE: The Analysis ToolPak is a Microsoft Excel add-in
program that is available when
you install Microsoft Office or Excel. To use it in Excel,
however, you need to load it first.
1. Click the File tab, and then click Options.
2. Click Add-Ins, and then in the Manage box, select Excel
Add-ins.
3. Click Go.
4. In the Add-Ins available box, select the Analysis ToolPak
check box, and then click
OK.
a. If Analysis ToolPak is not listed in the Add-Ins available
box, click Browse to
locate it.
b. If you get prompted that the Analysis ToolPak is not
currently installed on your
computer, click Yes to install it.
5. After you load the Analysis ToolPak, the Data Analysis
command is available in the
Analysis group on the Data tab.
EXCEL EXAMPLES EXAMPLE 4
62 INTRODUCTORY STATISTICS LABORATORY
Example 2: Preparing a histogram of data
A histogram is a graphical summary of numerical data. In this
example, data stored in
EXCEL worksheet column A is summarized in a histogram.
Before calculating frequencies in
different groups, you must define the classes. In EXCEL, the
classes are called "bins". For this
example, suppose that the data to be summarized varies from 21
to 28 and you wish to group the
observations into "bins" each with one unit for a class width.
The first bin will include all data
points with values up to and including 22, the second bin will
include values greater than 22 up
to and including 23 and so on. You only need to indicate the
upper boundary for each bin. For
this example, use 22, 23, 24, 25, 26, 27, and 28. These values
need to be entered into a new
column, say column B. You can type the numbers into the first
seven rows of column B.
To actually draw the histogram, you must first calculate
frequencies of data in each bin.
Choose Data analysis [Excel 2013: Data Tab – Data Analysis]
and select Histogram
In the histogram dialog box,
move cursor to Input range and click on top of column A,
move cursor to Bin range and click on top of column B,
if you have a labels in A1 and B1, check the Labels option,
and
click on OK or press the enter key.
EXCEL is very slow at this calculation, so be patient! In a few
seconds, you should get a
new sheet in the workbook that contains the upper ends of the
bin and the frequencies) of
observations in each bin. In this example, the results look like
this
Bin Frequency
22 9
23 6
24 6
25 5
26 7
27 2
28 1
More 0
At this point, you should have a numerical representation of a
histogram. Most
histograms are presented in graphical form. To develop a bar
graph to show the histogram,
proceed as follows. Note that Excel creates a bar graph not a
true histogram as there are spaces
between the bars. A true histogram has no spaces between the
bars.
Highlight the data, including titles, using the cursor.
Insert a chart. [Excel 2013: Insert Tab – in Charts choose Insert
Column Chart – select
2D (first choice of the options)]
Excel will automatically produce a chart.
EXAMPLE 2 AND 3 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 63
A histogram gives the frequency (number of observations) in
each of various classes. In
EXCEL, the classes are defined by giving the upper boundaries
of each class (bin).
The + sign allows you to format your chart’s elements. You
can click on the boxes to
include whatever elements you feel are appropriate for your
chart. If you want to edit the Axis
Title, you can click into that box and type a new axis title.
The paint brush allows you to choose the style and color of your
chart.
This icon allows you to select your data source and make
changes instead of having to
highlight your excel cells that hold the data and start the chart
all over again.
EXCEL EXAMPLES EXAMPLE 4
64 INTRODUCTORY STATISTICS LABORATORY
How to make a true histogram: To get rid of the gaps between
the bars and make a true
histogram, right click on any bar and Excel comes up with a
window with Format Data Series.
Choose Format Data Series (see above arrow).
On this window you will need to choose the three column
symbol (see above arrow) and then
Excel opens Series Options and at the bottom is Gap Width.
Change the gap width to zero and
you will have a true histogram.
EXAMPLE 2 AND 3 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 65
You can change the outline of your bars to a different color to
have them appear separated by
clicking the Outline (see arrow below) and changing the color
to black or white.
The resulting chart looks like this (remember to make changes
to your titles according to best
graphing practices, not shown in this chart):
EXCEL EXAMPLES EXAMPLE 4
66 INTRODUCTORY STATISTICS LABORATORY
Example 3: Entering data from the keyboard into the EXCEL
worksheet
Occasionally, you will be required to enter data or intermediate
results directly into the
EXCEL worksheet. You merely type the data into the cells
where you wish to store the
information.
EXAMPLE 5 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 67
Example 4: Calculating relative frequencies
To calculate relative frequencies in each of several classes,
you must divide each
frequency of a class by the sum of all the frequencies. Consider
data summarized in three classes.
Class Frequency
1 5
2 10
3 5
Total 20
The relative frequency for Class 1 is 5/20 = 0.25, for Class
2 is 10/20 = 0.50, and for
Class 3 is 5/20 = 0.25. Note that the relative frequencies must
always sum to 1.0 (within
rounding error). Thus, 0.25 + 0.50 + 0.25 = 1.0.
If the frequencies are stored in EXCEL Worksheet column C,
you can calculate relative
frequencies and store them in another column in the following
way. Suppose 5 is in cell C1, 10
in cell C2 and 5 in cell C3. Move the cursor to cell D1, type ‘=
C1/SUM($C$1:$C$3)’ in the
formula bar, and press enter. Don’t forget the = at the beginning
of your equation otherwise it
will be entered only as text and will calculate for you. You
should see the value 0.25 in cell D1.
To calculate the remaining relative frequencies, just copy the
formula in cell D1 to cells D2 and
D3. Note that, as the formula is copied, C1 will change to C2
and then to C3, but $C$1:$C$3
will remain constant.
An alternative would be to first calculate the sum (20) and
store in a cell that could then
be used to calculate all relative frequencies. For example, enter
the formula ‘=SUM(C1:C3)’ in
cell C4. Now, use the formula ‘= C1/$C$4’ in cell D1. Again,
copy cell D1 to cells D2 and D3.
You should also confirm that the relative frequencies sum to
1.0.
Use the formula ‘= SUM(D1:D3)’ in cell D4. You can also use
the Σ in the tool bar and Excel
will help you calculate a sum for that column. [Excel
2013:Home Tab – Σ ]
EXCEL EXAMPLES EXAMPLE 5
68 INTRODUCTORY STATISTICS LABORATORY
Example 5: Leaving EXCEL and grading your assignment.
When you have completed an assignment and have recorded
numerical answers to each
of the questions in the INTRODUCTORY STATISTICSD
LABORAOTRY, you should try your
answers in ISLeX.
In submitting your answers to the Introductory Statistics
Laboratory Program (ISLeX),
you are required to use numbers for all answers. Place the
cursor in the appropriate box and type
in your answer. Use the mouse or the tab key to move to the
next box. If you press enter, it will
go right to grading. (You have the option to go back again, so
DO NOT accept unless you are
completely finished.) Click on the “Check my answers” box to
grade your assignment.
At the end of the assignment, your grade will be displayed on
the screen and you will be
given to option of accepting the grade or repeating the
assignment. Once you accept your grade,
you will not be able to repeat the assignment. You are
encouraged to repeat the assignment until
you are satisfied with your effort. You must achieve 80 or
higher to move onto the next
assignment.
EXAMPLE 6 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 69
Example 6: How to prepare a stem-and-leaf diagram
A stem-and-leaf diagram combines graphical and numerical
methods to summarize data.
Unfortunately, EXCEL does not have a command for preparing
a stem-and-leaf diagram.
Suppose you wish to develop a stem-and-leaf diagram of the
following data.
25.6 26.0 25.3 27.2 23.6 26.3 25.4 23.8 21.1 23.4 23.9
23.8 26.0 20.0 22.5 28.0 26.7
24.8 25.1 24.9 26.6 24.9 25.0 27.5 20.6 24.0 22.1 20.0
21.8 24.7 21.7 25.2 27.1 24.8
25.8 26.9 25.6
Enter (or read) the data into a column in EXCEL and then sort
the data from lowest to
highest use the Data->Sort command. [Excel 2013: Data Tab –
Sort] The results follow.
20.0
20.0
20.6
21.1
21.7
21.8
22.1
22.5
23.4
23.6
23.8
23.8
23.9
24.0
24.4
24.7
24.8
24.8
24.9
24.9
25.0
25.1
25.2
25.3
25.4
25.6
25.6
25.8
26.0
26.0
26.3
26.6
26.7
26.9
27.1
27.2
27.5
28.0
If you decide to have leaf units of
0.1, the successive stem units will
be 10 × 0.1 = 1.0 higher than the
previous one. Start by writing the
stem units in a column followed by
a vertical bar.
20 | 20 | 0 0 6
21 | Then, go 21 | 1 7 8
22 | down the data 22 | 1 5
23 | and write the 23 | 4 6 8 8 9
24 | last digit of 24 | 0 4 7 8 8
9 9
25 | each number 25 | 0 1 2 3 4
6 6 8
26 | in the leaf 26 | 0 0 3 6 7
9
27 | position 27 | 1 2 5
28 | 28 | 0
And, finally, add a title and leaf
unit to complete the job.
Stem-and-leaf diagram of example
data.
Leaf unit = 0.1
20 | 0 0 6
21 | 1 7 8
22 | 1 5
23 | 4 6 8 8 9
24 | 0 4 7 8 8 9 9
25 | 0 1 2 3 4 6 6 8
26 | 0 0 3 6 7 9
27 | 1 2 5
28 | 0
EXCEL EXAMPLES EXAMPLE 7
70 INTRODUCTORY STATISTICS LABORATORY
The stem-and-leaf diagram consists of two columns of
numbers. The first column is
called the stem. The second column contains the leaves; one
leaf for each data point. The value
of any number in a leaf position is indicated by the leaf unit,
0.1 in this example. Any number in
a leaf position represents that number multiplied by the leaf unit
0.1. In the first row of the
diagram, the 0 stands for 0 × 0.1 = 0.0, and the 6 stands for 6 ×
0.1 = 0.6.
The value of the numbers in the stem position are 10 × leaf
unit, i.e. 1 in this case. In the
last row, the 28 for 28 × 1 = 28. The final value of any leaf is
calculated by adding the leaf value
to the corresponding stem value. The 0 in the last row
represents the number 0 × 0.1 + 28 × 1 =
28.0. The third leaf in stem position 21 represents 8 × 0.1 + 21
× 1 = 21.8.
EXAMPLE 7 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 71
Example 7: How to draw a frequency (or relative frequency)
polygon.
In this example, midpoints for Samples 1 and 2 are stored in
column A, and relative
frequencies from Sample 1 are stored in column B and relative
frequencies from Sample 2 are
stored in column C of an EXCEL worksheet. In order to
compare the two samples, it will be
useful to plot relative frequencies for both samples on the same
graph.
Here are columns A, B, and C of an example worksheet.
20 0.0357 0.0000
21 0.1429 0.0270
22 0.2143 0.1081
23 0.1786 0.1081
24 0.2500 0.1622
25 0.1071 0.2162
26 0.0714 0.1892
27 0.0000 0.1081
28 0.0000 0.0811
[Excel 2013: highlight the data. Insert Tab – Charts and Choose
SCATTER, then click 2D
‘Straight Line with Markers’]. The resulting graph will look
like:
However, you will want to edit the graph. Click the to edit the
chart. Choose Axes and
move the cursor over until the little right arrow appears, then
choose More Options and then
Click on the histogram picture.
EXCEL EXAMPLES EXAMPLE 7
72 INTRODUCTORY STATISTICS LABORATORY
The resulting graph will now have better representation.
Remember to label your chart title and
axis appropriately (not shown in chart below).
You can now edit the Axis. Change
the minimum Bounds to 19 and the
maximum Bounds to 29. Then
change the Major Units to 1.0.
EXAMPLE 9 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 73
Example 8: How to use EXCEL to calculate various numbers
that summarize the characteristics
of a population (or sample).
In this example, the Function command is used to calculate
various constant values to be
stored in cells in the worksheet. [Excel 2013: Formulas Tab –
Insert Function (fx)]. There are
many different functions that can be used. Some refer to whole
columns, some to individual
observations. The following examples demonstrate a few of the
uses of functions in EXCEL.
You can type the function into any particular cell by first
typing an equal sign in the
formula bar and then typing the name of the function along with
its required arguments. As an
alternative, you can use the [Excel 2013: Formulas Tab – Insert
Function (fx)] to choose a
function and have EXCEL prompt you for necessary arguments.
In this course, you would
probably choose Function category = Statistical and then double
click on the Function name
for the function you want to use.
For this example, consider that there are 22 observations
stored in column A.
a) Determine the number of data points in the population.
=COUNT(A:A)
b) Calculate the mean (= sum of all observations divided by
number of observations)
=SUM(A1:A22)/COUNT(A1:A22)
=AVERAGE(A1:A22)
c) Determine the minimum in this population (the first value in
a magnitude array). If the data
have been sorted from smallest to largest, the smallest
(minimum) value will be in the first
position, cell A1, and the largest will be located in the last
position, cell A22 in this example.
=MIN(A1:A22)
d) Determine the maximum in this population (the last value in
a magnitude array).
=MAX(A1:A22)
e) Determine the median (the middle value in a magnitude
array).
For an odd number of data points, the median is the middle
value. The middle value of n data
points if n is even is given by the average of the values of the
two middle terms.
=MEDIAN(A1:A22)
f) Determine the first quartile.
The first quartile is that value below which one-quarter of the
observations lie. Because there is
no generally accepted definition of quartile, different programs
gives different results for
quartiles. ISLeX is programmed to calculate quartiles in the
same way that Excel uses.
=QUARTILE(A1:A22,1)
EXCEL EXAMPLES EXAMPLE 8
74 INTRODUCTORY STATISTICS LABORATORY
g) Determine the third quartile.
The third quartile is that value below which three-quarters of
the observations lie.
=QUARTILE(A1:A22,3)
NOTE: The median is sometimes referred to as the second
quartile (Q2) because it is the
value below which 2/4 of the values lie. The first quartile (Q1),
the median (Q2) and the third
quartile (Q3) divide the data values into four groups. We know
that 1/4 of the data values are less
than Q1, 1/4 are between Q1 and Q2, 1/4 are between Q2 and
Q3, and 1/4 are greater than Q3.
For some purposes, it may be sufficient to summarize a large
data set by presenting these three
values.
h) Determine the standard deviation.
The standard deviation is the square root of the variance, and
the variance is the average of the
squares of differences between individual data points and the
overall mean. Remember that the
standard deviation of a population is calculated differently than
a standard deviation of a sample.
It is important to know if you have a sample or a population.
=STDEV.S(A1:A22) for a sample
=STDEV.P(A1:A22) for a population
23
20 22 Uses =COUNT(A1:A22) to count number of observations
29 22.77273 Uses =SUM(A1:A22)/COUNT(A1:A22) to
calculate average
29 16 Uses =MIN(A1:A22) to calculate the minimum value
27 30 Uses =MAX(A1:A22) to calculate maximum value
23 23 Uses =MEDIAN(A1:A22) to calculate median value
17 19 Uses =QUARTILE(A1:A22,1) to calculate first quartile
17 27.75 Uses =QUARTILE(A1,A22,3) to calculate third
quartile
22 4.669372 Uses =STDEV.S(A1:A22) to calculate standard
deviation for a sample
23
25
21
21
18
16
21
24
19
27
19
25
24
EXAMPLE 9 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 75
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL
The Descriptive statistics command of EXCEL will
automatically calculate most of the
summary statistics required of data in a single column [Excel
2013 Data Tab – Data Analysis and
then choose Descriptive Statistics]. By listing several columns,
the
Descriptive statistics command can be applied to several
columns simultaneously.
Consider that data has been stored in column A. To calculate
summary statistics for this
column, follow these steps.
Excel 2013: Data Tab and choose Data Analysis (on right)
Double click on Descriptive statistics in the Data Analysis
dialog box
Set Input range to = A:A (or just highlight the data with the
cursor)
Click on Summary statistics
Click on OK
Your results will be on a new worksheet and will look like this
(move column borders to
see full text).
Column1
Mean 23.90909
Standard Error 1.038041
Median 23.5
Mode #NUM!
Standard Deviation 4.868843
Sample Variance 23.70563
Kurtosis -1.32235
Skewness -0.11628
Range 14
Minimum 16
Maximum 30
Sum 526
Count 22
This approach gives many of the summary statistics described
in the preceding example
as well as several others. The #NUM! Message means only that
there are several possible values
for the mode in this data set.
EXCEL EXAMPLES EXAMPLE 10
76 INTRODUCTORY STATISTICS LABORATORY
Example 10: Further uses of the EXCEL->As a calculator
EXCEL can also be used as a calculator.
The following statements would allow you to calculate 5.6-3.2
= 2.4 and store it in a cell
in the EXCEL worksheet. It is important to start your equation
with an “=” otherwise the
calculator function is not enabled .
=5.6-3.2
If 5.6 was stored in cell D3 and 3.2 was stored in cell D4, you
could also use
=D3-D4
The second option may be useful if 5.6 and 3.2 may be used in
other calculations.
This same scheme may be used for all elementary mathematical
operations.
Use - to indicate subtraction [ = 5.6 - 3.2]
Use + to indicate addition [ = 5.6 + 3.2]
Use * to indicated multiplication [ = 5.6 * 3.2]
Use / to indicate division [ = 5.6 / 3.2]
Use POWER to indicate exponentiation [ = POWER(5.6, 3.2)]
EXAMPLE 12 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 77
Example 11: Calculations with a discrete probability
distribution
In this example, EXCEL is used to answer various questions
dealing with a discrete
probability distribution. EXCEL worksheet column A contains
the event names and column B
contains the corresponding probabilities. In PL SC 314, we will
discuss only events that
represent counts; e.g. number of seeds germinated, number of
red blood cells, number of live
plantlets, number of microbial colonies, et cetera.
0 0.018316
1 0.073263
2 0.146525
3 0.195367
4 0.195367
5 0.156293
6 0.104196
7 0.059540
8 0.029770
9 0.013231
10 0.005292
11 0.001925
12 0.000642
13 0.000197
14 0.000056
15 0.000015
16 0.000004
17 0.000001
18 0.000000
19 0.000000
20 0.000000
Suppose one were interested in the probability of exactly 10 in
this distribution. This can
be read directly from column B in the row position
corresponding to A = 10. Thus, P(X = 10) =
0.005292.
A powerful way of calculating the probabilities of compound
events is to sum parts of the
probability table.
Suppose you want the probability of less than 13. You must add
the probabilities for 0, 1,
. . 12. Those probabilities are in cells B1:B13. To calculate the
probability, you could move to
cell C1 and enter the formula = SUM(B1:B13). In this example,
the probability of less than 13 is
0.99973 or 99.973 percent.
Note that terms such as 'less than 13' and 'fewer than 13'
include all possible values from
the smallest up to, but excluding, 13.
Similarly, 'more than 13' or 'greater than 13' would not include
13. Moreover, the term
'between 5 and 10' would include 6, 7, 8 and 9, and would
exclude 5 and 10.
EXCEL EXAMPLES EXAMPLE 11
78 INTRODUCTORY STATISTICS LABORATORY
However, ‘no more than 13’ would include 13. ‘At least 13’
would include 13 and all
higher values.
The following three examples show other questions that can be
dealt with in this general
manner.
a) P[10 < X < 21] = ?
= P(11) + P(12) + P(13) + P(14) + … + P(20). P(11) is listed in
row 12 of column
B while P(20) is listed in row 21 of column B.
= SUM(B12:B21) = 0.0028398
b) P[(X < 6) or (X > 14)] = ?
In this example, calculate P(0) + P(1) + … +P(5) + P(15) +
P(16) + … + P(20)
= SUM(B1:B6)+SUM(B16:B21) = 0.78515
c) P[X > 0] = ?
= SUM(B2:B21) = 0.98168
or = 1 - B1 = 0.98168
In order to calculate the mean of a probability distribution, one
must use the methods for
calculating the mean of a relative frequency distribution. The
mean is equal to the sum of the
products of each value multiplied by its corresponding
probability. In the example table, the hand
calculation would require 0(0.018316) + 1(0.073263) + . . for
21 terms. In EXCEL, the
following formula will operate on whole columns and
calculation of the mean is simple.
= SUMPRODUCT(A1:A21*B1:B21) = 4.00
For this probability distribution, one would conclude that the
average value in a great
many samples from this distribution will be 4.0.
The variance of a probability distribution can be most easily
calculated as the average of
the squares of the values minus the square of the average. The
following EXCEL formula will
calculate the variance. The mean (see above) must have
previously been calculated and stored in
cell D7.
= SUMPRODUCT(A1:A21*A1:A21*B1:B21)-D7*D7 = 4.000
In this example, the variance of the distribution (4.0) is
identical to the mean. This is a
characteristic of the 'Poisson' probability distribution (that deals
with the random occurrence of
rare events). Such a relationship will not occur with other
distributions.
EXAMPLE 12 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 79
Example 12: Reading and storing constants for further use.
In some assignments, you are required to read numerical values
from a file and then use
them to calculate answers to specific questions. Consider the
situation where the mean and
standard deviation are stored in columns 1 and 2 of the data file
called 'Table R'.
1. Copy the table using the method described in Example 1.
2. Observe the two columns to see the mean and standard
deviation.
3. Suppose the data is loaded into cells B1 and C1 of the
EXCEL worksheet and that you are told
the first value is the mean and the second is the standard
deviation. Use the value stored in B1 as
the mean and the one stored in C1 as the standard deviation in
subsequent computations.
Suppose the values were 100.4 and 7.89.
You could calculate the value of the mean plus two standard
deviations by using this
formula in a cell.
= B1 + 2*C1
EXCEL EXAMPLES EXAMPLE 13
80 INTRODUCTORY STATISTICS LABORATORY
Example 13: Using the EXCEL to answer questions about
continuous distributions.
Consider that X is a continuous variable with a mean whose
value is stored in EXCEL
cell A1 and a standard deviation whose value is stored in B1.
For example, if you are given that
the mean is 86.7 and the standard deviation is 4.81, the
following calculations will work if you
first store 86.7 in A1 and 4.81 in B1.
In the assignments, you will be dealing only with a continuous
distribution known as a
normal distribution. When using the NORM.DIST function to
calculate a probability, it will be
necessary to indicate i) the value below which you require the
probability, ii) the mean of the
distribution, iii) the standard deviation of the distribution, and
iv) TRUE to indicate that you
want a cumulative probability [P(X < Value)].
In these examples, consider that X is an observation from a
normal distribution. The
NORM.DIST function will give the probability that a random
observation will be less than some
specified value V, i.e. P[X < V].
To calculate P[X < 90] use
=NORM.DIST(90,86.7,4.81,true) = 0.75367
or =NORM.DIST(90,A1,B2,true) = 0.75367
if mean in A1 and standard deviation in B1.
By choosing NORM.DIST, you will be prompted for the four
arguments. [Excel 2013:
Formula Tab – Insert Function], scroll to choose NORM.DIST
off of statistical list]. Choose
your x value, type in or use the cursor to select your mean, type
in or use the cursor to select your
standard deviation and type TRUE into the Cummulative (for
continuous data).
The following examples should help to convert questions into
mathematical expression and then
into EXCEL commands.
What is the probability that a continuous normal variable X will
be less than 75?
P[X < 75] = ?
=NORM.DIST(75,A1,B1,TRUE)
EXAMPLE 13 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 81
What is the probability that a continuous normal variable X will
exceed 75?
P[X > 75] = 1 - P[X < 75] = ?
=1-NORM.DIST(75,A1,B1,TRUE)
What is the probability that a random observation from a normal
distribution will be between 70
and 80?
P[70 < X < 80] = P[X < 80] - P[X < 70] = ?
=NORM.DIST(80,A1,B1,TRUE)-
NORM.DIST(70,A1,B1,TRUE)
What proportion of random observations from a normal
distribution should lie within two
standard deviations of the mean?
P[mean - 2*stdev < X < mean + 2*stdev]
= P[X < mean + 2*stdev] - P[X < mean - 2*stdev] = ?
=NORM.DIST(A1+2*B1,A1,B1,TRUE)-NORM.DIST(A1-
*B1,A1,B1,TRUE)
What percentage of observation from a normal population
should exceed the mean by 1.96
standard deviations?
100 × P[X > mean + 1.96*stdev]
= 100 × (1 - P[X < mean + 1.96*stdev]) = ?
=100*(1-NORM.DIST(A1+1.96*B1,A1,B1,TRUE))
Results of the above calculations are
a) P{X<90} = 0.75367
b) P{X<75} = 0.00750
c) P{X>75}= 0.99250
d) P{70<X<80}= 0.08156
e) P{mean-2*sd < X < mean + 2 sd} = 0.95450
f) 100*P(X > mean + 1.96 * sd) = 2.49978
EXCEL EXAMPLES EXAMPLE 15
82 INTRODUCTORY STATISTICS LABORATORY
Example 14: How to calculate a chi-squared statistic for a
'goodness-of-fit' test.
Consider this example from Steel and Torrie (1981). A
researcher observed 1178 barley
plants in class 1 (green, non-two-row), 291 in class 2 (green,
two-row), 273 in class 3 (chlorina,
non-two-row), and 156 in class 4 (chlorina, two-row). Test the
hypothesis that distribution in the
four classes is in the ratio of 9 : 3 : 3 : 1.
Step 1. Store the observed frequencies in one column of the
EXCEL worksheet. To
calculate expected frequencies, first convert the numbers in the
expected ratio to proportions
(relative frequencies) by dividing each by 16. Then, multiply
the proportions 9/16, 3/16, 3/16 and
1/16 by the total number of barley plants in order to calculate
expected frequencies.
If the observed frequencies (1178, 291, 273 and 156) are stored
in column A, the
following four formulas should be entered in cells B1, B2, B3
and B4.
Cell B1 =9/16*SUM(A1:A4)
Cell B2 =3/16*SUM(A1:A4)
Cell B3 =3/16*SUM(A1:A4)
Cell B4 =1/16*SUM(A1:A4)
The will give the following table where the first column
contains the observed
frequencies and the second column the frequencies expected if
the observations are distributed
into the four classes in a ratio of 9 : 3 : 3 : 1.
1178 1067.625
291 355.875
273 355.875
156 118.625
Step 2. Calculate the Chi-squared statistics as the sum of (O-
E)2/E.
Enter the formula =(A1-B1)*(A1-B1)/B1 into cell C1 and copy
it into cells C2,
C3 and C4 (Note that the 1 will change to 2, 3 or 4 as you copy
the formula into each successive
cell). Finally, enter the formula =SUM(C1:C4) into cell C6 to
calculate the chi-squared statistic.
The worksheet should now look like this.
1178 1067.625 11.41097
291 355.875 11.82653
273 355.875 19.29966
156 118.625 11.77568
54.31284
Note that the sum of (O-E) should be zero.
The sum of (O-E)2/E [in cell C6] gives the required chi-
squared statistic, 54.313.
EXAMPLE 14 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 83
Step 3. Compare the calculated statistic to the appropriate
critical value. If the statistic exceeds
the critical value, reject the hypothesis that the observed
frequencies show a good fit to a 9 : 3 : 3
: 1 ratio.
In this example, the four expected frequencies are required to
sum to 1898. Because of this one
restriction, the chi-squared statistic has 4-1 = 3 degrees of
freedom. People will often choose
critical values that correspond to a 5 % significance level (α =
0.05). You may read the critical
value for the chi-squared distribution with 3 degrees of freedom
and a 5% significance level
directly from a table in a statistical textbook (= 7.82) or use the
following EXCEL commands.
To calculate the critical value, one uses α = 0.05 and df = 3 as
the arguments for the
CHISQ.INV function. [Excel 2013: Formula Tab – Insert
Function], scroll to choose
CHISQ.INV off of statistical list, choosing the RT for right tail.
=CHISQ.INV.RT(0.05,3) = 7.814725
Rather than comparing the calculated test-statistic (54.31284) to
the critical value 7.82 and
concluding that the observed frequencies do not fit a 9 : 3 : 3 : 1
ratio, you can also calculate the
p-value, the probability of such a large chi-squared statistic if
the null hypothesis is really true. If
the calculated test-statistic is in cell C6, use [Excel 2013:
Formula Tab – Insert Function], scroll
to choose CHISQ.DIST off of statistical list, choosing the RT
for right tail.
=CHISQ.DIST.RT(C6,3) = 0.00000000000096
With the p-value formula written in cell D6, some headings
typed in cells C5 and D5, and
some formatting of cell D6, the worksheet now look like this.
1178 1067.625 11.41097
291 355.875 11.82653
273 355.875 19.29966
156 118.625 11.77568
Chi-
square
p-value
54.31284 0.0000
Conclusion: Since the calculated value (54.313) exceeds the
critical value (7.8147), reject
the hypothesis of a good fit to a 9 : 3 : 3 : 1 ratio. Also we
reject the hypothesis that the observed
frequencies show a good fit to a 9 : 3 : 3 : 1 ratio if the
significance level (α = .05) is greater than
the p-value. In this case, we reject because .05 > than the p-
value of .0000.
EXCEL EXAMPLES EXAMPLE 15
84 INTRODUCTORY STATISTICS LABORATORY
Example 15: How to calculate a confidence interval for one
mean when σ is known.
Consider an example where the data are stored in worksheet
column A and you are required to
calculate a 90 % confidence interval for the mean of the data
and σ is given.
Step 1. Calculate the mean of the data, as well as the number of
observations.
Let's store these intermediate results in column D along with
identification in column C.
Type 'Mean' in cell C1 and the formula =AVERAGE(A:A) in
cell D1
Type ‘n’ in cell C2 and the formula =COUNT(A:A) in cell D2
Step 2. The standard deviation of the population is given and is
4.0. This value can be
typed into D3 with a title of ‘St.DevP.’ in C3.
a) Since 90 = 100(1 - α), α = 0.10 and α/2 = 0.05. Determine the
critical value (CV) of
the standard normal distribution corresponding to α/2 = 0.05
from a table or using
EXCEL as follows (CV = 1.645). [Excel 2013: Formula Tab –
Insert Function], scroll to
choose NORM.INV off of statistical list
=NORM.INV(0.95,0,1) = 1.644853
b) Calculate the margin of error as CV multiplied by standard
deviation of the population
and divided by the square root of the sample size.
Type 'E =' in cell C4, and the formula
=NORM.INV(0.95,0,1)*D3/SQRT(D2) in cell D4
c) Calculate the lower and upper limits as mean ± margin of
error.
Type 'LL =' in cell C5, and the formula =D1-D4 in cell D5.
Type 'UL =' in cell C6, and the formula =D1+D4 in cell D6.
29.6 Mean = 30.7285
30.7 St. dev. = 4.0
31.4 n = 35
31.1 E = 1.1122
25.5 LL = 29.6163
34.6 UL = 31.8407
34
31
34
EXAMPLE 16 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 85
Example 16: How to calculate a confidence interval for one
mean when σ is NOT known.
For this example, consider that the sample data are stored in
column A and that you are required
to calculate a 95 % confidence interval for the mean of the
population from which the sample
was taken. This example is very similar to Example 15. There
are two differences: first you will
have to use an estimate of the population standard deviation
because σ is not given. Secondly,
we will use the t distribution to find our critical value. We will
use the function T.INV, rather
than NORM.INV, to calculate the critical value for the margin
of error. Note that T.INV uses
α/2 in the LEFT tail; therefore, will always give the negative
left tail critical value for α/2.
Step 1. Calculate n, MEAN and STDEV.S
Let's store these intermediate results in column D along with
identification in column C.
Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A)
in cell D1
Type 'St. Dev. =' in cell C2, and the formula =STDEV.S(A:A)
in cell D2
Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell
D3
Step 2. Because we have sample data for the standard deviation,
we will determine the
α/2 critical value (CV) from the t-distribution with 24 - 1 = 23
degrees of freedom. For a
95% confidence interval, α/2 = 0.025. Read the value from a
table of critical values for
the t-distribution (= 2.069) or calculate it using EXCEL T.INV
function. [Excel 2013:
Formula Tab – Insert Function, scroll to choose T.INV off of
statistical list]. Note that
for questions which have sample sizes of 76 or larger, we must
use the T.INV
function to get the correct CV (ISLeX will mark an
approximation as incorrect).
=T.INV(0.025,23) = -2.068655
Step 3. Calculate margin of error = E = CV * STDEV.S /
SQRT(n)
Type 'E = ' in cell C4, and the formula =T.INV(0.025,D3-
1)*D2/SQRT(D3) in cell D4
Step 4. Calculate lower limit = mean – margin of error and
upper limit = mean + margin
of error.
Type 'LL =' in cell C5, and the formula =D1+D4 in cell D5 (it is
+ because the E is
calculated using the critical value in the left tail and is a
negative number).
Type 'UL =' in cell C6, and the formula =D1-D4 in cell D6. (it
is - because the E is
calculated using the critical value in the left tail and is a
negative number).
29.6 Mean = 30.9875
30.7 St. dev. = 2.788465
31.4 n = 24
31.1 E = -­‐1.17747
25.5 LL = 29.8100
34.6 UL = 32.1649
34
31
EXCEL EXAMPLES EXAMPLE 17
86 INTRODUCTORY STATISTICS LABORATORY
Example 17: How to calculate a confidence interval for a
proportion
A proportion is the number of observations in one class
expressed as a proportion of the
total number of observations.
Consider that there are n = 978 observations of which 123 are
in the first class and the
remaining 855 are in the second class. Further, consider the
proportion p̂ = 123/978 = 0.12577
that are in the first class.
This example shows how to calculate a 95 % confidence
interval for the proportion that
are in the first class in the population from which these 978
observations were randomly taken.
The following steps can be used to calculate a confidence
interval for a proportion.
a) Calculate the estimated standard error of the proportion =
n
qpsp
ˆˆ
ˆ =
In this example, let's use columns A and B of a new worksheet
for the calculations.
Type ' p̂ = ' in cell A1, and the formula =123/978 in cell B1
Type 'st.dev. = ' in cell A2, and the formula = SQRT(B1*(1-
B1)/978) in cell B2
b) Get the critical value for a standard normal (z) distribution
for confidence level
1 - α = 0.95 or α/2 = 0.025. Use NORM.INV with 1 - α/2 =
0.975. Calculate the margin of error
by multiplying the critical value by the standard error.
Type 'CV =' in cell A3, and the formula
=NORM.INV(0.975,0,1) in cell B3.
Type 'E =' in cell A4, and the formula =B2*B3 in cell B4.
c) Calculate lower limit = estimate – margin of error
and upper limit = estimate + margin of error.
Type 'LL = ' in cell A5, and the formula =B1-B4 in cell B5.
Type 'UL =' in cell A6, and the formula =B1+B4 in cell B6.
p̂ = 0.125767
st.dev. = 0.010603
cv = 1.959961
E = 0.020781
LL = 0.104985
UL = 0.146548
EXAMPLE 18 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 87
Example 18: How to calculate a test of hypothesis concerning
one mean when σ is NOT known.
In tests of hypothesis, we are interested in evaluating
assertions about population
parameters in light of the evidence we have in a sample taken
from that population.
In this example, we look at hypotheses concerning the mean of
a population.
Step 1. Make an assertion, the null hypothesis, that the mean is
equal to some value.
Consider the possible alternative(s).
H0 : population mean = 3
H1 : population mean > 3 {one-tailed (right) alternative}
or population mean < 3 {one-tailed (left) alternative}
or population mean ≠ 3 {two-tailed alternative}
In this example, consider H0: mean = 3 and H1 : mean > 3. This
will be a right-tailed test.
Step 2. Calculate the sample mean, size and standard deviation.
Suppose that the data is stored in column A of an EXCEL
worksheet. Let's use columns C and D
to store identification and intermediate and final results.
Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A)
in cell D1.
Type 's =' in cell C2, and the formula =STDEV.S(A:A) in cell
D2.
Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell
D3.
Step 3. Calculate the test statistic t as:
x
calculated s
x
t )hypothesis Null (from
µ−
=
and n
ssx =
Type 't<calc> = ' in cell C4, and the formula =(D1-
3)/(D2/SQRT(D3)) in cell D4.
Step 4. Calculate the critical value of the t-distribution for the
degrees of freedom appropriate for
this sample, for the desired significance level (α), and for the
appropriate alternative hypothesis.
Consider a right-tailed test at α = 0.05.
Type 't<table> =' in cell C5, and the formula =T.INV(0.95,D3-
1) in cell D5.
Type 'p-value =' in cell C6, and the formula
=T.DIST.RT(D4,D3-1,1) in cell D6.
Mean = 4.88
s = 0.28
n = 22
t<calc> = 1.431491
t<table> = 1.720744
p-value = 0.083503
Note that T.INV(0.95,D3-1) is 0.95 because the alternative
hypothesis, in this case, is in the right tail.
Also, T.DIST.RT(D4,D3-1,1) is .RT because the alternative
hypothesis is in the right tail
EXCEL EXAMPLES EXAMPLE 18
88 INTRODUCTORY STATISTICS LABORATORY
Step 5. Compare the calculated test statistic to the critical
value and decide whether or not to
reject the null hypothesis that the population mean equals the
specified value.
In this case, 1.4431491 is less than 1.720744 and we do not
reject the null hypothesis that
the population mean is equal to 3. Or p-value = 0.83503 is
greater than α = 0.05, so we do not
reject the null hypothesis.
What if the alternative hypothesis was less than 3?
In the case of a one-tailed (left alternative):
H0 : population mean = 3
H1 : population mean < 3 {one-tailed (left) alternative}
If the alternative hypothesis is that the mean is really less than
3, we would compare the
test statistic to a critical value of -1.7208 (the negative of that
used for the one-tailed upper
alternative). We could find the critical value for the left tail by
using T.INV(0.05,D3-1).
We would reject the null hypothesis only if the test statistic was
more negative than the lower
critical value. In this example, 1.4431491 is not less than -
1.7208 and we do not reject the null
hypothesis. To find the p-value, for the left tail, it would be
T.DIST(-1.431491,22-1, 1). Or p-
value = 0.83503 is greater than α = 0.05, so we do not reject the
null hypothesis.
What if the alternative hypothesis was not equal to 3?
iii) In the case of a two-tailed alternative:
H0 : population mean = 3
H1 : population mean ≠ 3 {two-tailed alternative}
For the two-tailed alternative, we need two critical values (one
for each tail). Using T.INV.2T
with α = 0.05 will give the positive critical value for a two-
tailed test with α appropriately split
into both tails.
=T.INV.2T(0.05,D3-1) = 2.079614
The lower critical value is the negative of the upper critical
value, i. e. -2.079614.
The decision rule in this case is to reject the null hypothesis if
the test statistic is smaller
than the lower critical value or greater than the upper critical
value. In this example, the test
statistic is between the lower critical value and the upper
critical value for a two-tailed test and
we would do not reject the null hypothesis. To find the p-value,
for both tails, it would be
T.DIST.2T(1.431491, 22-1) =0.16700. The p-value for a two-
tailed test = 0.16700 and is
greater than α = 0.05, so we do not reject the null hypothesis
EXAMPLE 19 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 89
Example 19: Large sample confidence intervals and tests of
hypothesis for differences between
two means when population variance is unknown and unequal.
When we have large sample sizes with unknown variances that
are unequal, we can use
the normal distribution as an approximation to the t distribution.
For this example, 49 individuals with anorexia nervosa were
bulimic and had an average
depression score of 30.0 (standard deviation = 5.9) while 56
individual were non-bulimic and
had an average depression score of 27.0 (standard deviation =
5.4).
i) Calculate a 90 % confidence interval for the difference in
depression score.
a) Record sample sizes, means and standard deviations as
constants in EXCEL cells.
Type 'Sample 1:' in cell A1.
Type 'Mean1 =' in cell B2, and the number 30.0 in cell C2.
Type 's1 =' in cell B3, and the number 5.9 in cell C3.
Type 'n1 =' in cell B4, and the number 49 in cell C4.
Type 'Sample 2:' in cell A5.
Type 'Mean2 =' in cell B6, and the number 27.0 in cell C6.
Type 's2 =' in cell B7, and the number 5.4 in cell C7.
Type 'n2 =' in cell B8, and the number 56 in cell C8.
b) Calculate the standard error of the difference between the
two sample means.
Type 'sd<diff> =' in cell B10,
and the formula =SQRT(C3*C3/C4+C7*C7/C8) in cell C10.
c) Determine the critical value for a 90% confidence interval.
In this case we will use the
standard normal distribution to approximate the t value because
the sample sizes are so large.
For 100(1-α)% CI, use the NORM.INV function with 1-α.
Type 'cv = ' in cell B11, and the formula
=NORM.INV(0.95,0,1) in cell C11.
d) Calculate margin of error of difference = critical value ×
standard error.
Type 'E =' in cell B12, and the formula =C11*C10 in cell C12.
e) Calculate lower limit = difference – margin of error
and upper limit = difference + margin of error.
Type 'LL =' in cell B13, and the formula =C2-C6-C12 in cell
C13.
Type 'UL =' in cell B14, and the formula =C2-C6+C12 in cell
B14.
EXCEL EXAMPLES EXAMPLE 19
90 INTRODUCTORY STATISTICS LABORATORY
The following is a copy of the first 12 rows of columns A, B
and C
Bulimics
Mean1 = 30
s1 = 5.9
n1 = 49
Non-
bulimics
Mean2 = 27
s2 = 5.4
n2 = 56
se<diff> = 1.10956
cv = 1.644853
E = 1.825062
LL = 1.174938
UL = 4.825062
ii) Test the hypothesis that the mean depression scores for the
two groups are equal against an
alternative that they are not equal.
a) Calculate the test statistic (z) by dividing the difference
between the means minus zero
(for no difference from the null hypothesis) by the standard
error of the difference.
=(C2-C6-0)/C10 = 2.70378
b) Compare the calculated test statistic to a critical value that
correctly reflects your
choice of significance level and the form of the alternative
hypothesis.
For a two-tailed test, use NORM.INV with 1 - α/2. For one-
tailed test, use NORM.INV
with 1 - α. In this example, consider α = 0.05 and a two-tailed
test.
=NORM.INV(0.975,0,1) = 1.959961
Since the test statistic (2.70378) is greater than the upper
critical value for the two-tailed
test, reject the conclusion that the mean depression score is the
same for both groups.
EXAMPLE 20 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 91
Example 20: Confidence intervals and tests of hypothesis for
differences between two means for
independent samples: population variances are unknown but
equal.
For this example, 25 men had an average decrease in systolic
blood pressure of 8.9 units
(standard deviation = 6.2) due to transcendental meditation. For
25 women, the average decrease
was 5.0 units (standard deviation = 6.0).
i) Calculate a 95 % confidence interval for the difference in
average decrease.
a) Record sample sizes, means and standard deviations as
EXCEL constants.
Type 'Men:' in cell A1.
Type 'Mean1 =' in cell B2, and the number 8.9 in cell C2.
Type 's1 =' in cell B3, and the number 6.2 in cell C3.
Type 'n1 =' in cell B4, and the number 25 in cell C4.
Type 'Women:' in cell A5.
Type 'Mean2 =' in cell B6, and the number 5.0 in cell C6.
Type 's2 =' in cell B7, and the number 6.0 in cell C7.
Type 'n2 =' in cell B8, and the number 25 in cell C8.
b) Calculate the pooled variance for the two samples (assumed
to be same in both
populations).
Type 'Var(pooled) = ' in cell B10,
and the formula =((c4-1)*c3*c3+(c8-1)*c7*c7)/(c4-1+c8-
1) in cell C10.
c) Calculate the standard error of the difference between the
two means.
Type 'sde<diff> =' in cell B11, and the formula
=SQRT(C10*(1/C4+1/C8)) in cell C11.
2
p
1
2
1 2
2
2
1 2
1
2
1
2
s =
(n -1) s + (n -1) s
(n -1) + (n -1)
n = size, sample 1
n = size sample 2
s = st.dev, sample 1
s = st.dev, sample 2
,
sx1−x2 = pools (
1
1n
+ 1
2n
)
EXCEL EXAMPLES EXAMPLE 20
92 INTRODUCTORY STATISTICS LABORATORY
d) Calculate the α/2 critical value for the t-distribution with (n1
- 1 + n2 - 1) degrees of
freedom because the population variances are equal. Using
T.INV.2T with α will give the
positive critical value for a two-tailed test with α appropriately
split into both tails.
=T.INV.2T(0.05,C4+C8-2) = 2.01064
Type 'cv = ' in cell B12, and the formula
=T.INV.2T(0.05,C4+C8-2) in cell C12
e) Calculate the margin of error = critical value × standard
error of the difference.
Type 'E =' in cell B13, and the formula =C12*C11 in cell C13.
f) Calculate lower limit = difference between means – margin
of error
and upper limit = difference between means + margin of error.
Type 'LL =' in cell B14, and the formula =C2-C6-C13 in cell
C14.
Type 'UL =' in cell B15, and the formula =C2-C6+C13 in cell
C15.
Men:
Mean1 = 8.9
s1 = 6.2
n1 = 25
Women:
Mean2 = 5.0
s2 = 6.0
n2 = 25
Var(pooled) = 37.22
se<diff> = 1.7256
cv = 2.0106
E = 3.4695
LL = 0.4305
UL = 7.3695
On average, transcendental meditation resulted in a greater
decrease (3.9 units) in blood
pressure for men than for women. We are 95% confident that
the population difference is
between 0.4 and 7.4 units.
EXAMPLE 20 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 93
ii) Test the hypothesis that the decrease in blood pressure is the
same in men as in women.
Use a 5% significance level and a two-tailed alternative
hypothesis.
a) Calculate the test statistic as difference/standard error of
difference.
=((C2-C6)-0)/C11 = 2.26012
b) Compare to critical values from the t-distribution with n1 +
n2 - 2 = 48 degrees of
freedom and α = 0.05.
=T.INV.2T(0.05,C4+C8-2)= 2.01064, therefore the critical
values for a two-
tailed test are -2.01064 and 2.01064.
Since test statistic = 2.26012 is greater than the upper critical
value of 2.01064, reject the
null hypothesis that the decrease in blood pressure is the same
for both sexes.
EXCEL EXAMPLES EXAMPLE 21
94 INTRODUCTORY STATISTICS LABORATORY
Example 21: Large sample confidence intervals and tests of
hypothesis for differences between
two proportions.
Of 1500 people from a high-income group, 62.4 % were
registered to vote. Of 1500 in a
low-income group, 58.2% were registered to vote.
i) Calculate a 95 % confidence interval for the difference in
voter registration between
high-income and low-income groups.
a) Store n1, p1, n2 and p2 as EXCEL constants.
Type 'n1 =' in cell A1, and '1500' in cell B1.
Type ' p̂ 1 =' in cell A2, and '0.624' in cell B2.
Type 'n2 =' in cell A3, and '1500' in cell B3.
Type ' p̂ 2 =' in cell A4, and '0.582' in cell B4.
b) Calculate the standard error of the difference of the two
population proportions:
Type 'sd<diff> = ' in cell A6,
and the formula =SQRT(B2*(1-B2)/B1+B4*(1-B4)/B3) in cell
B6.
c) Determine the critical value of the standard normal
distribution corresponding to α/2 =
0.025 and 1-α/2 = 0.975 [required for a (1 - α) = 0.95
confidence interval].
Type 'cv =' in cell A7, and the formula
=NORM.INV(0.975,0,1) in cell B7.
d) Calculate margin of error = critical value × standard error of
the difference.
Type 'E =' in cell A8, and the formula =B7*B6 in cell B8.
e) Calculate lower limit = difference in proportion – margin of
error
and upper limit = difference in proportion + margin of error.
Type 'LL =' in cell A9, and the formula =B2-B4-B8 in cell B9.
Type 'UL =' in cell A10, and the formula =B2-B4+B8 in cell
B10.
n
)p-(1p
+
n
)p-(1p
= s
2
22
1
11
pp
ˆˆˆˆ
2ˆ1ˆ −
EXAMPLE 21 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 95
n1 = 1500
p1 = 0.624
n2 = 1500
p2 = 0.582
sd<diff> = 0.017849
cv = 1.959961
E = 0.034984
LL = 0.007016
UL = 0.076984
Using a 95% confidence interval, the difference in voter
registration between high-
income and low-income groups is between 0.007 and 0.077 (0.7
to 7.7 %).
ii) Test the hypothesis that the high-income group has a higher
voter registration that the
low-income group. Use. α = 0.05.
a) The test statistic must be calculated as if the null hypothesis
were true. Thus, we need
to calculate the average proportion of voter registration.
=(B1*B2+B3*B4)/(B1+B3) = 0.603000
Type 'p<pooled> =' in cell A12, and the formula
=(B1*B2+B3*B4)/(B1+B3) in cell B12
b) Use the pooled proportion to calculate new standard error of
a difference.
)
n
+
n
)(p-(1p = s
2
pp
11
1
2ˆ1ˆ −
Type 'sd =' in cell A13,
and the formula =SQRT(B12*(1-B12)*(1/B1+1/B3)) in cell
B13.
c) Calculate the test statistic
Type 'z<calc> =' in cell A14, and the formula =(B2-B4-0)/B13
in cell B14.
n + n
pn +pn = p
21
2211 ˆˆ
sd
p - p
=z 21
0ˆˆ −
EXCEL EXAMPLES EXAMPLE 21
96 INTRODUCTORY STATISTICS LABORATORY
d) Calculate the critical value for a one-tailed (upper) test at α
= 0.05;
Type 'cv =' in cell A15, and the formula =NORM.INV(0.95,0,1)
in cell B15.
As an alternative, the p-value can be calculated
Type 'p-value =' in cell A16,
and the formula =1-NORM.DIST(B14,0,1,TRUE) in cell B16.
The following are the results.
p<pooled> = 0.603
sd = 0.017866
z<calc> = 2.350856
cv = 1.644853
p-value = 0.009365
Since z = 2.35086 is greater than zα = 1.6449, reject the null
hypothesis. Or because p-
value = 0.009365 which is less than α = 0.05, the null
hypothesis is rejected.
EXAMPLE 22 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 97
Example 22: How to carry out a one-way analysis of variance.
A one-way analysis of variance should be used where data can
be grouped by only one
criterion. This type of design is sometimes called a completely
random design because
treatments are assigned randomly to all available experimental
units.
For this example, consider the mercury concentration
(micrograms per gram of body
weight) of fish living 5.5 km upstream from a chloralkali plant
(treatment 1), 3.7 km downstream
from the plant (treatment 2), 21 km downstream (treatment 3),
or 133 km downstream (treatment
4). Consider that the treatment number for each of 40 fish has
been read into column A and that
the mercury concentration has been read into column B.
The ANOVA procedure should be used to carry out this
analysis of variance.
In this example, the variable to be analyzed, mercury
concentration, is stored in column B
and the classification variable (treatment) is stored in column A
on an EXCEL worksheet.
Before analysis can begin, it is necessary to copy data for the
different treatments
into different columns of the EXCEL worksheet. In this
example, the data for the four
treatments is stored in columns F, G, H and I. The labels 'Trt 1'
in cell F1, 'Trt 2' in cell G2, 'Trt
3' in cell H1, and 'Trt 4' in cell I1 are added. Then select all the
data in column B that belongs to
Trt 1, and then use Edit->Paste Special to past the values in
cells F2 through F11. The same
procedure is repeated for treatments 2, 3 and 4. Prior to analysis
of variance, the data is arranged
thusly:
Trt 1 Trt 2 Trt 3 Trt 4
23.84 26.92 29.20 32.73
23.58 26.68 29.70 32.88
23.42 26.91 29.11 32.90
23.74 26.26 29.02 32.08
23.23 26.72 29.19 32.80
23.01 26.05 29.06 32.96
23.14 26.12 29.39 32.22
23.31 26.86 29.68 32.31
23.02 26.87 29.69 32.13
23.79 26.31 29.78 32.35
EXCEL EXAMPLES EXAMPLE 22
98 INTRODUCTORY STATISTICS LABORATORY
To perform a one-way ANOVA, choose Anova: Single Factor
to open the single factor
anova dialog box. [Excel 2013: Data Tab – Data Analysis –
Anova: Single Factor]
Set the Input range to $F$1:$I$11
Grouped by to Columns and select Labels in first row.
Click OK.
EXAMPLE 22 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 99
The following results appear on a new worksheet.
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Trt 1 10 234.0743 23.40743 0.099489
Trt 2 10 265.6906 26.56906 0.120285
Trt 3 10 293.8229 29.38229 0.090206
Trt 4 10 325.3521 32.53521 0.121919
ANOVA
Source of
Variation
SS df MS F P-value F crit
Between Groups 456.1535 3 152.0512 1408.211 2.34E-37
2.866265
Within Groups 3.887088 36 0.107975
Total 460.0405 39
The degrees of freedom (DF) for differences among the four
treatment groups is equal to
one less than the number of treatments (4 – 1 = 3). The degrees
of freedom for error is equal to
the sum over four treatments of the number of individuals in
each treatment minus one [(10 - 1)
+ (10 - 1) + (10 - 1) + (10 - 1) = 36]. The degrees of freedom
for the total sum of squares is equal
to the total number of observations minus one (39 = 40 - 1).
The mean square for treatment groups is equal to the sum of
squares for treatment groups
divided by its degrees of freedom (152.0512 = 456.1535/3).
This mean square is a measure of
variation among the four groups of fish. The error mean square
is equal to the error sum of
squares divided by its degrees of freedom (0.107975 =
3.887088/36). It measures the average
(pooled) variation among individuals within treatments. The
error mean square is the estimate of
pooled variance and will be used for calculating confidence
intervals or tests of hypothesis about
treatment means.
The F-ratio is calculated by dividing the treatment mean square
by the error mean square
(1408.211 = 152.0512/0.107975). The F-ratio is the test statistic
for testing the null hypothesis
that all four treatments have the same mean. The alternative
hypothesis is that not all four
treatments have the same mean.
NOTE
The alternative hypothesis sounds like a two-tailed hypothesis.
However, only the upper
tail of the F distribution is considered when evaluating the
significance of an F statistic. Only the
upper tail is used because the F statistic is calculated from
squares of differences. Squares of
differences will be positive regardless of whether the
differences are positive or negative.
The F-statistic has a numerator degrees of freedom equal to the
degrees of freedom that
correspond to the numerator mean square (3, in this example)
and a denominator degrees of
freedom equal to the degrees of freedom associated with the
error (36, in this example). To test
EXCEL EXAMPLES EXAMPLE 22
100 INTRODUCTORY STATISTICS LABORATORY
the hypothesis that all treatments have the same mean, one
should compare the calculated F-
statistic to the critical value of the F-distribution corresponding
to 3 and 36 degrees of freedom
and a suitable significance level (0.01 or 0.05 are most
common).
The critical value of the F-distribution can be determined by
reference to a statistical
table. EXCEL gives the correct critical F-value for the test.
Since the calculated F-value (1408.211) greatly exceeds the
critical value (2.8863), we
reject the null hypothesis and conclude that there were
differences among the treatments in
average mercury concentration.
An alternative to comparing the calculated F-value to a critical
value is to compare the p-
value (2.34E-37 = 0.0000 to four decimal places) to the
significance level (α = 0.05). Since
0.0000 is much less than 0.05, we reject the null hypothesis.
EXAMPLE 24 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 101
Example 23: Is MIA.
Example 24: How to use information from analysis of variance
to calculate confidence intervals
or test hypotheses about treatment means (including least
significant difference) using data from
Example 22.
For these examples, consider an analysis of variance {Example
22} that has an error
mean square of 0.107975 with 36 degrees of freedom. Consider
treatment 2 with a mean (of 10
observations) equal to 26.569 and treatment 3 (also 10
observations) with a mean of 29.382.
IMPORTANT
Confidence intervals and tests of hypotheses about means in an
analysis of variance will
always use the error mean square as the estimate of the pooled
variance.
a) Store the error degrees of freedom, the error mean square
(pooled variance) and means in
EXCEL worksheet cells.
Type 'df =' in cell A1, and '36' in cell B1.
Type 'ems =' in cell A2, and '0.107975' in cell B2.
Type 't2 =' in cell A3, and '26.569' in cell B3.
Type 't3 =' in cell A4, and '29.382' in cell B4.
df = 36
ems = 0.107975
t2 = 26.569
t3 = 29.382
b) Calculate a 90% confidence limit for the mean of treatment 2
using the method when σ is not
known as described in example 16.
Standard error of one mean = square root of (error mean
square/sample size).
Type 'sd =' in cell A6, and the formula =SQRT(B2/10) in cell
B6.
Get critical value for α = 1 - 0.90 = 0.10 and error degrees of
freedom.
Type 'cv =' in cell A7, and the formula =T.INV.2T(0.10,36) in
cell B7.
Limits = mean ± critical value x standard error of mean.
Type 'LL =' in cell A8, and the formula =B3-B6*B7 in cell B8,
Type 'UL =' in cell A9, and the formula =B3+B6*B7 in cell
B9.
sd = 0.103911
cv = 1.688297
LL = 26.39357
UL = 26.74443
EXCEL EXAMPLES EXAMPLE 24
102 INTRODUCTORY STATISTICS LABORATORY
c) Test the hypothesis that the two means are not different using
the method described in
example 20. Use α = 0.05 and consider a two-tailed alternative
hypothesis.
Type 't =' in cell A11, and the formula =(B3-
B4)/sqrt(B2/10+B2/10) in cell B11.
Type 'cv =' in cell A12, and the formula = T.INV.2T(0.05,36) in
cell B12.
t = -19.1423
cv = 2.028091
Since the calculated test statistic (-19.1423) is outside the
range of -2.028091 to
2.028091, we reject the hypothesis that the two means are equal.
d) The least significant difference is the margin of error for a
confidence interval for the
difference between two means, provided both means are based
on the same sample size. To
calculate an LSD(α), we use the Error Mean Square and the
sample size (remember all
sample sizes are the same).
n
quareErrorMeanS
t=)LSD( edf/2,
*2
αα
Type 'LSD(0.05) =' in cell A14, and the formula
=T.INV.2T(0.05,36)*SQRT(2*B2/10)
LSD(0.05) = 0.298033
If the absolute value of the difference between two means is
greater than the least
significant difference, we reject the hypothesis that the two
means are equal.
For treatments 2 and 3, the difference is 26.569 - 29.382 = -
2.813 with absolute value
2.813. Since 2.813 is greater than LSD(0.05) = 0.298033, we
reject the hypothesis that
treatments 2 and 3 are equal.
t = x
- x
s
n
+
s
n
2 3
2
2
2
3
EXAMPLE 25 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 103
Example 25: How to perform a two-way analysis of variance.
When data are classified according to two criteria, and when
there is more than one
observation in each combination of the two criteria, a two-way
analysis of variance includes a
term for the interaction between the two classification factors.
Data for this example consist of
the number of diatoms found in a stream at each of two
locations (1 = upstream, 2 = downstream
from a water treatment plant) with sampling occurring in three
different weeks. For each
observation, the site designation is stored in column A, the
week designation in column B, and
the number of diatoms in column C.
Site Week Number
1 1 689
1 1 756
1 2 831
1 2 916
1 3 558
1 3 423
2 1 204
2 1 229
2 2 56
2 2 73
2 3 34
2 3 78
First, arrange the data in a two-way table like this.
Site 1 Site 2
Week 1 689 204
756 229
Week2 831 56
916 73
Week3 558 34
423 78
To perform a two-way Anova use: Anova: Two-Factor With
Replication [Excel 2013: Data
Tab – Data Analysis – Anova: Two Factor With Replication]
In Input Range:, indicate the cells that contain the data and the
labels. For example, if the first
seven rows of columns E, F and G contain the two-way table of
data, specify the input range as
E1:G7.
Set Rows per sample: to 2 and click OK.
EXCEL EXAMPLES EXAMPLE 25
104 INTRODUCTORY STATISTICS LABORATORY
Here are the results from this EXCEL analysis.
Anova: Two-Factor With Replication
SUMMARY Site 1 Site 2 Total
Week 1
Count 2 2 4
Sum 1445 433 1878
Average 722.5 216.5 469.5
Variance 2244.5 312.5 86197.67
Week2
Count 2 2 4
Sum 1747 129 1876
Average 873.5 64.5 469
Variance 3612.5 144.5 219412.7
Week3
Count 2 2 4
Sum 981 112 1093
Average 490.5 56 273.25
Variance 9112.5 968 66290.25
Total
Count 6 6
Sum 4173 674
Average 695.5 112.3333
Variance 32769.1 6809.867
ANOVA
Source of
Variation
SS df MS F P-value F crit
Sample 102443.2 2 51221.58 18.74589 0.002626 5.143249
Columns 1020250 1 1020250 373.3874 1.24E-06 5.987374
Interaction 79057.17 2 39528.58 14.46653 0.005067 5.143249
Within 16394.5 6 2732.417
Total 1218145 11
F = 373.39 has 1 and 6 degrees of freedom and can be used to
test the hypothesis that
there is no difference between the upstream and downstream
sites. F = 18.75 has 2 and 6 degrees
of freedom and can be used to test the hypothesis that there
were no differences among weeks. F
= 14.47 has 2 and 6 degrees of freedom and can be used to test
the hypothesis that the differences
between sites (if any) were the same in all three weeks (i.e., no
interaction between the two
factors).
Since all three p-values were less than 0.05, we would reject all
three null hypotheses.
EXAMPLE 25 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 105
Average of Number Site
Week 1 2 Grand Total
1 722.50 216.50 469.50
2 873.50 64.50 469.00
3 490.50 56.00 273.25
Grand Total 695.50 112.33 403.92
From the two-way table of means, it is clear that the number of
diatoms was much lower
(112.33 on average) at the downstream site than at the upstream
site (average = 695.50). It is also
clear that numbers were down in week 3 compared to the other
two weeks. The difference
between the upstream and downstream sites was 504.00 in week
1, 809.00 in week 2, and 434.50
in week 3. It is clear that there is an interaction between the site
and week factors; the difference
between sites depends upon which week the sampling was done.
EXCEL EXAMPLES EXAMPLE 27
106 INTRODUCTORY STATISTICS LABORATORY
Example 26: How to calculate a randomized complete block
analysis of variance
Many experiments in agriculture and biology are similar to a
two-way design but have
only one observation per cell. In these experiments, one must
assume that there is no interaction
between the two factors. This assumption is always valid when
one of the factors consists of
ways of grouping the experimental units into more uniform
groups, as is common if field
research.
The present example consists of data on the number of soybean
plants (out of 100;
column C) that failed to emerge. There are two factors in the
experiment. Each observation can
be classified according the fungicide treatment (Check, Arasan,
Spergon, Semasan, or Fermate;
column A) or according to the block in the field (Block 1, Block
2, Bock 3, Block 4 or Block 5;
column B). The 25 observations consist of five fungicide
treatments in all combinations with 5
blocks. The model will not include an interaction term.
The data, arranged for analysis by EXCEL, is stored in rows 1
through 6 of columns E
through J.
Block 1 Block 2 Block 3 Block 4 Block 5
Check 8 10 12 13 11
Arasan 2 6 7 11 5
Spergon 4 10 9 8 10
Semasan 3 5 9 10 6
Fermate 9 7 5 5 3
To analyze this data, proceed as follows:
To perform a Anova for a RCBD, use: Anova: Two-Factor
Without Replication [Excel 2013:
Data Tab – Data Analysis – Anova: Two Factor Without
Replication]
In Input Range:, indicate the cells that contain the data and the
labels. For example, if the first
six rows of columns E, through J contain the two-way table of
data, specify the input range as
E1:J6, or select those cells by suing the mouse.
Check Labels, and click on OK.
EXAMPLE 26 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 107
Results are as follows:
Anova: Two-Factor Without Replication
SUMMARY Count Sum Average Variance
Check 5 54 10.8 3.7
Arasan 5 31 6.2 10.7
Spergon 5 41 8.2 6.2
Semasan 5 33 6.6 8.3
Fermate 5 29 5.8 5.2
Block 1 5 26 5.2 9.7
Block 2 5 38 7.6 5.3
Block 3 5 42 8.4 6.8
Block 4 5 47 9.4 9.3
Block 5 5 35 7 11.5
ANOVA
Source of
Variation
SS df MS F P-value F crit
Rows 83.84 4 20.96 3.874307 0.021886 3.006917
Columns 49.84 4 12.46 2.303142 0.103195 3.006917
Error 86.56 16 5.41
Total 220.24 24
The error mean square (5.41) is an estimate of the pooled
variance and has 16 degrees of
freedom. The table gives us two p-values for the two F-tests but
only one of those is for the
treatments. The F value for the Rows tests if there is significant
differences among rows
(fungicide treatments) for the number of failed germinations.
This information appears in Rows
because that is how the original data was organized into Excel.
The error mean square (5.41) is
an estimate of the pooled variance and has 16 degrees of
freedom. The p-value for blocks cannot
be used to glean information about our treatments (and is
ignored because the blocks are not
randomly assigned).
EXCEL EXAMPLES EXAMPLE 27
108 INTRODUCTORY STATISTICS LABORATORY
Example 27: How to prepare a scatterplot of two variables.
For this example, data on cage size (cm2) and body weight (g)
of 12 crabs has been stored
in columns A and B of the EXCEL worksheet. Note that cage
size is the independent (x) variable
and body weight (y) is the dependent variable. In other words,
the size of the cage affects the
body weight of the crab. Excel will pick the first column as the
x variable and the second column
as the y variable.
CageSize BodyWt
159 14.40
179 15.20
100 11.30
45 2.50
384 22.70
230 14.90
100 1.41
320 15.81
80 4.19
220 15.39
320 17.25
210 9.52
To make a scatterplot, a chart must be inserted.
The first step is to highlight the data including labels, then
choose Chart [Excel 2013: Insert Tab
– Insert Scatter (X,Y) or Bubble Chart]
EXAMPLE 28 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 109
Choose the first option under Scatter (used to compare at least
two sets of values or pairs of
data).
The Chart Title should be descriptive. Click on the title and
rename the graph to describe the
subject matter.
The Axes should also be appropriately labeled. Click the graph
and check the Axis Titles to add
them in.
EXCEL EXAMPLES EXAMPLE 27
110 INTRODUCTORY STATISTICS LABORATORY
A trendline can also be added in. PLSC 214 discusses linear
relationships and Excel allows the
regression equation and the r2 value on to the graph.
Because the points are scattered in a pattern from the lower left
corner to the upper right
corner, we conclude that there is a positive relationship between
the two variables. It appears that
bigger cage sizes results in heavier crabs. The slope is 0.0528
which is positive and the y
intercept is 1.7287. The r2 value is 0.7485; nearly 75% of the
variation is explained by the
model.
EXAMPLE 28 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 111
Example 28: How to calculate a correlation coefficient.
For this example, data on cage size (cm2) and body weight (g)
of 12 crabs has been stored
in columns A and B of the EXCEL worksheet (see example 27).
CageSize BodyWt
159 14.40
179 15.20
100 11.30
45 2.50
384 22.70
230 14.90
100 1.41
320 15.81
80 4.19
220 15.39
320 17.25
210
9.52
a) Calculate standard deviations of each of the two variable and
store results in column D.
Type 's1 =' in cell C1, and the formula =STDEV.S(A:A) in cell
D1.
Type 's2 =' in cell C2, and the formula =STDEV.S(B:B) in cell
D2.
b) Calculate the covariance for sample data.
Type 's12 =' in cell C3, and the formula
=COVARIANCE.S(A2:A13,B2:B13) in cell D3
c) Calculate the correlation = covariance/STDEV.S(x) *
STDEV.S(y)
Type 'r = ' in cell C4, and the formula =D3/D2*D1 in cell D4.
Results are:
s1 = 106.3309
s2 = 6.484094
s12 = 596.5107
r = 0.86519
e) Alternative method of calculating correlation.
Type 'r =' in cell C6, and the formula
=CORREL(A2:A13,B2:B13) in cell D6.
EXCEL EXAMPLES EXAMPLE 28
112 INTRODUCTORY STATISTICS LABORATORY
f) How to test if the correlation is significant. A test statistic
can be calculated and compared to
a t value with degrees of freedom n – 2. If the test statistic falls
in the rejection region, you
would reject the null hypothesis of ρ = 0.
21
2
r
nr=tcalc −
−
In this example of 12 crabs, we could test at the 5%
significance level if there is a
positive linear correlation. The t value with n -2 degrees of
freedom is t = 1.812.
456.5
)86519.0(1
21286519.0
1
2
22 =−
−
=
−
−
r
nr=tcalc
The test statistic for a right-tailed test falls into the rejection
region. The null hypothesis is
rejected and we concluded that there is a positive linear
relationship between cage size and weight
of crabs.
EXAMPLE 29 EXCEL EXAMPLES
INTRODUCTORY STATISTICS LABORATORY 113
Example 29: How to perform a regression analysis using
EXCEL
For this example, data on cage size (cm2) and body weight (g)
of 12 crabs has been stored
in columns A and B of the EXCEL worksheet (see example 27).
For this example, body weight is the dependent variable and
cage size is the independent
variable. We wish to body weight by using its relationship to
cage size. In this example, we have
only one independent variable.
Highlight the data and select Regression [Excel 2013: Data Tab
– Data Analysis - Regression]
Set Input Y Range: to a1:a13.
Set Input X Range: to b1:b13.
Check Confidence levels .
Click OK.
EXCEL EXAMPLES EXAMPLE 29
114 INTRODUCTORY STATISTICS LABORATORY
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.8651
R Square 0.7485
Adjusted R
Square
0.7234
Standard Error 55.922
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 93095.89 93095.89 29.76876 0.00028
Residual 10 31273.02 3127.302
Total 11 124368.9
Coefficients Standard
Error
t Stat P-value Lower 95% Upper
95%
Intercept 24.65 35.24 0.700 0.50016 -53.87 103.18
X Variable 1 14.188 2.600 5.456 0.00028 8.40 19.98
The intercept (the expected value of the dependent variable
when the independent
variable is zero) was estimated as 24.65 with a standard
deviation of 35.24.
The slope (the expected change in the dependent variable for an
increase of one unit in the
independent variable) was 14.188 with a standard deviation of
2.600. A t-test indicates that the
slope was significantly different from zero because the p-value
= 0.00028 which is less than α =
0.05.
The standard deviations of the intercept and slope may also be
called standard errors and
they are standard deviations of the distribution of sample
statistics. Both standard deviations
have n - 2 = 10 degrees of freedom because both are (complex)
functions of the error sum of
squares.
The coefficient of determination (R-Square = 74.85%) indicates
that nearly 75% of the
variation in body weight can be explained as a linear function of
cage size.
Annual Editions Journal Summary
Instructions:
1. Summarize each of the readings in the tables below.
2. You may expand the table to accommodate your information.
3. Write in complete sentences using proper grammar and
mechanics.
Readings:
Unit 5 in the textbook: Social Media and Commerce
· The Rising Influence of Social Media as Reflected by Data
· How Google Dominates Us.
· Can Online Piracy Be Stopped by Laws?
· How Psychology Will Shape the Future of Social Media
Marketing.
· AmazonFresh is Jeff Bezos’ Last Mile Quest for Total Retail
Domination.
Reading #15 – The Rising Influence of Social Media as
Reflected by Data
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.
Response to the article:
Reading #16 –How Psychology Will Shape the Future of Social
Media Marketing
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.
Response to the article:
Reading #17– How Google Dominates Us
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.
Response to the article:
Reading #18 – AmazonFresh is Jeff Bezos’ Last Mile Quest for
Total Retail Domination.
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.
Response to the article:
Reading #19 - Can Online Piracy Be Stopped by Laws?
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.
Response to the article:
Adapted from Dushkin Online Annual Editions Test Your
Knowledge Form http://www.dushkin.com/online/
LAB2A.DAT142142130139132150137133147135134146140132
13614114914113513613013613413714613815213213712613413
51471421421351311421381461351481291381351371411441471
41141138139139145139137147141143135136140139137139134
13912913714914214013013913514413413213313514413413913
51341311421421521411401361441401391421461391391351391
42138135133142137141141142136141134135138135140144142
13814813514113913814113713513614114412913813313813013
01331381241421421381321441401461461451381391361321391
35137136131137147140137137134129134140141139143140138
13913715014215014613813012213213814112313413613914215
21491381391371351331381381351411451391301401331441431
41137137138136134143143138136140142136148141133149139
13114414313914214612713913713513113614413513714514714
11361471331311451361411401391451401441371371391441381
38141134145136139136135143135136135149144133146134140
15013714114213015414114313813413813113514914913113214
21361321441341521361391421391411401311371341381521371
34139124147144146140139141132143145137142139138138143
14913013513413614915014513714514113813614113113914213
61441371351511421431431401301451421391301371511391401
35138133137143134132136131135141145132135139139141138
13914814114313813314713013512813813614514313413313813
81471371401401361331391431381431371421461331511411331
41138145149139140128140137140146138132141151137140128
144132143149137
LAB2B.DAT146140135137140137138143149142144145145139
14114113713913915413713515113913813613813914614214113
61371471441401321461481441401341391411391391391491301
39138134134135141139151130135136138137142138140138130
147141136
LAB2C.DAT152147131139145136131142138140140134130134
13713514013314414013913814514514513813914614614713514
31391391301421361451381351391451341341381361421351361
43135130140149141137140152137132136133140141145141132
13313513813714114413714314214413113613814613813513913
71411391371411441421451301431301361461421391331391451
45133137137150139142140140137142143129135132133144138
14714114214514213314214114013414913913813313813914112
61401371361421311351321361391351361381511461371441411
32141135149137139142150131134139139139135134132134147
13713913614913713214113713213713413214314214913113313
21421441411351441341391371471341411431341351521371481
24137135130123133147142146131137139141139142136130147
13713613713414313914413714013314014914014914012813414
91301441411351391381411291301421291361361391281381371
35138138131150134132140135
Introductory Statistics Laboratory for Excel .docx

More Related Content

DOCX
De vry math 221 all discussion+ilbs latest 2016 november 1
DOCX
De vry math 221 all discussion+ilbs latest 2016 november
DOCX
De vry math 399 ilabs &amp; discussions latest 2016
DOCX
De vry math 399 ilabs &amp; discussions latest 2016 november
DOCX
De vry math 221 all ilabs latest 2016 november
DOCX
De vry math 399 all ilabs latest 2016 november
PDF
Reading Data into R
DOCX
De vry math221 all ilabs latest 2016 november
De vry math 221 all discussion+ilbs latest 2016 november 1
De vry math 221 all discussion+ilbs latest 2016 november
De vry math 399 ilabs &amp; discussions latest 2016
De vry math 399 ilabs &amp; discussions latest 2016 november
De vry math 221 all ilabs latest 2016 november
De vry math 399 all ilabs latest 2016 november
Reading Data into R
De vry math221 all ilabs latest 2016 november

Similar to Introductory Statistics Laboratory for Excel .docx (20)

PDF
Solution manual-of-probability-statistics-for-engineers-scientists-9th-editio...
DOCX
STAT 200 Introduction to Statistics Final Examination, Su.docx
PDF
QNT 275 Education Specialist |tutorialrank.com
DOC
Math 221 Massive Success / snaptutorial.com
DOCX
STAT 200 Introduction to Statistics Final Examination, Spri.docx
PDF
An Introduction To Statistical Concepts For Education And Behavioral Sciences...
DOCX
Problem I - Write your first name, middle name, and last name in c.docx
PPTX
Statistical techniques used in measurement
DOCX
Answer all 20 questions. Make sure your answers are as complet.docx
DOC
9417-2.doc
PPTX
Statistics Assignment Help
PPT
Penggambaran Data dengan Grafik
DOCX
STAT 200 Introduction to Statistics Final Examination, Fa.docx
PDF
Manuale di PAST
PPT
Biostatistics
PPTX
LESSON 5-DATA ANALYSIS-Practical Research 2
PDF
Functions
PDF
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures.pdf
PDF
USE OF EXCEL IN STATISTICS: PROBLEM SOLVING VS PROBLEM UNDERSTANDING
PDF
Use of Excel in Statistics: Problem Solving Vs Problem Understanding
Solution manual-of-probability-statistics-for-engineers-scientists-9th-editio...
STAT 200 Introduction to Statistics Final Examination, Su.docx
QNT 275 Education Specialist |tutorialrank.com
Math 221 Massive Success / snaptutorial.com
STAT 200 Introduction to Statistics Final Examination, Spri.docx
An Introduction To Statistical Concepts For Education And Behavioral Sciences...
Problem I - Write your first name, middle name, and last name in c.docx
Statistical techniques used in measurement
Answer all 20 questions. Make sure your answers are as complet.docx
9417-2.doc
Statistics Assignment Help
Penggambaran Data dengan Grafik
STAT 200 Introduction to Statistics Final Examination, Fa.docx
Manuale di PAST
Biostatistics
LESSON 5-DATA ANALYSIS-Practical Research 2
Functions
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures.pdf
USE OF EXCEL IN STATISTICS: PROBLEM SOLVING VS PROBLEM UNDERSTANDING
Use of Excel in Statistics: Problem Solving Vs Problem Understanding
Ad

More from normanibarber20063 (20)

DOCX
Assist with first annotated bibliography.  Assist with f.docx
DOCX
Assistance needed with SQL commandsI need assistance with the quer.docx
DOCX
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
DOCX
Assimilate the lessons learned from the dream sequences in Defense o.docx
DOCX
Assignmnt-500 words with 2 referencesRecognizing the fa.docx
DOCX
Assignmnt-700 words with 3 referencesToday, there is a crisi.docx
DOCX
Assignment  For Paper #2, you will pick two poems on a similar th.docx
DOCX
Assignment Write an essay comparingcontrasting two thingspeople.docx
DOCX
Assignment Travel Journal to Points of Interest from the Early Midd.docx
DOCX
Assignment What are the factors that influence the selection of .docx
DOCX
Assignment Write a research paper that contains the following.docx
DOCX
Assignment Thinking about Managers and Leaders· Identifya man.docx
DOCX
Assignment Talk to friends, family, potential beneficiaries abou.docx
DOCX
Assignment The objective of assignment is to provide a Power .docx
DOCX
Assignment During the on-ground, residency portion of Skill.docx
DOCX
Assignment PurposeThe first part of this assignment will assist.docx
DOCX
Assignment PowerPoint Based on what you have learned so .docx
DOCX
Assignment In essay format, please answer the following quest.docx
DOCX
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
DOCX
Assignment In essay format, please answer the following questions.docx
Assist with first annotated bibliography.  Assist with f.docx
Assistance needed with SQL commandsI need assistance with the quer.docx
assingment Assignment Agenda Comparison Grid and Fact Sheet or .docx
Assimilate the lessons learned from the dream sequences in Defense o.docx
Assignmnt-500 words with 2 referencesRecognizing the fa.docx
Assignmnt-700 words with 3 referencesToday, there is a crisi.docx
Assignment  For Paper #2, you will pick two poems on a similar th.docx
Assignment Write an essay comparingcontrasting two thingspeople.docx
Assignment Travel Journal to Points of Interest from the Early Midd.docx
Assignment What are the factors that influence the selection of .docx
Assignment Write a research paper that contains the following.docx
Assignment Thinking about Managers and Leaders· Identifya man.docx
Assignment Talk to friends, family, potential beneficiaries abou.docx
Assignment The objective of assignment is to provide a Power .docx
Assignment During the on-ground, residency portion of Skill.docx
Assignment PurposeThe first part of this assignment will assist.docx
Assignment PowerPoint Based on what you have learned so .docx
Assignment In essay format, please answer the following quest.docx
Assignment NameUnit 2 Discussion BoardDeliverable Length150-.docx
Assignment In essay format, please answer the following questions.docx
Ad

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
01-Introduction-to-Information-Management.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Institutional Correction lecture only . . .
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Lesson notes of climatology university.
PPTX
master seminar digital applications in india
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Classroom Observation Tools for Teachers
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Pre independence Education in Inndia.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
RMMM.pdf make it easy to upload and study
PPTX
Microbial diseases, their pathogenesis and prophylaxis
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
01-Introduction-to-Information-Management.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Basic Mud Logging Guide for educational purpose
Institutional Correction lecture only . . .
2.FourierTransform-ShortQuestionswithAnswers.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Lesson notes of climatology university.
master seminar digital applications in india
TR - Agricultural Crops Production NC III.pdf
Final Presentation General Medicine 03-08-2024.pptx
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Classroom Observation Tools for Teachers
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Pre independence Education in Inndia.pdf
Sports Quiz easy sports quiz sports quiz
Renaissance Architecture: A Journey from Faith to Humanism
RMMM.pdf make it easy to upload and study
Microbial diseases, their pathogenesis and prophylaxis

Introductory Statistics Laboratory for Excel .docx

  • 1. Introductory Statistics Laboratory for Excel Lab Manual Author: R. J. (Bob) Baker December 2003 Revised by: Krista Wilde (2016) i Table of Contents Assignment #0 _____________________________________________________ __________ 2
  • 2. Assignment #1 _____________________________________________________ __________ 6 Assignment #2 _____________________________________________________ _________ 10 Assignment #3 _____________________________________________________ _________ 16 Assignment #4 _____________________________________________________ _________ 22 Assignment #5 _____________________________________________________ _________ 26 Assignment #6 _____________________________________________________ _________ 32 Assignment #7 _____________________________________________________ _________ 36 Assignment #8 _____________________________________________________ _________ 44 Assignment #9 _____________________________________________________ _________ 52 INTRODUCTION _____________________________________________________ ______ 58
  • 3. Example 1: Reading data from a data file into the EXCEL worksheet. _________________ 60 Example 2: Preparing a histogram of data ________________________________________ 62 Example 3: Entering data from the keyboard into the EXCEL worksheet _______________ 66 Example 4: Calculating relative frequencies ______________________________________ 67 Example 5: Leaving EXCEL and grading your assignment __________________________ 68 Example 6: How to prepare a stem-and-leaf diagram _______________________________ 69 Example 7: How to draw a frequency (or relative frequency) polygon __________________ 71 Example 8: How to use EXCEL to calculate various numbers that summarize the characteristics of a population (or sample) ________________________________________ 73 Example 9: How to use the DESCRIPTIVE STATISTICS command of EXCEL _________ 75 Example 10: Further uses of EXCEL->As a calculator _____________________________ 76 Example 11: Calculations with a discrete probability distribution _____________________ 77 Example 12: Reading and storing constants for further use __________________________ 79 Example 13: Using EXCEL to answer questions about continuous distributions _________ 80 Example 14: How to calculate a chi-squared statistic for a 'goodness-of-fit' test _________ 82
  • 4. Example 15: How to calculate a confidence interval for one mean when σ is known ______ 84 Example 16: How to calculate a confidence interval for one mean when σ is NOT known _ 85 Example 17: How to calculate a confidence interval for a binomial proportion __________ 86 Example 18: How to calculate a test of hypothesis concerning one mean when σ is NOT known _____________________________________________________ ________________ 87 ii Example 19: Large sample confidence intervals and tests of hypothesis for differences between two means when population variance is unknown and equal _________________________ 89 Example 20: Confidence intervals and tests of hypothesis for differences between two means for independent samples: population variances are unknown but equal ________________ 91 Example 21: Large sample confidence intervals and tests of hypothesis for differences between two binomial proportions. _____________________________________________________ 94 Example 22: How to carry out a one-way analysis of variance.
  • 5. _______________________ 97 Example 23: . _____________________________________________________ _________ 101 Example 24: How to use information from analysis of variance to calculate confidence intervals or test hypotheses about treatment means (including least significant difference). 101 Example 25: How to perform a two-way analysis of variance. _______________________ 103 Example 26: How to calculate a randomized complete block analysis of variance _______ 106 Example 27: How to prepare a scatterplot of two variables. _________________________ 108 Example 28: How to calculate a correlation coefficient. ____________________________ 111 Example 29: How to perform a regression analysis using EXCEL ____________________ 113 Blank page ASSIGNMENT 0
  • 6. 2 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #0 Purpose This assignment is designed for use in the instructed introduction for students using the Introductory Statistics Laboratory for Excel (ISLeX) program. NOTES Login to ISLeX and get the data for Assignment 0. Then start Microsoft Excel and determine the answers to the questions in this assignment. When finished, exit from EXCEL, return to ISLeX and submit your answers. In this assignment, all students use the same data set. In remaining assignments, each student will have unique data sets. See the examples indicated by {Example } to learn how to use EXCEL to perform a particular task. Reference to an example will be given at the end of each major task. The symbol beginning of a new task. Question A
  • 7. Data called LAB0A.DAT in Table A represents measured yields (q/ha, where 1q = 1 quintal = 100 kg) of a sample of wheat varieties tested at Saskatoon. EL worksheet. {Example 1} midpoint (20.5 as its upper bin) and 1 as the interval width (bin size). LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 3
  • 8. Record the frequencies from the histogram into the following table; add the relative frequencies later. Bin Midpoint Frequency Relative frequency 20.5 20 21.5 21 22.5 22 23.5 23 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 Record your answers to the following questions 1. How many observations were there in this sample? 2. What is the midpoint of the most frequent class? (If tied, give lowest midpoint) 3. How many observations were there in the class with midpoint equal to 22? {Example 2}
  • 9. into two columns of the EXCEL worksheet. Verify that you have entered the correct data. Calculate and store relative frequencies in a new column. Record relative frequencies in the above table. {Examples 3 and 4} ASSIGNMENT 0 4 INTRODUCTORY STATISTICS LABORATORY Question B Data in Table B represents measured yields (q/ha) of a sample of wheat varieties evaluated at Tisdale. calculate the mean value.
  • 10. 4. How many observations were there in this data set? 5. What was the mean yield of this sample of wheat varieties? {Example 1, and Example 8 a and b} recorded numerical answers to each of the five questions, you should now leave EXCEL and submit your answers for grading by the ISLeX program. {Example 5} - END OF ASSIGNMENT 0 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 5 Blank page ASSIGNMENT 1 6 INTRODUCTORY STATISTICS LABORATORY
  • 11. Introductory Statistics Laboratory Assignment #1 Purpose This lab is an introduction to tabular and graphical methods of descriptive statistics. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you are then required to enter the answers into the ISLeX program. Question A Data in Table A represents measured yields (q/ha, where 1q = 1 quintal = 100 kg) of a sample of wheat varieties tested at Saskatoon. ata into an EXCEL worksheet. {Example 1} midpoint (20.5 as the starting bin) and 1 as the interval width (bin size). Note that the lower endpoint of any interval is the midpoint minus one-half the interval width while the upper endpoint is the midpoint plus one-half the interval width. Record the frequencies in the preceding table;
  • 12. add relative frequencies later. Excel places data points that are on a bin boundary in the lower bin. Bin Midpoint Frequency Relative frequency 20.5 20 21.5 21 22.5 22 23.5 23 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 Record your answers to the following questions 1. How many observations were there in this sample? 2. What is the midpoint of the most frequent class? (If tied, give lowest midpoint) 3. How many observations were greater than 21.5 and less than or equal to 22.5 q/ha?
  • 13. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 7 {Example 2} EXCEL Worksheet. {Example 3} these will be used in question C). Check the data you have entered and verify that the relative frequencies sum to 1.0 (within 0.001). Record the relative frequencies in the preceding table. 4. What is the relative frequency of yields in sample A that were greater than 21.5 and less than or equal to 22.5 ? {Example 4} -and-leaf diagram of the data from sample A. Use an increment of 1.0 between consecutive stem positions (leaf unit = 0.1). Use the stem-and- leaf diagram to answer the following questions. 5. What is the value (in q/ha) of the leaf unit in this stem-and- leaf diagram?
  • 14. 6. What is the yield (in q/ha) for the item represented by the last leaf position in the fifth (from the top) stem position? {Example 6} Question B Data in Table B represents measured yields (q/ha) of a sample of wheat varieties evaluated at Tisdale. {Example 1} midpoint (24.5 as the first bin) and 1 as the interval width.
  • 15. ASSIGNMENT 1 8 INTRODUCTORY STATISTICS LABORATORY Record the frequencies in the following table; add relative frequencies later. Bin Midpoint Frequency Relative frequency 24.5 24 25.5 25 26.5 26 27.5 27 28.5 28 29.5 29 30.5 30 31.5 31 32.5 32 33.5 33 34.5 34 35.5 35 36.5 36 Record your answers to the following questions 7. How many observations were there in this sample? 8. What is the midpoint of the most frequent class? (If tied, give lowest midpoint)
  • 16. 9. How many observations fell between 31.5 and 32.5 q/ha? {Example 2} EXCEL Worksheet. Calculate the relative frequencies in each class. Check that the correct information has been entered, that frequencies sum to the total number of observations and that the relative frequencies sum to 1.0. Record the relative frequencies in the preceding table. Answer the following question. 10. What is the relative frequency of yields in sample B that were greater than 31.5 and less than or equal to 32.5 q/ha ? {Example 4} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 9 Question C
  • 17. Compare the distributions of yields of wheat varieties in sample A (Saskatoon) with those from sample B (Tisdale). from both samples. Include appropriate titles and axis labels. Use different line types for each sample. Answer the following questions from the relative frequency polygon. 11. Which of the two samples, Saskatoon (1) or Tisdale (2) has the highest relative frequency in the class whose midpoint is 26 q/ha? (Answer 1 or 2; 0 if same) 12. Which of the two samples, Saskatoon (1) or Tisdale (2) has the greatest spread looking at the midpoints? (i.e. greatest difference between maximum and minimum midpoint values)? (Answer 1 or 2; 0 if same) {Example 7} recorded numerical answers to each of the twelve questions, you should now leave EXCEL and submit
  • 18. your answers for grading by the ISLeX program. {Example 5} - End of Assignment #1 - ASSIGNMENT 2 10 INTRODUCTORY STATISTICS LABORATORY Introductory Statistical Laboratory Assignment #2 Purpose The three main objectives of this assignment are to: a) use numerical values as descriptive statistics, b) introduce the concept of sampling from a population, and c) demonstrate the effects of sample size. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you are then required to enter the answers into the ISLeX program. Question A
  • 19. Data in Table A represents protein concentrations (g/kg) of boxcar lots of durum wheat delivered to Thunder Bay, Ontario. This data is supposed to be a population of data points. EXCEL worksheet, and name the column. When viewing the data for the first time, you should try to determine approximately the number of items and guess at the average value. Scan the data to try to determine what the smallest and largest values are. {Example 1} record the values of the following population characteristics (i.e. parameters). 1. How many data points are there in this data set? 2. What is the mean protein concentration (g/kg)? 3. What is the minimum protein concentration? 4. What is the maximum protein concentration? 5. What is the median protein concentration? 6. What is the value of the first quartile?
  • 20. 7. What is the value of the third quartile? 8. What is the standard deviation of the population of protein concentrations? {Example 8} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 11 Question B The data in Table B constitutes 10 random samples, each of size 7, from the population of protein concentrations. The data file contains seven rows of data with each row containing ten columns. ata for the ten samples into columns of the EXCEL worksheet. {Example 1} the mean, median, standard deviation, minimum, maximum, first quartile and third quartile of each of the ten samples.
  • 21. Record these descriptive statistics in the following table. Sample Size Mean Median Standard Deviation Minimum Maximum Q1 Q3 1 7 2 7 3 7 4 7 5 7 6 7 7 7 8 7 9 7 10 7 {Example 9} calculated in question A to answer the following questions.
  • 22. These questions are designed to get you thinking about how well sample statistics represent the characteristics of the population from which the sample was taken. 9. How many of the ten sample means are less than or equal to the population mean? ASSIGNMENT 2 12 INTRODUCTORY STATISTICS LABORATORY 10. How many of the ten sample medians are exactly equal to the population median? 11. How many of the ten sample minimums are less than or equal to the population minimum? 12. How many of the ten sample maximums are greater than or equal to the
  • 23. population maximum? 13. How many of the sample first quartiles are less than or equal to the population first quartile? 14. How many of the sample third quartiles are greater than or equal to the population third quartile? 15. Which sample has the largest standard deviation? 16. Which sample has the largest range (=Maximum - Minimum)? 17. What is the ratio of the largest sample standard deviation to the smallest sample standard deviation? 18. What is the ratio of the largest sample mean to the smallest sample mean? 19. Of the two ratios (Questions 17 and 18), which is the largest, the ratio
  • 24. of standard deviations (17) or the ratio of means (18)? {Answer 17 or 18} {Example 10} Question C The data in Table C constitutes 10 random samples, each of size 27, from the population of protein concentrations. EXCEL worksheet. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 13
  • 25. median, standard deviation, minimum, maximum, first quartile and third quartile of each of the ten samples. Record the descriptive statistics in the following table. Sample Size Mean Median Standard Deviation Minimum Maximum Q1 Q3 1 27 2 27 3 27 4 27 5 27 6 27 7 27 8 27 9 27 10 27 {Example 9}
  • 26. questions A and B to answer the following questions. The following questions are designed to get you thinking about how the size of the sample affects relationship between sample statistics and population parameters. 20. How many of the ten sample minimums were exactly equal to the population minimum? 21. How many of the ten sample maximums were exactly equal to the population maximum? 22. For samples of size 27, what is the ratio of the largest sample mean to the smallest sample mean? 23. For samples of 27, what is the ratio of the largest sample standard deviation to the smallest sample standard deviation?
  • 27. ASSIGNMENT 2 14 INTRODUCTORY STATISTICS LABORATORY For the following questions, answer 0 if the statement is false or 1 if it is true. 24. The ratio of the largest sample mean to the smallest sample mean was less in samples of 27 than in samples of 7. 25. The ratio of the largest to the smallest sample standard deviations was greater in the larger samples. {Example 10} - Please use ISLeX to record and grade your answers - - END OF ASSIGNMENT 2 - LABORATORY ASSIGNMENTS
  • 28. INTRODUCTORY STATISTICS LABORATORY 15 Blank page ASSIGNMENT 3 16 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #3 Purpose This assignment is and introduction to questions concerning discrete probability distributions. NOTE As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you will then be required to enter the answers into the ISLeX program.
  • 29. Question A A binomial experiment consists of repeated trials each with two possible outcomes. The outcome of any trial is independent of all other trials. The binomial distribution gives the probability that a number X of n independent trials will have one type of outcome. X can be any number from 0 up to the total number of trials. The data in Table A gives the probabilities of observing that X = 0, 1, .. 20 out of 20 flower seeds from a given lot will germinate. lumns of the EXCEL worksheet and attach appropriate names to those two columns. Then, record the probabilities in the following table. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 17 Number germinated (out of 20) Probability 0 1
  • 31. 20 Use the table to answer the following questions. 1. What is the probability that all 20 seeds in a random sample of 20 seeds will germinate? 2. What is the probability that fewer than 15 seeds in a random sample of 20 seeds will germinate? ASSIGNMENT 3 18 INTRODUCTORY STATISTICS LABORATORY 3. What is the probability that at least 17 seeds in a random sample of 20 will germinate? 4. What is the probability that the number of seeds in a random sample of 20 that
  • 32. will germinate is between 10 and 15? HINT: Do not include 10 and 15. 5. What is the probability that the number of seeds in a random sample of 20 that will germinate will be less than 10 or greater than 17? HINT : You will have to add the probabilities for 0, 1, .. 9 and 18, 19, 20. 6. What is the mean of this binomial distribution? HINT: The mean of a discrete variable can be calculated by summing the products of each value multiplied by its corresponding probability. 7. What is the variance of this binomial distribution? HINT : The variance of a probability distribution is the mean of the squares of values minus the square of the mean of values. {Example 11} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 19
  • 33. Question B This question is based on a Poisson discrete probability distribution. The distribution is important in biology and medicine, and can be dealt with in the same way as any other discrete distribution. Red blood cell deficiency may be determined by examining a specimen of blood under the microscope. The data in Table B gives a hypothetical distribution of numbers of red blood cells in a certain small fixed volume of blood from normal patients. Theoretically, there is no upper limit to the value of a POISSON distribution. In reality, you can force only so many red blood cells into a given volume. worksheet, name the columns, and view the table. Since the table is quite large, you should attempt to answer the following questions without actually recording the table. {Example 1} questions. 8. What is the probability that a blood sample from this distribution will have exactly 20 red blood cells?
  • 34. 9. What is the probability that a blood sample from a normal person will have between 19 and 26 red blood cells? HINT: See questions 3 and 4. 10. What is the probability that a blood sample from a normal person would have fewer than 10 red blood cells? 11. What is the probability that a blood sample from a normal person will have at least 15 red blood cells? HINT: Since there is no theoretical upper limit to the Poisson distribution, the correct way to answer this question is to calculate 1 – probability of fewer than 15 red blood cells. ASSIGNMENT 3 20 INTRODUCTORY STATISTICS LABORATORY
  • 35. 12. A person with a red blood cell count in the lower 2.5 percent of the distribution might be considered as deficient. What is the red blood cell count below which 2.5 percent of the distribution lies? HINT: You need to determine a value X so that if you sum all the probabilities for counts up to and including that value they will sum to at least 0.025. The sum of probabilities of all counts up to but excluding X should be less than 0.025. You can proceed in the following way. Look at the table to guess how many probabilities (P[X = 0] + P[X = 1] + . . ) should be added to give a sum of approximately 0.025. Calculate sums of probabilities for your guess of X. Continue your guessing of X until you get a sum ≥ 0.025 while the sum for X-1 < 0.025. 13. What is the mean red blood cell count in this distribution?
  • 36. 14. What is the variance of red blood cell count in this distribution? HINT: See question 7, and remember it is a Poisson distribution. 15. Is the following statement true (1) or false (0) for this distribution? In a Poisson distribution, the variance is equal to the mean (within rounding error). Record 1 if true, 0 if false. {Example 11} Please enter your answers into the ISLeX program - END OF ASSIGNMENT 3 –
  • 37. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 21 Blank page ASSIGNMENT 4 22 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #4 Purpose This lab is an introduction to questions concerning cumulative continuous probability distributions. NOTES As you proceed through this assignment, write your answers in the spaces provided. When you exit from EXCEL, you will then be required to enter the answers into the ISLeX
  • 38. program. With continuous distributions, P{X = x} = 0. In words, the probability that a continuous variable equals a particular value is considered to be zero. For this reason, all questions concerning continuous distributions must be phrased in terms of intervals. Furthermore, the probability that a continuous variable is less than or equal (≤) to a particular value is the same as the probability that the variable is less (<) than that particular value. The EXCEL NORM.DIST function gives the probability that a normal variable is less than (or equal to) a specified constant. The terminology concerning probability varies from one source to another. For this assignment, consider that probability = relative frequency = proportion. Also for this assignment, percentage = 100 * probability. Question A Suppose that height (cm) of male university students is normally distributed with the mean given in column 1 of Table A (LAB4A.DAT) and a standard deviation given in column 2 of Table A. heights from Table A and store
  • 39. them for further use. The data file contains one row with two columns. The first column contains the mean, the second contains the standard deviation. 1. What is the mean height in this population? 2. What is the standard deviation of height in this population? {Example 12} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 23 NORM.DIST function, to calculate answers for the following questions. 3. What proportion of male university students are expected to have a height between 170 and 180 cm? 4. What percentage of male university students would have a height less than 170 cm?
  • 40. 5. If a student is chosen at random from this population, what is the probability that he will be taller than 180 cm? {Example 13} Question B Suppose that the average length of telephone calls made by teenagers is a normally distributed variable with mean and standard deviation given in columns 1 and 2 of Table B (LAB4B.DAT). mean and standard deviation of the distribution of lengths of telephone calls from the first two columns of Table B and store them for further use. {Example 12} Use the values and the EXCEL NORM.DIST function to calculate answers for the following. 6. What is the mean length of telephone call? 7. What is the standard deviation of this distribution? 8. What is the probability that a random telephone call will last
  • 41. a length of time that is within one standard deviation of the mean (± 1 standard deviation)? 9. What is the proportion of telephone calls that last a length of time that is within two standard deviations of the mean (± 2 standard deviations)? 10. What is the relative frequency of lengths of teenage telephone calls that lie within three standard deviations of the mean (± 3 standard deviations)? 11. What is the probability that a telephone call will be longer than the mean by more than 1.645 standard deviations? {Example 13}
  • 42. ASSIGNMENT 4 24 INTRODUCTORY STATISTICS LABORATORY Question C In a study conducted by Booth et al (Int. J. Sports Psychol. 17:269-279 1986), student nurses at the University of Ottawa completed the Thurston- Richardson attitude questionnaire and voluntarily took the Canadian Home Fitness Test. They found that the frequency response of heart rates after a second exercise bout ranged from 101 to 190 beats per minute and seemed to follow a normal distribution. The mean heart rate was 145 with a standard deviation of 20. and standard deviation = 20) to calculate the answer to the following question. 12. What is the estimated proportion that had a frequency response of less than 130 after the second exercise session? {Example 13} Question D
  • 43. A standard normal distribution is one for which the mean is zero and the standard deviation is unity (1.0). This distribution is often referred to as the z-distribution. IST function to calculate answers to the following questions. 13. What is the probability that a standard normal variable will have a value less than 1.96? 14. What is the probability that a standard normal variable will have a value between -1 and +1? {Example 13} Please enter your answers into the ISLeX program - END OF ASSIGNMENT 4 - LABORATORY ASSIGNMENTS
  • 44. INTRODUCTORY STATISTICS LABORATORY 25 Blank page ASSIGNMENT 5 26 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #5 Purpose The main objectives of this assignment are to: a) use a goodness-of-fit test to demonstrate an important statistical theorem and b) calculate means and confidence intervals for a single sample when σ is known and when σ is not known. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program.
  • 45. Question A The central limit theorem states that means of samples of more than 30 observations from any distribution will have a distribution that a) is approximately normal, b) has a mean equal to the mean of the original distribution, and c) has a standard deviation equal to the standard deviation of the original distribution divided by the square root of the sample size. The Poisson distribution is discrete and skewed; it is decidedly non-normal! However, the central limit theorem states that the means of sufficiently large (n ≥ 30) samples from even a Poisson distribution will be normally distributed. The means of 100 samples, each of size 40, from a Poisson distribution are recorded in Table A. For this first question, you are required to use the 'goodness-of-fit' test to test the hypothesis that the means in this file are normally distributed with a mean of 10 and a standard deviation of 0.5. distribution into the EXCEL worksheet. {Example 1} the sample means.
  • 46. 1. What is the mean of the 100 sample means? 2. What is the standard deviation of the 100 sample means? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 27 {Example 9} means in each of the classes indicated in the following table. Note that interval endpoints are midpoint ± 0.5*width and the interval midpoint is average of two endpoints. Use 9.31, 9.61, 9.91, 10.21, 10.51 and 10.81 as the “bin” boundaries for the Excel HIST0GRAM procedure. Class interval Midpoint Expected frequency Observed Frequency < 9.31 - 8.3794 9.31 - 9.61 9.46 13.3902 9.61 - 9.91 9.76 21.0881
  • 47. 9.91 - 10.21 10.06 23.4181 10.21 - 10.51 10.36 18.3379 10.51 - 10.81 10.66 10.1248 > 10.81 - 5.2616 3. What was the observed frequency of sample means that fell between 9.91 and 10.21 ? {Example 2} EXCEL worksheet and the seven observed frequencies into another column. Make sure that expected and observed frequencies for the same class are entered in the same row. Check that both columns of data sum to 100 (within rounding error). If they do not, correct your error(s). {Example 3} -of-fit test should now be used to see if the observed frequencies in two or more classes of observed values agree sufficiently well with those expected on the basis of some hypothesis. In this example, the hypothesis is that the means of samples will be normal with mean 10 and standard deviation 0.5.
  • 48. The test requires that you calculate a chi-squared statistic by: a) calculating the differences between the observed and expected frequencies in each class, b) squaring the differences and dividing by the expected frequencies in each class, and c) summing the values from step b. ASSIGNMENT 5 28 INTRODUCTORY STATISTICS LABORATORY 4. What is the value of (O-E)2/E for the first class ? 5. What is the value of the chi-squared statistic (that is, the sum over all seven classes of (O-E)2/E) ? With seven classes, the chi-square statistic has 7-1 = 6 degrees of freedom and the critical value of a 5% significance level is 12.6. If your test statistic is less than 12.6, you should conclude that the observed data show a good fit to the hypothesis. 6. Does the data show a good fit to the normal distribution with mean 10 and
  • 49. standard deviation 0.5 (0 for no, 1 for yes) ? 7. Based on your limited experience, is the following statement true (use 1) or false (use 0)? Means of samples of size 40 from a Poisson (discrete) distribution are approximately normal (continuous). {Example 14} Question B The time (in minutes) required for six-year old children to assemble a certain toy is believed to be normally distributed with a known standard deviation of 3.0. The data in Table B gives the assembly times for a random sample of 25 children. compute and report the mean and standard deviation. 8. What was the mean assembly time for this sample of 25 six- year old children? 9. What was the estimated standard deviation?
  • 50. {Examples 1 and 9} deviation is known or given, one should use a standard normal distribution to calculate a confidence interval for the population mean. The procedure for calculating a large sample confidence interval for one mean involves three basic steps: a) determine a critical value from the appropriate distribution (for a 90% confidence interval with known standard deviation the critical value is z0.05 = 1.645). b) calculate the margin of error of the estimate E = zα/2σ/√n, and c) calculate lower limit = mean – margin of error, and upper limit = mean + margin of error LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 29 10. What was the margin of error of the estimate for a 90% confidence interval? 11. What was the lower limit of the 90% confidence interval for average assembly time?
  • 51. 12. What was the upper limit of the 90% confidence interval for average assembly time? 13. From this example, would you say that the following statement is true (use 1) or false (use 0) ? The lower confidence limit must always be less than the sample mean and the upper confidence limit must always be greater. 14. From this example, would you say that the following statement is true (use 1) or false (use 0)? When one has a choice of a known (or given) standard deviation and an estimated standard deviation, one should ignore the estimated standard deviation in calculating confidence intervals. {Example 15}
  • 52. Question C The level of monoamine oxidase (MOA) activity (nmol/hr/mg protein) was recorded for fourteen non-responsive depressed patients who had been treated with phenylzine. MOA activity is assumed to follow a normal distribution. The data are stored in a single column of Table C. You are asked to calculate a point estimate and an interval estimate of the mean MOA activity of this type of patient. Nothing is known about the variability of MOA activity. worksheet, and compute and report the mean and standard deviation. 15. What was the point estimate for the mean MOA activity for this sample of 14 depressed patients? 16. What was the standard deviation? {Examples 1 & 9}
  • 53. ASSIGNMENT 5 30 INTRODUCTORY STATISTICS LABORATORY When data has a normal distribution but is from a small (<30) sample or when data is from a large sample (≥30) and in either case σ is not known, one should use a t-distribution to calculate a confidence interval for the population mean. The procedure for calculating a confidence interval for one mean when σ is not known involves three basic steps: a) determine a critical value from the appropriate distribution (for a 90% confidence interval with estimated standard deviation the critical value is tα/2,n-1 = t0.05,13 = 1.771), b) calculate the margin of error of the estimate, E = tα/2,n- 1s/√n, and c) calculate lower limit = mean – margin of error and upper limit = mean + margin of error 17. What was the margin of error of estimate for a 90% confidence interval in this sample of 14 depressed patients? 18. What was the lower limit of the 90% confidence interval for average MOA activity?
  • 54. 19. What was the upper limit of the 90% confidence interval for average MOA activity ? 20. From these examples, would you say that the following statement is true (use 1) or false (use 0)? All confidence intervals are calculated by calculating a point estimate and then subtracting and adding a margin of error of the estimate. {Example 16} - END OF ASSIGNMENT 5 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 31
  • 55. Blank page ASSIGNMENT 6 32 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #6 Purpose The objectives of this assignment are to: a) calculate a confidence interval for a proportion and b) present confidence intervals and tests of hypothesis for matched pairs. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A Opinion polls are a popular method for assessing product
  • 56. preference, political preference, and more. As a simple example, consider that a poll was taken ten days prior to a civic election to try to predict what proportion of the electorate would vote for the incumbent mayor. The data in Table A represents the results of a moderate sample of persons who were asked if they would vote for the same mayor; a yes was recorded as 1, a no as 0. You are required to analyze the results of the poll and predict what proportion of voters will vote for the incumbent. he EXCEL worksheet, prepare a histogram to count the number of yes (1) and no (0) responses, and calculate the proportion who indicated that they would vote for the incumbent mayor. Note that, since yes and no are represented by 1 and 0, the proportion of yes can be determined by calculating the sum and dividing by the total sample size. 1. How large was the sample of voters represented in this poll? 2. What proportion of the sample voters indicated they would vote for the incumbent mayor? {Examples 1, 2 and 10} LABORATORY ASSIGNMENTS
  • 57. INTRODUCTORY STATISTICS LABORATORY 33 voters expected to vote for the incumbent mayor. The procedure for calculating a confidence interval for a proportion involves three basic steps. 3. Determine the α/2 critical value for the appropriate distribution (standard normal in this case). Use the NORM.INV function to calculate the critical value =NORM.INV(0.95,0,1). What is the critical value for a 90% confidence interval based on the standard normal distribution? 4. What is the standard error of the estimated proportion of polled voters who favour the incumbent? n qpsp ˆˆ ˆ = 5. What is the margin of error of the estimated proportion? 6. What is the lower 90% confidence limit on the proportion of voters who will
  • 58. vote for the incumbent? 7. What is the upper 90% confidence limit on the estimated proportion of voters who will vote for the incumbent? {Example 17} 23,217 of the 58,839 persons that voted actually voted for the incumbent. Calculate and report the actual proportion that voted for the incumbent. 8. What was the proportion that actually voted for the incumbent? 9. Based on the results given in questions 6, 7 and 8, which of the following statements (1, 2 or 3) is most correct? 1 - The poll of a sample of voters gave a good indication of the final vote. 2 - Many of the voters who would have voted for the incumbent at the time of the poll must have changed their minds. 3 - The persons sampled in the poll must have contained an
  • 59. unusually low proportion of those who favoured the incumbent. {Example 10} Question B The Monster Chemical Company believes that its herbicide (Avena-doom) is better than its competitor's herbicide (Avena-kill) for controlling wild oat in barley fields. To demonstrate ASSIGNMENT 6 34 INTRODUCTORY STATISTICS LABORATORY the advantage of their herbicide over that of their competitor, Monster grew side-by-side plots of barley treated with each of the two herbicides in a large sample of farmers' fields throughout western Canada. The company then wished to compare the yields of barley treated with the two types of herbicides. Yield of barley will vary from farm to farm regardless of which herbicide is used. A difference in climate, differences in agronomic practices, and differences in type of barley grown cause variation. For this reason, it is desirable to match the data from the two plots on each farm.
  • 60. The analysis is one of looking at differences between matched pairs. with Avena-doom (second column), and barley yield with Avena-kill (third column) from the three columns in Table B into columns of the EXCEL worksheet. Describe the data from the two treatments. 10. What was the average barley yield for plots treated with the Avena-doom herbicide? 11. What was the standard deviation of yields of barley plots treated with Avena-doom ? 12. What was the average yield of plots treated with Avena-kill? 13. What was the standard deviation with Avena-kill? {Examples 1 and 9} calculated and then analyze the differences.
  • 61. 14. What was the mean of the differences between yield of barley plots treated with Avena-doom and Avena-kill ? 15. What was the standard deviation of the differences (for each pair)? 16. Was the standard deviation of the differences smaller (0) or larger (1) than the standard deviation of the barley yields from plots treated with Avena-doom? {Examples 10 and 9} differences in yield between plots treated with Avena-doom and those treated with Avena-kill. NOTE: The standard deviation is estimated from the data so we use the t distribution. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 35
  • 62. 17. What is the critical value for the confidence interval? 18. What was the margin of error of the estimated mean difference? 19. What was the lower limit of the 95% confidence interval for the average difference in yields of barley treated with Avena-doom and barley treated with Avena-kill? 20. What was the upper limit? {Example 16} Question C Use the same data and results of Question B to investigate the hypothesis that the increase in barley yield by using Avena-doom instead of Avena- kill is no greater than 3.0 q/ha (300 kg/ha). The alternative to this hypothesis is that the increase is greater than 3 q/ha. To test this hypothesis, one must calculate a test statistic, t = Mean of differences - hypothesized mean ( =3.0) Standard error of the differences The null hypothesis should be rejected if the test statistic exceeds the critical value from the theoretical distribution. For a 5% significance level, α = 0.05, the critical value for a
  • 63. one-tailed test can be found by using the appropriate T.INV function (see Example 18) with n-1 degrees of freedom. For matched pairs, n is the number of pairs. In this instance, the null hypothesis should be rejected if the test statistic exceeds the critical value. 21. 21. What is the value of the test statistic for testing the hypothesis that the mean difference is 3.0 q/ha or less? 22. What is the critical value against which the test statistic in question 21 should be compared? 23. Should the hypothesis that the yield difference is 3 q/ha or less be rejected (1) or not (0)? {Example 18} - END OF ASSIGNMENT 6 - ASSIGNMENT 7 36 INTRODUCTORY STATISTICS LABORATORY
  • 64. Introductory Statistics Laboratory Assignment #7 Purpose This lengthy assignment serves to review calculations of confidence intervals and tests of hypothesis for: a) two means of large independent samples from populations with unknown and unequal variances, b) two means of small independent samples from populations with the same unknown variance, c) two proportions from large independent samples. NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A The role that cholesterol plays in the development of "hardening of the arteries" (atherosclerosis) and heart disease has been widely reported. In one experiment, a group of patients who were considered to be high-risk were split into two equal groups. The first group
  • 65. was put on a special diet with a high proportion of fish (salmon, tuna, mackerel and cod). Oil from these deep-sea fish is known to be very rich on Omega-3 fatty acids. The other (control) group was maintained on a standard diet (high-protein, low-fat, complex carbohydrates and polyunsaturated cooking oil). The change (decrease) in cholesterol was measured after a period of time. A greater change is desirable. The (simulated) data (mg decrease per decilitre of blood) for the Omega-3 group is stored in Table A1, and the data for the control group is stored in Table A2. You are required to calculate a 95% confidence interval for the average difference in cholesterol reduction and to test the hypothesis that there was no difference between the two diets in average reduction of cholesterol. m the 'Omega-3' group [Table A1] the data from the 'control' group [Table A2] into the EXCEL worksheet. Determine and report the number of observations in each group, the mean change (mg/dl) in each group and the standard deviation of the change in each group. 1. How many patients were in each diet group? 2. What was the mean (decrease) in cholesterol for the Omega-3 group of patients?
  • 66. 3. What was the standard deviation in that group? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 37 4. What was the mean (decrease) in cholesterol for the control group of patients? 5. What was the standard deviation in the control group? {Examples 1 and 9} variances that are unequal. We can use the normal distribution as an approximation to the t distribution when the sample sizes are large. The method for calculating a large-sample confidence interval for the difference between two means consists of three basic steps. a) Estimate the difference between the two sample means and the standard error of the difference between the two sample means. 6. What is the estimated difference of means? 7. Standard error of the difference between means
  • 67. 2 2 2 1 2 1 n s n s += What is the standard error of the difference of means? b) Calculate the margin of error of the estimated difference of means. For this large-sample 95% confidence interval we can approximate with a z value which is z0.025 = 1.96. Calculate the confidence interval as difference between means ± margin of error. 8. What is the margin of error of the estimated difference? 9. What is the lower limit for the 95% confidence interval of the difference in cholesterol reduction between Omega-3 and control diets?
  • 68. 10. What is the upper limit? {Example 19} difference between the two diets proceeds as follows. Since we expect that the Omega-3 diet should give a greater decrease in cholesterol than the control, we will use a one-tailed alternative hypothesis. Use a 5% significance level to test the null hypothesis that there is no difference between the diets against an alternative that the difference between Omega-3 and control groups is greater than zero. The test of hypothesis has two basic steps: ASSIGNMENT 7 38 INTRODUCTORY STATISTICS LABORATORY a) Compute the test statistic (z) as the difference in means divided by standard error of the difference. b) The null hypothesis should be rejected if the test statistic exceeds the critical value for a one-tailed alternative (approximately 1.645 for 5% significance in a large-sample, one-tailed test).
  • 69. 11. What is the value of the test statistic? 12. Should the null hypothesis be rejected and the conclusion be that Omega-3 diet did indeed cause a greater reduction in cholesterol than the control diet? Yes =1, No = 0 {Example 19} Question B In some law schools, the score on a test known as LSAT is an important criterion for acceptance. Two law schools decided to compare the LSAT scores of students registered in their respective schools. LSAT scores for students in Law school 1 are stored in Table B1 and those for students from Law school 2 in Table B2. Assume that the variances of LSAT scores are equal in the two schools. You are asked to calculate a 90% confidence interval for the difference in average LSAT scores and to test the hypothesis that students from the two schools do not differ in their average LSAT scores. Use a 5% significance level. from Law school 2 into the EXCEL worksheet. Compute and report the number, means and
  • 70. standard deviations of scores from each school. 13. How many LSAT scores from school 1? 14. What was the mean LSAT score from school 1? 15. What was the standard deviation of scores from school 1? 16. How many LSAT scores from school 2? 17. What was the mean LSAT score from school 2? 18. What was the standard deviation of scores from school 2? {Examples 1 and 9} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 39 eps to calculate a 90% confidence interval for the difference in mean LSAT scores when variances are unknown but assumed to be equal. a) calculate the difference between the two means (school 1 - school 2) b) calculate the pooled variance for the two samples: c) calculate the standard error of the difference:
  • 71. d) Calculate the critical value and margin of error for α = 0.10. Use the T.INV function to get the critical value. Multiply the critical value by the standard error of the difference to get the margin of error. Use degrees of freedom = n1 + n2 – 2. e) Calculate the lower and upper 90% confidence limits 19. What is the estimated pooled variance for this data? 20. What is the standard error of the difference? 21. What is the margin of error of the difference? 22. What is the lower limit of the difference between the two schools in LSAT scores? {Example 20} pools = ( 1n -1) 21s + ( 2n -1)
  • 72. 2 2s ( 1n -1) + ( 2n -1) 1n = size, sample 1 2n = size, sample 2 1s = st.dev, sample 1 2s = st.dev, sample 2 sx1−x2 = pools ( 1 1n + 1 2n ) ASSIGNMENT 7 40 INTRODUCTORY STATISTICS LABORATORY hypothesis that the means of the two groups of LSAT scores are equal when the samples are
  • 73. independent and the population variances are unknown but equal. The test statistic is the difference in means minus zero divided by the standard error of the difference. The null hypothesis should be rejected if the test statistic is less than -tα/2,df or greater than tα/2,df where df = n1 + n2 - 2 and α=0.05 is the chosen significance level. Use the T.INV function to calculate the critical values for this two-tailed test. 23. What is the value of the test statistic for testing the hypothesis that the mean LSAT scores are the same for the two law schools? 24. Using the 5% significance level, should the null hypothesis be rejected (1) or not (0)? {Example 20} Question C The legislature of a southern state in the U.S. passed a rule, commonly called "no-pass, no-play", which prohibits a student who fails in any subject from participating in any extracurricular activity for six weeks. Data were collected for students involved in football, volleyball, cross country, and band for the first six-week
  • 74. grading period. Records were kept from last year and this year. The numbers of students is stored in column 1 and the proportions sidelined because of the rule are stored in column 2 of Table C, the first row being for last year and the second for this year. values. 25. How many students were there in last year's sample? 26. What proportion of the last year's students were sidelined because of one or more failures? 27. How large was this year's sample? 28. What proportion failed and were sidelined this year? {Example 1}
  • 75. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 41 change (last year minus this year) in proportion of students sidelined. a) Calculate the difference in proportions. b) Calculate the standard error of the difference. n )p-(1p + n )p-(1p = s 2 22 1 11 pp ˆˆˆˆ 2ˆ1ˆ − c) Calculate the margin of error of estimate. For a 90% confidence interval with large samples, use z0.05 = 1.645.
  • 76. d) Calculate the lower and upper limits. 29. What is the upper 90% confidence limit on the change in proportion of students sidelined because of failure? {Example 21} an alternative that the proportion sidelined has decreased (that is, the difference in proportions is greater than zero). Use a 5% significance level. NOTE: Under the null hypothesis, the proportions are equal and we should therefore calculate an average proportion for the two groups. This will result in a new estimate of the standard error of the difference between sample proportions. average (pooled) proportion = 30. What was the average (pooled) proportion sidelined? 31. Now use the pooled proportion to calculate the standard error of the difference between the two proportions. ) n
  • 77. + n )(p-(1p = s 2 pp 11 1 2ˆ1ˆ − What is the value of the test statistic for testing the hypothesis that the proportion did not change (remember to divide by the standard error of the difference between the two proportions which was calculated using the pooled proportion)? n + n pn +pn = p 21 2211 ˆˆ
  • 78. ASSIGNMENT 7 42 INTRODUCTORY STATISTICS LABORATORY Use a one-tailed test with a 5% significance level to answer the following question. Remember that you will reject the null hypothesis if the test statistic exceeds the critical value (1.645 in this case). 32. Was the superintendent of schools justified in saying, "We are very pleased with the improvement. It shows coaches and students are taking the rule seriously"? Answer 1 for yes or 0 for no. {Example 21} - END OF ASSIGNMENT 7 – ASSIGNMENT 8 44 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #8
  • 79. Purpose In this assignment calculations will be completed for analyses of variance for : a) a one-way design, b) a two-way design with more than one observation per cell, and c) a two-way design with one observation per cell (randomized complete block design) NOTE As you proceed with this assignment, write your answers in the spaces provided. When you have completed the assignment and exit from EXCEL, you are required to enter your answers into the ISLeX program. Question A Gasoline mileage (mpg) was measured on several cars of each of four different makes (coded 1, 2, 3 and 4). The make of each car is stored in the first column, and the mileage for each car is stored in the second column, of Table A. You need to conduct an analysis of variance to see if there are differences among the four makes in gasoline mileage. You should also estimate the mileage of each of the four makes of cars. worksheet. Name the columns and view the data.
  • 80. {Example 1} -way analysis of variance on this data. Since each data point can be classified only according to the make of car, a one-way analysis of variance is required. It is important that students be able to interpret analysis of variance tables such as those produced by EXCEL. For this analysis, you will need to copy data for each make into different adjacent columns. Fill in the following one-way analysis of variance table and answer the first five questions. Source of variation Degrees of freedom Sum of squares Mean square F P Make of car 3
  • 81. Error 1. What is the value of the F-statistic for testing the null hypothesis that there are no differences in gasoline mileage among the four makes of automobile? LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 45 2. What are the degrees of freedom associated with the numerator of this test statistic? 3. What are the degrees of freedom associated with the denominator of the F-value for MAKE of car? 4. What is the estimate of the pooled variance within makes of cars (i.e. the Error mean square)?
  • 82. 5. What are the degrees of freedom for this variance in #4? {Example 22} NOTE: For the following questions (6 - 13), use the error mean square and the error degrees of freedom to calculate confidence intervals and to test hypotheses about pairs of means. car and record them in the following table. Make of car Number tested Average mileage 1 2 3 4 6. How many cars of make 2 were evaluated in this experiment? 7. What was the average gasoline mileage for make 2? 8. How many cars of make 3 were evaluated in this experiment? 9. What was the average gasoline mileage for make 3?
  • 83. make 2. Use the method for single means when σ is not known, but use the Error Mean Square as the estimate of the variance. The degrees of freedom will be the Error DF, not n-1! Reminders: Confidence Interval = mean ± margin of error Margin of error = critical value * standard error Use critical value for T at α/2 = 0.025 and df = error df (t table or EXCEL T.INV function) Use standard error = √(error mean square/number of observations of that make of car) 10. What was the margin of error for the confidence interval for gasoline mileage of make 2? ASSIGNMENT 8 46 INTRODUCTORY STATISTICS LABORATORY 11. What was the lower 95% confidence limit for make 2 mileage? 12. What was the upper 95% confidence limit for make 2 mileage? {Example 24}
  • 84. of makes 2 and 3 do not differ. Use the method for single means when σ is not known with the Error MS serving as the pooled variance. Reminders: Test statistic t = difference of means / standard error of difference of means. The standard error of the difference equals square root of the sum of variances of the two means. The variance of each mean is estimated by the error mean square/number of observations in that mean. 13. What is the value of the t test statistic for testing the hypothesis that makes 2 and 3 do not differ in mileage? {Example 24} Question B The data in Table B represents the times (in seconds) for men of three different ages (40, 50 and 60) in each of three different fitness classes (1, 2 and 3) to run a 2 km course. For each runner, age is recorded in the first column, fitness category is recorded in the second column, and running time is recorded in the third.
  • 85. Two men in each of the nine categories ran the course. You should be interested in determining whether age and/or fitness affect running time. Each data point can be classified according to age of the runner or according to fitness of the runner. The data therefore requires a two-way analysis of variance. It is possible that differences among ages of runner will depend upon the fitness categories of those two runners. The model for the analysis should include an interaction term. the columns, and view the data. You will have to copy the data into three different columns each with six observations in order to perform the following analysis (see Example 25). {Example 1, 25} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 47
  • 86. out a two-way analysis of variance and answer the following questions. Source of variation Degrees of freedom Sum of squares Mean square F P Age of runner 2 Fitness of runner 2 Interaction 4 Error 9 14. What is the value of the F test statistic for testing the hypothesis that age, on average, has no effect on running time?
  • 87. 15. What are the numerator degrees of freedom for that F statistic reported in question 14? 16. What are the denominator degrees of freedom for that F statistic reported in question 14? 17. What is the value of the F test statistic for testing the hypothesis that fitness, on average, has no effect on running time? 18. What is the value of the F test statistic for testing the hypothesis that the effect of age (if any) on running time does not depend of the runner's fitness? NOTE In analysis of variance, the null hypothesis should be rejected whenever the calculated F-statistic is greater than the critical value for a chosen significance level and appropriate numerator and
  • 88. denominator degrees of freedom. Equivalently, the null hypothesis should be rejected whenever the computed p-value is less than the chosen significance level. Use α = 0.01 (significance level =1 %) and answer the following two questions. 19. Should the null hypothesis that age has no effect on running time be rejected (1) or not rejected (0)? 20. Should the null hypothesis that the effect of age is independent of the effect of fitness be rejected (1) or not rejected (0)? {Example 25} ASSIGNMENT 8 48 INTRODUCTORY STATISTICS LABORATORY
  • 89. following three questions. Age Fitness 1 Fitness 2 Fitness 3 Average 40 50 60 Average 21. What was the average running time for all 60-year olds? 22. What was the average running time for all men in fitness category 3? 23. What was the mean running time of the two 60-year, category 3 runners? {Example 25} Question C In many agricultural and biological experiments, one may use a two-way model with only one observation per cell. When one of the factors is related to the grouping of experimental units into more uniform groups, the design may be called a randomized complete block design (RCBD). The analysis is similar to a two-way analysis of variance (question B) except that the model does not include an interaction term. The specific leaf areas (area per unit mass) of three types of citrus each treated with one of
  • 90. three levels of shading are stored in Table C. The first column contains the code for the shading treatment, the second column contains the code for the citrus species, and the third column contains the specific leaf area. Assume that there is no interaction between citrus species and shading. Carry out a two-way analysis of this data. The shading treatment and citrus species are coded as follows: Treatment Code Species Code Full sun 1 Shamouti orange 1 Half shade 2 Marsh grapefruit 2 Full shade 3 Clementine mandarin 3 leaf area into the EXCEL worksheet, label the columns and look at the data. {Example 1} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 49 -way (without interaction) analysis of this data and answer the following questions.
  • 91. Use a 5% significance level. Source of variation Degrees of freedom Sum of squares Mean square F P Shading treatment 2 Citrus species 2 Error 4 24. Should the hypothesis that shading treatment has no effect on specific leaf area be rejected (1) or not (0)? 25. Should the hypothesis that citrus species do not differ in specific leaf area be
  • 92. rejected (1) or not (0)? 26. What is the estimate of the average (pooled) variance in this experiment (i.e. Error mean square)? 27. What are the error degrees of freedom for the pooled variance? {Example 26} Recall that the confidence interval for a difference between two means is based on a calculation of the margin of error of the estimated difference. With a common variance (Error MS) and the same number of observations in all shading treatments, the margin of error of an estimated difference will be the same whether we calculate it for treatments 1 and 2, 1 and 3, or 2 and 3. This margin of error of the difference between two means is sometimes referred as the least significant difference (LSD). experiment. LSD = critical t value × standard error of difference. Use the critical t value with 4 degrees of freedom is t 0.025,4 = 2.776. n is the number of times of times each treatment was tested (in this case n = 3 for the 3 species).
  • 93. n quareErrorMeanS t=)LSD( edf/2, *2 αα 28. What is the least significant difference (α = 0.05) for comparing shading treatments in this experiment? {Example 24} ASSIGNMENT 8 50 INTRODUCTORY STATISTICS LABORATORY Any two shading treatments are judged to be significantly different if their absolute (ignore the + or - sign) difference exceeds the least significant difference.
  • 94. differences. Compare the appropriate differences to the LSD to answer the following questions. Shading Treatment Mean Specific Leaf Area Full Sun Half Shade Full Shade 29. Should the hypothesis that the specific leaf area under full sun is not different from the specific leaf area in half shade be rejected (1) or not rejected (0)? 30. Should the hypothesis that the specific leaf areas of half shade and full shade are not different be rejected (1) or not rejected (0)? {Example 24} - END OF ASSIGNMENT 8 - LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 51
  • 95. Blank page ASSIGNMENT 9 52 INTRODUCTORY STATISTICS LABORATORY Introductory Statistics Laboratory Assignment #9 Purpose This final assignment presents some of the important points to consider in correlation analysis and simple linear regression analysis. Question A The data in Table A gives the (simulated) advertising expenditures of 25 large companies for last year and this year. You are asked to investigate the question of whether or not expenditures in one year are related to expenditures in another. The data file contains the company number in the first column, last year's expenditures ($ millions) in the second column, and this year's expenditures ($ millions) in the third column.
  • 96. t, name the columns, and view the data. 1. Which company had the greatest advertising expenditures last year? 2. Which company had the greatest advertising expenditures this year? {Example 1} ditures in the two years and answer the following question. 3. Which of the following three statements (1, 2 or 3) most correctly describes the relationship between last year's and this year's expenditures? 1 - There is little relationship between what a company spends on advertising in one year and what that company spends in another. 2 - Companies that spent most on advertising last year tended to be among those spending the greatest amount this year. 3 - Companies that spend a lot on advertising in one year tend to reduce their advertising expenditures in the next. {Example 27}
  • 97. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 53 riables can be measured by the covariance. The covariance is a measure of how much two random variables vary together. The larger the magnitude of the product, the stronger the strength of the relationship. The value of the covariance is interpreted as follows: • Positive covariance - indicates that higher than average values of one variable tend to be paired with higher than average values of the other variable. • Negative covariance - indicates that higher than average values of one variable tend to be paired with lower than average values of the other variable. • Zero covariance - if the two random variables are independent, the covariance will be zero. However, a covariance of zero does not necessarily mean that the variables are independent. A nonlinear relationship can exist that still would result in a covariance value of zero.
  • 98. Calculate the standard deviation for last year's expenditures, the standard deviation for this year's expenditures and the covariance between the two. 4. What is the standard deviation of last year's advertising expenditures ($ millions) of these 25 companies? 5. What is the standard deviation of this year's advertising expenditures ($ millions) of these 25 companies? 6. What is the covariance between the last year's and this year's advertising expenditures ($ millions2) of these 25 companies? Because the covariance depends on the units of the data, it is difficult to compare covariances among data sets having different scales. A value that might represent a strong linear relationship for one data set might represent a very weak one in another. The correlation coefficient (r) addresses this issue by normalizing the covariance (i.e. divide the covariance sxy by the product of the two standard deviations (sx * sy)), creating a dimensionless
  • 99. quantity that allows the comparison of different data sets. 7. What is the correlation (r) between last year's and this year's expenditures? {Example 28} ASSIGNMENT 9 54 INTRODUCTORY STATISTICS LABORATORY expenditures from one year to another? Test the null hypothesis that there is no relationship between last year's and this year's expenditures against an alternative that there is a positive relationship (r > 0). Use a 10% significance level. Because this is a one-tailed test with 25 pairs of observations (degrees of freedom = 23), we find that the critical value against which to compare the estimated correlation is t = 1.319. Using your r value and n = 25, calculate the test statistic tcalc and compare. If the test statistic is greater than the critical value of 1.319, the null hypothesis will be rejected.
  • 100. 21 2 r nr=tcalc − − 8. Should the hypothesis that there is no relationship between last year's and this year's advertising expenditures be rejected (1) or not (0)? {Example 28} Question B In a study of the role of young drivers in automobile accidents, data on percentage of licensed drivers under the age of 21 and the number of fatal accidents per 1000 licenses were determined for 32 cities. The data are stored in Table B. The first column contains a number as the city code, the second column contains the percentage of drivers who are under 21, and the third column contains the number of fatal accidents per 1000 drivers. The primary interest is whether or not the number of fatal accidents is dependent upon the proportion of licensed drivers that are under 21. py the data into the EXCEL worksheet, name the
  • 101. columns, and view the data. 9. Which city (number) had the highest number of fatal accidents per 1000 licensed drivers? {Example 1} percentage of drivers under 21. Based on the plot, try to anticipate whether or not the following analysis will show that there is a significant increase or decrease in number of fatalities with increases in percentage of drivers under 21. {Example 27} LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 55 can be used to predict levels of a
  • 102. dependent variable for specified levels of an independent variable. Use the EXCEL REGRESSION command to calculate the intercept and slope of the least- squares line, as well as the analysis of variance associated with that line. Fill in the following table and use the results to answer the next few questions. Carefully choose your independent and dependent variables and input them correctly using EXCEL’s regression command. In this example, the percentage of drivers under the age of 21 affects the number of Fatals/1000 licenses. The regression equation (least-squares line) is Fatals/1000 licenses = + % under 21 (intercept) (slope) Analysis of variance Source DF SS MS F P Regression 1 ________ _______ ________ _______ Residual (Error) 30 ________ _______ 10. What is the estimated increase in number of fatal accidents per 1000 licenses due to a one percent increase in the percentage of drivers under 21 (i.e. the slope)?
  • 103. 11. What is the standard deviation of the estimated slope? 12. What is the estimated number of fatal accidents per 1000 licenses if there were no drivers under the age of 21 (i.e. the y intercept)? 13. What percentage of the variation in accident fatalities can be explained by the linear relationship with drivers under 21 (i.e. 100 × the unadjusted coefficient of determination)? 14. Should the hypothesis that the slope does not differ from zero (no effect of young drivers on fatals) be rejected (1) or not (0) based on a test at the 1% significance level (i.e. is the p-value from the ANOVA less than 0.01)? 15. What are the degrees of freedom for the standard error of estimate (and the standard deviation of the slope); i.e. what are the error degrees of freedom?
  • 104. {Example 29} ASSIGNMENT 9 56 INTRODUCTORY STATISTICS LABORATORY to calculate a confidence interval for the slope of the least-squares line and to test hypotheses other than H0 : ß1 = 0. In both cases, one needs to have an estimate of the slope and of its standard deviation (sometimes called standard error). Furthermore, one needs to recognize that the degrees of freedom for the standard deviation is the same as the error degrees of freedom (n - 2). Note that the EXCEL gives the standard error of estimate directly, but correctly calls it the standard deviation of the slope. Therefore, you must not divide by the square root of sample size as in example 16. Use the above information to calculate a 90% confidence interval for the slope of the true regression line. For 30 degrees of freedom and α = 0.1, the critical t-value is 1.697. 16. What is the margin of error for calculating a 90% confidence interval for the slope of the regression line (i.e. 1.697 × the standard deviation
  • 105. of the slope)? 17. What is the lower 90% confidence limit for the slope? (i.e. slope – margin of error) 18. What is the upper 90% confidence limit for the slope? (i.e. slope + margin of error) null hypothesis H0 : ß1 = 0.05 against a one-sided alternative H1 : ß1 > 0.05. Use a 1 percent significance level (for which the critical value is 2.423). Reminder : t = estimated value - hypothesized value = slope - 0.05 standard error (deviation) of estimate st dev of slope 19. What is the value of the test statistic for testing this hypothesis? 20. Should the hypothesis that the increase in fatals per one percent increase in drivers under 21 is not greater than 0.05 be rejected (1) or not (0)? - END OF ASSIGNMENT #9 - THE LAST ASSIGNMENT -
  • 106. LABORATORY ASSIGNMENTS INTRODUCTORY STATISTICS LABORATORY 57 Introductory Statistics Laboratory for Excel PC Instructions for Excel 2013 EXCEL EXAMPLES INTRODUCTION 58 INTRODUCTORY STATISTICS LABORATORY Excel Examples
  • 107. INTRODUCTION Note: Specific Excel 2013 instructions are shown in [Excel 2013: ] throughout the excel examples. These EXCEL examples provide a basis for learning to use MICROSOFT EXCEL to perform various tasks required in the ISLeX laboratory assignments. The examples may not refer exactly to the task to be performed. For instance, in some cases, the example may use different columns than required for a particular task. Your laboratory sessions will be much less frustrating if you study the assignment and associated examples before sitting down at a computer. The examples will not match exactly what you need to do to complete your assignments. They should provide an adequate outline, but you will have to modify the example to complete your assigned task. For instance, you will need to use different file names in your lab assignments than those used in examples. You will also have to refer to different EXCEL worksheet columns. The EXCEL workbook contains one or more worksheets each identified by a tab on the lower left part of the window. EXCEL will assign default names, such as Sheet 1, to individual
  • 108. worksheets or the user can change the name by clicking the right mouse button on the tab and choosing the 'rename' option. Each worksheet is composed of cells arranged in rows and columns. Rows are identified by numbers 1, 2, 3 and so on, while columns are identified by letters A, B, C and so on. After column Z, naming starts with AA and proceeds to ZZ. Each cell may contain a number, some text, or a formula. In this manual, only absolute referencing is used to refer to cells or blocks of cells. To refer to the cell located in the second row of column C, use C2. To indicate all cells in the block that includes rows 2 to 10 of columns B through D, use the cell designations for the cell in the upper left corner (i.e. B2) and for the cell in the lower right corner (i.e. D10) separated by a colon, thus B2:D10. Sometimes, it will be useful to enter a formula into a cell and then copy that formula to other cells. If the formula in cell B2 refers to cell A1, it will refer to cell D5 when the formula is copied to cell E6. If you wish it to continue to refer to cell A1, use $A$1 instead of A1 in the formula. INTRODUCTION EXCEL EXAMPLES
  • 109. INTRODUCTORY STATISTICS LABORATORY 59 EXCEL commands and subcommands can be selected by clicking the left mouse button on the required command or subcommand. When you first start using Excel, you should become familiar with three important areas in the Excel window. Mention has already been made of the cells arranged in rows and columns in the worksheet. In fact there may be several worksheets in a single workbook. If you place the cursor in a particular cell, the “Name box” located at the upper left hand side of the worksheet will indicate the identity of the active cell, e.g. B5. If you type a number, name or formula into that cell, it will also appear in the “Formula bar” at the top of the worksheet. If you then press the enter key, the cursor will move to the next cell and the formula bar will become blank (if the next cell is empty). If you had entered an actual formula, it will be evaluated and the evaluation will be present in the cell that you entered the formula. If you made an error and need to edit the formula, highlight the cell and then move the cursor to the formula bar to edit the formula. In these laboratory assignments, you are sometimes required to combine information from two parts of an assignment. Typically, each part will result in a separate workbook in
  • 110. Excel. You can copy data from one workbook to another by using the following procedure. Highlight the data you wish to copy and press Ctrl-C to copy the data. Use the Window command of Excel to choose the workbook you wish to copy to. Place the cursor where you wish to past the data and press Ctrl- V Note: Rather than using Ctrl-C and Ctrl-V to copy and paste, you may use Edit->Copy and Edit->Paste. Most data analysis tools of Excel default to printing their results on a new worksheet. However, most also have an option to specify an output range on the same worksheet. If you choose the Output range option, click in the adjacent box and then highlight the area of the worksheet where you wish to store the results. EXCEL EXAMPLES EXAMPLE 1 60 INTRODUCTORY STATISTICS LABORATORY Example 1: Copying data from the assignment webpage into the EXCEL worksheet. Your data will be presented to you in a web page. To copy the data to Excel: • First highlight the data and either press the key combination ctrl-c, or select Copy from
  • 111. the Edit menu to copy the data (to the clipboard). • Then, switch to the Excel window and either use the key combination ctrl-v, or select Paste from the Edit menu to paste the data into Excel. At this stage, you should now have the data on an Excel worksheet. (If you wish, you can name this worksheet LAB0A.DAT by right clicking on its tab at the bottom and choosing the rename option.) This same procedure applies to all assignments. Follow the above procedure even with multi-column tables. If you wish to add a label in cell 1 of column A, move the cursor to that cell and then choose Insert->Cells and click OK (or press enter) on the Insert dialog box to move all cells down. [Excel 2013: Home Tab – Insert] This will allow you to type a label in cell A1. The following procedure will allow you to calculate some summary statistics for data in a column. It is good practice to look at summary statistics before proceeding with further analysis. This will alert you to the number of data points, their average value, and a few other informative characteristics about the data. Data Analysis… to pop-up Data Analysis window [Excel 2013: Data Tab – Data Analysis over on far right side] (SEE NOTE BELOW if Data
  • 112. Analysis is missing.) double click on Descriptive statistics With cursor flashing in Input Range: box, click on column letter for column with data If you have entered a name in the first column, click Labels in first row. Click in box preceding Summary statistics, and click on OK or press the enter key. EXCEL will create a new worksheet with the summary statistics. You should note such key characteristics as count, minimum, mean and maximum. At more advanced stages, you may choose to think about kurtosis, skewness and standard deviation or standard error. If you wish, you can delete this temporary worksheet by right- clicking on its tab and choosing the delete option. The same basic procedures will be used in later assignments to enter data from a file that contains several columns. EXAMPLE 1 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 61
  • 113. NOTE: The Analysis ToolPak is a Microsoft Excel add-in program that is available when you install Microsoft Office or Excel. To use it in Excel, however, you need to load it first. 1. Click the File tab, and then click Options. 2. Click Add-Ins, and then in the Manage box, select Excel Add-ins. 3. Click Go. 4. In the Add-Ins available box, select the Analysis ToolPak check box, and then click OK. a. If Analysis ToolPak is not listed in the Add-Ins available box, click Browse to locate it. b. If you get prompted that the Analysis ToolPak is not currently installed on your computer, click Yes to install it. 5. After you load the Analysis ToolPak, the Data Analysis command is available in the Analysis group on the Data tab. EXCEL EXAMPLES EXAMPLE 4
  • 114. 62 INTRODUCTORY STATISTICS LABORATORY Example 2: Preparing a histogram of data A histogram is a graphical summary of numerical data. In this example, data stored in EXCEL worksheet column A is summarized in a histogram. Before calculating frequencies in different groups, you must define the classes. In EXCEL, the classes are called "bins". For this example, suppose that the data to be summarized varies from 21 to 28 and you wish to group the observations into "bins" each with one unit for a class width. The first bin will include all data points with values up to and including 22, the second bin will include values greater than 22 up to and including 23 and so on. You only need to indicate the upper boundary for each bin. For this example, use 22, 23, 24, 25, 26, 27, and 28. These values need to be entered into a new column, say column B. You can type the numbers into the first seven rows of column B. To actually draw the histogram, you must first calculate frequencies of data in each bin. Choose Data analysis [Excel 2013: Data Tab – Data Analysis] and select Histogram In the histogram dialog box, move cursor to Input range and click on top of column A, move cursor to Bin range and click on top of column B, if you have a labels in A1 and B1, check the Labels option, and click on OK or press the enter key. EXCEL is very slow at this calculation, so be patient! In a few
  • 115. seconds, you should get a new sheet in the workbook that contains the upper ends of the bin and the frequencies) of observations in each bin. In this example, the results look like this Bin Frequency 22 9 23 6 24 6 25 5 26 7 27 2 28 1 More 0 At this point, you should have a numerical representation of a histogram. Most histograms are presented in graphical form. To develop a bar graph to show the histogram, proceed as follows. Note that Excel creates a bar graph not a true histogram as there are spaces between the bars. A true histogram has no spaces between the bars. Highlight the data, including titles, using the cursor. Insert a chart. [Excel 2013: Insert Tab – in Charts choose Insert Column Chart – select 2D (first choice of the options)] Excel will automatically produce a chart.
  • 116. EXAMPLE 2 AND 3 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 63 A histogram gives the frequency (number of observations) in each of various classes. In EXCEL, the classes are defined by giving the upper boundaries of each class (bin). The + sign allows you to format your chart’s elements. You can click on the boxes to include whatever elements you feel are appropriate for your chart. If you want to edit the Axis Title, you can click into that box and type a new axis title. The paint brush allows you to choose the style and color of your chart. This icon allows you to select your data source and make changes instead of having to highlight your excel cells that hold the data and start the chart
  • 117. all over again. EXCEL EXAMPLES EXAMPLE 4 64 INTRODUCTORY STATISTICS LABORATORY How to make a true histogram: To get rid of the gaps between the bars and make a true histogram, right click on any bar and Excel comes up with a window with Format Data Series. Choose Format Data Series (see above arrow). On this window you will need to choose the three column symbol (see above arrow) and then Excel opens Series Options and at the bottom is Gap Width. Change the gap width to zero and you will have a true histogram.
  • 118. EXAMPLE 2 AND 3 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 65 You can change the outline of your bars to a different color to have them appear separated by clicking the Outline (see arrow below) and changing the color to black or white. The resulting chart looks like this (remember to make changes to your titles according to best graphing practices, not shown in this chart): EXCEL EXAMPLES EXAMPLE 4 66 INTRODUCTORY STATISTICS LABORATORY
  • 119. Example 3: Entering data from the keyboard into the EXCEL worksheet Occasionally, you will be required to enter data or intermediate results directly into the EXCEL worksheet. You merely type the data into the cells where you wish to store the information. EXAMPLE 5 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 67 Example 4: Calculating relative frequencies To calculate relative frequencies in each of several classes, you must divide each frequency of a class by the sum of all the frequencies. Consider data summarized in three classes. Class Frequency 1 5 2 10 3 5 Total 20 The relative frequency for Class 1 is 5/20 = 0.25, for Class 2 is 10/20 = 0.50, and for Class 3 is 5/20 = 0.25. Note that the relative frequencies must always sum to 1.0 (within rounding error). Thus, 0.25 + 0.50 + 0.25 = 1.0. If the frequencies are stored in EXCEL Worksheet column C,
  • 120. you can calculate relative frequencies and store them in another column in the following way. Suppose 5 is in cell C1, 10 in cell C2 and 5 in cell C3. Move the cursor to cell D1, type ‘= C1/SUM($C$1:$C$3)’ in the formula bar, and press enter. Don’t forget the = at the beginning of your equation otherwise it will be entered only as text and will calculate for you. You should see the value 0.25 in cell D1. To calculate the remaining relative frequencies, just copy the formula in cell D1 to cells D2 and D3. Note that, as the formula is copied, C1 will change to C2 and then to C3, but $C$1:$C$3 will remain constant. An alternative would be to first calculate the sum (20) and store in a cell that could then be used to calculate all relative frequencies. For example, enter the formula ‘=SUM(C1:C3)’ in cell C4. Now, use the formula ‘= C1/$C$4’ in cell D1. Again, copy cell D1 to cells D2 and D3. You should also confirm that the relative frequencies sum to 1.0. Use the formula ‘= SUM(D1:D3)’ in cell D4. You can also use the Σ in the tool bar and Excel will help you calculate a sum for that column. [Excel 2013:Home Tab – Σ ] EXCEL EXAMPLES EXAMPLE 5 68 INTRODUCTORY STATISTICS LABORATORY
  • 121. Example 5: Leaving EXCEL and grading your assignment. When you have completed an assignment and have recorded numerical answers to each of the questions in the INTRODUCTORY STATISTICSD LABORAOTRY, you should try your answers in ISLeX. In submitting your answers to the Introductory Statistics Laboratory Program (ISLeX), you are required to use numbers for all answers. Place the cursor in the appropriate box and type in your answer. Use the mouse or the tab key to move to the next box. If you press enter, it will go right to grading. (You have the option to go back again, so DO NOT accept unless you are completely finished.) Click on the “Check my answers” box to grade your assignment. At the end of the assignment, your grade will be displayed on the screen and you will be given to option of accepting the grade or repeating the assignment. Once you accept your grade, you will not be able to repeat the assignment. You are encouraged to repeat the assignment until you are satisfied with your effort. You must achieve 80 or higher to move onto the next assignment. EXAMPLE 6 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 69
  • 122. Example 6: How to prepare a stem-and-leaf diagram A stem-and-leaf diagram combines graphical and numerical methods to summarize data. Unfortunately, EXCEL does not have a command for preparing a stem-and-leaf diagram. Suppose you wish to develop a stem-and-leaf diagram of the following data. 25.6 26.0 25.3 27.2 23.6 26.3 25.4 23.8 21.1 23.4 23.9 23.8 26.0 20.0 22.5 28.0 26.7 24.8 25.1 24.9 26.6 24.9 25.0 27.5 20.6 24.0 22.1 20.0 21.8 24.7 21.7 25.2 27.1 24.8 25.8 26.9 25.6 Enter (or read) the data into a column in EXCEL and then sort the data from lowest to highest use the Data->Sort command. [Excel 2013: Data Tab – Sort] The results follow. 20.0 20.0 20.6 21.1 21.7 21.8 22.1 22.5 23.4 23.6 23.8 23.8 23.9 24.0 24.4 24.7 24.8
  • 123. 24.8 24.9 24.9 25.0 25.1 25.2 25.3 25.4 25.6 25.6 25.8 26.0 26.0 26.3 26.6 26.7 26.9 27.1 27.2 27.5 28.0 If you decide to have leaf units of 0.1, the successive stem units will be 10 × 0.1 = 1.0 higher than the previous one. Start by writing the stem units in a column followed by a vertical bar. 20 | 20 | 0 0 6 21 | Then, go 21 | 1 7 8 22 | down the data 22 | 1 5 23 | and write the 23 | 4 6 8 8 9 24 | last digit of 24 | 0 4 7 8 8 9 9 25 | each number 25 | 0 1 2 3 4
  • 124. 6 6 8 26 | in the leaf 26 | 0 0 3 6 7 9 27 | position 27 | 1 2 5 28 | 28 | 0 And, finally, add a title and leaf unit to complete the job. Stem-and-leaf diagram of example data. Leaf unit = 0.1 20 | 0 0 6 21 | 1 7 8 22 | 1 5 23 | 4 6 8 8 9 24 | 0 4 7 8 8 9 9 25 | 0 1 2 3 4 6 6 8 26 | 0 0 3 6 7 9 27 | 1 2 5 28 | 0 EXCEL EXAMPLES EXAMPLE 7 70 INTRODUCTORY STATISTICS LABORATORY The stem-and-leaf diagram consists of two columns of numbers. The first column is called the stem. The second column contains the leaves; one leaf for each data point. The value of any number in a leaf position is indicated by the leaf unit, 0.1 in this example. Any number in
  • 125. a leaf position represents that number multiplied by the leaf unit 0.1. In the first row of the diagram, the 0 stands for 0 × 0.1 = 0.0, and the 6 stands for 6 × 0.1 = 0.6. The value of the numbers in the stem position are 10 × leaf unit, i.e. 1 in this case. In the last row, the 28 for 28 × 1 = 28. The final value of any leaf is calculated by adding the leaf value to the corresponding stem value. The 0 in the last row represents the number 0 × 0.1 + 28 × 1 = 28.0. The third leaf in stem position 21 represents 8 × 0.1 + 21 × 1 = 21.8. EXAMPLE 7 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 71 Example 7: How to draw a frequency (or relative frequency) polygon. In this example, midpoints for Samples 1 and 2 are stored in column A, and relative frequencies from Sample 1 are stored in column B and relative frequencies from Sample 2 are stored in column C of an EXCEL worksheet. In order to compare the two samples, it will be useful to plot relative frequencies for both samples on the same graph. Here are columns A, B, and C of an example worksheet. 20 0.0357 0.0000
  • 126. 21 0.1429 0.0270 22 0.2143 0.1081 23 0.1786 0.1081 24 0.2500 0.1622 25 0.1071 0.2162 26 0.0714 0.1892 27 0.0000 0.1081 28 0.0000 0.0811 [Excel 2013: highlight the data. Insert Tab – Charts and Choose SCATTER, then click 2D ‘Straight Line with Markers’]. The resulting graph will look like: However, you will want to edit the graph. Click the to edit the chart. Choose Axes and move the cursor over until the little right arrow appears, then choose More Options and then Click on the histogram picture. EXCEL EXAMPLES EXAMPLE 7 72 INTRODUCTORY STATISTICS LABORATORY The resulting graph will now have better representation. Remember to label your chart title and axis appropriately (not shown in chart below).
  • 127. You can now edit the Axis. Change the minimum Bounds to 19 and the maximum Bounds to 29. Then change the Major Units to 1.0. EXAMPLE 9 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 73 Example 8: How to use EXCEL to calculate various numbers that summarize the characteristics of a population (or sample). In this example, the Function command is used to calculate various constant values to be stored in cells in the worksheet. [Excel 2013: Formulas Tab – Insert Function (fx)]. There are many different functions that can be used. Some refer to whole columns, some to individual observations. The following examples demonstrate a few of the uses of functions in EXCEL. You can type the function into any particular cell by first typing an equal sign in the formula bar and then typing the name of the function along with its required arguments. As an alternative, you can use the [Excel 2013: Formulas Tab – Insert Function (fx)] to choose a function and have EXCEL prompt you for necessary arguments. In this course, you would probably choose Function category = Statistical and then double
  • 128. click on the Function name for the function you want to use. For this example, consider that there are 22 observations stored in column A. a) Determine the number of data points in the population. =COUNT(A:A) b) Calculate the mean (= sum of all observations divided by number of observations) =SUM(A1:A22)/COUNT(A1:A22) =AVERAGE(A1:A22) c) Determine the minimum in this population (the first value in a magnitude array). If the data have been sorted from smallest to largest, the smallest (minimum) value will be in the first position, cell A1, and the largest will be located in the last position, cell A22 in this example. =MIN(A1:A22) d) Determine the maximum in this population (the last value in a magnitude array). =MAX(A1:A22) e) Determine the median (the middle value in a magnitude array). For an odd number of data points, the median is the middle value. The middle value of n data points if n is even is given by the average of the values of the two middle terms. =MEDIAN(A1:A22) f) Determine the first quartile.
  • 129. The first quartile is that value below which one-quarter of the observations lie. Because there is no generally accepted definition of quartile, different programs gives different results for quartiles. ISLeX is programmed to calculate quartiles in the same way that Excel uses. =QUARTILE(A1:A22,1) EXCEL EXAMPLES EXAMPLE 8 74 INTRODUCTORY STATISTICS LABORATORY g) Determine the third quartile. The third quartile is that value below which three-quarters of the observations lie. =QUARTILE(A1:A22,3) NOTE: The median is sometimes referred to as the second quartile (Q2) because it is the value below which 2/4 of the values lie. The first quartile (Q1), the median (Q2) and the third quartile (Q3) divide the data values into four groups. We know that 1/4 of the data values are less than Q1, 1/4 are between Q1 and Q2, 1/4 are between Q2 and Q3, and 1/4 are greater than Q3. For some purposes, it may be sufficient to summarize a large data set by presenting these three values. h) Determine the standard deviation.
  • 130. The standard deviation is the square root of the variance, and the variance is the average of the squares of differences between individual data points and the overall mean. Remember that the standard deviation of a population is calculated differently than a standard deviation of a sample. It is important to know if you have a sample or a population. =STDEV.S(A1:A22) for a sample =STDEV.P(A1:A22) for a population 23 20 22 Uses =COUNT(A1:A22) to count number of observations 29 22.77273 Uses =SUM(A1:A22)/COUNT(A1:A22) to calculate average 29 16 Uses =MIN(A1:A22) to calculate the minimum value 27 30 Uses =MAX(A1:A22) to calculate maximum value 23 23 Uses =MEDIAN(A1:A22) to calculate median value 17 19 Uses =QUARTILE(A1:A22,1) to calculate first quartile 17 27.75 Uses =QUARTILE(A1,A22,3) to calculate third quartile 22 4.669372 Uses =STDEV.S(A1:A22) to calculate standard deviation for a sample 23 25 21 21 18 16 21 24 19 27
  • 131. 19 25 24 EXAMPLE 9 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 75 Example 9: How to use the DESCRIPTIVE STATISTICS command of EXCEL The Descriptive statistics command of EXCEL will automatically calculate most of the summary statistics required of data in a single column [Excel 2013 Data Tab – Data Analysis and then choose Descriptive Statistics]. By listing several columns, the Descriptive statistics command can be applied to several columns simultaneously. Consider that data has been stored in column A. To calculate summary statistics for this column, follow these steps. Excel 2013: Data Tab and choose Data Analysis (on right) Double click on Descriptive statistics in the Data Analysis dialog box Set Input range to = A:A (or just highlight the data with the cursor) Click on Summary statistics
  • 132. Click on OK Your results will be on a new worksheet and will look like this (move column borders to see full text). Column1 Mean 23.90909 Standard Error 1.038041 Median 23.5 Mode #NUM! Standard Deviation 4.868843 Sample Variance 23.70563 Kurtosis -1.32235 Skewness -0.11628 Range 14 Minimum 16 Maximum 30 Sum 526 Count 22 This approach gives many of the summary statistics described in the preceding example as well as several others. The #NUM! Message means only that there are several possible values for the mode in this data set. EXCEL EXAMPLES EXAMPLE 10
  • 133. 76 INTRODUCTORY STATISTICS LABORATORY Example 10: Further uses of the EXCEL->As a calculator EXCEL can also be used as a calculator. The following statements would allow you to calculate 5.6-3.2 = 2.4 and store it in a cell in the EXCEL worksheet. It is important to start your equation with an “=” otherwise the calculator function is not enabled . =5.6-3.2 If 5.6 was stored in cell D3 and 3.2 was stored in cell D4, you could also use =D3-D4 The second option may be useful if 5.6 and 3.2 may be used in other calculations. This same scheme may be used for all elementary mathematical operations. Use - to indicate subtraction [ = 5.6 - 3.2] Use + to indicate addition [ = 5.6 + 3.2] Use * to indicated multiplication [ = 5.6 * 3.2] Use / to indicate division [ = 5.6 / 3.2] Use POWER to indicate exponentiation [ = POWER(5.6, 3.2)]
  • 134. EXAMPLE 12 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 77 Example 11: Calculations with a discrete probability distribution In this example, EXCEL is used to answer various questions dealing with a discrete probability distribution. EXCEL worksheet column A contains the event names and column B contains the corresponding probabilities. In PL SC 314, we will discuss only events that represent counts; e.g. number of seeds germinated, number of red blood cells, number of live plantlets, number of microbial colonies, et cetera. 0 0.018316 1 0.073263 2 0.146525 3 0.195367 4 0.195367 5 0.156293 6 0.104196 7 0.059540 8 0.029770 9 0.013231 10 0.005292 11 0.001925 12 0.000642 13 0.000197
  • 135. 14 0.000056 15 0.000015 16 0.000004 17 0.000001 18 0.000000 19 0.000000 20 0.000000 Suppose one were interested in the probability of exactly 10 in this distribution. This can be read directly from column B in the row position corresponding to A = 10. Thus, P(X = 10) = 0.005292. A powerful way of calculating the probabilities of compound events is to sum parts of the probability table. Suppose you want the probability of less than 13. You must add the probabilities for 0, 1, . . 12. Those probabilities are in cells B1:B13. To calculate the probability, you could move to cell C1 and enter the formula = SUM(B1:B13). In this example, the probability of less than 13 is 0.99973 or 99.973 percent. Note that terms such as 'less than 13' and 'fewer than 13' include all possible values from the smallest up to, but excluding, 13. Similarly, 'more than 13' or 'greater than 13' would not include 13. Moreover, the term 'between 5 and 10' would include 6, 7, 8 and 9, and would exclude 5 and 10.
  • 136. EXCEL EXAMPLES EXAMPLE 11 78 INTRODUCTORY STATISTICS LABORATORY However, ‘no more than 13’ would include 13. ‘At least 13’ would include 13 and all higher values. The following three examples show other questions that can be dealt with in this general manner. a) P[10 < X < 21] = ? = P(11) + P(12) + P(13) + P(14) + … + P(20). P(11) is listed in row 12 of column B while P(20) is listed in row 21 of column B. = SUM(B12:B21) = 0.0028398 b) P[(X < 6) or (X > 14)] = ? In this example, calculate P(0) + P(1) + … +P(5) + P(15) + P(16) + … + P(20) = SUM(B1:B6)+SUM(B16:B21) = 0.78515 c) P[X > 0] = ? = SUM(B2:B21) = 0.98168 or = 1 - B1 = 0.98168 In order to calculate the mean of a probability distribution, one must use the methods for calculating the mean of a relative frequency distribution. The
  • 137. mean is equal to the sum of the products of each value multiplied by its corresponding probability. In the example table, the hand calculation would require 0(0.018316) + 1(0.073263) + . . for 21 terms. In EXCEL, the following formula will operate on whole columns and calculation of the mean is simple. = SUMPRODUCT(A1:A21*B1:B21) = 4.00 For this probability distribution, one would conclude that the average value in a great many samples from this distribution will be 4.0. The variance of a probability distribution can be most easily calculated as the average of the squares of the values minus the square of the average. The following EXCEL formula will calculate the variance. The mean (see above) must have previously been calculated and stored in cell D7. = SUMPRODUCT(A1:A21*A1:A21*B1:B21)-D7*D7 = 4.000 In this example, the variance of the distribution (4.0) is identical to the mean. This is a characteristic of the 'Poisson' probability distribution (that deals with the random occurrence of rare events). Such a relationship will not occur with other distributions. EXAMPLE 12 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 79
  • 138. Example 12: Reading and storing constants for further use. In some assignments, you are required to read numerical values from a file and then use them to calculate answers to specific questions. Consider the situation where the mean and standard deviation are stored in columns 1 and 2 of the data file called 'Table R'. 1. Copy the table using the method described in Example 1. 2. Observe the two columns to see the mean and standard deviation. 3. Suppose the data is loaded into cells B1 and C1 of the EXCEL worksheet and that you are told the first value is the mean and the second is the standard deviation. Use the value stored in B1 as the mean and the one stored in C1 as the standard deviation in subsequent computations. Suppose the values were 100.4 and 7.89. You could calculate the value of the mean plus two standard deviations by using this formula in a cell. = B1 + 2*C1 EXCEL EXAMPLES EXAMPLE 13
  • 139. 80 INTRODUCTORY STATISTICS LABORATORY Example 13: Using the EXCEL to answer questions about continuous distributions. Consider that X is a continuous variable with a mean whose value is stored in EXCEL cell A1 and a standard deviation whose value is stored in B1. For example, if you are given that the mean is 86.7 and the standard deviation is 4.81, the following calculations will work if you first store 86.7 in A1 and 4.81 in B1. In the assignments, you will be dealing only with a continuous distribution known as a normal distribution. When using the NORM.DIST function to calculate a probability, it will be necessary to indicate i) the value below which you require the probability, ii) the mean of the distribution, iii) the standard deviation of the distribution, and iv) TRUE to indicate that you want a cumulative probability [P(X < Value)]. In these examples, consider that X is an observation from a normal distribution. The NORM.DIST function will give the probability that a random observation will be less than some specified value V, i.e. P[X < V]. To calculate P[X < 90] use =NORM.DIST(90,86.7,4.81,true) = 0.75367 or =NORM.DIST(90,A1,B2,true) = 0.75367 if mean in A1 and standard deviation in B1. By choosing NORM.DIST, you will be prompted for the four
  • 140. arguments. [Excel 2013: Formula Tab – Insert Function], scroll to choose NORM.DIST off of statistical list]. Choose your x value, type in or use the cursor to select your mean, type in or use the cursor to select your standard deviation and type TRUE into the Cummulative (for continuous data). The following examples should help to convert questions into mathematical expression and then into EXCEL commands. What is the probability that a continuous normal variable X will be less than 75? P[X < 75] = ? =NORM.DIST(75,A1,B1,TRUE) EXAMPLE 13 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 81 What is the probability that a continuous normal variable X will exceed 75? P[X > 75] = 1 - P[X < 75] = ? =1-NORM.DIST(75,A1,B1,TRUE) What is the probability that a random observation from a normal distribution will be between 70 and 80? P[70 < X < 80] = P[X < 80] - P[X < 70] = ?
  • 141. =NORM.DIST(80,A1,B1,TRUE)- NORM.DIST(70,A1,B1,TRUE) What proportion of random observations from a normal distribution should lie within two standard deviations of the mean? P[mean - 2*stdev < X < mean + 2*stdev] = P[X < mean + 2*stdev] - P[X < mean - 2*stdev] = ? =NORM.DIST(A1+2*B1,A1,B1,TRUE)-NORM.DIST(A1- *B1,A1,B1,TRUE) What percentage of observation from a normal population should exceed the mean by 1.96 standard deviations? 100 × P[X > mean + 1.96*stdev] = 100 × (1 - P[X < mean + 1.96*stdev]) = ? =100*(1-NORM.DIST(A1+1.96*B1,A1,B1,TRUE)) Results of the above calculations are a) P{X<90} = 0.75367 b) P{X<75} = 0.00750 c) P{X>75}= 0.99250 d) P{70<X<80}= 0.08156 e) P{mean-2*sd < X < mean + 2 sd} = 0.95450 f) 100*P(X > mean + 1.96 * sd) = 2.49978 EXCEL EXAMPLES EXAMPLE 15
  • 142. 82 INTRODUCTORY STATISTICS LABORATORY Example 14: How to calculate a chi-squared statistic for a 'goodness-of-fit' test. Consider this example from Steel and Torrie (1981). A researcher observed 1178 barley plants in class 1 (green, non-two-row), 291 in class 2 (green, two-row), 273 in class 3 (chlorina, non-two-row), and 156 in class 4 (chlorina, two-row). Test the hypothesis that distribution in the four classes is in the ratio of 9 : 3 : 3 : 1. Step 1. Store the observed frequencies in one column of the EXCEL worksheet. To calculate expected frequencies, first convert the numbers in the expected ratio to proportions (relative frequencies) by dividing each by 16. Then, multiply the proportions 9/16, 3/16, 3/16 and 1/16 by the total number of barley plants in order to calculate expected frequencies. If the observed frequencies (1178, 291, 273 and 156) are stored in column A, the following four formulas should be entered in cells B1, B2, B3 and B4. Cell B1 =9/16*SUM(A1:A4) Cell B2 =3/16*SUM(A1:A4) Cell B3 =3/16*SUM(A1:A4) Cell B4 =1/16*SUM(A1:A4) The will give the following table where the first column contains the observed frequencies and the second column the frequencies expected if
  • 143. the observations are distributed into the four classes in a ratio of 9 : 3 : 3 : 1. 1178 1067.625 291 355.875 273 355.875 156 118.625 Step 2. Calculate the Chi-squared statistics as the sum of (O- E)2/E. Enter the formula =(A1-B1)*(A1-B1)/B1 into cell C1 and copy it into cells C2, C3 and C4 (Note that the 1 will change to 2, 3 or 4 as you copy the formula into each successive cell). Finally, enter the formula =SUM(C1:C4) into cell C6 to calculate the chi-squared statistic. The worksheet should now look like this. 1178 1067.625 11.41097 291 355.875 11.82653 273 355.875 19.29966 156 118.625 11.77568 54.31284 Note that the sum of (O-E) should be zero. The sum of (O-E)2/E [in cell C6] gives the required chi- squared statistic, 54.313.
  • 144. EXAMPLE 14 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 83 Step 3. Compare the calculated statistic to the appropriate critical value. If the statistic exceeds the critical value, reject the hypothesis that the observed frequencies show a good fit to a 9 : 3 : 3 : 1 ratio. In this example, the four expected frequencies are required to sum to 1898. Because of this one restriction, the chi-squared statistic has 4-1 = 3 degrees of freedom. People will often choose critical values that correspond to a 5 % significance level (α = 0.05). You may read the critical value for the chi-squared distribution with 3 degrees of freedom and a 5% significance level directly from a table in a statistical textbook (= 7.82) or use the following EXCEL commands. To calculate the critical value, one uses α = 0.05 and df = 3 as the arguments for the CHISQ.INV function. [Excel 2013: Formula Tab – Insert Function], scroll to choose CHISQ.INV off of statistical list, choosing the RT for right tail. =CHISQ.INV.RT(0.05,3) = 7.814725 Rather than comparing the calculated test-statistic (54.31284) to the critical value 7.82 and
  • 145. concluding that the observed frequencies do not fit a 9 : 3 : 3 : 1 ratio, you can also calculate the p-value, the probability of such a large chi-squared statistic if the null hypothesis is really true. If the calculated test-statistic is in cell C6, use [Excel 2013: Formula Tab – Insert Function], scroll to choose CHISQ.DIST off of statistical list, choosing the RT for right tail. =CHISQ.DIST.RT(C6,3) = 0.00000000000096 With the p-value formula written in cell D6, some headings typed in cells C5 and D5, and some formatting of cell D6, the worksheet now look like this. 1178 1067.625 11.41097 291 355.875 11.82653 273 355.875 19.29966 156 118.625 11.77568 Chi- square p-value 54.31284 0.0000 Conclusion: Since the calculated value (54.313) exceeds the critical value (7.8147), reject the hypothesis of a good fit to a 9 : 3 : 3 : 1 ratio. Also we reject the hypothesis that the observed frequencies show a good fit to a 9 : 3 : 3 : 1 ratio if the significance level (α = .05) is greater than
  • 146. the p-value. In this case, we reject because .05 > than the p- value of .0000. EXCEL EXAMPLES EXAMPLE 15 84 INTRODUCTORY STATISTICS LABORATORY Example 15: How to calculate a confidence interval for one mean when σ is known. Consider an example where the data are stored in worksheet column A and you are required to calculate a 90 % confidence interval for the mean of the data and σ is given. Step 1. Calculate the mean of the data, as well as the number of observations. Let's store these intermediate results in column D along with identification in column C. Type 'Mean' in cell C1 and the formula =AVERAGE(A:A) in cell D1 Type ‘n’ in cell C2 and the formula =COUNT(A:A) in cell D2 Step 2. The standard deviation of the population is given and is 4.0. This value can be typed into D3 with a title of ‘St.DevP.’ in C3.
  • 147. a) Since 90 = 100(1 - α), α = 0.10 and α/2 = 0.05. Determine the critical value (CV) of the standard normal distribution corresponding to α/2 = 0.05 from a table or using EXCEL as follows (CV = 1.645). [Excel 2013: Formula Tab – Insert Function], scroll to choose NORM.INV off of statistical list =NORM.INV(0.95,0,1) = 1.644853 b) Calculate the margin of error as CV multiplied by standard deviation of the population and divided by the square root of the sample size. Type 'E =' in cell C4, and the formula =NORM.INV(0.95,0,1)*D3/SQRT(D2) in cell D4 c) Calculate the lower and upper limits as mean ± margin of error. Type 'LL =' in cell C5, and the formula =D1-D4 in cell D5. Type 'UL =' in cell C6, and the formula =D1+D4 in cell D6. 29.6 Mean = 30.7285 30.7 St. dev. = 4.0 31.4 n = 35 31.1 E = 1.1122 25.5 LL = 29.6163 34.6 UL = 31.8407
  • 148. 34 31 34 EXAMPLE 16 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 85 Example 16: How to calculate a confidence interval for one mean when σ is NOT known. For this example, consider that the sample data are stored in column A and that you are required to calculate a 95 % confidence interval for the mean of the population from which the sample was taken. This example is very similar to Example 15. There are two differences: first you will have to use an estimate of the population standard deviation because σ is not given. Secondly, we will use the t distribution to find our critical value. We will use the function T.INV, rather than NORM.INV, to calculate the critical value for the margin of error. Note that T.INV uses α/2 in the LEFT tail; therefore, will always give the negative left tail critical value for α/2. Step 1. Calculate n, MEAN and STDEV.S Let's store these intermediate results in column D along with identification in column C. Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A) in cell D1 Type 'St. Dev. =' in cell C2, and the formula =STDEV.S(A:A)
  • 149. in cell D2 Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell D3 Step 2. Because we have sample data for the standard deviation, we will determine the α/2 critical value (CV) from the t-distribution with 24 - 1 = 23 degrees of freedom. For a 95% confidence interval, α/2 = 0.025. Read the value from a table of critical values for the t-distribution (= 2.069) or calculate it using EXCEL T.INV function. [Excel 2013: Formula Tab – Insert Function, scroll to choose T.INV off of statistical list]. Note that for questions which have sample sizes of 76 or larger, we must use the T.INV function to get the correct CV (ISLeX will mark an approximation as incorrect). =T.INV(0.025,23) = -2.068655 Step 3. Calculate margin of error = E = CV * STDEV.S / SQRT(n) Type 'E = ' in cell C4, and the formula =T.INV(0.025,D3- 1)*D2/SQRT(D3) in cell D4 Step 4. Calculate lower limit = mean – margin of error and upper limit = mean + margin of error. Type 'LL =' in cell C5, and the formula =D1+D4 in cell D5 (it is
  • 150. + because the E is calculated using the critical value in the left tail and is a negative number). Type 'UL =' in cell C6, and the formula =D1-D4 in cell D6. (it is - because the E is calculated using the critical value in the left tail and is a negative number). 29.6 Mean = 30.9875 30.7 St. dev. = 2.788465 31.4 n = 24 31.1 E = -­‐1.17747 25.5 LL = 29.8100 34.6 UL = 32.1649 34 31 EXCEL EXAMPLES EXAMPLE 17 86 INTRODUCTORY STATISTICS LABORATORY Example 17: How to calculate a confidence interval for a proportion A proportion is the number of observations in one class expressed as a proportion of the
  • 151. total number of observations. Consider that there are n = 978 observations of which 123 are in the first class and the remaining 855 are in the second class. Further, consider the proportion p̂ = 123/978 = 0.12577 that are in the first class. This example shows how to calculate a 95 % confidence interval for the proportion that are in the first class in the population from which these 978 observations were randomly taken. The following steps can be used to calculate a confidence interval for a proportion. a) Calculate the estimated standard error of the proportion = n qpsp ˆˆ ˆ = In this example, let's use columns A and B of a new worksheet for the calculations. Type ' p̂ = ' in cell A1, and the formula =123/978 in cell B1 Type 'st.dev. = ' in cell A2, and the formula = SQRT(B1*(1- B1)/978) in cell B2 b) Get the critical value for a standard normal (z) distribution for confidence level
  • 152. 1 - α = 0.95 or α/2 = 0.025. Use NORM.INV with 1 - α/2 = 0.975. Calculate the margin of error by multiplying the critical value by the standard error. Type 'CV =' in cell A3, and the formula =NORM.INV(0.975,0,1) in cell B3. Type 'E =' in cell A4, and the formula =B2*B3 in cell B4. c) Calculate lower limit = estimate – margin of error and upper limit = estimate + margin of error. Type 'LL = ' in cell A5, and the formula =B1-B4 in cell B5. Type 'UL =' in cell A6, and the formula =B1+B4 in cell B6. p̂ = 0.125767 st.dev. = 0.010603 cv = 1.959961 E = 0.020781 LL = 0.104985 UL = 0.146548 EXAMPLE 18 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 87 Example 18: How to calculate a test of hypothesis concerning one mean when σ is NOT known. In tests of hypothesis, we are interested in evaluating assertions about population parameters in light of the evidence we have in a sample taken from that population.
  • 153. In this example, we look at hypotheses concerning the mean of a population. Step 1. Make an assertion, the null hypothesis, that the mean is equal to some value. Consider the possible alternative(s). H0 : population mean = 3 H1 : population mean > 3 {one-tailed (right) alternative} or population mean < 3 {one-tailed (left) alternative} or population mean ≠ 3 {two-tailed alternative} In this example, consider H0: mean = 3 and H1 : mean > 3. This will be a right-tailed test. Step 2. Calculate the sample mean, size and standard deviation. Suppose that the data is stored in column A of an EXCEL worksheet. Let's use columns C and D to store identification and intermediate and final results. Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A) in cell D1. Type 's =' in cell C2, and the formula =STDEV.S(A:A) in cell D2. Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell D3. Step 3. Calculate the test statistic t as: x calculated s x t )hypothesis Null (from
  • 154. µ− = and n ssx = Type 't<calc> = ' in cell C4, and the formula =(D1- 3)/(D2/SQRT(D3)) in cell D4. Step 4. Calculate the critical value of the t-distribution for the degrees of freedom appropriate for this sample, for the desired significance level (α), and for the appropriate alternative hypothesis. Consider a right-tailed test at α = 0.05. Type 't<table> =' in cell C5, and the formula =T.INV(0.95,D3- 1) in cell D5. Type 'p-value =' in cell C6, and the formula =T.DIST.RT(D4,D3-1,1) in cell D6. Mean = 4.88 s = 0.28 n = 22 t<calc> = 1.431491 t<table> = 1.720744 p-value = 0.083503 Note that T.INV(0.95,D3-1) is 0.95 because the alternative hypothesis, in this case, is in the right tail. Also, T.DIST.RT(D4,D3-1,1) is .RT because the alternative hypothesis is in the right tail
  • 155. EXCEL EXAMPLES EXAMPLE 18 88 INTRODUCTORY STATISTICS LABORATORY Step 5. Compare the calculated test statistic to the critical value and decide whether or not to reject the null hypothesis that the population mean equals the specified value. In this case, 1.4431491 is less than 1.720744 and we do not reject the null hypothesis that the population mean is equal to 3. Or p-value = 0.83503 is greater than α = 0.05, so we do not reject the null hypothesis. What if the alternative hypothesis was less than 3? In the case of a one-tailed (left alternative): H0 : population mean = 3 H1 : population mean < 3 {one-tailed (left) alternative} If the alternative hypothesis is that the mean is really less than 3, we would compare the test statistic to a critical value of -1.7208 (the negative of that used for the one-tailed upper alternative). We could find the critical value for the left tail by using T.INV(0.05,D3-1). We would reject the null hypothesis only if the test statistic was more negative than the lower critical value. In this example, 1.4431491 is not less than - 1.7208 and we do not reject the null hypothesis. To find the p-value, for the left tail, it would be T.DIST(-1.431491,22-1, 1). Or p- value = 0.83503 is greater than α = 0.05, so we do not reject the null hypothesis.
  • 156. What if the alternative hypothesis was not equal to 3? iii) In the case of a two-tailed alternative: H0 : population mean = 3 H1 : population mean ≠ 3 {two-tailed alternative} For the two-tailed alternative, we need two critical values (one for each tail). Using T.INV.2T with α = 0.05 will give the positive critical value for a two- tailed test with α appropriately split into both tails. =T.INV.2T(0.05,D3-1) = 2.079614 The lower critical value is the negative of the upper critical value, i. e. -2.079614. The decision rule in this case is to reject the null hypothesis if the test statistic is smaller than the lower critical value or greater than the upper critical value. In this example, the test statistic is between the lower critical value and the upper critical value for a two-tailed test and we would do not reject the null hypothesis. To find the p-value, for both tails, it would be T.DIST.2T(1.431491, 22-1) =0.16700. The p-value for a two- tailed test = 0.16700 and is greater than α = 0.05, so we do not reject the null hypothesis EXAMPLE 19 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 89
  • 157. Example 19: Large sample confidence intervals and tests of hypothesis for differences between two means when population variance is unknown and unequal. When we have large sample sizes with unknown variances that are unequal, we can use the normal distribution as an approximation to the t distribution. For this example, 49 individuals with anorexia nervosa were bulimic and had an average depression score of 30.0 (standard deviation = 5.9) while 56 individual were non-bulimic and had an average depression score of 27.0 (standard deviation = 5.4). i) Calculate a 90 % confidence interval for the difference in depression score. a) Record sample sizes, means and standard deviations as constants in EXCEL cells. Type 'Sample 1:' in cell A1. Type 'Mean1 =' in cell B2, and the number 30.0 in cell C2. Type 's1 =' in cell B3, and the number 5.9 in cell C3. Type 'n1 =' in cell B4, and the number 49 in cell C4. Type 'Sample 2:' in cell A5. Type 'Mean2 =' in cell B6, and the number 27.0 in cell C6. Type 's2 =' in cell B7, and the number 5.4 in cell C7. Type 'n2 =' in cell B8, and the number 56 in cell C8. b) Calculate the standard error of the difference between the two sample means. Type 'sd<diff> =' in cell B10, and the formula =SQRT(C3*C3/C4+C7*C7/C8) in cell C10.
  • 158. c) Determine the critical value for a 90% confidence interval. In this case we will use the standard normal distribution to approximate the t value because the sample sizes are so large. For 100(1-α)% CI, use the NORM.INV function with 1-α. Type 'cv = ' in cell B11, and the formula =NORM.INV(0.95,0,1) in cell C11. d) Calculate margin of error of difference = critical value × standard error. Type 'E =' in cell B12, and the formula =C11*C10 in cell C12. e) Calculate lower limit = difference – margin of error and upper limit = difference + margin of error. Type 'LL =' in cell B13, and the formula =C2-C6-C12 in cell C13. Type 'UL =' in cell B14, and the formula =C2-C6+C12 in cell B14. EXCEL EXAMPLES EXAMPLE 19 90 INTRODUCTORY STATISTICS LABORATORY The following is a copy of the first 12 rows of columns A, B
  • 159. and C Bulimics Mean1 = 30 s1 = 5.9 n1 = 49 Non- bulimics Mean2 = 27 s2 = 5.4 n2 = 56 se<diff> = 1.10956 cv = 1.644853 E = 1.825062 LL = 1.174938 UL = 4.825062 ii) Test the hypothesis that the mean depression scores for the two groups are equal against an alternative that they are not equal. a) Calculate the test statistic (z) by dividing the difference between the means minus zero (for no difference from the null hypothesis) by the standard error of the difference. =(C2-C6-0)/C10 = 2.70378 b) Compare the calculated test statistic to a critical value that correctly reflects your
  • 160. choice of significance level and the form of the alternative hypothesis. For a two-tailed test, use NORM.INV with 1 - α/2. For one- tailed test, use NORM.INV with 1 - α. In this example, consider α = 0.05 and a two-tailed test. =NORM.INV(0.975,0,1) = 1.959961 Since the test statistic (2.70378) is greater than the upper critical value for the two-tailed test, reject the conclusion that the mean depression score is the same for both groups. EXAMPLE 20 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 91 Example 20: Confidence intervals and tests of hypothesis for differences between two means for independent samples: population variances are unknown but equal. For this example, 25 men had an average decrease in systolic blood pressure of 8.9 units (standard deviation = 6.2) due to transcendental meditation. For 25 women, the average decrease was 5.0 units (standard deviation = 6.0). i) Calculate a 95 % confidence interval for the difference in average decrease.
  • 161. a) Record sample sizes, means and standard deviations as EXCEL constants. Type 'Men:' in cell A1. Type 'Mean1 =' in cell B2, and the number 8.9 in cell C2. Type 's1 =' in cell B3, and the number 6.2 in cell C3. Type 'n1 =' in cell B4, and the number 25 in cell C4. Type 'Women:' in cell A5. Type 'Mean2 =' in cell B6, and the number 5.0 in cell C6. Type 's2 =' in cell B7, and the number 6.0 in cell C7. Type 'n2 =' in cell B8, and the number 25 in cell C8. b) Calculate the pooled variance for the two samples (assumed to be same in both populations). Type 'Var(pooled) = ' in cell B10, and the formula =((c4-1)*c3*c3+(c8-1)*c7*c7)/(c4-1+c8- 1) in cell C10. c) Calculate the standard error of the difference between the two means. Type 'sde<diff> =' in cell B11, and the formula =SQRT(C10*(1/C4+1/C8)) in cell C11. 2 p 1
  • 162. 2 1 2 2 2 1 2 1 2 1 2 s = (n -1) s + (n -1) s (n -1) + (n -1) n = size, sample 1 n = size sample 2 s = st.dev, sample 1 s = st.dev, sample 2 , sx1−x2 = pools ( 1 1n + 1 2n
  • 163. ) EXCEL EXAMPLES EXAMPLE 20 92 INTRODUCTORY STATISTICS LABORATORY d) Calculate the α/2 critical value for the t-distribution with (n1 - 1 + n2 - 1) degrees of freedom because the population variances are equal. Using T.INV.2T with α will give the positive critical value for a two-tailed test with α appropriately split into both tails. =T.INV.2T(0.05,C4+C8-2) = 2.01064 Type 'cv = ' in cell B12, and the formula =T.INV.2T(0.05,C4+C8-2) in cell C12 e) Calculate the margin of error = critical value × standard error of the difference. Type 'E =' in cell B13, and the formula =C12*C11 in cell C13. f) Calculate lower limit = difference between means – margin of error and upper limit = difference between means + margin of error. Type 'LL =' in cell B14, and the formula =C2-C6-C13 in cell C14. Type 'UL =' in cell B15, and the formula =C2-C6+C13 in cell C15.
  • 164. Men: Mean1 = 8.9 s1 = 6.2 n1 = 25 Women: Mean2 = 5.0 s2 = 6.0 n2 = 25 Var(pooled) = 37.22 se<diff> = 1.7256 cv = 2.0106 E = 3.4695 LL = 0.4305 UL = 7.3695 On average, transcendental meditation resulted in a greater decrease (3.9 units) in blood pressure for men than for women. We are 95% confident that the population difference is between 0.4 and 7.4 units. EXAMPLE 20 EXCEL EXAMPLES
  • 165. INTRODUCTORY STATISTICS LABORATORY 93 ii) Test the hypothesis that the decrease in blood pressure is the same in men as in women. Use a 5% significance level and a two-tailed alternative hypothesis. a) Calculate the test statistic as difference/standard error of difference. =((C2-C6)-0)/C11 = 2.26012 b) Compare to critical values from the t-distribution with n1 + n2 - 2 = 48 degrees of freedom and α = 0.05. =T.INV.2T(0.05,C4+C8-2)= 2.01064, therefore the critical values for a two- tailed test are -2.01064 and 2.01064. Since test statistic = 2.26012 is greater than the upper critical value of 2.01064, reject the null hypothesis that the decrease in blood pressure is the same for both sexes. EXCEL EXAMPLES EXAMPLE 21 94 INTRODUCTORY STATISTICS LABORATORY Example 21: Large sample confidence intervals and tests of
  • 166. hypothesis for differences between two proportions. Of 1500 people from a high-income group, 62.4 % were registered to vote. Of 1500 in a low-income group, 58.2% were registered to vote. i) Calculate a 95 % confidence interval for the difference in voter registration between high-income and low-income groups. a) Store n1, p1, n2 and p2 as EXCEL constants. Type 'n1 =' in cell A1, and '1500' in cell B1. Type ' p̂ 1 =' in cell A2, and '0.624' in cell B2. Type 'n2 =' in cell A3, and '1500' in cell B3. Type ' p̂ 2 =' in cell A4, and '0.582' in cell B4. b) Calculate the standard error of the difference of the two population proportions: Type 'sd<diff> = ' in cell A6, and the formula =SQRT(B2*(1-B2)/B1+B4*(1-B4)/B3) in cell B6. c) Determine the critical value of the standard normal distribution corresponding to α/2 = 0.025 and 1-α/2 = 0.975 [required for a (1 - α) = 0.95 confidence interval]. Type 'cv =' in cell A7, and the formula =NORM.INV(0.975,0,1) in cell B7. d) Calculate margin of error = critical value × standard error of the difference.
  • 167. Type 'E =' in cell A8, and the formula =B7*B6 in cell B8. e) Calculate lower limit = difference in proportion – margin of error and upper limit = difference in proportion + margin of error. Type 'LL =' in cell A9, and the formula =B2-B4-B8 in cell B9. Type 'UL =' in cell A10, and the formula =B2-B4+B8 in cell B10. n )p-(1p + n )p-(1p = s 2 22 1 11 pp
  • 168. ˆˆˆˆ 2ˆ1ˆ − EXAMPLE 21 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 95 n1 = 1500 p1 = 0.624 n2 = 1500 p2 = 0.582 sd<diff> = 0.017849 cv = 1.959961 E = 0.034984 LL = 0.007016 UL = 0.076984 Using a 95% confidence interval, the difference in voter registration between high- income and low-income groups is between 0.007 and 0.077 (0.7 to 7.7 %). ii) Test the hypothesis that the high-income group has a higher voter registration that the low-income group. Use. α = 0.05. a) The test statistic must be calculated as if the null hypothesis
  • 169. were true. Thus, we need to calculate the average proportion of voter registration. =(B1*B2+B3*B4)/(B1+B3) = 0.603000 Type 'p<pooled> =' in cell A12, and the formula =(B1*B2+B3*B4)/(B1+B3) in cell B12 b) Use the pooled proportion to calculate new standard error of a difference. ) n + n )(p-(1p = s 2 pp 11 1 2ˆ1ˆ − Type 'sd =' in cell A13, and the formula =SQRT(B12*(1-B12)*(1/B1+1/B3)) in cell B13. c) Calculate the test statistic Type 'z<calc> =' in cell A14, and the formula =(B2-B4-0)/B13 in cell B14.
  • 170. n + n pn +pn = p 21 2211 ˆˆ sd p - p =z 21 0ˆˆ − EXCEL EXAMPLES EXAMPLE 21 96 INTRODUCTORY STATISTICS LABORATORY d) Calculate the critical value for a one-tailed (upper) test at α = 0.05; Type 'cv =' in cell A15, and the formula =NORM.INV(0.95,0,1)
  • 171. in cell B15. As an alternative, the p-value can be calculated Type 'p-value =' in cell A16, and the formula =1-NORM.DIST(B14,0,1,TRUE) in cell B16. The following are the results. p<pooled> = 0.603 sd = 0.017866 z<calc> = 2.350856 cv = 1.644853 p-value = 0.009365 Since z = 2.35086 is greater than zα = 1.6449, reject the null hypothesis. Or because p- value = 0.009365 which is less than α = 0.05, the null hypothesis is rejected. EXAMPLE 22 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 97 Example 22: How to carry out a one-way analysis of variance. A one-way analysis of variance should be used where data can be grouped by only one criterion. This type of design is sometimes called a completely random design because treatments are assigned randomly to all available experimental units. For this example, consider the mercury concentration
  • 172. (micrograms per gram of body weight) of fish living 5.5 km upstream from a chloralkali plant (treatment 1), 3.7 km downstream from the plant (treatment 2), 21 km downstream (treatment 3), or 133 km downstream (treatment 4). Consider that the treatment number for each of 40 fish has been read into column A and that the mercury concentration has been read into column B. The ANOVA procedure should be used to carry out this analysis of variance. In this example, the variable to be analyzed, mercury concentration, is stored in column B and the classification variable (treatment) is stored in column A on an EXCEL worksheet. Before analysis can begin, it is necessary to copy data for the different treatments into different columns of the EXCEL worksheet. In this example, the data for the four treatments is stored in columns F, G, H and I. The labels 'Trt 1' in cell F1, 'Trt 2' in cell G2, 'Trt 3' in cell H1, and 'Trt 4' in cell I1 are added. Then select all the data in column B that belongs to Trt 1, and then use Edit->Paste Special to past the values in cells F2 through F11. The same procedure is repeated for treatments 2, 3 and 4. Prior to analysis of variance, the data is arranged thusly: Trt 1 Trt 2 Trt 3 Trt 4 23.84 26.92 29.20 32.73 23.58 26.68 29.70 32.88 23.42 26.91 29.11 32.90
  • 173. 23.74 26.26 29.02 32.08 23.23 26.72 29.19 32.80 23.01 26.05 29.06 32.96 23.14 26.12 29.39 32.22 23.31 26.86 29.68 32.31 23.02 26.87 29.69 32.13 23.79 26.31 29.78 32.35 EXCEL EXAMPLES EXAMPLE 22 98 INTRODUCTORY STATISTICS LABORATORY To perform a one-way ANOVA, choose Anova: Single Factor to open the single factor anova dialog box. [Excel 2013: Data Tab – Data Analysis – Anova: Single Factor] Set the Input range to $F$1:$I$11 Grouped by to Columns and select Labels in first row. Click OK. EXAMPLE 22 EXCEL EXAMPLES
  • 174. INTRODUCTORY STATISTICS LABORATORY 99 The following results appear on a new worksheet. Anova: Single Factor SUMMARY Groups Count Sum Average Variance Trt 1 10 234.0743 23.40743 0.099489 Trt 2 10 265.6906 26.56906 0.120285 Trt 3 10 293.8229 29.38229 0.090206 Trt 4 10 325.3521 32.53521 0.121919 ANOVA Source of Variation SS df MS F P-value F crit Between Groups 456.1535 3 152.0512 1408.211 2.34E-37 2.866265 Within Groups 3.887088 36 0.107975 Total 460.0405 39 The degrees of freedom (DF) for differences among the four treatment groups is equal to one less than the number of treatments (4 – 1 = 3). The degrees of freedom for error is equal to the sum over four treatments of the number of individuals in
  • 175. each treatment minus one [(10 - 1) + (10 - 1) + (10 - 1) + (10 - 1) = 36]. The degrees of freedom for the total sum of squares is equal to the total number of observations minus one (39 = 40 - 1). The mean square for treatment groups is equal to the sum of squares for treatment groups divided by its degrees of freedom (152.0512 = 456.1535/3). This mean square is a measure of variation among the four groups of fish. The error mean square is equal to the error sum of squares divided by its degrees of freedom (0.107975 = 3.887088/36). It measures the average (pooled) variation among individuals within treatments. The error mean square is the estimate of pooled variance and will be used for calculating confidence intervals or tests of hypothesis about treatment means. The F-ratio is calculated by dividing the treatment mean square by the error mean square (1408.211 = 152.0512/0.107975). The F-ratio is the test statistic for testing the null hypothesis that all four treatments have the same mean. The alternative hypothesis is that not all four treatments have the same mean. NOTE The alternative hypothesis sounds like a two-tailed hypothesis. However, only the upper tail of the F distribution is considered when evaluating the significance of an F statistic. Only the upper tail is used because the F statistic is calculated from squares of differences. Squares of differences will be positive regardless of whether the differences are positive or negative.
  • 176. The F-statistic has a numerator degrees of freedom equal to the degrees of freedom that correspond to the numerator mean square (3, in this example) and a denominator degrees of freedom equal to the degrees of freedom associated with the error (36, in this example). To test EXCEL EXAMPLES EXAMPLE 22 100 INTRODUCTORY STATISTICS LABORATORY the hypothesis that all treatments have the same mean, one should compare the calculated F- statistic to the critical value of the F-distribution corresponding to 3 and 36 degrees of freedom and a suitable significance level (0.01 or 0.05 are most common). The critical value of the F-distribution can be determined by reference to a statistical table. EXCEL gives the correct critical F-value for the test. Since the calculated F-value (1408.211) greatly exceeds the critical value (2.8863), we reject the null hypothesis and conclude that there were differences among the treatments in average mercury concentration. An alternative to comparing the calculated F-value to a critical value is to compare the p- value (2.34E-37 = 0.0000 to four decimal places) to the significance level (α = 0.05). Since 0.0000 is much less than 0.05, we reject the null hypothesis.
  • 177. EXAMPLE 24 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 101 Example 23: Is MIA. Example 24: How to use information from analysis of variance to calculate confidence intervals or test hypotheses about treatment means (including least significant difference) using data from Example 22. For these examples, consider an analysis of variance {Example 22} that has an error mean square of 0.107975 with 36 degrees of freedom. Consider treatment 2 with a mean (of 10 observations) equal to 26.569 and treatment 3 (also 10 observations) with a mean of 29.382. IMPORTANT Confidence intervals and tests of hypotheses about means in an analysis of variance will always use the error mean square as the estimate of the pooled variance. a) Store the error degrees of freedom, the error mean square (pooled variance) and means in EXCEL worksheet cells.
  • 178. Type 'df =' in cell A1, and '36' in cell B1. Type 'ems =' in cell A2, and '0.107975' in cell B2. Type 't2 =' in cell A3, and '26.569' in cell B3. Type 't3 =' in cell A4, and '29.382' in cell B4. df = 36 ems = 0.107975 t2 = 26.569 t3 = 29.382 b) Calculate a 90% confidence limit for the mean of treatment 2 using the method when σ is not known as described in example 16. Standard error of one mean = square root of (error mean square/sample size). Type 'sd =' in cell A6, and the formula =SQRT(B2/10) in cell B6. Get critical value for α = 1 - 0.90 = 0.10 and error degrees of freedom. Type 'cv =' in cell A7, and the formula =T.INV.2T(0.10,36) in cell B7. Limits = mean ± critical value x standard error of mean. Type 'LL =' in cell A8, and the formula =B3-B6*B7 in cell B8, Type 'UL =' in cell A9, and the formula =B3+B6*B7 in cell B9. sd = 0.103911 cv = 1.688297 LL = 26.39357 UL = 26.74443
  • 179. EXCEL EXAMPLES EXAMPLE 24 102 INTRODUCTORY STATISTICS LABORATORY c) Test the hypothesis that the two means are not different using the method described in example 20. Use α = 0.05 and consider a two-tailed alternative hypothesis. Type 't =' in cell A11, and the formula =(B3- B4)/sqrt(B2/10+B2/10) in cell B11. Type 'cv =' in cell A12, and the formula = T.INV.2T(0.05,36) in cell B12. t = -19.1423 cv = 2.028091 Since the calculated test statistic (-19.1423) is outside the range of -2.028091 to 2.028091, we reject the hypothesis that the two means are equal. d) The least significant difference is the margin of error for a confidence interval for the difference between two means, provided both means are based on the same sample size. To calculate an LSD(α), we use the Error Mean Square and the
  • 180. sample size (remember all sample sizes are the same). n quareErrorMeanS t=)LSD( edf/2, *2 αα Type 'LSD(0.05) =' in cell A14, and the formula =T.INV.2T(0.05,36)*SQRT(2*B2/10) LSD(0.05) = 0.298033 If the absolute value of the difference between two means is greater than the least significant difference, we reject the hypothesis that the two means are equal. For treatments 2 and 3, the difference is 26.569 - 29.382 = - 2.813 with absolute value 2.813. Since 2.813 is greater than LSD(0.05) = 0.298033, we reject the hypothesis that treatments 2 and 3 are equal. t = x - x s n
  • 181. + s n 2 3 2 2 2 3 EXAMPLE 25 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 103 Example 25: How to perform a two-way analysis of variance. When data are classified according to two criteria, and when there is more than one observation in each combination of the two criteria, a two-way analysis of variance includes a term for the interaction between the two classification factors. Data for this example consist of the number of diatoms found in a stream at each of two locations (1 = upstream, 2 = downstream from a water treatment plant) with sampling occurring in three different weeks. For each observation, the site designation is stored in column A, the
  • 182. week designation in column B, and the number of diatoms in column C. Site Week Number 1 1 689 1 1 756 1 2 831 1 2 916 1 3 558 1 3 423 2 1 204 2 1 229 2 2 56 2 2 73 2 3 34 2 3 78 First, arrange the data in a two-way table like this. Site 1 Site 2 Week 1 689 204 756 229 Week2 831 56 916 73 Week3 558 34 423 78 To perform a two-way Anova use: Anova: Two-Factor With Replication [Excel 2013: Data Tab – Data Analysis – Anova: Two Factor With Replication]
  • 183. In Input Range:, indicate the cells that contain the data and the labels. For example, if the first seven rows of columns E, F and G contain the two-way table of data, specify the input range as E1:G7. Set Rows per sample: to 2 and click OK. EXCEL EXAMPLES EXAMPLE 25 104 INTRODUCTORY STATISTICS LABORATORY Here are the results from this EXCEL analysis. Anova: Two-Factor With Replication SUMMARY Site 1 Site 2 Total Week 1 Count 2 2 4 Sum 1445 433 1878 Average 722.5 216.5 469.5 Variance 2244.5 312.5 86197.67 Week2
  • 184. Count 2 2 4 Sum 1747 129 1876 Average 873.5 64.5 469 Variance 3612.5 144.5 219412.7 Week3 Count 2 2 4 Sum 981 112 1093 Average 490.5 56 273.25 Variance 9112.5 968 66290.25 Total Count 6 6 Sum 4173 674 Average 695.5 112.3333 Variance 32769.1 6809.867 ANOVA Source of Variation SS df MS F P-value F crit Sample 102443.2 2 51221.58 18.74589 0.002626 5.143249 Columns 1020250 1 1020250 373.3874 1.24E-06 5.987374 Interaction 79057.17 2 39528.58 14.46653 0.005067 5.143249 Within 16394.5 6 2732.417 Total 1218145 11
  • 185. F = 373.39 has 1 and 6 degrees of freedom and can be used to test the hypothesis that there is no difference between the upstream and downstream sites. F = 18.75 has 2 and 6 degrees of freedom and can be used to test the hypothesis that there were no differences among weeks. F = 14.47 has 2 and 6 degrees of freedom and can be used to test the hypothesis that the differences between sites (if any) were the same in all three weeks (i.e., no interaction between the two factors). Since all three p-values were less than 0.05, we would reject all three null hypotheses. EXAMPLE 25 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 105 Average of Number Site Week 1 2 Grand Total 1 722.50 216.50 469.50 2 873.50 64.50 469.00 3 490.50 56.00 273.25 Grand Total 695.50 112.33 403.92 From the two-way table of means, it is clear that the number of diatoms was much lower
  • 186. (112.33 on average) at the downstream site than at the upstream site (average = 695.50). It is also clear that numbers were down in week 3 compared to the other two weeks. The difference between the upstream and downstream sites was 504.00 in week 1, 809.00 in week 2, and 434.50 in week 3. It is clear that there is an interaction between the site and week factors; the difference between sites depends upon which week the sampling was done. EXCEL EXAMPLES EXAMPLE 27 106 INTRODUCTORY STATISTICS LABORATORY Example 26: How to calculate a randomized complete block analysis of variance Many experiments in agriculture and biology are similar to a two-way design but have only one observation per cell. In these experiments, one must assume that there is no interaction between the two factors. This assumption is always valid when one of the factors consists of ways of grouping the experimental units into more uniform groups, as is common if field research. The present example consists of data on the number of soybean plants (out of 100; column C) that failed to emerge. There are two factors in the experiment. Each observation can be classified according the fungicide treatment (Check, Arasan, Spergon, Semasan, or Fermate;
  • 187. column A) or according to the block in the field (Block 1, Block 2, Bock 3, Block 4 or Block 5; column B). The 25 observations consist of five fungicide treatments in all combinations with 5 blocks. The model will not include an interaction term. The data, arranged for analysis by EXCEL, is stored in rows 1 through 6 of columns E through J. Block 1 Block 2 Block 3 Block 4 Block 5 Check 8 10 12 13 11 Arasan 2 6 7 11 5 Spergon 4 10 9 8 10 Semasan 3 5 9 10 6 Fermate 9 7 5 5 3 To analyze this data, proceed as follows: To perform a Anova for a RCBD, use: Anova: Two-Factor Without Replication [Excel 2013: Data Tab – Data Analysis – Anova: Two Factor Without Replication] In Input Range:, indicate the cells that contain the data and the labels. For example, if the first six rows of columns E, through J contain the two-way table of data, specify the input range as E1:J6, or select those cells by suing the mouse. Check Labels, and click on OK.
  • 188. EXAMPLE 26 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 107 Results are as follows: Anova: Two-Factor Without Replication SUMMARY Count Sum Average Variance Check 5 54 10.8 3.7 Arasan 5 31 6.2 10.7 Spergon 5 41 8.2 6.2 Semasan 5 33 6.6 8.3 Fermate 5 29 5.8 5.2 Block 1 5 26 5.2 9.7 Block 2 5 38 7.6 5.3 Block 3 5 42 8.4 6.8 Block 4 5 47 9.4 9.3 Block 5 5 35 7 11.5 ANOVA Source of Variation SS df MS F P-value F crit
  • 189. Rows 83.84 4 20.96 3.874307 0.021886 3.006917 Columns 49.84 4 12.46 2.303142 0.103195 3.006917 Error 86.56 16 5.41 Total 220.24 24 The error mean square (5.41) is an estimate of the pooled variance and has 16 degrees of freedom. The table gives us two p-values for the two F-tests but only one of those is for the treatments. The F value for the Rows tests if there is significant differences among rows (fungicide treatments) for the number of failed germinations. This information appears in Rows because that is how the original data was organized into Excel. The error mean square (5.41) is an estimate of the pooled variance and has 16 degrees of freedom. The p-value for blocks cannot be used to glean information about our treatments (and is ignored because the blocks are not randomly assigned). EXCEL EXAMPLES EXAMPLE 27 108 INTRODUCTORY STATISTICS LABORATORY Example 27: How to prepare a scatterplot of two variables. For this example, data on cage size (cm2) and body weight (g) of 12 crabs has been stored in columns A and B of the EXCEL worksheet. Note that cage
  • 190. size is the independent (x) variable and body weight (y) is the dependent variable. In other words, the size of the cage affects the body weight of the crab. Excel will pick the first column as the x variable and the second column as the y variable. CageSize BodyWt 159 14.40 179 15.20 100 11.30 45 2.50 384 22.70 230 14.90 100 1.41 320 15.81 80 4.19 220 15.39 320 17.25 210 9.52 To make a scatterplot, a chart must be inserted. The first step is to highlight the data including labels, then choose Chart [Excel 2013: Insert Tab – Insert Scatter (X,Y) or Bubble Chart]
  • 191. EXAMPLE 28 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 109 Choose the first option under Scatter (used to compare at least two sets of values or pairs of data). The Chart Title should be descriptive. Click on the title and rename the graph to describe the subject matter. The Axes should also be appropriately labeled. Click the graph and check the Axis Titles to add them in. EXCEL EXAMPLES EXAMPLE 27 110 INTRODUCTORY STATISTICS LABORATORY A trendline can also be added in. PLSC 214 discusses linear relationships and Excel allows the regression equation and the r2 value on to the graph.
  • 192. Because the points are scattered in a pattern from the lower left corner to the upper right corner, we conclude that there is a positive relationship between the two variables. It appears that bigger cage sizes results in heavier crabs. The slope is 0.0528 which is positive and the y intercept is 1.7287. The r2 value is 0.7485; nearly 75% of the variation is explained by the model. EXAMPLE 28 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 111 Example 28: How to calculate a correlation coefficient. For this example, data on cage size (cm2) and body weight (g) of 12 crabs has been stored in columns A and B of the EXCEL worksheet (see example 27). CageSize BodyWt 159 14.40 179 15.20 100 11.30 45 2.50
  • 193. 384 22.70 230 14.90 100 1.41 320 15.81 80 4.19 220 15.39 320 17.25 210 9.52 a) Calculate standard deviations of each of the two variable and store results in column D. Type 's1 =' in cell C1, and the formula =STDEV.S(A:A) in cell D1. Type 's2 =' in cell C2, and the formula =STDEV.S(B:B) in cell D2. b) Calculate the covariance for sample data. Type 's12 =' in cell C3, and the formula =COVARIANCE.S(A2:A13,B2:B13) in cell D3 c) Calculate the correlation = covariance/STDEV.S(x) * STDEV.S(y) Type 'r = ' in cell C4, and the formula =D3/D2*D1 in cell D4. Results are: s1 = 106.3309 s2 = 6.484094 s12 = 596.5107 r = 0.86519
  • 194. e) Alternative method of calculating correlation. Type 'r =' in cell C6, and the formula =CORREL(A2:A13,B2:B13) in cell D6. EXCEL EXAMPLES EXAMPLE 28 112 INTRODUCTORY STATISTICS LABORATORY f) How to test if the correlation is significant. A test statistic can be calculated and compared to a t value with degrees of freedom n – 2. If the test statistic falls in the rejection region, you would reject the null hypothesis of ρ = 0. 21 2 r nr=tcalc − − In this example of 12 crabs, we could test at the 5% significance level if there is a
  • 195. positive linear correlation. The t value with n -2 degrees of freedom is t = 1.812. 456.5 )86519.0(1 21286519.0 1 2 22 =− − = − − r nr=tcalc The test statistic for a right-tailed test falls into the rejection region. The null hypothesis is rejected and we concluded that there is a positive linear relationship between cage size and weight of crabs. EXAMPLE 29 EXCEL EXAMPLES INTRODUCTORY STATISTICS LABORATORY 113
  • 196. Example 29: How to perform a regression analysis using EXCEL For this example, data on cage size (cm2) and body weight (g) of 12 crabs has been stored in columns A and B of the EXCEL worksheet (see example 27). For this example, body weight is the dependent variable and cage size is the independent variable. We wish to body weight by using its relationship to cage size. In this example, we have only one independent variable. Highlight the data and select Regression [Excel 2013: Data Tab – Data Analysis - Regression] Set Input Y Range: to a1:a13. Set Input X Range: to b1:b13. Check Confidence levels . Click OK. EXCEL EXAMPLES EXAMPLE 29 114 INTRODUCTORY STATISTICS LABORATORY SUMMARY OUTPUT
  • 197. Regression Statistics Multiple R 0.8651 R Square 0.7485 Adjusted R Square 0.7234 Standard Error 55.922 Observations 12 ANOVA df SS MS F Significance F Regression 1 93095.89 93095.89 29.76876 0.00028 Residual 10 31273.02 3127.302 Total 11 124368.9 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 24.65 35.24 0.700 0.50016 -53.87 103.18 X Variable 1 14.188 2.600 5.456 0.00028 8.40 19.98 The intercept (the expected value of the dependent variable when the independent variable is zero) was estimated as 24.65 with a standard deviation of 35.24.
  • 198. The slope (the expected change in the dependent variable for an increase of one unit in the independent variable) was 14.188 with a standard deviation of 2.600. A t-test indicates that the slope was significantly different from zero because the p-value = 0.00028 which is less than α = 0.05. The standard deviations of the intercept and slope may also be called standard errors and they are standard deviations of the distribution of sample statistics. Both standard deviations have n - 2 = 10 degrees of freedom because both are (complex) functions of the error sum of squares. The coefficient of determination (R-Square = 74.85%) indicates that nearly 75% of the variation in body weight can be explained as a linear function of cage size. Annual Editions Journal Summary Instructions: 1. Summarize each of the readings in the tables below. 2. You may expand the table to accommodate your information. 3. Write in complete sentences using proper grammar and mechanics. Readings: Unit 5 in the textbook: Social Media and Commerce
  • 199. · The Rising Influence of Social Media as Reflected by Data · How Google Dominates Us. · Can Online Piracy Be Stopped by Laws? · How Psychology Will Shape the Future of Social Media Marketing. · AmazonFresh is Jeff Bezos’ Last Mile Quest for Total Retail Domination. Reading #15 – The Rising Influence of Social Media as Reflected by Data Main idea of the article: Information presented: List at least five points made by the author 1. 2. 3. 4. 5.
  • 200. Response to the article: Reading #16 –How Psychology Will Shape the Future of Social Media Marketing Main idea of the article: Information presented: List at least five points made by the author 1. 2. 3. 4. 5. Response to the article:
  • 201. Reading #17– How Google Dominates Us Main idea of the article: Information presented: List at least five points made by the author 1. 2. 3. 4. 5. Response to the article:
  • 202. Reading #18 – AmazonFresh is Jeff Bezos’ Last Mile Quest for Total Retail Domination. Main idea of the article: Information presented: List at least five points made by the author 1. 2. 3. 4. 5.
  • 203. Response to the article: Reading #19 - Can Online Piracy Be Stopped by Laws? Main idea of the article: Information presented: List at least five points made by the author 1. 2.
  • 204. 3. 4. 5. Response to the article: Adapted from Dushkin Online Annual Editions Test Your Knowledge Form http://www.dushkin.com/online/ LAB2A.DAT142142130139132150137133147135134146140132 13614114914113513613013613413714613815213213712613413 51471421421351311421381461351481291381351371411441471 41141138139139145139137147141143135136140139137139134 13912913714914214013013913514413413213313514413413913 51341311421421521411401361441401391421461391391351391 42138135133142137141141142136141134135138135140144142 13814813514113913814113713513614114412913813313813013 01331381241421421381321441401461461451381391361321391 35137136131137147140137137134129134140141139143140138 13913715014215014613813012213213814112313413613914215 21491381391371351331381381351411451391301401331441431 41137137138136134143143138136140142136148141133149139 13114414313914214612713913713513113614413513714514714
  • 205. 11361471331311451361411401391451401441371371391441381 38141134145136139136135143135136135149144133146134140 15013714114213015414114313813413813113514914913113214 21361321441341521361391421391411401311371341381521371 34139124147144146140139141132143145137142139138138143 14913013513413614915014513714514113813614113113914213 61441371351511421431431401301451421391301371511391401 35138133137143134132136131135141145132135139139141138 13914814114313813314713013512813813614514313413313813 81471371401401361331391431381431371421461331511411331 41138145149139140128140137140146138132141151137140128 144132143149137 LAB2B.DAT146140135137140137138143149142144145145139 14114113713913915413713515113913813613813914614214113 61371471441401321461481441401341391411391391391491301 39138134134135141139151130135136138137142138140138130 147141136 LAB2C.DAT152147131139145136131142138140140134130134 13713514013314414013913814514514513813914614614713514 31391391301421361451381351391451341341381361421351361 43135130140149141137140152137132136133140141145141132 13313513813714114413714314214413113613814613813513913 71411391371411441421451301431301361461421391331391451 45133137137150139142140140137142143129135132133144138 14714114214514213314214114013414913913813313813914112 61401371361421311351321361391351361381511461371441411 32141135149137139142150131134139139139135134132134147 13713913614913713214113713213713413214314214913113313 21421441411351441341391371471341411431341351521371481 24137135130123133147142146131137139141139142136130147 13713613713414313914413714013314014914014914012813414 91301441411351391381411291301421291361361391281381371 35138138131150134132140135