Introductory Statistics Laboratory for Excel .docx

Introductory
Statistics
Laboratory
for Excel
Lab Manual Author:
R. J. (Bob) Baker
December 2003
Revised by:
Krista Wilde (2016)
i
Table of Contents
Assignment #0
_____________________________________________________
__________ 2

Assignment #1
_____________________________________________________
__________ 6
Assignment #2
_____________________________________________________
_________ 10
Assignment #3
_____________________________________________________
_________ 16
Assignment #4
_____________________________________________________
_________ 22
Assignment #5
_____________________________________________________
_________ 26
Assignment #6
_____________________________________________________
_________ 32
Assignment #7
_____________________________________________________
_________ 36
Assignment #8
_____________________________________________________
_________ 44
Assignment #9
_____________________________________________________
_________ 52
INTRODUCTION
_____________________________________________________
______ 58

Example 1: Reading data from a data file into the EXCEL
worksheet. _________________ 60
Example 2: Preparing a histogram of data
________________________________________ 62
Example 3: Entering data from the keyboard into the EXCEL
worksheet _______________ 66
Example 4: Calculating relative frequencies
______________________________________ 67
Example 5: Leaving EXCEL and grading your assignment
__________________________ 68
Example 6: How to prepare a stem-and-leaf diagram
_______________________________ 69
Example 7: How to draw a frequency (or relative frequency)
polygon __________________ 71
Example 8: How to use EXCEL to calculate various numbers
that summarize the
characteristics of a population (or sample)
________________________________________ 73
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL _________ 75
Example 10: Further uses of EXCEL->As a calculator
_____________________________ 76
Example 11: Calculations with a discrete probability
distribution _____________________ 77
Example 12: Reading and storing constants for further use
__________________________ 79
Example 13: Using EXCEL to answer questions about
continuous distributions _________ 80
Example 14: How to calculate a chi-squared statistic for a
'goodness-of-fit' test _________ 82

Example 15: How to calculate a confidence interval for one
mean when σ is known ______ 84
mean when σ is NOT known _ 85
Example 17: How to calculate a confidence interval for a
binomial proportion __________ 86
Example 18: How to calculate a test of hypothesis concerning
one mean when σ is NOT
known
_____________________________________________________
________________ 87
ii
Example 19: Large sample confidence intervals and tests of
hypothesis for differences between
two means when population variance is unknown and equal
_________________________ 89
Example 20: Confidence intervals and tests of hypothesis for
differences between two means
for independent samples: population variances are unknown but
equal ________________ 91
two binomial proportions.
_____________________________________________________
94
Example 22: How to carry out a one-way analysis of variance.

_______________________ 97
Example 23: .
_____________________________________________________
_________ 101
Example 24: How to use information from analysis of variance
to calculate confidence
intervals or test hypotheses about treatment means (including
least significant difference). 101
Example 25: How to perform a two-way analysis of variance.
_______________________ 103
Example 26: How to calculate a randomized complete block
analysis of variance _______ 106
Example 27: How to prepare a scatterplot of two variables.
_________________________ 108
Example 28: How to calculate a correlation coefficient.
____________________________ 111
Example 29: How to perform a regression analysis using
EXCEL ____________________ 113
Blank page
ASSIGNMENT 0

2 INTRODUCTORY STATISTICS LABORATORY
Introductory Statistics Laboratory
Assignment #0
Purpose
This assignment is designed for use in the instructed
introduction for students using the
Introductory Statistics Laboratory for Excel (ISLeX) program.
NOTES
Login to ISLeX and get the data for Assignment 0. Then start
Microsoft Excel and
determine the answers to the questions in this assignment. When
finished, exit from EXCEL,
return to ISLeX and submit your answers.
In this assignment, all students use the same data set. In
remaining assignments, each
student will have unique data sets.
See the examples indicated by {Example } to learn how to use
EXCEL to perform a
particular task. Reference to an example will be given at the end
of each major task. The symbol
beginning of a new task.
Question A

Data called LAB0A.DAT in Table A represents measured
yields (q/ha, where 1q = 1
quintal = 100 kg) of a sample of wheat varieties tested at
Saskatoon.
EL
worksheet.
{Example 1}
midpoint (20.5 as its upper bin) and 1 as
the interval width (bin size).
LABORATORY ASSIGNMENTS
INTRODUCTORY STATISTICS LABORATORY 3

Record the frequencies from the histogram into the following
table; add the relative
frequencies later.
Bin Midpoint Frequency Relative frequency
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
Record your answers to the following questions
1. How many observations were there in this sample?
2. What is the midpoint of the most frequent class?
(If tied, give lowest midpoint)
3. How many observations were there in the class with midpoint
equal to 22?
{Example 2}

into two columns of the EXCEL
worksheet. Verify that you have entered the correct data.
Calculate and store relative frequencies
in a new column. Record relative frequencies in the above table.
{Examples 3 and 4}
ASSIGNMENT 0
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
calculate the mean value.

4. How many observations were there in this data set?
5. What was the mean yield of this sample of wheat varieties?
{Example 1, and Example 8 a and b}
recorded numerical answers to each of
the five questions, you should now leave EXCEL and submit
your answers for grading by the
ISLeX program.
{Example 5}
- END OF ASSIGNMENT 0 -
Blank page
ASSIGNMENT 1

Assignment #1
Purpose
This lab is an introduction to tabular and graphical methods of
descriptive statistics.
NOTE
As you proceed through this assignment, write your answers in
the spaces provided.
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX
program.
Question A
Data in Table A represents measured yields (q/ha, where 1q = 1
quintal = 100 kg) of a
sample of wheat varieties tested at Saskatoon.
ata into an EXCEL worksheet.
{Example 1}
midpoint (20.5 as the starting bin) and 1
as the interval width (bin size). Note that the lower endpoint of
any interval is the midpoint
minus one-half the interval width while the upper endpoint is
the midpoint plus one-half the
interval width. Record the frequencies in the preceding table;

add relative frequencies later.
Excel places data points that are on a bin boundary in the lower
bin.
20.5 20
21.5 21
22.5 22
23.5 23
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
3. How many observations were greater than 21.5 and less than
or
equal to 22.5
q/ha?

{Example 2}
EXCEL Worksheet.
{Example 3}
these will be used in question C).
Check the data you have entered and verify that the relative
frequencies sum to 1.0 (within
0.001). Record the relative frequencies in the preceding table.
4. What is the relative frequency of yields in sample A that
were greater than
21.5 and less than or equal to 22.5 ?
{Example 4}
-and-leaf diagram of the data from sample A.
Use an increment of 1.0 between
consecutive stem positions (leaf unit = 0.1). Use the stem-and-
leaf diagram to answer the
following questions.
5. What is the value (in q/ha) of the leaf unit in this stem-and-
leaf diagram?

6. What is the yield (in q/ha) for the item represented by the last
leaf position in
the fifth (from the top) stem position?
{Example 6}
Question B
Data in Table B represents measured yields (q/ha) of a sample
of wheat varieties
evaluated at Tisdale.
{Example 1}
midpoint (24.5 as the first bin) and 1 as
the interval width.

ASSIGNMENT 1
Record the frequencies in the following table; add relative
frequencies later.
24.5 24
25.5 25
26.5 26
27.5 27
28.5 28
29.5 29
30.5 30
31.5 31
32.5 32
33.5 33
34.5 34
35.5 35
36.5 36

9. How many observations fell between 31.5 and 32.5 q/ha?
{Example 2}
EXCEL Worksheet.
Calculate the relative frequencies in each class.
Check that the correct information has been entered, that
frequencies sum to the total
number of observations and that the relative frequencies sum to
1.0.
Record the relative frequencies in the preceding table. Answer
the following question.
10. What is the relative frequency of yields in sample B that
were greater than
31.5 and less than or equal to 32.5 q/ha ?
{Example 4}
Question C

Compare the distributions of yields of wheat varieties in
sample A (Saskatoon) with those
from sample B (Tisdale).
from both samples. Include
appropriate titles and axis labels. Use different line types for
each sample.
Answer the following questions from the relative frequency
polygon.
11. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the highest
relative frequency in the class whose midpoint is 26 q/ha?
(Answer 1 or 2; 0 if same)
12. Which of the two samples, Saskatoon (1) or Tisdale (2) has
the greatest
spread looking at the midpoints?
(i.e. greatest difference between maximum and minimum
midpoint values)?
(Answer 1 or 2; 0 if same)
{Example 7}
recorded numerical answers to each of
the twelve questions, you should now leave EXCEL and submit

your answers for grading by the
ISLeX program.
{Example 5}
- End of Assignment #1 -
ASSIGNMENT 2
Introductory Statistical Laboratory
Assignment #2
Purpose
The three main objectives of this assignment are to:
a) use numerical values as descriptive statistics,
b) introduce the concept of sampling from a population, and
c) demonstrate the effects of sample size.
NOTE
When you exit from EXCEL, you are then required to enter the
answers into the ISLeX program.
Question A

Data in Table A represents protein concentrations (g/kg) of
boxcar lots of durum wheat
delivered to Thunder Bay, Ontario. This data is supposed to be a
population of data points.
EXCEL worksheet, and name the
column. When viewing the data for the first time, you should try
to determine approximately the
number of items and guess at the average value. Scan the data to
try to determine what the
smallest and largest values are.
{Example 1}
record the values of the following
population characteristics (i.e. parameters).
1. How many data points are there in this data set?
2. What is the mean protein concentration (g/kg)?
3. What is the minimum protein concentration?
4. What is the maximum protein concentration?
5. What is the median protein concentration?
6. What is the value of the first quartile?

7. What is the value of the third quartile?
8. What is the standard deviation of the population of protein
concentrations?
{Example 8}
Question B
The data in Table B constitutes 10 random samples, each of
size 7, from the population of
protein concentrations. The data file contains seven rows of
data with each row containing ten
columns.
ata for the ten samples into columns of the
EXCEL worksheet.
{Example 1}
the mean, median, standard
deviation, minimum, maximum, first quartile and third quartile
of each of the ten samples.

Record these descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 7
2 7
3 7
4 7
5 7
6 7
7 7
8 7
9 7
10 7
{Example 9}
calculated in question A to answer
the following questions.

These questions are designed to get you thinking about how
well sample statistics
represent the characteristics of the population from which the
sample was taken.
9. How many of the ten sample means are less than or equal to
the
population mean?
ASSIGNMENT 2
10. How many of the ten sample medians are exactly equal to
the population
median?
11. How many of the ten sample minimums are less than or
equal to the
population minimum?
12. How many of the ten sample maximums are greater than or
equal to the

population maximum?
13. How many of the sample first quartiles are less than or
equal to the
population first quartile?
14. How many of the sample third quartiles are greater than or
equal to the
population third quartile?
15. Which sample has the largest standard deviation?
16. Which sample has the largest range (=Maximum -
Minimum)?
17. What is the ratio of the largest sample standard deviation to
the smallest
sample standard deviation?
18. What is the ratio of the largest sample mean to the smallest
sample mean?
19. Of the two ratios (Questions 17 and 18), which is the
largest, the ratio

of standard deviations (17) or the ratio of means (18)?
{Answer 17 or 18}
{Example 10}
Question C
The data in Table C constitutes 10 random samples, each of
size 27, from the population
of protein concentrations.
EXCEL worksheet.
{Example 1}

median, standard deviation,
minimum, maximum, first quartile and third quartile of each of
the ten samples. Record the
descriptive statistics in the following table.
Sample Size Mean Median Standard
Deviation
Minimum Maximum Q1 Q3
1 27
2 27
3 27
4 27
5 27
6 27
7 27
8 27
9 27
10 27
{Example 9}

questions A and B to answer the
The following questions are designed to get you thinking about
how the size of the
sample affects relationship between sample statistics and
population parameters.
20. How many of the ten sample minimums were exactly equal
to the population
minimum?
21. How many of the ten sample maximums were exactly equal
to the population
maximum?
22. For samples of size 27, what is the ratio of the largest
sample mean to the
smallest sample mean?
23. For samples of 27, what is the ratio of the largest sample
standard deviation
to the smallest sample standard deviation?

ASSIGNMENT 2
For the following questions, answer 0 if the statement is false
or 1 if it is true.
24. The ratio of the largest sample mean to the smallest sample
mean was less in
samples of 27 than in samples of 7.
25. The ratio of the largest to the smallest sample standard
deviations was greater
in the larger samples.
{Example 10}
- Please use ISLeX to record and grade your answers -

Blank page
ASSIGNMENT 3
Assignment #3
Purpose
This assignment is and introduction to questions concerning
discrete probability
distributions.
NOTE
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX
program.

Question A
A binomial experiment consists of repeated trials each with two
possible outcomes. The
outcome of any trial is independent of all other trials. The
binomial distribution gives the
probability that a number X of n independent trials will have
one type of outcome. X can be any
number from 0 up to the total number of trials.
The data in Table A gives the probabilities of observing that X
= 0, 1, .. 20 out of 20
flower seeds from a given lot will germinate.
lumns of the EXCEL worksheet and
attach appropriate names to those
two columns. Then, record the probabilities in the following
table.
{Example 1}
Number germinated (out of 20) Probability
0
1

2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

20
Use the table to answer the following questions.
1. What is the probability that all 20 seeds in a random sample
of 20 seeds will
germinate?
2. What is the probability that fewer than 15 seeds in a random
sample of 20
seeds will germinate?
ASSIGNMENT 3
3. What is the probability that at least 17 seeds in a random
sample of 20 will
germinate?
4. What is the probability that the number of seeds in a random
sample of 20 that

will germinate is between 10 and 15?
HINT: Do not include 10 and 15.
5. What is the probability that the number of seeds in a random
sample of 20 that
will germinate will be less than 10 or greater than 17?
HINT : You will have to add the probabilities for 0, 1, .. 9
and 18, 19, 20.
6. What is the mean of this binomial distribution?
HINT: The mean of a discrete variable can be calculated by
summing the
products of each value multiplied by its corresponding
probability.
7. What is the variance of this binomial distribution?
HINT : The variance of a probability distribution is the
mean of the
squares of values minus the square of the mean of values.
{Example 11}

Question B
This question is based on a Poisson discrete probability
distribution. The distribution is
important in biology and medicine, and can be dealt with in the
same way as any other discrete
distribution.
Red blood cell deficiency may be determined by examining a
specimen of blood under
the microscope. The data in Table B gives a hypothetical
distribution of numbers of red blood
cells in a certain small fixed volume of blood from normal
patients. Theoretically, there is no
upper limit to the value of a POISSON distribution. In reality,
you can force only so many red
blood cells into a given volume.
worksheet, name the columns, and
view the table. Since the table is quite large, you should attempt
to answer the following
questions without actually recording the table.
{Example 1}
questions.
8. What is the probability that a blood sample from this
distribution will have
exactly 20 red blood cells?

9. What is the probability that a blood sample from a normal
person will have
between 19 and 26 red blood cells?
HINT: See questions 3 and 4.
person would have
fewer than 10 red blood cells?
person will have at
least 15 red blood cells?
HINT: Since there is no theoretical upper limit to the Poisson
distribution, the
correct way to answer this question is to calculate 1 –
probability of fewer than
15 red blood cells.
ASSIGNMENT 3

12. A person with a red blood cell count in the lower 2.5
percent of the
distribution might be considered as deficient. What is the red
blood cell
count below which 2.5 percent of the distribution lies?
HINT: You need to determine a value X so that if you sum all
the probabilities
for counts up to and including that value they will sum to at
least 0.025. The
sum of probabilities of all counts up to but excluding X should
be less than
0.025.
You can proceed in the following way.
Look at the table to guess how many probabilities (P[X = 0]
+ P[X = 1] + . . )
should be added to give a sum of approximately 0.025.
Calculate sums of
probabilities for your guess of X.
Continue your guessing of X until you get a sum ≥ 0.025
while the sum for
X-1 < 0.025.
13. What is the mean red blood cell count in this distribution?

14. What is the variance of red blood cell count in this
distribution?
HINT: See question 7, and remember it is a Poisson
distribution.
15. Is the following statement true (1) or false (0) for this
distribution?
In a Poisson distribution, the variance is equal to the
mean (within
rounding error). Record 1 if true, 0 if false.
{Example 11}
Please enter your answers into the ISLeX program
- END OF ASSIGNMENT 3 –

Blank page
ASSIGNMENT 4
Assignment #4
Purpose
This lab is an introduction to questions concerning cumulative
continuous probability
distributions.
NOTES
When you exit from EXCEL, you will then be required to enter
the answers into the ISLeX

program.
With continuous distributions, P{X = x} = 0. In words, the
probability that a continuous
variable equals a particular value is considered to be zero. For
this reason, all questions
concerning continuous distributions must be phrased in terms of
intervals. Furthermore, the
probability that a continuous variable is less than or equal (≤) to
a particular value is the same as
the probability that the variable is less (<) than that particular
value.
The EXCEL NORM.DIST function gives the probability that a
normal variable is less
than (or equal to) a specified constant.
The terminology concerning probability varies from one source
to another. For this
assignment, consider that probability = relative frequency =
proportion. Also for this
assignment, percentage = 100 * probability.
Question A
Suppose that height (cm) of male university students is
normally distributed with the
mean given in column 1 of Table A (LAB4A.DAT) and a
standard deviation given in column 2
of Table A.
heights from Table A and store

them for further use. The data file contains one row with two
columns. The first column contains
the mean, the second contains the standard deviation.
1. What is the mean height in this population?
2. What is the standard deviation of height in this population?
{Example 12}
NORM.DIST function, to calculate
answers for the following questions.
3. What proportion of male university students are expected to
have a height
between 170 and 180 cm?
4. What percentage of male university students would have a
height less than
170 cm?

5. If a student is chosen at random from this population, what is
the probability
that he will be taller than 180 cm?
{Example 13}
Question B
Suppose that the average length of telephone calls made by
teenagers is a normally
distributed variable with mean and standard deviation given in
columns 1 and 2 of Table B
(LAB4B.DAT).
mean and standard deviation of the distribution of
lengths of telephone calls from the
first two columns of Table B and store them for further use.
{Example 12}
Use the values and the EXCEL NORM.DIST function to
calculate answers for the
following.
6. What is the mean length of telephone call?
7. What is the standard deviation of this distribution?
8. What is the probability that a random telephone call will last

a length of time
that is within one standard deviation of the mean (± 1 standard
deviation)?
9. What is the proportion of telephone calls that last a length of
time that is
within two standard deviations of the mean (± 2 standard
deviations)?
10. What is the relative frequency of lengths of teenage
telephone calls that lie
within three standard deviations of the mean (± 3 standard
deviations)?
11. What is the probability that a telephone call will be longer
than the mean by
more than 1.645 standard deviations?
{Example 13}

ASSIGNMENT 4
Question C
In a study conducted by Booth et al (Int. J. Sports Psychol.
17:269-279 1986), student
nurses at the University of Ottawa completed the Thurston-
Richardson attitude questionnaire and
voluntarily took the Canadian Home Fitness Test. They found
that the frequency response of
heart rates after a second exercise bout ranged from 101 to 190
beats per minute and seemed to
follow a normal distribution. The mean heart rate was 145 with
a standard deviation of 20.
and standard deviation = 20) to
calculate the answer to the following question.
12. What is the estimated proportion that had a frequency
response of less than
130 after the second exercise session?
{Example 13}
Question D

A standard normal distribution is one for which the mean is
zero and the standard
deviation is unity (1.0). This distribution is often referred to as
the z-distribution.
IST function to calculate answers
to the following questions.
13. What is the probability that a standard normal variable will
have a value less
than 1.96?
14. What is the probability that a standard normal variable will
have a value
between -1 and +1?
{Example 13}
Please enter your answers into the ISLeX program

Blank page
ASSIGNMENT 5
Assignment #5
Purpose
The main objectives of this assignment are to:
a) use a goodness-of-fit test to demonstrate an important
statistical theorem and
b) calculate means and confidence intervals for a single sample
when σ is known and when σ is
not known.
NOTE
As you proceed with this assignment, write your answers in the
spaces provided. When
you have completed the assignment and exit from EXCEL, you
are required to enter your

Question A
The central limit theorem states that means of samples of more
than 30 observations from
any distribution will have a distribution that
a) is approximately normal,
b) has a mean equal to the mean of the original distribution,
and
c) has a standard deviation equal to the standard deviation of
the original distribution
divided by the square root of the sample size.
The Poisson distribution is discrete and skewed; it is decidedly
non-normal! However,
the central limit theorem states that the means of sufficiently
large (n ≥ 30) samples from even a
Poisson distribution will be normally distributed.
The means of 100 samples, each of size 40, from a Poisson
distribution are recorded in
Table A. For this first question, you are required to use the
'goodness-of-fit' test to test the
hypothesis that the means in this file are normally distributed
with a mean of 10 and a standard
deviation of 0.5.
distribution into the EXCEL worksheet.
{Example 1}
the sample means.

1. What is the mean of the 100 sample means?
2. What is the standard deviation of the 100 sample means?
{Example 9}
means in each of the classes
indicated in the following table. Note that interval endpoints are
midpoint ± 0.5*width and the
interval midpoint is average of two endpoints. Use 9.31, 9.61,
9.91, 10.21, 10.51 and 10.81 as the
“bin” boundaries for the Excel HIST0GRAM procedure.
Class interval Midpoint Expected
frequency
Observed
Frequency
< 9.31 - 8.3794
9.31 - 9.61 9.46 13.3902
9.61 - 9.91 9.76 21.0881

9.91 - 10.21 10.06 23.4181
10.21 - 10.51 10.36 18.3379
10.51 - 10.81 10.66 10.1248
> 10.81 - 5.2616
3. What was the observed frequency of sample means that fell
between 9.91 and
10.21 ?
{Example 2}
EXCEL worksheet and the seven
observed frequencies into another column. Make sure that
expected and observed frequencies for
the same class are entered in the same row. Check that both
columns of data sum to 100 (within
rounding error). If they do not, correct your error(s).
{Example 3}
-of-fit test should now be used to see if the
observed frequencies in two or more
classes of observed values agree sufficiently well with those
expected on the basis of some
hypothesis. In this example, the hypothesis is that the means of
samples will be normal with
mean 10 and standard deviation 0.5.

The test requires that you calculate a chi-squared statistic by:
a) calculating the differences between the observed and
expected frequencies in each class,
b) squaring the differences and dividing by the expected
frequencies in each class, and
c) summing the values from step b.
ASSIGNMENT 5
4. What is the value of (O-E)2/E for the first class ?
5. What is the value of the chi-squared statistic (that is, the sum
over all seven
classes of (O-E)2/E) ?
With seven classes, the chi-square statistic has 7-1 = 6 degrees
of freedom and the critical
value of a 5% significance level is 12.6. If your test statistic is
less than 12.6, you should
conclude that the observed data show a good fit to the
hypothesis.
6. Does the data show a good fit to the normal distribution with
mean 10 and

standard deviation 0.5 (0 for no, 1 for yes) ?
7. Based on your limited experience, is the following statement
true (use 1) or
false (use 0)?
Means of samples of size 40 from a Poisson (discrete)
distribution are
approximately normal (continuous).
{Example 14}
Question B
The time (in minutes) required for six-year old children to
assemble a certain toy is
believed to be normally distributed with a known standard
deviation of 3.0. The data in Table
B gives the assembly times for a random sample of 25 children.
compute and report the mean and
standard deviation.
8. What was the mean assembly time for this sample of 25 six-
year old children?
9. What was the estimated standard deviation?

{Examples 1 and 9}
deviation is known or given, one
should use a standard normal distribution to calculate a
confidence interval for the population
mean. The procedure for calculating a large sample confidence
interval for one mean involves
three basic steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with known standard deviation the critical value is
z0.05 = 1.645).
b) calculate the margin of error of the estimate E = zα/2σ/√n,
and
c) calculate lower limit = mean – margin of error,
and upper limit = mean + margin of error
10. What was the margin of error of the estimate for a 90%
confidence interval?
11. What was the lower limit of the 90% confidence interval for
average
assembly time?

12. What was the upper limit of the 90% confidence interval for
average
assembly time?
13. From this example, would you say that the following
statement is true (use 1)
or false (use 0) ?
The lower confidence limit must always be less than the sample
mean and
the upper confidence limit must always be greater.
14. From this example, would you say that the following
statement is true (use 1)
or false (use 0)?
When one has a choice of a known (or given) standard deviation
and an
estimated standard deviation, one should ignore the estimated
standard
deviation in calculating confidence intervals.
{Example 15}

Question C
The level of monoamine oxidase (MOA) activity (nmol/hr/mg
protein) was recorded for
fourteen non-responsive depressed patients who had been
treated with phenylzine. MOA activity
is assumed to follow a normal distribution. The data are stored
in a single column of Table C.
You are asked to calculate a point estimate and an interval
estimate of the mean MOA activity of
this type of patient. Nothing is known about the variability of
MOA activity.
worksheet, and compute and
report the mean and standard deviation.
15. What was the point estimate for the mean MOA activity for
this sample of 14
depressed patients?
16. What was the standard deviation?
{Examples 1 & 9}

ASSIGNMENT 5
When data has a normal distribution but is from a small
(<30) sample or when data is from a
large sample (≥30) and in either case σ is not known, one
should use a t-distribution to calculate
a confidence interval for the population mean. The procedure
for calculating a confidence
interval for one mean when σ is not known involves three basic
steps:
a) determine a critical value from the appropriate distribution
(for a 90% confidence
interval with estimated standard deviation the critical value is
tα/2,n-1 = t0.05,13 = 1.771),
b) calculate the margin of error of the estimate, E = tα/2,n-
1s/√n, and
c) calculate lower limit = mean – margin of error
and upper limit = mean + margin of error
17. What was the margin of error of estimate for a 90%
confidence interval in
this sample of 14 depressed patients?
average MOA
activity?

19. What was the upper limit of the 90% confidence interval for
average MOA
activity ?
20. From these examples, would you say that the following
statement is true (use
1) or false (use 0)?
All confidence intervals are calculated by calculating a point
estimate and then
subtracting and adding a margin of error of the estimate.
{Example 16}

Blank page
ASSIGNMENT 6
Assignment #6
Purpose
The objectives of this assignment are to:
a) calculate a confidence interval for a proportion and
b) present confidence intervals and tests of hypothesis for
matched pairs.
NOTE
Question A
Opinion polls are a popular method for assessing product

preference, political preference,
and more. As a simple example, consider that a poll was taken
ten days prior to a civic election
to try to predict what proportion of the electorate would vote for
the incumbent mayor. The data
in Table A represents the results of a moderate sample of
persons who were asked if they would
vote for the same mayor; a yes was recorded as 1, a no as 0.
You are required to analyze the
results of the poll and predict what proportion of voters will
vote for the incumbent.
he EXCEL worksheet, prepare a
histogram to count the number of yes (1)
and no (0) responses, and calculate the proportion who
indicated that they would vote for the
incumbent mayor. Note that, since yes and no are represented by
1 and 0, the proportion of yes
can be determined by calculating the sum and dividing by the
total sample size.
1. How large was the sample of voters represented in this poll?
2. What proportion of the sample voters indicated they would
vote for the
incumbent mayor?
{Examples 1, 2 and 10}

voters expected to vote for the
incumbent mayor. The procedure for calculating a confidence
interval for a proportion involves
three basic steps.
3. Determine the α/2 critical value for the appropriate
distribution (standard
normal in this case). Use the NORM.INV function to calculate
the critical value
=NORM.INV(0.95,0,1). What is the critical value for a 90%
confidence interval
based on the standard normal distribution?
4. What is the standard error of the estimated proportion of
polled voters who
favour the incumbent? n
qpsp
ˆˆ
ˆ =
5. What is the margin of error of the estimated proportion?
6. What is the lower 90% confidence limit on the proportion of
voters who will

vote for the incumbent?
7. What is the upper 90% confidence limit on the estimated
proportion of voters
who will vote for the incumbent?
{Example 17}
23,217 of the 58,839 persons that voted
actually voted for the incumbent. Calculate and report the actual
proportion that voted for the
incumbent.
8. What was the proportion that actually voted for the
incumbent?
9. Based on the results given in questions 6, 7 and 8, which of
the following
statements (1, 2 or 3) is most correct?
1 - The poll of a sample of voters gave a good indication of the
final vote.
2 - Many of the voters who would have voted for the incumbent
at the time of the poll
must have changed their minds.
3 - The persons sampled in the poll must have contained an

unusually low proportion of
those who favoured the incumbent.
{Example 10}
Question B
The Monster Chemical Company believes that its herbicide
(Avena-doom) is better than
its competitor's herbicide (Avena-kill) for controlling wild oat
in barley fields. To demonstrate
ASSIGNMENT 6
the advantage of their herbicide over that of their competitor,
Monster grew side-by-side plots of
barley treated with each of the two herbicides in a large sample
of farmers' fields throughout
western Canada. The company then wished to compare the
yields of barley treated with the two
types of herbicides.
Yield of barley will vary from farm to farm regardless of which
herbicide is used. A
difference in climate, differences in agronomic practices, and
differences in type of barley grown
cause variation. For this reason, it is desirable to match the data
from the two plots on each farm.

The analysis is one of looking at differences between matched
pairs.
with Avena-doom (second
column), and barley yield with Avena-kill (third column) from
the three columns in Table B into
columns of the EXCEL worksheet. Describe the data from the
two treatments.
10. What was the average barley yield for plots treated with the
Avena-doom
herbicide?
11. What was the standard deviation of yields of barley plots
treated with
Avena-doom ?
12. What was the average yield of plots treated with Avena-kill?
13. What was the standard deviation with Avena-kill?
{Examples 1 and 9}
calculated and then analyze the
differences.

14. What was the mean of the differences between yield of
barley plots treated
with Avena-doom and Avena-kill ?
15. What was the standard deviation of the differences (for each
pair)?
16. Was the standard deviation of the differences smaller (0) or
larger (1) than
the standard deviation of the barley yields from plots treated
with
Avena-doom?
{Examples 10 and 9}
differences in yield between plots
treated with Avena-doom and those treated with Avena-kill.
NOTE: The standard deviation is estimated from the data so
we use the t distribution.

17. What is the critical value for the confidence interval?
18. What was the margin of error of the estimated mean
difference?
the average
difference in yields of barley treated with Avena-doom and
barley treated with
Avena-kill?
20. What was the upper limit?
{Example 16}
Question C
Use the same data and results of Question B to investigate the
hypothesis that the
increase in barley yield by using Avena-doom instead of Avena-
kill is no greater than 3.0 q/ha
(300 kg/ha). The alternative to this hypothesis is that the
increase is greater than 3 q/ha.
To test this hypothesis, one must calculate a test statistic, t =
Mean of differences - hypothesized mean ( =3.0)
Standard error of the differences
The null hypothesis should be rejected if the test statistic
exceeds the critical value from
the theoretical distribution. For a 5% significance level, α =
0.05, the critical value for a

one-tailed test can be found by using the appropriate T.INV
function (see Example 18) with n-1
degrees of freedom. For matched pairs, n is the number of pairs.
In this instance, the null hypothesis should be rejected if the
test statistic exceeds the
critical value.
21. 21. What is the value of the test statistic for testing the
hypothesis that the mean
difference is 3.0 q/ha or less?
22. What is the critical value against which the test statistic in
question 21 should
be compared?
23. Should the hypothesis that the yield difference is 3 q/ha or
less be rejected
(1) or not (0)?
{Example 18}
ASSIGNMENT 7

Assignment #7
Purpose
This lengthy assignment serves to review calculations of
confidence intervals and tests of
hypothesis for:
a) two means of large independent samples from populations
with unknown and unequal
variances,
b) two means of small independent samples from populations
with the same unknown variance,
c) two proportions from large independent samples.
NOTE
Question A
The role that cholesterol plays in the development of
"hardening of the arteries"
(atherosclerosis) and heart disease has been widely reported. In
one experiment, a group of
patients who were considered to be high-risk were split into two
equal groups. The first group

was put on a special diet with a high proportion of fish (salmon,
tuna, mackerel and cod). Oil
from these deep-sea fish is known to be very rich on Omega-3
fatty acids. The other (control)
group was maintained on a standard diet (high-protein, low-fat,
complex carbohydrates and
polyunsaturated cooking oil). The change (decrease) in
cholesterol was measured after a period
of time. A greater change is desirable.
The (simulated) data (mg decrease per decilitre of blood) for
the Omega-3 group is stored
in Table A1, and the data for the control group is stored in
Table A2. You are required to
calculate a 95% confidence interval for the average difference
in cholesterol reduction and to test
the hypothesis that there was no difference between the two
diets in average reduction of
cholesterol.
m the 'Omega-3' group [Table A1] the data
from the 'control' group [Table
A2] into the EXCEL worksheet. Determine and report the
number of observations in each group,
the mean change (mg/dl) in each group and the standard
deviation of the change in each group.
1. How many patients were in each diet group?
2. What was the mean (decrease) in cholesterol for the Omega-3
group of
patients?

3. What was the standard deviation in that group?
4. What was the mean (decrease) in cholesterol for the control
group of patients?
5. What was the standard deviation in the control group?
{Examples 1 and 9}
variances that are unequal. We
can use the normal distribution as an approximation to the t
distribution when the sample sizes
are large. The method for calculating a large-sample confidence
interval for the difference
between two means consists of three basic steps.
a) Estimate the difference between the two sample means
and the standard error of the
difference between the two sample means.
6. What is the estimated difference of means?
7. Standard error of the difference between means

2
2
2
1
2
1
n
s
n
s
+=
What is the standard error of the difference of means?
b) Calculate the margin of error of the estimated difference of
means. For this
large-sample 95% confidence interval we can approximate with
a z value which is z0.025 = 1.96.
Calculate the confidence interval as difference between means ±
margin of error.
8. What is the margin of error of the estimated difference?
9. What is the lower limit for the 95% confidence interval of the
difference in
cholesterol reduction between Omega-3 and control diets?

10. What is the upper limit?
{Example 19}
difference between the two diets
proceeds as follows. Since we expect that the Omega-3 diet
should give a greater decrease in
cholesterol than the control, we will use a one-tailed alternative
hypothesis. Use a 5%
significance level to test the null hypothesis that there is no
difference between the diets against
an alternative that the difference between Omega-3 and control
groups is greater than zero.
The test of hypothesis has two basic steps:
ASSIGNMENT 7
a) Compute the test statistic (z) as the difference in means
divided by standard error of the
difference.
b) The null hypothesis should be rejected if the test statistic
exceeds the critical value for
a one-tailed alternative (approximately 1.645 for 5%
significance in a large-sample, one-tailed
test).

11. What is the value of the test statistic?
12. Should the null hypothesis be rejected and the conclusion be
that Omega-3
diet did indeed cause a greater reduction in cholesterol than the
control diet?
Yes =1, No = 0
{Example 19}
Question B
In some law schools, the score on a test known as LSAT is an
important criterion for
acceptance. Two law schools decided to compare the LSAT
scores of students registered in their
respective schools. LSAT scores for students in Law school 1
are stored in Table B1 and those
for students from Law school 2 in Table B2.
Assume that the variances of LSAT scores are equal in the two
schools. You are asked to
calculate a 90% confidence interval for the difference in
average LSAT scores and to test the
hypothesis that students from the two schools do not differ in
their average LSAT scores. Use a
5% significance level.
from Law school 2 into the
EXCEL worksheet. Compute and report the number, means and

standard deviations of scores
from each school.
13. How many LSAT scores from school 1?
14. What was the mean LSAT score from school 1?
15. What was the standard deviation of scores from school 1?
16. How many LSAT scores from school 2?
17. What was the mean LSAT score from school 2?
18. What was the standard deviation of scores from school 2?
{Examples 1 and 9}
eps to calculate a 90% confidence
interval for the difference in mean
LSAT scores when variances are unknown but assumed to be
equal.
a) calculate the difference between the two means (school 1 -
school 2)
b) calculate the pooled variance for the two samples:
c) calculate the standard error of the difference:

d) Calculate the critical value and margin of error for α = 0.10.
Use the T.INV function to
get the critical value. Multiply the critical value by the standard
error of the difference to get the
margin of error. Use degrees of freedom = n1 + n2 – 2.
e) Calculate the lower and upper 90% confidence limits
19. What is the estimated pooled variance for this data?
20. What is the standard error of the difference?
21. What is the margin of error of the difference?
22. What is the lower limit of the difference between the two
schools in LSAT
scores?
{Example 20}
pools =
( 1n -1) 21s + ( 2n -1)

2
2s
( 1n -1) + ( 2n -1)
1n = size, sample 1
2n = size, sample 2
1s = st.dev, sample 1
2s = st.dev, sample 2
sx1−x2 = pools (
1
1n
+ 1
2n
)
ASSIGNMENT 7
hypothesis that the means of the
two groups of LSAT scores are equal when the samples are

independent and the population
variances are unknown but equal. The test statistic is the
difference in means minus zero divided
by the standard error of the difference. The null hypothesis
should be rejected if the test statistic
is less than -tα/2,df or greater than tα/2,df where df = n1 + n2 -
2 and α=0.05 is the chosen
significance level. Use the T.INV function to calculate the
critical values for this two-tailed test.
23. What is the value of the test statistic for testing the
hypothesis that the mean
LSAT scores are the same for the two law schools?
24. Using the 5% significance level, should the null hypothesis
be rejected (1) or
not (0)?
{Example 20}
Question C
The legislature of a southern state in the U.S. passed a rule,
commonly called "no-pass,
no-play", which prohibits a student who fails in any subject
from participating in any
extracurricular activity for six weeks. Data were collected for
students involved in football,
volleyball, cross country, and band for the first six-week

grading period. Records were kept from
last year and this year.
The numbers of students is stored in column 1 and the
proportions sidelined because of
the rule are stored in column 2 of Table C, the first row being
for last year and the second for this
year.
values.
25. How many students were there in last year's sample?
26. What proportion of the last year's students were sidelined
because of one or
more failures?
27. How large was this year's sample?
28. What proportion failed and were sidelined this year?
{Example 1}

change (last year minus this year)
in proportion of students sidelined.
a) Calculate the difference in proportions.
b) Calculate the standard error of the difference.
n
)p-(1p
+
n
)p-(1p
= s
2
22
1
11
pp
ˆˆˆˆ
2ˆ1ˆ −
c) Calculate the margin of error of estimate. For a 90%
confidence interval with large
samples, use z0.05 = 1.645.

d) Calculate the lower and upper limits.
29. What is the upper 90% confidence limit on the change in
proportion of
students sidelined because of failure?
{Example 21}
an alternative that the proportion
sidelined has decreased (that is, the difference in proportions is
greater than zero). Use a 5%
significance level.
NOTE: Under the null hypothesis, the proportions are equal
and we should therefore calculate
an average proportion for the two groups. This will result in a
new estimate of the standard error
of the difference between sample proportions.
average (pooled) proportion =
30. What was the average (pooled) proportion sidelined?
31. Now use the pooled proportion to calculate the standard
error of the
difference between the two proportions.
)
n

+
n
)(p-(1p = s
2
pp
11
1
2ˆ1ˆ −
What is the value of the test statistic for testing the hypothesis
that the
proportion did not change (remember to divide by the standard
error of the
difference between the two proportions which was calculated
using the
pooled proportion)?
n + n
pn +pn = p
21
2211 ˆˆ

ASSIGNMENT 7
Use a one-tailed test with a 5% significance level to answer
the following question.
Remember that you will reject the null hypothesis if the test
statistic exceeds the critical value
(1.645 in this case).
32. Was the superintendent of schools justified in saying, "We
are very pleased
with the improvement. It shows coaches and students are taking
the rule
seriously"? Answer 1 for yes or 0 for no.
{Example 21}
- END OF ASSIGNMENT 7 –
ASSIGNMENT 8
Assignment #8

Purpose
In this assignment calculations will be completed for analyses
of variance for :
a) a one-way design,
b) a two-way design with more than one observation per cell,
and
c) a two-way design with one observation per cell (randomized
complete block design)
NOTE
spaces provided. When you
have completed the assignment and exit from EXCEL, you are
required to enter your answers into
the ISLeX program.
Question A
Gasoline mileage (mpg) was measured on several cars of each
of four different makes
(coded 1, 2, 3 and 4). The make of each car is stored in the first
column, and the mileage for each
car is stored in the second column, of Table A. You need to
conduct an analysis of variance to see if
there are differences among the four makes in gasoline mileage.
You should also estimate the
mileage of each of the four makes of cars.
worksheet. Name the columns and
view the data.

{Example 1}
-way analysis of variance on this data. Since
each data point can be classified only
according to the make of car, a one-way analysis of variance is
required. It is important that students
be able to interpret analysis of variance tables such as those
produced by EXCEL. For this analysis,
you will need to copy data for each make into different adjacent
columns. Fill in the following
one-way analysis of variance table and answer the first five
questions.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Make of car 3

Error
1. What is the value of the F-statistic for testing the null
hypothesis that there are no
differences in gasoline mileage among the four makes of
automobile?
2. What are the degrees of freedom associated with the
numerator of this test
statistic?
3. What are the degrees of freedom associated with the
denominator of the F-value
for MAKE of car?
4. What is the estimate of the pooled variance within makes of
cars (i.e. the Error
mean square)?

5. What are the degrees of freedom for this variance in #4?
{Example 22}
NOTE: For the following questions (6 - 13), use the error mean
square and the error degrees of
freedom to calculate confidence intervals and to test hypotheses
about pairs of means.
car
and record them in the following table.
Make of car Number tested Average mileage
1
2
3
4
6. How many cars of make 2 were evaluated in this experiment?
7. What was the average gasoline mileage for make 2?
8. How many cars of make 3 were evaluated in this experiment?
9. What was the average gasoline mileage for make 3?

make 2. Use the method for single
means when σ is not known, but use the Error Mean Square as
the estimate of the variance. The
degrees of freedom will be the Error DF, not n-1!
Reminders:
Confidence Interval = mean ± margin of error
Margin of error = critical value * standard error
Use critical value for T at α/2 = 0.025 and df = error df (t table
or EXCEL T.INV function)
Use standard error = √(error mean square/number of
observations of that make of car)
10. What was the margin of error for the confidence interval for
gasoline mileage
of make 2?
ASSIGNMENT 8
11. What was the lower 95% confidence limit for make 2
mileage?
12. What was the upper 95% confidence limit for make 2
mileage?
{Example 24}

of makes
2 and 3 do not differ. Use the
method for single means when σ is not known with the Error
MS serving as the pooled variance.
Reminders:
Test statistic t = difference of means / standard error of
difference of means.
The standard error of the difference equals square root of the
sum of variances of the two
means. The variance of each mean is estimated by the error
mean square/number of
observations in that mean.
13. What is the value of the t test statistic for testing the
hypothesis that makes 2
and 3 do not differ in mileage?
{Example 24}
Question B
The data in Table B represents the times (in seconds) for men
of three different ages (40, 50
and 60) in each of three different fitness classes (1, 2 and 3) to
run a 2 km course. For each runner,
age is recorded in the first column, fitness category is recorded
in the second column, and running
time is recorded in the third.

Two men in each of the nine categories ran the course. You
should be interested in
determining whether age and/or fitness affect running time.
Each data point can be classified
according to age of the runner or according to fitness of the
runner. The data therefore requires a
two-way analysis of variance. It is possible that differences
among ages of runner will depend upon
the fitness categories of those two runners. The model for the
analysis should include an interaction
term.
the columns, and view the data. You
will have to copy the data into three different columns each
with six observations in order to
perform the following analysis (see Example 25).
{Example 1, 25}

out a two-way analysis of variance and answer the
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Age of runner 2
Fitness of runner 2
Interaction 4
Error 9
14. What is the value of the F test statistic for testing the
hypothesis that age, on
average, has no effect on running time?

15. What are the numerator degrees of freedom for that F
statistic reported in
question 14?
16. What are the denominator degrees of freedom for that F
statistic reported in
question 14?
hypothesis that fitness, on
average, has no effect on running time?
hypothesis that the effect
of age (if any) on running time does not depend of the runner's
fitness?
NOTE
In analysis of variance, the null hypothesis should be rejected
whenever the calculated F-statistic is
greater than the critical value for a chosen significance level
and appropriate numerator and

denominator degrees of freedom. Equivalently, the null
hypothesis should be rejected whenever the
computed p-value is less than the chosen significance level. Use
α = 0.01 (significance level =1 %)
and answer the following two questions.
19. Should the null hypothesis that age has no effect on running
time be rejected (1)
or not rejected (0)?
20. Should the null hypothesis that the effect of age is
independent of the effect of
fitness be rejected (1) or not rejected (0)?
{Example 25}
ASSIGNMENT 8

following three questions.
Age Fitness 1 Fitness 2 Fitness 3 Average
40
50
60
Average
21. What was the average running time for all 60-year olds?
22. What was the average running time for all men in fitness
category 3?
23. What was the mean running time of the two 60-year,
category 3 runners?
{Example 25}
Question C
In many agricultural and biological experiments, one may use a
two-way model with only
one observation per cell. When one of the factors is related to
the grouping of experimental units
into more uniform groups, the design may be called a
randomized complete block design (RCBD).
The analysis is similar to a two-way analysis of variance
(question B) except that the model does
not include an interaction term.
The specific leaf areas (area per unit mass) of three types of
citrus each treated with one of

three levels of shading are stored in Table C. The first column
contains the code for the shading
treatment, the second column contains the code for the citrus
species, and the third column contains
the specific leaf area. Assume that there is no interaction
between citrus species and shading. Carry
out a two-way analysis of this data.
The shading treatment and citrus species are coded as follows:
Treatment Code Species Code
Full sun 1 Shamouti orange 1
Half shade 2 Marsh grapefruit 2
Full shade 3 Clementine mandarin 3
leaf area into the EXCEL worksheet,
label the columns and look at the data.
{Example 1}
-way (without interaction) analysis of this
data and answer the following questions.

Use a 5% significance level.
Source of
variation
Degrees of
freedom
Sum of
squares
Mean square
F
P
Shading treatment 2
Citrus species 2
Error 4
24. Should the hypothesis that shading treatment has no effect
on specific leaf area
be rejected (1) or not (0)?
25. Should the hypothesis that citrus species do not differ in
specific leaf area be

rejected (1) or not (0)?
26. What is the estimate of the average (pooled) variance in this
experiment (i.e.
Error mean square)?
27. What are the error degrees of freedom for the pooled
variance?
{Example 26}
Recall that the confidence interval for a difference between two
means is based on a
calculation of the margin of error of the estimated difference.
With a common variance (Error MS)
and the same number of observations in all shading treatments,
the margin of error of an estimated
difference will be the same whether we calculate it for
treatments 1 and 2, 1 and 3, or 2 and 3. This
margin of error of the difference between two means is
sometimes referred as the least significant
difference (LSD).
experiment.
LSD = critical t value × standard error of difference.
Use the critical t value with 4 degrees of freedom is t 0.025,4 =
2.776.
n is the number of times of times each treatment was tested (in
this case n = 3 for the 3 species).

n
quareErrorMeanS
t=)LSD( edf/2,
*2
αα
28. What is the least significant difference (α = 0.05) for
comparing shading
treatments in this experiment?
{Example 24}
ASSIGNMENT 8
Any two shading treatments are judged to be significantly
different if their absolute (ignore
the + or - sign) difference exceeds the least significant
difference.

differences. Compare the appropriate
differences to the LSD to answer the following questions.
Shading Treatment Mean Specific Leaf Area
Full Sun
Half Shade
Full Shade
29. Should the hypothesis that the specific leaf area under full
sun is not different
from the specific leaf area in half shade be rejected (1) or not
rejected (0)?
30. Should the hypothesis that the specific leaf areas of half
shade and full shade
are not different be rejected (1) or not rejected (0)?
{Example 24}

Blank page
ASSIGNMENT 9
Assignment #9
Purpose
This final assignment presents some of the important points to
consider in correlation
analysis and simple linear regression analysis.
Question A
The data in Table A gives the (simulated) advertising
expenditures of 25 large companies
for last year and this year. You are asked to investigate the
question of whether or not expenditures
in one year are related to expenditures in another. The data file
contains the company number in the
first column, last year's expenditures ($ millions) in the second
column, and this year's expenditures
($ millions) in the third column.

t,
name the columns, and view the data.
1. Which company had the greatest advertising expenditures last
year?
2. Which company had the greatest advertising expenditures this
year?
{Example 1}
ditures in the
two years and answer the following
question.
3. Which of the following three statements (1, 2 or 3) most
correctly describes the
relationship between last year's and this year's expenditures?
1 - There is little relationship between what a company spends
on advertising in one year and
what that company spends in another.
2 - Companies that spent most on advertising last year tended
to be among those spending the
greatest amount this year.
3 - Companies that spend a lot on advertising in one year tend
to reduce their advertising
expenditures in the next.
{Example 27}

riables
can be measured by the covariance.
The covariance is a measure of how much two random variables
vary together. The larger the
magnitude of the product, the stronger the strength of the
relationship.
The value of the covariance is interpreted as follows:
• Positive covariance - indicates that higher than average values
of one variable tend to be
paired with higher than average values of the other variable.
• Negative covariance - indicates that higher than average
values of one variable tend to
be paired with lower than average values of the other variable.
• Zero covariance - if the two random variables are independent,
the covariance will be
zero. However, a covariance of zero does not necessarily mean
that the variables are
independent. A nonlinear relationship can exist that still would
result in a covariance
value of zero.

Calculate the standard deviation for last year's expenditures, the
standard deviation for this year's
expenditures and the covariance between the two.
4. What is the standard deviation of last year's advertising
expenditures ($ millions)
of these 25 companies?
5. What is the standard deviation of this year's advertising
expenditures ($ millions)
of these 25 companies?
6. What is the covariance between the last year's and this year's
advertising
expenditures ($ millions2) of these 25 companies?
Because the covariance depends on the units of the data, it is
difficult to compare covariances
among data sets having different scales. A value that might
represent a strong linear relationship
for one data set might represent a very weak one in another.
The correlation coefficient (r) addresses this issue by
normalizing the covariance (i.e. divide the
covariance sxy by the product of the two standard deviations (sx
* sy)), creating a dimensionless

quantity that allows the comparison of different data sets.
7. What is the correlation (r) between last year's and this year's
expenditures?
{Example 28}
ASSIGNMENT 9
expenditures from one year to another?
Test the null hypothesis that there is no relationship between
last year's and this year's expenditures
against an alternative that there is a positive relationship (r >
0). Use a 10% significance level.
Because this is a one-tailed test with 25 pairs of observations
(degrees of freedom = 23), we find
that the critical value against which to compare the estimated
correlation is t = 1.319. Using your r
value and n = 25, calculate the test statistic tcalc and compare.
If the test statistic is greater than the
critical value of 1.319, the null hypothesis will be rejected.

21
2
r
nr=tcalc −
−
8. Should the hypothesis that there is no relationship between
last year's and this
year's advertising expenditures be rejected (1) or not (0)?
{Example 28}
Question B
In a study of the role of young drivers in automobile accidents,
data on percentage of
licensed drivers under the age of 21 and the number of fatal
accidents per 1000 licenses were
determined for 32 cities. The data are stored in Table B. The
first column contains a number as the
city code, the second column contains the percentage of drivers
who are under 21, and the third
column contains the number of fatal accidents per 1000 drivers.
The primary interest is whether or
not the number of fatal accidents is dependent upon the
proportion of licensed drivers that are under
21.
py the data into the EXCEL worksheet, name the

columns, and view the data.
9. Which city (number) had the highest number of fatal
accidents per 1000 licensed
drivers?
{Example 1}
percentage of drivers under 21. Based on the
plot, try to anticipate whether or not the following analysis will
show that there is a significant
increase or decrease in number of fatalities with increases in
percentage of drivers under 21.
{Example 27}
can be used to predict levels of a

dependent variable for specified levels of an independent
variable. Use the EXCEL REGRESSION
command to calculate the intercept and slope of the least-
squares line, as well as the analysis of
variance associated with that line. Fill in the following table
and use the results to answer the next
few questions. Carefully choose your independent and
dependent variables and input them
correctly using EXCEL’s regression command. In this example,
the percentage of drivers under the
age of 21 affects the number of Fatals/1000 licenses.
The regression equation (least-squares line) is
Fatals/1000 licenses = + % under 21
(intercept) (slope)
Analysis of variance
Source DF SS MS F P
Regression 1 ________ _______ ________ _______
Residual (Error) 30 ________ _______
10. What is the estimated increase in number of fatal accidents
per 1000 licenses
due to a one percent increase in the percentage of drivers under
21 (i.e. the
slope)?

11. What is the standard deviation of the estimated slope?
12. What is the estimated number of fatal accidents per 1000
licenses if there were
no drivers under the age of 21 (i.e. the y intercept)?
13. What percentage of the variation in accident fatalities can
be explained by the
linear relationship with drivers under 21 (i.e. 100 × the
unadjusted coefficient
of determination)?
14. Should the hypothesis that the slope does not differ from
zero (no effect of
young drivers on fatals) be rejected (1) or not (0) based on a
test at the 1%
significance level (i.e. is the p-value from the ANOVA less than
0.01)?
15. What are the degrees of freedom for the standard error of
estimate (and the
standard deviation of the slope); i.e. what are the error degrees
of freedom?

{Example 29}
ASSIGNMENT 9
to calculate a confidence interval for
the slope of the least-squares line and to test hypotheses other
than H0 : ß1 = 0. In both cases, one
needs to have an estimate of the slope and of its standard
deviation (sometimes called standard
error). Furthermore, one needs to recognize that the degrees of
freedom for the standard deviation is
the same as the error degrees of freedom (n - 2).
Note that the EXCEL gives the standard error of estimate
directly, but correctly calls it the standard
deviation of the slope. Therefore, you must not divide by the
square root of sample size as in
example 16.
Use the above information to calculate a 90% confidence
interval for the slope of the true regression
line. For 30 degrees of freedom and α = 0.1, the critical t-value
is 1.697.
16. What is the margin of error for calculating a 90%
confidence interval for the
slope of the regression line (i.e. 1.697 × the standard deviation

of the slope)?
17. What is the lower 90% confidence limit for the slope?
(i.e. slope – margin of error)
18. What is the upper 90% confidence limit for the slope?
(i.e. slope + margin of error)
null hypothesis H0 : ß1 = 0.05 against
a one-sided alternative H1 : ß1 > 0.05. Use a 1 percent
significance level (for which the critical value
is 2.423).
Reminder : t = estimated value - hypothesized value = slope
- 0.05
standard error (deviation) of estimate st dev of slope
19. What is the value of the test statistic for testing this
hypothesis?
20. Should the hypothesis that the increase in fatals per one
percent increase in
drivers under 21 is not greater than 0.05 be rejected (1) or not
(0)?
- END OF ASSIGNMENT #9 - THE LAST ASSIGNMENT -

Introductory
Statistics
Laboratory
for Excel
PC Instructions for Excel 2013
EXCEL EXAMPLES INTRODUCTION
Excel Examples

INTRODUCTION
Note: Specific Excel 2013 instructions are shown in [Excel
2013: ] throughout the excel
examples.
These EXCEL examples provide a basis for learning to use
MICROSOFT EXCEL to
perform various tasks required in the ISLeX laboratory
assignments.
The examples may not refer exactly to the task to be
performed. For instance, in some
cases, the example may use different columns than required for
a particular task.
Your laboratory sessions will be much less frustrating if you
study the assignment and
associated examples before sitting down at a computer.
The examples will not match exactly what you need to do to
complete your assignments.
They should provide an adequate outline, but you will have to
modify the example to complete
your assigned task. For instance, you will need to use different
file names in your lab
assignments than those used in examples. You will also have to
refer to different EXCEL
worksheet columns.
The EXCEL workbook contains one or more worksheets each
identified by a tab on the
lower left part of the window. EXCEL will assign default
names, such as Sheet 1, to individual

worksheets or the user can change the name by clicking the
right mouse button on the tab and
choosing the 'rename' option.
Each worksheet is composed of cells arranged in rows and
columns. Rows are identified
by numbers 1, 2, 3 and so on, while columns are identified by
letters A, B, C and so on. After
column Z, naming starts with AA and proceeds to ZZ. Each cell
may contain a number, some
text, or a formula.
In this manual, only absolute referencing is used to refer to
cells or blocks of cells. To
refer to the cell located in the second row of column C, use C2.
To indicate all cells in the block
that includes rows 2 to 10 of columns B through D, use the cell
designations for the cell in the
upper left corner (i.e. B2) and for the cell in the lower right
corner (i.e. D10) separated by a
colon, thus B2:D10.
Sometimes, it will be useful to enter a formula into a cell and
then copy that formula to
other cells. If the formula in cell B2 refers to cell A1, it will
refer to cell D5 when the formula is
copied to cell E6. If you wish it to continue to refer to cell A1,
use $A$1 instead of A1 in the
formula.
INTRODUCTION EXCEL EXAMPLES

EXCEL commands and subcommands can be selected by
clicking the left mouse button
on the required command or subcommand.
When you first start using Excel, you should become familiar
with three important areas
in the Excel window. Mention has already been made of the
cells arranged in rows and columns
in the worksheet. In fact there may be several worksheets in a
single workbook.
If you place the cursor in a particular cell, the “Name box”
located at the upper left hand
side of the worksheet will indicate the identity of the active
cell, e.g. B5.
If you type a number, name or formula into that cell, it will
also appear in the “Formula
bar” at the top of the worksheet. If you then press the enter key,
the cursor will move to the next
cell and the formula bar will become blank (if the next cell is
empty). If you had entered an
actual formula, it will be evaluated and the evaluation will be
present in the cell that you entered
the formula. If you made an error and need to edit the formula,
highlight the cell and then move
the cursor to the formula bar to edit the formula.
In these laboratory assignments, you are sometimes required to
combine information
from two parts of an assignment. Typically, each part will result
in a separate workbook in

Excel. You can copy data from one workbook to another by
using the following procedure.
Highlight the data you wish to copy and press Ctrl-C to copy
the data.
Use the Window command of Excel to choose the workbook you
wish to copy to.
Place the cursor where you wish to past the data and press Ctrl-
V
Note: Rather than using Ctrl-C and Ctrl-V to copy and paste,
you may use Edit->Copy and
Edit->Paste.
Most data analysis tools of Excel default to printing their
results on a new worksheet.
However, most also have an option to specify an output range
on the same worksheet. If you
choose the Output range option, click in the adjacent box and
then highlight the area of the
worksheet where you wish to store the results.
EXCEL EXAMPLES EXAMPLE 1
Example 1: Copying data from the assignment webpage into the
EXCEL worksheet.
Your data will be presented to you in a web page. To copy
the data to Excel:
• First highlight the data and either press the key combination
ctrl-c, or select Copy from

the Edit menu to copy the data (to the clipboard).
• Then, switch to the Excel window and either use the key
combination ctrl-v, or select
Paste from the Edit menu to paste the data into Excel.
At this stage, you should now have the data on an Excel
worksheet. (If you wish, you
can name this worksheet LAB0A.DAT by right clicking on its
tab at the bottom and choosing
the rename option.)
This same procedure applies to all assignments. Follow the
above procedure even with
multi-column tables.
If you wish to add a label in cell 1 of column A, move the
cursor to that cell and then
choose Insert->Cells and click OK (or press enter) on the Insert
dialog box to move all cells
down. [Excel 2013: Home Tab – Insert] This will allow you to
type a label in cell A1.
The following procedure will allow you to calculate some
summary statistics for data in
a column. It is good practice to look at summary statistics
before proceeding with further
analysis. This will alert you to the number of data points, their
average value, and a few other
informative characteristics about the data.
Data Analysis… to pop-up Data Analysis window [Excel 2013:
Data Tab – Data
Analysis over on far right side] (SEE NOTE BELOW if Data

Analysis is missing.)
double click on Descriptive statistics
With cursor flashing in Input Range: box, click on column letter
for column with
data
If you have entered a name in the first column, click Labels in
first row.
Click in box preceding Summary statistics, and click on OK or
press the enter key.
EXCEL will create a new worksheet with the summary
statistics. You should note such key
characteristics as count, minimum, mean and maximum. At
more advanced stages, you may
choose to think about kurtosis, skewness and standard deviation
or standard error.
If you wish, you can delete this temporary worksheet by right-
clicking on its tab and
choosing the delete option.
The same basic procedures will be used in later assignments to
enter data from a file that
contains several columns.
EXAMPLE 1 EXCEL EXAMPLES

NOTE: The Analysis ToolPak is a Microsoft Excel add-in
program that is available when
you install Microsoft Office or Excel. To use it in Excel,
however, you need to load it first.
1. Click the File tab, and then click Options.
2. Click Add-Ins, and then in the Manage box, select Excel
Add-ins.
3. Click Go.
4. In the Add-Ins available box, select the Analysis ToolPak
check box, and then click
OK.
a. If Analysis ToolPak is not listed in the Add-Ins available
box, click Browse to
locate it.
b. If you get prompted that the Analysis ToolPak is not
currently installed on your
computer, click Yes to install it.
5. After you load the Analysis ToolPak, the Data Analysis
command is available in the
Analysis group on the Data tab.

Example 2: Preparing a histogram of data
A histogram is a graphical summary of numerical data. In this
example, data stored in
EXCEL worksheet column A is summarized in a histogram.
Before calculating frequencies in
different groups, you must define the classes. In EXCEL, the
classes are called "bins". For this
example, suppose that the data to be summarized varies from 21
to 28 and you wish to group the
observations into "bins" each with one unit for a class width.
The first bin will include all data
points with values up to and including 22, the second bin will
include values greater than 22 up
to and including 23 and so on. You only need to indicate the
upper boundary for each bin. For
this example, use 22, 23, 24, 25, 26, 27, and 28. These values
need to be entered into a new
column, say column B. You can type the numbers into the first
seven rows of column B.
To actually draw the histogram, you must first calculate
frequencies of data in each bin.
Choose Data analysis [Excel 2013: Data Tab – Data Analysis]
and select Histogram
In the histogram dialog box,
move cursor to Input range and click on top of column A,
move cursor to Bin range and click on top of column B,
if you have a labels in A1 and B1, check the Labels option,
and
click on OK or press the enter key.
EXCEL is very slow at this calculation, so be patient! In a few

seconds, you should get a
new sheet in the workbook that contains the upper ends of the
bin and the frequencies) of
observations in each bin. In this example, the results look like
this
Bin Frequency
22 9
23 6
24 6
25 5
26 7
27 2
28 1
More 0
At this point, you should have a numerical representation of a
histogram. Most
histograms are presented in graphical form. To develop a bar
graph to show the histogram,
proceed as follows. Note that Excel creates a bar graph not a
true histogram as there are spaces
between the bars. A true histogram has no spaces between the
bars.
Highlight the data, including titles, using the cursor.
Insert a chart. [Excel 2013: Insert Tab – in Charts choose Insert
Column Chart – select
2D (first choice of the options)]
Excel will automatically produce a chart.

EXAMPLE 2 AND 3 EXCEL EXAMPLES
A histogram gives the frequency (number of observations) in
each of various classes. In
EXCEL, the classes are defined by giving the upper boundaries
of each class (bin).
The + sign allows you to format your chart’s elements. You
can click on the boxes to
include whatever elements you feel are appropriate for your
chart. If you want to edit the Axis
Title, you can click into that box and type a new axis title.
The paint brush allows you to choose the style and color of your
chart.
This icon allows you to select your data source and make
changes instead of having to
highlight your excel cells that hold the data and start the chart

all over again.
How to make a true histogram: To get rid of the gaps between
the bars and make a true
histogram, right click on any bar and Excel comes up with a
window with Format Data Series.
Choose Format Data Series (see above arrow).
On this window you will need to choose the three column
symbol (see above arrow) and then
Excel opens Series Options and at the bottom is Gap Width.
Change the gap width to zero and
you will have a true histogram.

EXAMPLE 2 AND 3 EXCEL EXAMPLES
You can change the outline of your bars to a different color to
have them appear separated by
clicking the Outline (see arrow below) and changing the color
to black or white.
The resulting chart looks like this (remember to make changes
to your titles according to best
graphing practices, not shown in this chart):

Example 3: Entering data from the keyboard into the EXCEL
worksheet
Occasionally, you will be required to enter data or intermediate
results directly into the
EXCEL worksheet. You merely type the data into the cells
where you wish to store the
information.
Example 4: Calculating relative frequencies
To calculate relative frequencies in each of several classes,
you must divide each
frequency of a class by the sum of all the frequencies. Consider
data summarized in three classes.
Class Frequency
1 5
2 10
3 5
Total 20
The relative frequency for Class 1 is 5/20 = 0.25, for Class
2 is 10/20 = 0.50, and for
Class 3 is 5/20 = 0.25. Note that the relative frequencies must
always sum to 1.0 (within
rounding error). Thus, 0.25 + 0.50 + 0.25 = 1.0.
If the frequencies are stored in EXCEL Worksheet column C,

you can calculate relative
frequencies and store them in another column in the following
way. Suppose 5 is in cell C1, 10
in cell C2 and 5 in cell C3. Move the cursor to cell D1, type ‘=
C1/SUM($C$1:$C$3)’ in the
formula bar, and press enter. Don’t forget the = at the beginning
of your equation otherwise it
will be entered only as text and will calculate for you. You
should see the value 0.25 in cell D1.
To calculate the remaining relative frequencies, just copy the
formula in cell D1 to cells D2 and
D3. Note that, as the formula is copied, C1 will change to C2
and then to C3, but $C$1:$C$3
will remain constant.
An alternative would be to first calculate the sum (20) and
store in a cell that could then
be used to calculate all relative frequencies. For example, enter
the formula ‘=SUM(C1:C3)’ in
cell C4. Now, use the formula ‘= C1/$C$4’ in cell D1. Again,
copy cell D1 to cells D2 and D3.
You should also confirm that the relative frequencies sum to
1.0.
Use the formula ‘= SUM(D1:D3)’ in cell D4. You can also use
the Σ in the tool bar and Excel
will help you calculate a sum for that column. [Excel
2013:Home Tab – Σ ]

Example 5: Leaving EXCEL and grading your assignment.
When you have completed an assignment and have recorded
numerical answers to each
of the questions in the INTRODUCTORY STATISTICSD
LABORAOTRY, you should try your
answers in ISLeX.
In submitting your answers to the Introductory Statistics
Laboratory Program (ISLeX),
you are required to use numbers for all answers. Place the
cursor in the appropriate box and type
in your answer. Use the mouse or the tab key to move to the
next box. If you press enter, it will
go right to grading. (You have the option to go back again, so
DO NOT accept unless you are
completely finished.) Click on the “Check my answers” box to
grade your assignment.
At the end of the assignment, your grade will be displayed on
the screen and you will be
given to option of accepting the grade or repeating the
assignment. Once you accept your grade,
you will not be able to repeat the assignment. You are
encouraged to repeat the assignment until
you are satisfied with your effort. You must achieve 80 or
higher to move onto the next
assignment.

Example 6: How to prepare a stem-and-leaf diagram
A stem-and-leaf diagram combines graphical and numerical
methods to summarize data.
Unfortunately, EXCEL does not have a command for preparing
a stem-and-leaf diagram.
Suppose you wish to develop a stem-and-leaf diagram of the
following data.
25.6 26.0 25.3 27.2 23.6 26.3 25.4 23.8 21.1 23.4 23.9
23.8 26.0 20.0 22.5 28.0 26.7
24.8 25.1 24.9 26.6 24.9 25.0 27.5 20.6 24.0 22.1 20.0
21.8 24.7 21.7 25.2 27.1 24.8
25.8 26.9 25.6
Enter (or read) the data into a column in EXCEL and then sort
the data from lowest to
highest use the Data->Sort command. [Excel 2013: Data Tab –
Sort] The results follow.
20.0
20.0
20.6
21.1
21.7
21.8
22.1
22.5
23.4
23.6
23.8
23.8
23.9
24.0
24.4
24.7
24.8

24.8
24.9
24.9
25.0
25.1
25.2
25.3
25.4
25.6
25.6
25.8
26.0
26.0
26.3
26.6
26.7
26.9
27.1
27.2
27.5
28.0
If you decide to have leaf units of
0.1, the successive stem units will
be 10 × 0.1 = 1.0 higher than the
previous one. Start by writing the
stem units in a column followed by
a vertical bar.
20 | 20 | 0 0 6
21 | Then, go 21 | 1 7 8
22 | down the data 22 | 1 5
23 | and write the 23 | 4 6 8 8 9
24 | last digit of 24 | 0 4 7 8 8
9 9
25 | each number 25 | 0 1 2 3 4

6 6 8
26 | in the leaf 26 | 0 0 3 6 7
9
27 | position 27 | 1 2 5
28 | 28 | 0
And, finally, add a title and leaf
unit to complete the job.
Stem-and-leaf diagram of example
data.
Leaf unit = 0.1
20 | 0 0 6
21 | 1 7 8
22 | 1 5
23 | 4 6 8 8 9
24 | 0 4 7 8 8 9 9
25 | 0 1 2 3 4 6 6 8
26 | 0 0 3 6 7 9
27 | 1 2 5
28 | 0
The stem-and-leaf diagram consists of two columns of
numbers. The first column is
called the stem. The second column contains the leaves; one
leaf for each data point. The value
of any number in a leaf position is indicated by the leaf unit,
0.1 in this example. Any number in

a leaf position represents that number multiplied by the leaf unit
0.1. In the first row of the
diagram, the 0 stands for 0 × 0.1 = 0.0, and the 6 stands for 6 ×
0.1 = 0.6.
The value of the numbers in the stem position are 10 × leaf
unit, i.e. 1 in this case. In the
last row, the 28 for 28 × 1 = 28. The final value of any leaf is
calculated by adding the leaf value
to the corresponding stem value. The 0 in the last row
represents the number 0 × 0.1 + 28 × 1 =
28.0. The third leaf in stem position 21 represents 8 × 0.1 + 21
× 1 = 21.8.
Example 7: How to draw a frequency (or relative frequency)
polygon.
In this example, midpoints for Samples 1 and 2 are stored in
column A, and relative
frequencies from Sample 1 are stored in column B and relative
frequencies from Sample 2 are
stored in column C of an EXCEL worksheet. In order to
compare the two samples, it will be
useful to plot relative frequencies for both samples on the same
graph.
Here are columns A, B, and C of an example worksheet.
20 0.0357 0.0000

21 0.1429 0.0270
22 0.2143 0.1081
23 0.1786 0.1081
24 0.2500 0.1622
25 0.1071 0.2162
26 0.0714 0.1892
27 0.0000 0.1081
28 0.0000 0.0811
[Excel 2013: highlight the data. Insert Tab – Charts and Choose
SCATTER, then click 2D
‘Straight Line with Markers’]. The resulting graph will look
like:
However, you will want to edit the graph. Click the to edit the
chart. Choose Axes and
move the cursor over until the little right arrow appears, then
choose More Options and then
Click on the histogram picture.
The resulting graph will now have better representation.
Remember to label your chart title and
axis appropriately (not shown in chart below).

You can now edit the Axis. Change
the minimum Bounds to 19 and the
maximum Bounds to 29. Then
change the Major Units to 1.0.
Example 8: How to use EXCEL to calculate various numbers
that summarize the characteristics
of a population (or sample).
In this example, the Function command is used to calculate
various constant values to be
stored in cells in the worksheet. [Excel 2013: Formulas Tab –
Insert Function (fx)]. There are
many different functions that can be used. Some refer to whole
columns, some to individual
observations. The following examples demonstrate a few of the
uses of functions in EXCEL.
You can type the function into any particular cell by first
typing an equal sign in the
formula bar and then typing the name of the function along with
its required arguments. As an
alternative, you can use the [Excel 2013: Formulas Tab – Insert
Function (fx)] to choose a
function and have EXCEL prompt you for necessary arguments.
In this course, you would
probably choose Function category = Statistical and then double

click on the Function name
for the function you want to use.
For this example, consider that there are 22 observations
stored in column A.
a) Determine the number of data points in the population.
=COUNT(A:A)
b) Calculate the mean (= sum of all observations divided by
number of observations)
=SUM(A1:A22)/COUNT(A1:A22)
=AVERAGE(A1:A22)
c) Determine the minimum in this population (the first value in
a magnitude array). If the data
have been sorted from smallest to largest, the smallest
(minimum) value will be in the first
position, cell A1, and the largest will be located in the last
position, cell A22 in this example.
=MIN(A1:A22)
d) Determine the maximum in this population (the last value in
a magnitude array).
=MAX(A1:A22)
e) Determine the median (the middle value in a magnitude
array).
For an odd number of data points, the median is the middle
value. The middle value of n data
points if n is even is given by the average of the values of the
two middle terms.
=MEDIAN(A1:A22)
f) Determine the first quartile.

The first quartile is that value below which one-quarter of the
observations lie. Because there is
no generally accepted definition of quartile, different programs
gives different results for
quartiles. ISLeX is programmed to calculate quartiles in the
same way that Excel uses.
=QUARTILE(A1:A22,1)
g) Determine the third quartile.
The third quartile is that value below which three-quarters of
the observations lie.
=QUARTILE(A1:A22,3)
NOTE: The median is sometimes referred to as the second
quartile (Q2) because it is the
value below which 2/4 of the values lie. The first quartile (Q1),
the median (Q2) and the third
quartile (Q3) divide the data values into four groups. We know
that 1/4 of the data values are less
than Q1, 1/4 are between Q1 and Q2, 1/4 are between Q2 and
Q3, and 1/4 are greater than Q3.
For some purposes, it may be sufficient to summarize a large
data set by presenting these three
values.
h) Determine the standard deviation.

The standard deviation is the square root of the variance, and
the variance is the average of the
squares of differences between individual data points and the
overall mean. Remember that the
standard deviation of a population is calculated differently than
a standard deviation of a sample.
It is important to know if you have a sample or a population.
=STDEV.S(A1:A22) for a sample
=STDEV.P(A1:A22) for a population
23
20 22 Uses =COUNT(A1:A22) to count number of observations
29 22.77273 Uses =SUM(A1:A22)/COUNT(A1:A22) to
calculate average
29 16 Uses =MIN(A1:A22) to calculate the minimum value
27 30 Uses =MAX(A1:A22) to calculate maximum value
23 23 Uses =MEDIAN(A1:A22) to calculate median value
17 19 Uses =QUARTILE(A1:A22,1) to calculate first quartile
17 27.75 Uses =QUARTILE(A1,A22,3) to calculate third
quartile
22 4.669372 Uses =STDEV.S(A1:A22) to calculate standard
deviation for a sample
23
25
21
21
18
16
21
24
19
27

19
25
24
Example 9: How to use the DESCRIPTIVE STATISTICS
command of EXCEL
The Descriptive statistics command of EXCEL will
automatically calculate most of the
summary statistics required of data in a single column [Excel
2013 Data Tab – Data Analysis and
then choose Descriptive Statistics]. By listing several columns,
the
Descriptive statistics command can be applied to several
columns simultaneously.
Consider that data has been stored in column A. To calculate
summary statistics for this
column, follow these steps.
Excel 2013: Data Tab and choose Data Analysis (on right)
Double click on Descriptive statistics in the Data Analysis
dialog box
Set Input range to = A:A (or just highlight the data with the
cursor)
Click on Summary statistics

Click on OK
Your results will be on a new worksheet and will look like this
(move column borders to
see full text).
Column1
Mean 23.90909
Standard Error 1.038041
Median 23.5
Mode #NUM!
Standard Deviation 4.868843
Sample Variance 23.70563
Kurtosis -1.32235
Skewness -0.11628
Range 14
Minimum 16
Maximum 30
Sum 526
Count 22
This approach gives many of the summary statistics described
in the preceding example
as well as several others. The #NUM! Message means only that
there are several possible values
for the mode in this data set.

Example 10: Further uses of the EXCEL->As a calculator
EXCEL can also be used as a calculator.
The following statements would allow you to calculate 5.6-3.2
= 2.4 and store it in a cell
in the EXCEL worksheet. It is important to start your equation
with an “=” otherwise the
calculator function is not enabled .
=5.6-3.2
If 5.6 was stored in cell D3 and 3.2 was stored in cell D4, you
could also use
=D3-D4
The second option may be useful if 5.6 and 3.2 may be used in
other calculations.
This same scheme may be used for all elementary mathematical
operations.
Use - to indicate subtraction [ = 5.6 - 3.2]
Use + to indicate addition [ = 5.6 + 3.2]
Use * to indicated multiplication [ = 5.6 * 3.2]
Use / to indicate division [ = 5.6 / 3.2]
Use POWER to indicate exponentiation [ = POWER(5.6, 3.2)]

Example 11: Calculations with a discrete probability
distribution
In this example, EXCEL is used to answer various questions
dealing with a discrete
probability distribution. EXCEL worksheet column A contains
the event names and column B
contains the corresponding probabilities. In PL SC 314, we will
discuss only events that
represent counts; e.g. number of seeds germinated, number of
red blood cells, number of live
plantlets, number of microbial colonies, et cetera.
0 0.018316
1 0.073263
2 0.146525
3 0.195367
4 0.195367
5 0.156293
6 0.104196
7 0.059540
8 0.029770
9 0.013231
10 0.005292
11 0.001925
12 0.000642
13 0.000197

14 0.000056
15 0.000015
16 0.000004
17 0.000001
18 0.000000
19 0.000000
20 0.000000
Suppose one were interested in the probability of exactly 10 in
this distribution. This can
be read directly from column B in the row position
corresponding to A = 10. Thus, P(X = 10) =
0.005292.
A powerful way of calculating the probabilities of compound
events is to sum parts of the
probability table.
Suppose you want the probability of less than 13. You must add
the probabilities for 0, 1,
. . 12. Those probabilities are in cells B1:B13. To calculate the
probability, you could move to
cell C1 and enter the formula = SUM(B1:B13). In this example,
the probability of less than 13 is
0.99973 or 99.973 percent.
Note that terms such as 'less than 13' and 'fewer than 13'
include all possible values from
the smallest up to, but excluding, 13.
Similarly, 'more than 13' or 'greater than 13' would not include
13. Moreover, the term
'between 5 and 10' would include 6, 7, 8 and 9, and would
exclude 5 and 10.

However, ‘no more than 13’ would include 13. ‘At least 13’
would include 13 and all
higher values.
The following three examples show other questions that can be
dealt with in this general
manner.
a) P[10 < X < 21] = ?
= P(11) + P(12) + P(13) + P(14) + … + P(20). P(11) is listed in
row 12 of column
B while P(20) is listed in row 21 of column B.
= SUM(B12:B21) = 0.0028398
b) P[(X < 6) or (X > 14)] = ?
In this example, calculate P(0) + P(1) + … +P(5) + P(15) +
P(16) + … + P(20)
= SUM(B1:B6)+SUM(B16:B21) = 0.78515
c) P[X > 0] = ?
= SUM(B2:B21) = 0.98168
or = 1 - B1 = 0.98168
In order to calculate the mean of a probability distribution, one
must use the methods for
calculating the mean of a relative frequency distribution. The

mean is equal to the sum of the
products of each value multiplied by its corresponding
probability. In the example table, the hand
calculation would require 0(0.018316) + 1(0.073263) + . . for
21 terms. In EXCEL, the
following formula will operate on whole columns and
calculation of the mean is simple.
= SUMPRODUCT(A1:A21*B1:B21) = 4.00
For this probability distribution, one would conclude that the
average value in a great
many samples from this distribution will be 4.0.
The variance of a probability distribution can be most easily
calculated as the average of
the squares of the values minus the square of the average. The
following EXCEL formula will
calculate the variance. The mean (see above) must have
previously been calculated and stored in
cell D7.
= SUMPRODUCT(A1:A21*A1:A21*B1:B21)-D7*D7 = 4.000
In this example, the variance of the distribution (4.0) is
identical to the mean. This is a
characteristic of the 'Poisson' probability distribution (that deals
with the random occurrence of
rare events). Such a relationship will not occur with other
distributions.

Example 12: Reading and storing constants for further use.
In some assignments, you are required to read numerical values
from a file and then use
them to calculate answers to specific questions. Consider the
situation where the mean and
standard deviation are stored in columns 1 and 2 of the data file
called 'Table R'.
1. Copy the table using the method described in Example 1.
2. Observe the two columns to see the mean and standard
deviation.
3. Suppose the data is loaded into cells B1 and C1 of the
EXCEL worksheet and that you are told
the first value is the mean and the second is the standard
deviation. Use the value stored in B1 as
the mean and the one stored in C1 as the standard deviation in
subsequent computations.
Suppose the values were 100.4 and 7.89.
You could calculate the value of the mean plus two standard
deviations by using this
formula in a cell.
= B1 + 2*C1

Example 13: Using the EXCEL to answer questions about
continuous distributions.
Consider that X is a continuous variable with a mean whose
value is stored in EXCEL
cell A1 and a standard deviation whose value is stored in B1.
For example, if you are given that
the mean is 86.7 and the standard deviation is 4.81, the
following calculations will work if you
first store 86.7 in A1 and 4.81 in B1.
In the assignments, you will be dealing only with a continuous
distribution known as a
normal distribution. When using the NORM.DIST function to
calculate a probability, it will be
necessary to indicate i) the value below which you require the
probability, ii) the mean of the
distribution, iii) the standard deviation of the distribution, and
iv) TRUE to indicate that you
want a cumulative probability [P(X < Value)].
In these examples, consider that X is an observation from a
normal distribution. The
NORM.DIST function will give the probability that a random
observation will be less than some
specified value V, i.e. P[X < V].
To calculate P[X < 90] use
=NORM.DIST(90,86.7,4.81,true) = 0.75367
or =NORM.DIST(90,A1,B2,true) = 0.75367
if mean in A1 and standard deviation in B1.
By choosing NORM.DIST, you will be prompted for the four

arguments. [Excel 2013:
Formula Tab – Insert Function], scroll to choose NORM.DIST
off of statistical list]. Choose
your x value, type in or use the cursor to select your mean, type
in or use the cursor to select your
standard deviation and type TRUE into the Cummulative (for
continuous data).
The following examples should help to convert questions into
mathematical expression and then
into EXCEL commands.
What is the probability that a continuous normal variable X will
be less than 75?
P[X < 75] = ?
=NORM.DIST(75,A1,B1,TRUE)
What is the probability that a continuous normal variable X will
exceed 75?
P[X > 75] = 1 - P[X < 75] = ?
=1-NORM.DIST(75,A1,B1,TRUE)
What is the probability that a random observation from a normal
distribution will be between 70
and 80?
P[70 < X < 80] = P[X < 80] - P[X < 70] = ?

=NORM.DIST(80,A1,B1,TRUE)-
NORM.DIST(70,A1,B1,TRUE)
What proportion of random observations from a normal
distribution should lie within two
standard deviations of the mean?
P[mean - 2*stdev < X < mean + 2*stdev]
= P[X < mean + 2*stdev] - P[X < mean - 2*stdev] = ?
=NORM.DIST(A1+2*B1,A1,B1,TRUE)-NORM.DIST(A1-
*B1,A1,B1,TRUE)
What percentage of observation from a normal population
should exceed the mean by 1.96
standard deviations?
100 × P[X > mean + 1.96*stdev]
= 100 × (1 - P[X < mean + 1.96*stdev]) = ?
=100*(1-NORM.DIST(A1+1.96*B1,A1,B1,TRUE))
Results of the above calculations are
a) P{X<90} = 0.75367
b) P{X<75} = 0.00750
c) P{X>75}= 0.99250
d) P{70<X<80}= 0.08156
e) P{mean-2*sd < X < mean + 2 sd} = 0.95450
f) 100*P(X > mean + 1.96 * sd) = 2.49978

Example 14: How to calculate a chi-squared statistic for a
'goodness-of-fit' test.
Consider this example from Steel and Torrie (1981). A
researcher observed 1178 barley
plants in class 1 (green, non-two-row), 291 in class 2 (green,
two-row), 273 in class 3 (chlorina,
non-two-row), and 156 in class 4 (chlorina, two-row). Test the
hypothesis that distribution in the
four classes is in the ratio of 9 : 3 : 3 : 1.
Step 1. Store the observed frequencies in one column of the
EXCEL worksheet. To
calculate expected frequencies, first convert the numbers in the
expected ratio to proportions
(relative frequencies) by dividing each by 16. Then, multiply
the proportions 9/16, 3/16, 3/16 and
1/16 by the total number of barley plants in order to calculate
expected frequencies.
If the observed frequencies (1178, 291, 273 and 156) are stored
in column A, the
following four formulas should be entered in cells B1, B2, B3
and B4.
Cell B1 =9/16*SUM(A1:A4)
The will give the following table where the first column
contains the observed
frequencies and the second column the frequencies expected if

the observations are distributed
into the four classes in a ratio of 9 : 3 : 3 : 1.
1178 1067.625
291 355.875
273 355.875
156 118.625
Step 2. Calculate the Chi-squared statistics as the sum of (O-
E)2/E.
Enter the formula =(A1-B1)*(A1-B1)/B1 into cell C1 and copy
it into cells C2,
C3 and C4 (Note that the 1 will change to 2, 3 or 4 as you copy
the formula into each successive
cell). Finally, enter the formula =SUM(C1:C4) into cell C6 to
calculate the chi-squared statistic.
The worksheet should now look like this.
1178 1067.625 11.41097
291 355.875 11.82653
273 355.875 19.29966
156 118.625 11.77568
54.31284
Note that the sum of (O-E) should be zero.
The sum of (O-E)2/E [in cell C6] gives the required chi-
squared statistic, 54.313.

Step 3. Compare the calculated statistic to the appropriate
critical value. If the statistic exceeds
the critical value, reject the hypothesis that the observed
frequencies show a good fit to a 9 : 3 : 3
: 1 ratio.
In this example, the four expected frequencies are required to
sum to 1898. Because of this one
restriction, the chi-squared statistic has 4-1 = 3 degrees of
freedom. People will often choose
critical values that correspond to a 5 % significance level (α =
0.05). You may read the critical
value for the chi-squared distribution with 3 degrees of freedom
and a 5% significance level
directly from a table in a statistical textbook (= 7.82) or use the
following EXCEL commands.
To calculate the critical value, one uses α = 0.05 and df = 3 as
the arguments for the
CHISQ.INV function. [Excel 2013: Formula Tab – Insert
Function], scroll to choose
CHISQ.INV off of statistical list, choosing the RT for right tail.
=CHISQ.INV.RT(0.05,3) = 7.814725
Rather than comparing the calculated test-statistic (54.31284) to
the critical value 7.82 and

concluding that the observed frequencies do not fit a 9 : 3 : 3 : 1
ratio, you can also calculate the
p-value, the probability of such a large chi-squared statistic if
the null hypothesis is really true. If
the calculated test-statistic is in cell C6, use [Excel 2013:
Formula Tab – Insert Function], scroll
to choose CHISQ.DIST off of statistical list, choosing the RT
for right tail.
=CHISQ.DIST.RT(C6,3) = 0.00000000000096
With the p-value formula written in cell D6, some headings
typed in cells C5 and D5, and
some formatting of cell D6, the worksheet now look like this.
1178 1067.625 11.41097
291 355.875 11.82653
273 355.875 19.29966
156 118.625 11.77568
Chi-
square
p-value
54.31284 0.0000
Conclusion: Since the calculated value (54.313) exceeds the
critical value (7.8147), reject
the hypothesis of a good fit to a 9 : 3 : 3 : 1 ratio. Also we
reject the hypothesis that the observed
frequencies show a good fit to a 9 : 3 : 3 : 1 ratio if the
significance level (α = .05) is greater than

the p-value. In this case, we reject because .05 > than the p-
value of .0000.
mean when σ is known.
Consider an example where the data are stored in worksheet
column A and you are required to
calculate a 90 % confidence interval for the mean of the data
and σ is given.
Step 1. Calculate the mean of the data, as well as the number of
observations.
Let's store these intermediate results in column D along with
identification in column C.
Type 'Mean' in cell C1 and the formula =AVERAGE(A:A) in
cell D1
Type ‘n’ in cell C2 and the formula =COUNT(A:A) in cell D2
Step 2. The standard deviation of the population is given and is
4.0. This value can be
typed into D3 with a title of ‘St.DevP.’ in C3.

a) Since 90 = 100(1 - α), α = 0.10 and α/2 = 0.05. Determine the
critical value (CV) of
the standard normal distribution corresponding to α/2 = 0.05
from a table or using
EXCEL as follows (CV = 1.645). [Excel 2013: Formula Tab –
Insert Function], scroll to
choose NORM.INV off of statistical list
=NORM.INV(0.95,0,1) = 1.644853
b) Calculate the margin of error as CV multiplied by standard
deviation of the population
and divided by the square root of the sample size.
Type 'E =' in cell C4, and the formula
=NORM.INV(0.95,0,1)*D3/SQRT(D2) in cell D4
c) Calculate the lower and upper limits as mean ± margin of
error.
Type 'LL =' in cell C5, and the formula =D1-D4 in cell D5.
Type 'UL =' in cell C6, and the formula =D1+D4 in cell D6.
29.6 Mean = 30.7285
30.7 St. dev. = 4.0
31.4 n = 35
31.1 E = 1.1122
25.5 LL = 29.6163
34.6 UL = 31.8407

34
31
34
mean when σ is NOT known.
For this example, consider that the sample data are stored in
column A and that you are required
to calculate a 95 % confidence interval for the mean of the
population from which the sample
was taken. This example is very similar to Example 15. There
are two differences: first you will
have to use an estimate of the population standard deviation
because σ is not given. Secondly,
we will use the t distribution to find our critical value. We will
use the function T.INV, rather
than NORM.INV, to calculate the critical value for the margin
of error. Note that T.INV uses
α/2 in the LEFT tail; therefore, will always give the negative
left tail critical value for α/2.
Step 1. Calculate n, MEAN and STDEV.S
Let's store these intermediate results in column D along with
identification in column C.
Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A)
in cell D1
Type 'St. Dev. =' in cell C2, and the formula =STDEV.S(A:A)

in cell D2
Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell
D3
Step 2. Because we have sample data for the standard deviation,
we will determine the
α/2 critical value (CV) from the t-distribution with 24 - 1 = 23
degrees of freedom. For a
95% confidence interval, α/2 = 0.025. Read the value from a
table of critical values for
the t-distribution (= 2.069) or calculate it using EXCEL T.INV
function. [Excel 2013:
Formula Tab – Insert Function, scroll to choose T.INV off of
statistical list]. Note that
for questions which have sample sizes of 76 or larger, we must
use the T.INV
function to get the correct CV (ISLeX will mark an
approximation as incorrect).
=T.INV(0.025,23) = -2.068655
Step 3. Calculate margin of error = E = CV * STDEV.S /
SQRT(n)
Type 'E = ' in cell C4, and the formula =T.INV(0.025,D3-
1)*D2/SQRT(D3) in cell D4
Step 4. Calculate lower limit = mean – margin of error and
upper limit = mean + margin
of error.
Type 'LL =' in cell C5, and the formula =D1+D4 in cell D5 (it is

+ because the E is
calculated using the critical value in the left tail and is a
negative number).
Type 'UL =' in cell C6, and the formula =D1-D4 in cell D6. (it
is - because the E is
calculated using the critical value in the left tail and is a
negative number).
29.6 Mean = 30.9875
30.7 St. dev. = 2.788465
31.4 n = 24
31.1 E = -‐1.17747
25.5 LL = 29.8100
34.6 UL = 32.1649
34
31
Example 17: How to calculate a confidence interval for a
proportion
A proportion is the number of observations in one class
expressed as a proportion of the

total number of observations.
Consider that there are n = 978 observations of which 123 are
in the first class and the
remaining 855 are in the second class. Further, consider the
proportion p̂ = 123/978 = 0.12577
that are in the first class.
This example shows how to calculate a 95 % confidence
interval for the proportion that
are in the first class in the population from which these 978
observations were randomly taken.
The following steps can be used to calculate a confidence
interval for a proportion.
a) Calculate the estimated standard error of the proportion =
n
qpsp
ˆˆ
ˆ =
In this example, let's use columns A and B of a new worksheet
for the calculations.
Type ' p̂ = ' in cell A1, and the formula =123/978 in cell B1
Type 'st.dev. = ' in cell A2, and the formula = SQRT(B1*(1-
B1)/978) in cell B2
b) Get the critical value for a standard normal (z) distribution
for confidence level

1 - α = 0.95 or α/2 = 0.025. Use NORM.INV with 1 - α/2 =
0.975. Calculate the margin of error
by multiplying the critical value by the standard error.
Type 'CV =' in cell A3, and the formula
=NORM.INV(0.975,0,1) in cell B3.
Type 'E =' in cell A4, and the formula =B2*B3 in cell B4.
c) Calculate lower limit = estimate – margin of error
and upper limit = estimate + margin of error.
Type 'LL = ' in cell A5, and the formula =B1-B4 in cell B5.
Type 'UL =' in cell A6, and the formula =B1+B4 in cell B6.
p̂ = 0.125767
st.dev. = 0.010603
cv = 1.959961
E = 0.020781
LL = 0.104985
UL = 0.146548
Example 18: How to calculate a test of hypothesis concerning
one mean when σ is NOT known.
In tests of hypothesis, we are interested in evaluating
assertions about population
parameters in light of the evidence we have in a sample taken
from that population.

In this example, we look at hypotheses concerning the mean of
a population.
Step 1. Make an assertion, the null hypothesis, that the mean is
equal to some value.
Consider the possible alternative(s).
H0 : population mean = 3
H1 : population mean > 3 {one-tailed (right) alternative}
or population mean < 3 {one-tailed (left) alternative}
or population mean ≠ 3 {two-tailed alternative}
In this example, consider H0: mean = 3 and H1 : mean > 3. This
will be a right-tailed test.
Step 2. Calculate the sample mean, size and standard deviation.
Suppose that the data is stored in column A of an EXCEL
worksheet. Let's use columns C and D
to store identification and intermediate and final results.
Type 'Mean =' in cell C1, and the formula =AVERAGE(A:A)
in cell D1.
Type 's =' in cell C2, and the formula =STDEV.S(A:A) in cell
D2.
Type 'n =' in cell C3, and the formula =COUNT(A:A) in cell
D3.
Step 3. Calculate the test statistic t as:
x
calculated s
x
t )hypothesis Null (from

µ−
=
and n
ssx =
Type 't<calc> = ' in cell C4, and the formula =(D1-
3)/(D2/SQRT(D3)) in cell D4.
Step 4. Calculate the critical value of the t-distribution for the
degrees of freedom appropriate for
this sample, for the desired significance level (α), and for the
appropriate alternative hypothesis.
Consider a right-tailed test at α = 0.05.
Type 't<table> =' in cell C5, and the formula =T.INV(0.95,D3-
1) in cell D5.
Type 'p-value =' in cell C6, and the formula
=T.DIST.RT(D4,D3-1,1) in cell D6.
Mean = 4.88
s = 0.28
n = 22
t<calc> = 1.431491
t<table> = 1.720744
p-value = 0.083503
Note that T.INV(0.95,D3-1) is 0.95 because the alternative
hypothesis, in this case, is in the right tail.
Also, T.DIST.RT(D4,D3-1,1) is .RT because the alternative
hypothesis is in the right tail

Step 5. Compare the calculated test statistic to the critical
value and decide whether or not to
reject the null hypothesis that the population mean equals the
specified value.
In this case, 1.4431491 is less than 1.720744 and we do not
reject the null hypothesis that
the population mean is equal to 3. Or p-value = 0.83503 is
greater than α = 0.05, so we do not
reject the null hypothesis.
What if the alternative hypothesis was less than 3?
In the case of a one-tailed (left alternative):
H1 : population mean < 3 {one-tailed (left) alternative}
If the alternative hypothesis is that the mean is really less than
3, we would compare the
test statistic to a critical value of -1.7208 (the negative of that
used for the one-tailed upper
alternative). We could find the critical value for the left tail by
using T.INV(0.05,D3-1).
We would reject the null hypothesis only if the test statistic was
more negative than the lower
critical value. In this example, 1.4431491 is not less than -
1.7208 and we do not reject the null
hypothesis. To find the p-value, for the left tail, it would be
T.DIST(-1.431491,22-1, 1). Or p-
value = 0.83503 is greater than α = 0.05, so we do not reject the
null hypothesis.

What if the alternative hypothesis was not equal to 3?
iii) In the case of a two-tailed alternative:
H1 : population mean ≠ 3 {two-tailed alternative}
For the two-tailed alternative, we need two critical values (one
for each tail). Using T.INV.2T
with α = 0.05 will give the positive critical value for a two-
tailed test with α appropriately split
into both tails.
=T.INV.2T(0.05,D3-1) = 2.079614
The lower critical value is the negative of the upper critical
value, i. e. -2.079614.
The decision rule in this case is to reject the null hypothesis if
the test statistic is smaller
than the lower critical value or greater than the upper critical
value. In this example, the test
statistic is between the lower critical value and the upper
critical value for a two-tailed test and
we would do not reject the null hypothesis. To find the p-value,
for both tails, it would be
T.DIST.2T(1.431491, 22-1) =0.16700. The p-value for a two-
tailed test = 0.16700 and is
greater than α = 0.05, so we do not reject the null hypothesis

two means when population variance is unknown and unequal.
When we have large sample sizes with unknown variances that
are unequal, we can use
the normal distribution as an approximation to the t distribution.
For this example, 49 individuals with anorexia nervosa were
bulimic and had an average
depression score of 30.0 (standard deviation = 5.9) while 56
individual were non-bulimic and
had an average depression score of 27.0 (standard deviation =
5.4).
i) Calculate a 90 % confidence interval for the difference in
depression score.
a) Record sample sizes, means and standard deviations as
constants in EXCEL cells.
Type 'Sample 1:' in cell A1.
Type 'Mean1 =' in cell B2, and the number 30.0 in cell C2.
Type 's1 =' in cell B3, and the number 5.9 in cell C3.
Type 'n1 =' in cell B4, and the number 49 in cell C4.
Type 'Sample 2:' in cell A5.
b) Calculate the standard error of the difference between the
two sample means.
Type 'sd<diff> =' in cell B10,
and the formula =SQRT(C3*C3/C4+C7*C7/C8) in cell C10.

c) Determine the critical value for a 90% confidence interval.
In this case we will use the
standard normal distribution to approximate the t value because
the sample sizes are so large.
For 100(1-α)% CI, use the NORM.INV function with 1-α.
Type 'cv = ' in cell B11, and the formula
=NORM.INV(0.95,0,1) in cell C11.
d) Calculate margin of error of difference = critical value ×
standard error.
Type 'E =' in cell B12, and the formula =C11*C10 in cell C12.
e) Calculate lower limit = difference – margin of error
and upper limit = difference + margin of error.
Type 'LL =' in cell B13, and the formula =C2-C6-C12 in cell
C13.
Type 'UL =' in cell B14, and the formula =C2-C6+C12 in cell
B14.
The following is a copy of the first 12 rows of columns A, B

and C
Bulimics
Mean1 = 30
s1 = 5.9
n1 = 49
Non-
bulimics
Mean2 = 27
s2 = 5.4
n2 = 56
se<diff> = 1.10956
cv = 1.644853
E = 1.825062
LL = 1.174938
UL = 4.825062
ii) Test the hypothesis that the mean depression scores for the
two groups are equal against an
alternative that they are not equal.
a) Calculate the test statistic (z) by dividing the difference
between the means minus zero
(for no difference from the null hypothesis) by the standard
error of the difference.
=(C2-C6-0)/C10 = 2.70378
b) Compare the calculated test statistic to a critical value that
correctly reflects your

choice of significance level and the form of the alternative
hypothesis.
For a two-tailed test, use NORM.INV with 1 - α/2. For one-
tailed test, use NORM.INV
with 1 - α. In this example, consider α = 0.05 and a two-tailed
test.
=NORM.INV(0.975,0,1) = 1.959961
Since the test statistic (2.70378) is greater than the upper
critical value for the two-tailed
test, reject the conclusion that the mean depression score is the
same for both groups.
Example 20: Confidence intervals and tests of hypothesis for
differences between two means for
independent samples: population variances are unknown but
equal.
For this example, 25 men had an average decrease in systolic
blood pressure of 8.9 units
(standard deviation = 6.2) due to transcendental meditation. For
25 women, the average decrease
was 5.0 units (standard deviation = 6.0).
average decrease.

a) Record sample sizes, means and standard deviations as
EXCEL constants.
Type 'Men:' in cell A1.
Type 'Women:' in cell A5.
b) Calculate the pooled variance for the two samples (assumed
to be same in both
populations).
Type 'Var(pooled) = ' in cell B10,
and the formula =((c4-1)*c3*c3+(c8-1)*c7*c7)/(c4-1+c8-
1) in cell C10.
c) Calculate the standard error of the difference between the
two means.
Type 'sde<diff> =' in cell B11, and the formula
=SQRT(C10*(1/C4+1/C8)) in cell C11.
2
p
1

2
1 2
2
2
1 2
1
2
1
2
s =
(n -1) s + (n -1) s
(n -1) + (n -1)
n = size, sample 1
n = size sample 2
s = st.dev, sample 1
s = st.dev, sample 2
,
sx1−x2 = pools (
1
1n
+ 1
2n

)
d) Calculate the α/2 critical value for the t-distribution with (n1
- 1 + n2 - 1) degrees of
freedom because the population variances are equal. Using
T.INV.2T with α will give the
positive critical value for a two-tailed test with α appropriately
split into both tails.
=T.INV.2T(0.05,C4+C8-2) = 2.01064
Type 'cv = ' in cell B12, and the formula
=T.INV.2T(0.05,C4+C8-2) in cell C12
e) Calculate the margin of error = critical value × standard
error of the difference.
Type 'E =' in cell B13, and the formula =C12*C11 in cell C13.
f) Calculate lower limit = difference between means – margin
of error
and upper limit = difference between means + margin of error.
Type 'LL =' in cell B14, and the formula =C2-C6-C13 in cell
C14.
Type 'UL =' in cell B15, and the formula =C2-C6+C13 in cell
C15.

Men:
Mean1 = 8.9
s1 = 6.2
n1 = 25
Women:
Mean2 = 5.0
s2 = 6.0
n2 = 25
Var(pooled) = 37.22
se<diff> = 1.7256
cv = 2.0106
E = 3.4695
LL = 0.4305
UL = 7.3695
On average, transcendental meditation resulted in a greater
decrease (3.9 units) in blood
pressure for men than for women. We are 95% confident that
the population difference is
between 0.4 and 7.4 units.

ii) Test the hypothesis that the decrease in blood pressure is the
same in men as in women.
Use a 5% significance level and a two-tailed alternative
hypothesis.
a) Calculate the test statistic as difference/standard error of
difference.
=((C2-C6)-0)/C11 = 2.26012
b) Compare to critical values from the t-distribution with n1 +
n2 - 2 = 48 degrees of
freedom and α = 0.05.
=T.INV.2T(0.05,C4+C8-2)= 2.01064, therefore the critical
values for a two-
tailed test are -2.01064 and 2.01064.
Since test statistic = 2.26012 is greater than the upper critical
value of 2.01064, reject the
null hypothesis that the decrease in blood pressure is the same
for both sexes.

two proportions.
Of 1500 people from a high-income group, 62.4 % were
registered to vote. Of 1500 in a
low-income group, 58.2% were registered to vote.
voter registration between
high-income and low-income groups.
a) Store n1, p1, n2 and p2 as EXCEL constants.
Type 'n1 =' in cell A1, and '1500' in cell B1.
Type ' p̂ 1 =' in cell A2, and '0.624' in cell B2.
Type 'n2 =' in cell A3, and '1500' in cell B3.
Type ' p̂ 2 =' in cell A4, and '0.582' in cell B4.
b) Calculate the standard error of the difference of the two
population proportions:
Type 'sd<diff> = ' in cell A6,
and the formula =SQRT(B2*(1-B2)/B1+B4*(1-B4)/B3) in cell
B6.
c) Determine the critical value of the standard normal
distribution corresponding to α/2 =
0.025 and 1-α/2 = 0.975 [required for a (1 - α) = 0.95
confidence interval].
Type 'cv =' in cell A7, and the formula
=NORM.INV(0.975,0,1) in cell B7.
d) Calculate margin of error = critical value × standard error of
the difference.

Type 'E =' in cell A8, and the formula =B7*B6 in cell B8.
e) Calculate lower limit = difference in proportion – margin of
error
and upper limit = difference in proportion + margin of error.
Type 'LL =' in cell A9, and the formula =B2-B4-B8 in cell B9.
Type 'UL =' in cell A10, and the formula =B2-B4+B8 in cell
B10.
n
)p-(1p
+
n
)p-(1p
= s
2
22
1
11
pp

ˆˆˆˆ
2ˆ1ˆ −
n1 = 1500
p1 = 0.624
n2 = 1500
p2 = 0.582
sd<diff> = 0.017849
cv = 1.959961
E = 0.034984
LL = 0.007016
UL = 0.076984
Using a 95% confidence interval, the difference in voter
registration between high-
income and low-income groups is between 0.007 and 0.077 (0.7
to 7.7 %).
ii) Test the hypothesis that the high-income group has a higher
voter registration that the
low-income group. Use. α = 0.05.
a) The test statistic must be calculated as if the null hypothesis

were true. Thus, we need
to calculate the average proportion of voter registration.
=(B1*B2+B3*B4)/(B1+B3) = 0.603000
Type 'p<pooled> =' in cell A12, and the formula
=(B1*B2+B3*B4)/(B1+B3) in cell B12
b) Use the pooled proportion to calculate new standard error of
a difference.
)
n
+
n
)(p-(1p = s
2
pp
11
1
2ˆ1ˆ −
Type 'sd =' in cell A13,
and the formula =SQRT(B12*(1-B12)*(1/B1+1/B3)) in cell
B13.
c) Calculate the test statistic
Type 'z<calc> =' in cell A14, and the formula =(B2-B4-0)/B13
in cell B14.

n + n
pn +pn = p
21
2211 ˆˆ
sd
p - p
=z 21
0ˆˆ −
d) Calculate the critical value for a one-tailed (upper) test at α
= 0.05;
Type 'cv =' in cell A15, and the formula =NORM.INV(0.95,0,1)

in cell B15.
As an alternative, the p-value can be calculated
Type 'p-value =' in cell A16,
and the formula =1-NORM.DIST(B14,0,1,TRUE) in cell B16.
The following are the results.
p<pooled> = 0.603
sd = 0.017866
z<calc> = 2.350856
cv = 1.644853
p-value = 0.009365
Since z = 2.35086 is greater than zα = 1.6449, reject the null
hypothesis. Or because p-
value = 0.009365 which is less than α = 0.05, the null
hypothesis is rejected.
Example 22: How to carry out a one-way analysis of variance.
A one-way analysis of variance should be used where data can
be grouped by only one
criterion. This type of design is sometimes called a completely
random design because
treatments are assigned randomly to all available experimental
units.
For this example, consider the mercury concentration

(micrograms per gram of body
weight) of fish living 5.5 km upstream from a chloralkali plant
(treatment 1), 3.7 km downstream
from the plant (treatment 2), 21 km downstream (treatment 3),
or 133 km downstream (treatment
4). Consider that the treatment number for each of 40 fish has
been read into column A and that
the mercury concentration has been read into column B.
The ANOVA procedure should be used to carry out this
analysis of variance.
In this example, the variable to be analyzed, mercury
concentration, is stored in column B
and the classification variable (treatment) is stored in column A
on an EXCEL worksheet.
Before analysis can begin, it is necessary to copy data for the
different treatments
into different columns of the EXCEL worksheet. In this
example, the data for the four
treatments is stored in columns F, G, H and I. The labels 'Trt 1'
in cell F1, 'Trt 2' in cell G2, 'Trt
3' in cell H1, and 'Trt 4' in cell I1 are added. Then select all the
data in column B that belongs to
Trt 1, and then use Edit->Paste Special to past the values in
cells F2 through F11. The same
procedure is repeated for treatments 2, 3 and 4. Prior to analysis
of variance, the data is arranged
thusly:
Trt 1 Trt 2 Trt 3 Trt 4
23.84 26.92 29.20 32.73
23.58 26.68 29.70 32.88
23.42 26.91 29.11 32.90

23.74 26.26 29.02 32.08
23.23 26.72 29.19 32.80
23.01 26.05 29.06 32.96
23.14 26.12 29.39 32.22
23.31 26.86 29.68 32.31
23.02 26.87 29.69 32.13
23.79 26.31 29.78 32.35
To perform a one-way ANOVA, choose Anova: Single Factor
to open the single factor
anova dialog box. [Excel 2013: Data Tab – Data Analysis –
Anova: Single Factor]
Set the Input range to $F$1:$I$11
Grouped by to Columns and select Labels in first row.
Click OK.

The following results appear on a new worksheet.
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Trt 1 10 234.0743 23.40743 0.099489
Trt 2 10 265.6906 26.56906 0.120285
Trt 3 10 293.8229 29.38229 0.090206
Trt 4 10 325.3521 32.53521 0.121919
ANOVA
Source of
Variation
SS df MS F P-value F crit
Between Groups 456.1535 3 152.0512 1408.211 2.34E-37
2.866265
Within Groups 3.887088 36 0.107975
Total 460.0405 39
The degrees of freedom (DF) for differences among the four
treatment groups is equal to
one less than the number of treatments (4 – 1 = 3). The degrees
of freedom for error is equal to
the sum over four treatments of the number of individuals in

each treatment minus one [(10 - 1)
+ (10 - 1) + (10 - 1) + (10 - 1) = 36]. The degrees of freedom
for the total sum of squares is equal
to the total number of observations minus one (39 = 40 - 1).
The mean square for treatment groups is equal to the sum of
squares for treatment groups
divided by its degrees of freedom (152.0512 = 456.1535/3).
This mean square is a measure of
variation among the four groups of fish. The error mean square
is equal to the error sum of
squares divided by its degrees of freedom (0.107975 =
3.887088/36). It measures the average
(pooled) variation among individuals within treatments. The
error mean square is the estimate of
pooled variance and will be used for calculating confidence
intervals or tests of hypothesis about
treatment means.
The F-ratio is calculated by dividing the treatment mean square
by the error mean square
(1408.211 = 152.0512/0.107975). The F-ratio is the test statistic
for testing the null hypothesis
that all four treatments have the same mean. The alternative
hypothesis is that not all four
treatments have the same mean.
NOTE
The alternative hypothesis sounds like a two-tailed hypothesis.
However, only the upper
tail of the F distribution is considered when evaluating the
significance of an F statistic. Only the
upper tail is used because the F statistic is calculated from
squares of differences. Squares of
differences will be positive regardless of whether the
differences are positive or negative.

The F-statistic has a numerator degrees of freedom equal to the
degrees of freedom that
correspond to the numerator mean square (3, in this example)
and a denominator degrees of
freedom equal to the degrees of freedom associated with the
error (36, in this example). To test
the hypothesis that all treatments have the same mean, one
should compare the calculated F-
statistic to the critical value of the F-distribution corresponding
to 3 and 36 degrees of freedom
and a suitable significance level (0.01 or 0.05 are most
common).
The critical value of the F-distribution can be determined by
reference to a statistical
table. EXCEL gives the correct critical F-value for the test.
Since the calculated F-value (1408.211) greatly exceeds the
critical value (2.8863), we
reject the null hypothesis and conclude that there were
differences among the treatments in
average mercury concentration.
An alternative to comparing the calculated F-value to a critical
value is to compare the p-
value (2.34E-37 = 0.0000 to four decimal places) to the
significance level (α = 0.05). Since
0.0000 is much less than 0.05, we reject the null hypothesis.

Example 23: Is MIA.
Example 24: How to use information from analysis of variance
to calculate confidence intervals
or test hypotheses about treatment means (including least
significant difference) using data from
Example 22.
For these examples, consider an analysis of variance {Example
22} that has an error
mean square of 0.107975 with 36 degrees of freedom. Consider
treatment 2 with a mean (of 10
observations) equal to 26.569 and treatment 3 (also 10
observations) with a mean of 29.382.
IMPORTANT
Confidence intervals and tests of hypotheses about means in an
analysis of variance will
always use the error mean square as the estimate of the pooled
variance.
a) Store the error degrees of freedom, the error mean square
(pooled variance) and means in
EXCEL worksheet cells.

Type 'df =' in cell A1, and '36' in cell B1.
Type 'ems =' in cell A2, and '0.107975' in cell B2.
Type 't2 =' in cell A3, and '26.569' in cell B3.
Type 't3 =' in cell A4, and '29.382' in cell B4.
df = 36
ems = 0.107975
t2 = 26.569
t3 = 29.382
b) Calculate a 90% confidence limit for the mean of treatment 2
using the method when σ is not
known as described in example 16.
Standard error of one mean = square root of (error mean
square/sample size).
Type 'sd =' in cell A6, and the formula =SQRT(B2/10) in cell
B6.
Get critical value for α = 1 - 0.90 = 0.10 and error degrees of
freedom.
Type 'cv =' in cell A7, and the formula =T.INV.2T(0.10,36) in
cell B7.
Limits = mean ± critical value x standard error of mean.
Type 'LL =' in cell A8, and the formula =B3-B6*B7 in cell B8,
Type 'UL =' in cell A9, and the formula =B3+B6*B7 in cell
B9.
sd = 0.103911
cv = 1.688297
LL = 26.39357
UL = 26.74443

c) Test the hypothesis that the two means are not different using
the method described in
example 20. Use α = 0.05 and consider a two-tailed alternative
hypothesis.
Type 't =' in cell A11, and the formula =(B3-
B4)/sqrt(B2/10+B2/10) in cell B11.
Type 'cv =' in cell A12, and the formula = T.INV.2T(0.05,36) in
cell B12.
t = -19.1423
cv = 2.028091
Since the calculated test statistic (-19.1423) is outside the
range of -2.028091 to
2.028091, we reject the hypothesis that the two means are equal.
d) The least significant difference is the margin of error for a
confidence interval for the
difference between two means, provided both means are based
on the same sample size. To
calculate an LSD(α), we use the Error Mean Square and the

sample size (remember all
sample sizes are the same).
n
quareErrorMeanS
t=)LSD( edf/2,
*2
αα
Type 'LSD(0.05) =' in cell A14, and the formula
=T.INV.2T(0.05,36)*SQRT(2*B2/10)
LSD(0.05) = 0.298033
If the absolute value of the difference between two means is
greater than the least
significant difference, we reject the hypothesis that the two
means are equal.
For treatments 2 and 3, the difference is 26.569 - 29.382 = -
2.813 with absolute value
2.813. Since 2.813 is greater than LSD(0.05) = 0.298033, we
reject the hypothesis that
treatments 2 and 3 are equal.
t = x
- x
s
n

+
s
n
2 3
2
2
2
3
Example 25: How to perform a two-way analysis of variance.
When data are classified according to two criteria, and when
there is more than one
observation in each combination of the two criteria, a two-way
analysis of variance includes a
term for the interaction between the two classification factors.
Data for this example consist of
the number of diatoms found in a stream at each of two
locations (1 = upstream, 2 = downstream
from a water treatment plant) with sampling occurring in three
different weeks. For each
observation, the site designation is stored in column A, the

week designation in column B, and
the number of diatoms in column C.
Site Week Number
1 1 689
1 1 756
1 2 831
1 2 916
1 3 558
1 3 423
2 1 204
2 1 229
2 2 56
2 2 73
2 3 34
2 3 78
First, arrange the data in a two-way table like this.
Site 1 Site 2
Week 1 689 204
756 229
Week2 831 56
916 73
Week3 558 34
423 78
To perform a two-way Anova use: Anova: Two-Factor With
Replication [Excel 2013: Data
Tab – Data Analysis – Anova: Two Factor With Replication]

In Input Range:, indicate the cells that contain the data and the
labels. For example, if the first
seven rows of columns E, F and G contain the two-way table of
data, specify the input range as
E1:G7.
Set Rows per sample: to 2 and click OK.
Here are the results from this EXCEL analysis.
Anova: Two-Factor With Replication
SUMMARY Site 1 Site 2 Total
Week 1
Count 2 2 4
Sum 1445 433 1878
Average 722.5 216.5 469.5
Variance 2244.5 312.5 86197.67
Week2

Count 2 2 4
Sum 1747 129 1876
Average 873.5 64.5 469
Variance 3612.5 144.5 219412.7
Week3
Count 2 2 4
Sum 981 112 1093
Average 490.5 56 273.25
Variance 9112.5 968 66290.25
Total
Count 6 6
Sum 4173 674
Average 695.5 112.3333
Variance 32769.1 6809.867
ANOVA
Source of
Variation
Sample 102443.2 2 51221.58 18.74589 0.002626 5.143249
Columns 1020250 1 1020250 373.3874 1.24E-06 5.987374
Interaction 79057.17 2 39528.58 14.46653 0.005067 5.143249
Within 16394.5 6 2732.417
Total 1218145 11

F = 373.39 has 1 and 6 degrees of freedom and can be used to
test the hypothesis that
there is no difference between the upstream and downstream
sites. F = 18.75 has 2 and 6 degrees
of freedom and can be used to test the hypothesis that there
were no differences among weeks. F
= 14.47 has 2 and 6 degrees of freedom and can be used to test
the hypothesis that the differences
between sites (if any) were the same in all three weeks (i.e., no
interaction between the two
factors).
Since all three p-values were less than 0.05, we would reject all
three null hypotheses.
Average of Number Site
Week 1 2 Grand Total
1 722.50 216.50 469.50
2 873.50 64.50 469.00
3 490.50 56.00 273.25
Grand Total 695.50 112.33 403.92
From the two-way table of means, it is clear that the number of
diatoms was much lower

(112.33 on average) at the downstream site than at the upstream
site (average = 695.50). It is also
clear that numbers were down in week 3 compared to the other
two weeks. The difference
between the upstream and downstream sites was 504.00 in week
1, 809.00 in week 2, and 434.50
in week 3. It is clear that there is an interaction between the site
and week factors; the difference
between sites depends upon which week the sampling was done.
Example 26: How to calculate a randomized complete block
analysis of variance
Many experiments in agriculture and biology are similar to a
two-way design but have
only one observation per cell. In these experiments, one must
assume that there is no interaction
between the two factors. This assumption is always valid when
one of the factors consists of
ways of grouping the experimental units into more uniform
groups, as is common if field
research.
The present example consists of data on the number of soybean
plants (out of 100;
column C) that failed to emerge. There are two factors in the
experiment. Each observation can
be classified according the fungicide treatment (Check, Arasan,
Spergon, Semasan, or Fermate;

column A) or according to the block in the field (Block 1, Block
2, Bock 3, Block 4 or Block 5;
column B). The 25 observations consist of five fungicide
treatments in all combinations with 5
blocks. The model will not include an interaction term.
The data, arranged for analysis by EXCEL, is stored in rows 1
through 6 of columns E
through J.
Block 1 Block 2 Block 3 Block 4 Block 5
Check 8 10 12 13 11
Arasan 2 6 7 11 5
Spergon 4 10 9 8 10
Semasan 3 5 9 10 6
Fermate 9 7 5 5 3
To analyze this data, proceed as follows:
To perform a Anova for a RCBD, use: Anova: Two-Factor
Without Replication [Excel 2013:
Data Tab – Data Analysis – Anova: Two Factor Without
Replication]
In Input Range:, indicate the cells that contain the data and the
labels. For example, if the first
six rows of columns E, through J contain the two-way table of
data, specify the input range as
E1:J6, or select those cells by suing the mouse.
Check Labels, and click on OK.

Results are as follows:
Anova: Two-Factor Without Replication
SUMMARY Count Sum Average Variance
Check 5 54 10.8 3.7
Arasan 5 31 6.2 10.7
Spergon 5 41 8.2 6.2
Semasan 5 33 6.6 8.3
Fermate 5 29 5.8 5.2
Block 1 5 26 5.2 9.7
Block 2 5 38 7.6 5.3
Block 3 5 42 8.4 6.8
Block 4 5 47 9.4 9.3
Block 5 5 35 7 11.5
ANOVA
Source of
Variation

Rows 83.84 4 20.96 3.874307 0.021886 3.006917
Columns 49.84 4 12.46 2.303142 0.103195 3.006917
Error 86.56 16 5.41
Total 220.24 24
The error mean square (5.41) is an estimate of the pooled
variance and has 16 degrees of
freedom. The table gives us two p-values for the two F-tests but
only one of those is for the
treatments. The F value for the Rows tests if there is significant
differences among rows
(fungicide treatments) for the number of failed germinations.
This information appears in Rows
because that is how the original data was organized into Excel.
The error mean square (5.41) is
an estimate of the pooled variance and has 16 degrees of
freedom. The p-value for blocks cannot
be used to glean information about our treatments (and is
ignored because the blocks are not
randomly assigned).
Example 27: How to prepare a scatterplot of two variables.
For this example, data on cage size (cm2) and body weight (g)
of 12 crabs has been stored
in columns A and B of the EXCEL worksheet. Note that cage

size is the independent (x) variable
and body weight (y) is the dependent variable. In other words,
the size of the cage affects the
body weight of the crab. Excel will pick the first column as the
x variable and the second column
as the y variable.
CageSize BodyWt
159 14.40
179 15.20
100 11.30
45 2.50
384 22.70
230 14.90
100 1.41
320 15.81
80 4.19
220 15.39
320 17.25
210 9.52
To make a scatterplot, a chart must be inserted.
The first step is to highlight the data including labels, then
choose Chart [Excel 2013: Insert Tab
– Insert Scatter (X,Y) or Bubble Chart]

Choose the first option under Scatter (used to compare at least
two sets of values or pairs of
data).
The Chart Title should be descriptive. Click on the title and
rename the graph to describe the
subject matter.
The Axes should also be appropriately labeled. Click the graph
and check the Axis Titles to add
them in.
A trendline can also be added in. PLSC 214 discusses linear
relationships and Excel allows the
regression equation and the r2 value on to the graph.

Because the points are scattered in a pattern from the lower left
corner to the upper right
corner, we conclude that there is a positive relationship between
the two variables. It appears that
bigger cage sizes results in heavier crabs. The slope is 0.0528
which is positive and the y
intercept is 1.7287. The r2 value is 0.7485; nearly 75% of the
variation is explained by the
model.
Example 28: How to calculate a correlation coefficient.
in columns A and B of the EXCEL worksheet (see example 27).
CageSize BodyWt
159 14.40
179 15.20
100 11.30
45 2.50

384 22.70
230 14.90
100 1.41
320 15.81
80 4.19
220 15.39
320 17.25
210
9.52
a) Calculate standard deviations of each of the two variable and
store results in column D.
Type 's1 =' in cell C1, and the formula =STDEV.S(A:A) in cell
D1.
Type 's2 =' in cell C2, and the formula =STDEV.S(B:B) in cell
D2.
b) Calculate the covariance for sample data.
Type 's12 =' in cell C3, and the formula
=COVARIANCE.S(A2:A13,B2:B13) in cell D3
c) Calculate the correlation = covariance/STDEV.S(x) *
STDEV.S(y)
Type 'r = ' in cell C4, and the formula =D3/D2*D1 in cell D4.
Results are:
s1 = 106.3309
s2 = 6.484094
s12 = 596.5107
r = 0.86519

e) Alternative method of calculating correlation.
Type 'r =' in cell C6, and the formula
=CORREL(A2:A13,B2:B13) in cell D6.
f) How to test if the correlation is significant. A test statistic
can be calculated and compared to
a t value with degrees of freedom n – 2. If the test statistic falls
in the rejection region, you
would reject the null hypothesis of ρ = 0.
21
2
r
nr=tcalc −
−
In this example of 12 crabs, we could test at the 5%
significance level if there is a

positive linear correlation. The t value with n -2 degrees of
freedom is t = 1.812.
456.5
)86519.0(1
21286519.0
1
2
22 =−
−
=
−
−
r
nr=tcalc
The test statistic for a right-tailed test falls into the rejection
region. The null hypothesis is
rejected and we concluded that there is a positive linear
relationship between cage size and weight
of crabs.

Example 29: How to perform a regression analysis using
EXCEL
in columns A and B of the EXCEL worksheet (see example 27).
For this example, body weight is the dependent variable and
cage size is the independent
variable. We wish to body weight by using its relationship to
cage size. In this example, we have
only one independent variable.
Highlight the data and select Regression [Excel 2013: Data Tab
– Data Analysis - Regression]
Set Input Y Range: to a1:a13.
Set Input X Range: to b1:b13.
Check Confidence levels .
Click OK.
SUMMARY OUTPUT

Regression Statistics
Multiple R 0.8651
R Square 0.7485
Adjusted R
Square
0.7234
Standard Error 55.922
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 93095.89 93095.89 29.76876 0.00028
Residual 10 31273.02 3127.302
Total 11 124368.9
Coefficients Standard
Error
t Stat P-value Lower 95% Upper
95%
Intercept 24.65 35.24 0.700 0.50016 -53.87 103.18
X Variable 1 14.188 2.600 5.456 0.00028 8.40 19.98
The intercept (the expected value of the dependent variable
when the independent
variable is zero) was estimated as 24.65 with a standard
deviation of 35.24.

The slope (the expected change in the dependent variable for an
increase of one unit in the
independent variable) was 14.188 with a standard deviation of
2.600. A t-test indicates that the
slope was significantly different from zero because the p-value
= 0.00028 which is less than α =
0.05.
The standard deviations of the intercept and slope may also be
called standard errors and
they are standard deviations of the distribution of sample
statistics. Both standard deviations
have n - 2 = 10 degrees of freedom because both are (complex)
functions of the error sum of
squares.
The coefficient of determination (R-Square = 74.85%) indicates
that nearly 75% of the
variation in body weight can be explained as a linear function of
cage size.
Annual Editions Journal Summary
Instructions:
1. Summarize each of the readings in the tables below.
2. You may expand the table to accommodate your information.
3. Write in complete sentences using proper grammar and
mechanics.
Readings:
Unit 5 in the textbook: Social Media and Commerce

· The Rising Influence of Social Media as Reflected by Data
· How Google Dominates Us.
· Can Online Piracy Be Stopped by Laws?
· How Psychology Will Shape the Future of Social Media
Marketing.
· AmazonFresh is Jeff Bezos’ Last Mile Quest for Total Retail
Domination.
Reading #15 – The Rising Influence of Social Media as
Reflected by Data
Main idea of the article:
Information presented: List at least five points made by the
author
1.
2.
3.
4.
5.

Response to the article:
Reading #16 –How Psychology Will Shape the Future of Social
Media Marketing
author
1.
2.
3.
4.
5.

Reading #17– How Google Dominates Us
author
1.
2.
3.
4.
5.

Reading #18 – AmazonFresh is Jeff Bezos’ Last Mile Quest for
Total Retail Domination.
author
1.
2.
3.
4.
5.

Reading #19 - Can Online Piracy Be Stopped by Laws?
author
1.
2.

3.
4.
5.
Adapted from Dushkin Online Annual Editions Test Your
Knowledge Form http://www.dushkin.com/online/
LAB2A.DAT142142130139132150137133147135134146140132
13614114914113513613013613413714613815213213712613413
51471421421351311421381461351481291381351371411441471
41141138139139145139137147141143135136140139137139134
13912913714914214013013913514413413213313514413413913
51341311421421521411401361441401391421461391391351391
42138135133142137141141142136141134135138135140144142
13814813514113913814113713513614114412913813313813013
01331381241421421381321441401461461451381391361321391
35137136131137147140137137134129134140141139143140138
13913715014215014613813012213213814112313413613914215
21491381391371351331381381351411451391301401331441431
41137137138136134143143138136140142136148141133149139
13114414313914214612713913713513113614413513714514714

11361471331311451361411401391451401441371371391441381
38141134145136139136135143135136135149144133146134140
15013714114213015414114313813413813113514914913113214
21361321441341521361391421391411401311371341381521371
34139124147144146140139141132143145137142139138138143
14913013513413614915014513714514113813614113113914213
61441371351511421431431401301451421391301371511391401
35138133137143134132136131135141145132135139139141138
13914814114313813314713013512813813614514313413313813
81471371401401361331391431381431371421461331511411331
41138145149139140128140137140146138132141151137140128
144132143149137
LAB2B.DAT146140135137140137138143149142144145145139
14114113713913915413713515113913813613813914614214113
61371471441401321461481441401341391411391391391491301
39138134134135141139151130135136138137142138140138130
147141136
LAB2C.DAT152147131139145136131142138140140134130134
13713514013314414013913814514514513813914614614713514
31391391301421361451381351391451341341381361421351361
43135130140149141137140152137132136133140141145141132
13313513813714114413714314214413113613814613813513913
71411391371411441421451301431301361461421391331391451
45133137137150139142140140137142143129135132133144138
14714114214514213314214114013414913913813313813914112
61401371361421311351321361391351361381511461371441411
32141135149137139142150131134139139139135134132134147
13713913614913713214113713213713413214314214913113313
21421441411351441341391371471341411431341351521371481
24137135130123133147142146131137139141139142136130147
13713613713414313914413714013314014914014914012813414
91301441411351391381411291301421291361361391281381371
35138138131150134132140135

Introductory Statistics Laboratory for Excel .docx

Introductory Statistics Laboratory for Excel .docx

More Related Content

Similar to Introductory Statistics Laboratory for Excel .docx (20)

More from normanibarber20063 (20)

Recently uploaded (20)

Introductory Statistics Laboratory for Excel .docx