Practice test1 solution

1
Statistics, Practice Test 1 Solution
Intro to Stats, Summarizing, Describing Data
1. True or False: The value of variance and standard deviation is never negative.
True – these are absolute quantities that is a measure of variation of all values from the
mean (it can be zero)
 
 
 
 
2
2
2
2
2
2
Population Variance:
Population Standard Deviation:
Sample Variance:
1
Sample Standard Deviation:
1
x
N
x
N
x x
s
n
x x
s
n


















2. What kind of variable “weights of bears” is? Quantitative or Qualitative
Quantitative – variable “weights of bears” gives numbers that represent counts or
measurements
3. What kind of variable “gender of bears” is? Quantitative or Qualitative
Qualitative – “gender of bears” is distinguished by nonnumeric characteristics
4. Define a population in statistics.
Population is the complete collection of all elements (scores, people, measurement, etc)
to be studied
5. The value of the middle term in a ranked data set is called
the median
6. Given any data, how do you find the mode?
Mode is the value that appears with the greatest frequency among the data. A data set can
have one, more than one, or no mode (when all numbers appear with equal frequency).

2
7. True or False: The “number of chairs” is considered to be a continuous variable.
False – The number of chairs is not continuous. We cannot have ¼ amounts of chairs.
8. On a Pareto chart, the frequencies should be represented on the vertical (or y) axis.
9. Given the frequency table, answer the following questions.
Age group Frequency
11-20 5
21-30 6
31-40 9
41-50 11
51-60 4
a) The number of classes in the table is 5 [number of statistical age groups defined]
b) The class width is 10
(upper limit – lower limit + 1 unit or difference of two consecutive lower limits or upper
limits i.e. 21-11)
c) The midpoint of the 4th
class is 45.5
(41+50)/2 = 45.5
d) The Lower Boundary of the 5th
class is 50.5
(50+51)/2 = 50.5 (think of it as a midpoint between the upper limit of 4th
class and the
lower limit of 5th
class)
e) The Upper Limit of the 1st
class is 20
1st
class is 11-20  upper limit
f) The sample size is 35
5+6+9+11+4 = 35
g) The relative frequency of the 1st
class is
relative frequency: f/n
relative frequency of the 1st
class = f/n = 5/35 = 1/7 ≈ 0.1429 (or 14.29 %)
Age group Frequency Midpoint
=(LL+UL)/2
LB - UB RF= f / n
1) 11-20 5 (11+20) / 2 = 15.5 10.5-20.5 5/35 = 1/7
2) 21-30 6 25.5 20.5-30.5 6/35
3) 31-40 9 35.5 30.5-40.5 9/35
4) 41-50 11 45.5 40.5-50.5 11/35
5) 51-60 4 55.5 50.5-60.5 4/35

3
35n f 
10. The following frequency table describes the speeds of drivers ticketed through a 30 mph
speed zone.
Speed Frequency (number of drivers)
42-45 25
46-49 14
50-53 7
54-57 3
58-61 1
a) Calculate the relative frequencies for all classes.
n = 50
first class: f/n = 25/50 = 0.5 (or 50%)
second class: 14/50 = 0.28 (or 28%)
third class: 7/50 = 0.14 (or 14%)
fourth class: 3/50 = 0.06 (or 6%)
fifth class: 1/50 = 0.02 (or 2%)
∑rf = 1 (or 100%)
b) What percentage represents the speed of 53 mph or less?
cumulative frequency distribution of 53 mph or less refers to first three classes
cumulative frequency = 0.5 + 0.28 + 0.14 = 0.92
Or: (25 +14 +7) / 50 = 46 / 50 = 92%
92% represents the speed of 53 mph or less
c) What are the class boundaries?
class boundaries are midpoints between corresponding upper and lower limit
for the outer bound, same amount is either subtracted or added
class boundaries: 41.5-45.5, 45.5-49.5, 49.5-53.5, 53.5-57.5, 57.5-61.5
Speed Frequency (number of
drivers)
Q 16: RF = f / n Q 18: Boundaries
1 42-45 25 25/50 =1/2 41.5-45.5
2 46-49 14 14/50=7/25 45.5-49.5
3 50-53 7 7/50 49.5-53.5
4 54-57 3 3/50 53.5-57.5
5 58-61 1 1/50 57.5-61.5
50n f 
d) Construct a histogram corresponding to the frequency distribution table.

4
30 --
25 --
20 --
15 –
10 --
5 –
0 – | | | | |
41.5 45.5 49.5 53.5 57.5 61.5
e) Prepare the cumulative frequency distribution. (see below)
f) Prepare the cumulative relative frequency distribution.
Cumulative speed Cumulative frequency Cumulative relative frequency
42-45 25 25/50 = 0.5 (or 50%)
42-49 25+14 = 39 39/50 = 0.78 (or 78%)
42-53 25+14+7 = 46 46/50 = 0.92 (or 92%)
42-57 25+14+7+3 = 49 49/50 = 0.98 (or 98%)
42-61 25+14+7+3+1 = 50 50/50 = 1 (or 100%)
Frequency
SPEED (mph)

5
g) Draw an ogive of the cumulative percentage distribution.
h) Using the ogive find the percentage of drivers who drove 47 mph or less.
Number 47 is somewhere between 45.5 and 49.5 on the horizontal axis which corresponds to
approximately 60%, therefore, 60% to 61%, of drivers drove 47 mph or less.
0
20
40
60
80
100
120
41.5 45.5 49.5 53.5 57.5 61.5
0
20
40
60
80
100
120
41.5 45.5 49.5 53.5 57.5 61.5

6
11. The following data gives the number of hours that a few employees at the GM factory
worked last week.
17, 38, 27, 14, 18, 34, 16, 42, 28, 24, 40, 20, 23, 31, 37, 21, 30, 25
Ranked Data: (Note: We don’t need to rank data for some calculations such as finding the mean,
however, it’s a good practice to do so for those calculations that need ranked data.)
14, 16, 17, 18, 20, 21, 23, 24, 25, 27, 28, 30, 31, 34, 37, 38, 40, 42
n = 18
a) Find the mean
x
x
n
 
 (14+16+17+18+20+21+23+24+25+27+28+30+31+34+37+38+40+42)/18
= 485/18 ≈ 26.9444
b) Find the mode
there is no mode (each term applies only once)
c) Find the median.
(25+27)/2 = 26
d) Find the midrange.
MR = (Min + Max)/2 = (14+42)/2 = 28
e) Find the range
R= Max – Min = 42 – 14 = 28.
f) Find the variance.
     
2 22
2
Sample Variance:
1 ( 1)
x x n x x
s
n n n
 
 
 
 
   
22 2 2
14 16 ... 42 14 16 ... 42
74.99673203... 75
18(18 1)
n       
 

= (18 (14343) – (485)2
) / 18(17) = 75
Or we can do the following:
 
2
2 2 2
2 (14 26.944) (16 26.944) ... (42 26.944)
Sample Variance: 75
1 18 1
x x
s
n
      
 
 
g) Find the standard deviation.
 
2
Sample Standard Deviation: 75
1
x x
s
n

 


s ≈ 8.66

7
h) Find the interquartile range (IQR).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
14 16 17 18 20 21 23 24 25 27 28 30 31 34 37 38 40 42
Q2 = median =
25+27
2
= 26
Q1 = Median of the first half of data = 20
Q3 = Median of the second = 34
interquartile range: Q3 – Q1 = 34 – 20 = 14
12. IQ scores have a mean of 100 and a standard deviation of 15.
a) Find the coefficient of variance.
15
: 100, 15& 15%
100
Given CV

 

    
b) Using the range rule of thumb to establish the minimum and maximum “usual” IQ
scores.
2   100 – 2(15) = 70 to 100 + 2(15) = 130
usual minimum is 70 and usual maximum is 130
c) Using the Chebyshev’s Theorem, find what is the least percentage of those who will
have an IQ score of 70 to 130.
1 – 1/K2
K = 2 ( K is the number of standard deviations away from the mean)
1 – 1/22
= 1 – ¼ = ¾
At least 75% have an IQ score of 70 to 130.
d. Using the empirical rule, find the percentage of those who will have an IQ score of
70 to 130.
95% will have an IQ score of 70 to 130.
(70 to 130 are 2 standard deviations away from the mean)
13. Define a parameter and a statistic.
parameter: a numerical measurement describing some characteristic of a population
statistic: a numerical measurement describing some characteristic of a sample
14. Define random sample and simple random sample.
random sample: members of the population are selected in such a way that each
individual member has an equal chance of being selected
simple random sample (of size n): subjects selected in such a way that every possible
sample of the same size n has the same chance of being chosen
15. Define the following types of sampling: systematic, convenience, stratified, cluster
systematic sampling: select some starting point, and then select every kth element in
population
convenience sampling: use results that are easy to get

8
stratified sampling: subdivide the population into at least two different subgroups that
share the same characteristics, then draw a sample from each subgroup (stratum)
cluster sampling: divide the population into sections (or clusters that are similar to one
another ), randomly select some of those clusters, choose all members from selected
clusters
16. What are different levels of measurement of data? Give examples.
nominal level of measurement: qualitative data
ex) gender of subjects
ordinal level of measurement: categories with some order (differences between data
values either cannot be determined or is meaningless but there is an order)
ex) course grades A, B, C, D, F
interval level of measurement: differences between data values are meaningful, but there
is no natural starting point (the value 0 does not mean lack of)
ex) years such as 1000, 2000, 1492, 1776
ratio level of measurement: interval level modified to include natural zero starting point
ex: price of college textbooks ($0 means no cost)
LEVELS OF Measurement Examples
RATIO
 Distances (in km) travelled by cars (0 km represents no
distance travelled, and 400 km is twice as far as 200 km.)
 Prices of college textbooks ($0 does represent no cost,
and a $100 book does cost twice as much as a $50 book.)
INTERVAL
 Body temperatures of 98.20
F and 98.60
F
 The years 1769 and 1845
ORDINAL
 Ranks of colleges in U.S. News and World Report (Ranks
can be first, second, third, and so on, which determines an
ordering)
 A school teacher assigns grades of A, B, C, D, or F
(These grades can be arranged in order, but we can’t
determine difference between the grades.)
NOMINAL
 Eye colors (blue, brown, black, other)
 Political party (Democrat, republican, Independent, other)
17. What’s the difference between an observational study and an experiment? Give
examples.
observational study: observing and measuring specific characteristics without
attempting to modify the subjects being studied
ex) Charles Darwin’s observation of Darwinian finches at the Galapagos Islands
experiment: apply some treatment and then observe its effects on the subjects
ex) giving some type of medicine and see whether it cures certain type of disease among
subjects
18. Given the following set of data: 32, 19, 14, 7, 15, 3, 4, 5, 9, 16, 15, 16, 19, , 50

9
a) Rank the data from smallest to largest.
b) Prepare a box-and-whisker plot. [Box plot]
c) Does this data set contain any outliers? [Make sure to show the lower and the upper fences
on your graph]
d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?]
a) Answer: 3, 4, 5, 7, 9, 14, 15, 15, 16, 16, 19, 19, 32, 50
b) Answer:
Q1 = 7 (4th
data) L = (25/100)(14) = 3.5 ≈ 4; Q2 = median = (15+15)/2 = 15;
Q3 = 19 (11th
data) L = (75/100)(14) = 10.5 ≈ 11
Minimum Q1 Median Q3 Maximum
3 7 15 19 50
c) Answer: Outlier: 50
The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable"
values from the outlier values. Outliers lie outside the fences.
IQR = Q3 – Q1 = 19 – 7 = 12; IQR x 1.5 = 12 x 1.5 = 18
A data is considered an outlier if its value is less than
Q1 – 1.5 IQR = 7 – 18 = – 11
A data is considered an outlier if its value is larger than
Q3 + 1.5 IQR = 19 + 18 = 37
A data is considered an extreme outlier if its value is larger than
Q1 – 3 IQR = 7 – 36 = –29
A data is considered an extreme outlier if its value is larger than
Q3 + 3 IQR = 19 + 36 = 55
d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?]
Answer: Skewed to the right
Note 1: Make sure the drawing is to scale.
Note 2: Skewed data show an uneven boxplot in which case the median cuts the box
into two unequal pieces. Longer part on the right or above the median indicates data is skewed
to the right. Longer part on the left or below the median indicates data is skewed to the left.
Note 3: Sometimes the box may look even or uneven and even Skewed to one side,
however, the whiskers (tails on each side of the box) may indicate otherwise. Therefore, pay
attention to both.

10
19. Draw the box-and-whisker plot for the following data set:
77, 79, 80, 86, 87, 87, 94, 99
Median: (86 + 87) ÷ 2 = 86.5 = Q2
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the
data set each contain an even number of values, the sub-medians will be the average of the
middle two values. Copyright © 2004-2011 All Rights Reserved
Q1 = (79 + 80) ÷ 2 = 79.5
Q3 = (87 + 94) ÷ 2 = 90.5
Minimum = 77, Q1 = 79.5, Q2= 86.5, Q3= 90.5, Maximum = 99
Box & Whisker Plot:
OR:
Minimum Q1 Median Q3 Maximum
77 79.5 86.5 90.5 99
This set of five values has been given the name "the five-number summary".
To find the outliers:
IQR = Q3 – Q1= 90.5 -79.5 = 11.
The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable"
values from the outlier values. Outliers lie outside the fences.
The outliers will be any values below Q1 – 1.5×IQR = 79.5 – 1.5 × 11 = 79.5 – 15.5 = 64 or
above Q3 + 1.5×IQR = 90.5 + 1.5 × 11 = 90.5 + 15.5 = 106.
The extreme values (Outliers) will be those below Q1 – 3×IQR or above Q3 + 3×IQR.

11
Answer: This data is almost symmetric (Normal, bell shaped)
20. If the mean of a set of data is 23.00, and 12.60 has a z-score of –1.30, then the
standard deviation must be:
A) 4.00 B) 32.00 C) 64.00 D) 8.00
Answer: D
12.6 23
8
1.3
x x
z
z
 


  
    

21. Find the z score for each student and indicate which one is higher.
Art Major
Theater Major
A) Both students have the same score.
B) Neither student received a positive score; therefore, the higher score cannot be
determined.
C) The theater major has a higher score than the art major.
D) The art major has a higher score than the theater major.
Answer: C
46 50 4
0.8
5 5
70 75 5
0.7143
7 7
Art
Theater
x x
z
s
x x
z
s
 
    
  
    
22. If the five number summary for a set of data is 0, 3, 6, 7, and 16, then the mean of this
set of data is
A) 6 B) there is insufficient information to calculate the mean C) 8 D) 5
Answer: B
X  46 X  50 s  5
X  70 X  75 s  7

12
23. A student received the following grades: An A in Statistics (4 credits), a F in Physics II
(5 credits), a B in Sociology (3 credits), a B in a Literature seminar (2 credits), and a D
in Tennis (1 credit). Assuming A = 4 grade points, B = 3 grade points, C = 2 grade
points, D = 1 grade point, and F = 0 grade points, the student's grade point average is:
1
1
( 4) ( 0) ( 3) ( 3)4 5 3 2 1
4
( 1) 32
2.1
5 3 2 1
33
15
n
i
n
i
x
x
w
w


        
   
   


Answer: 2.133
24. Which of the following is true?
A) B) C) D)
Answer: B
D: Decile, 10 equal parts
P: Percentile, 100 equal parts
Q: Quartile, 4 equal parts
25. Given the following data set, find the value that corresponds to the
a) 75th percentile.
b) 30th percentile.
c) Find the percentile corresponding to number 44.
d) Using range Rule of Thumb, estimate the standard deviation.
10, 44, 15, 23, 14, 18, 72, 56
:10,14,15,18,23,44,56,72
75
8, : 8 6
100 100
Since 6 is a whole nummber
6th+7th 44 56
50
2 2
Ranked
k
n location L n     

 
a) Answer: 50
30
8, : 8 2.4
100 100
Alw Roays : 3rd value: 15und Up
k
n location L n     
D P Q50 5 25  D P Q5 50 2  D P Q50 5 2  D P Q5 5 5 

13
b) Answer: 3rd
value = 15
63
# of values 44 5
: 0.625
8
Round accordingly
percentile p
n

  
c) Answer: P63
d)
72 10
15.5
4 4 4
R Max Min
s
 
   
26) Construct a frequency distribution with 4 classes for the following data representing
numbers of keyboards assembled for a sample of 25 days in a company:
45 52 48 41 56 46 44 48 53 51 53 51
48 46 43 52 50 54 47 44 47 50 49 52
42
Range(Hi-Low)
Width=
Number of classes
4 classes:
56- 41
3.75
4
W   & rounding will result in: w = 4,
Lower limit of the 1st
class (starting point) is a convenient Number  the smallest value
Frequency Distribution Table
Classes Frequency Relative Frequency
41 - 44 5 0.2
45 - 48 8 0.32
49 - 52 8 0.32
53 - 56 4 0.16

Practice test1 solution

More Related Content

What's hot (20)

Similar to Practice test1 solution (20)

More from Long Beach City College (20)

Recently uploaded (20)

Practice test1 solution