Distinguish between qualitative data and quantitative data.
1. 1. Introduction to Statistics
1.1 What is Statistics?
1.2 Classification of Statistics
1.3 Some Basic terms
1.4 Types of variables and measurement of
scales
1
2. LEARNING OBJECTIVES
⚫ Distinguish between qualitative data
and quantitative data.
⚫ Describe the difference between
population and sample.
⚫ Describe nominal, ordinal, interval, and
ratio scales of measurements.
After studying this chapter, the participants should be able to:
2
3. 1.1 What is statistics?
⚫ In more common usage statistics refers to numerical
facts. Example: daily income of labour workers, number
of products produced. Number of students enrolled in a
university every year, number of graduate students from
universities of a country. (In plural sense)
⚫ Statistics: is a method and procedure which deals with
collecting, organizing, summarizing, analyzing and
interpreting of data based on the analysis of the data.
(Singular sense)
⚫ Statistics is a science that helps us make better decisions in
business and economics as well as in engineering fields.
3
4. Importance of statistics
⚫ Statistics teaches us how to summarize, analyze, and draw
meaningful inferences from data that then lead to improve
decisions or policy.
• In business, industry and management: The commitment of
large amount of capital and human resource business became
very complex and competitive and there came a direct need
for sound decision.
• Any decision or policy becomes successful if the forecasts is
made at time of decision prove to be accurate. The degree of
accuracy depends upon the proper analysis of the past data. It
is here that statistical data and statistical tools of average,
variation, test of significance, forecasting techniques, time
series and soon played an important role.
4
5. Importance of Statistics in (Example)
• Importance of statistics :
✓Statistics is used in establishing the relationship
between
➢Input and output analysis.
➢Cost benefit analysis.
✓Resource allocation.
✓National income in accounting.
✓Population survey.
✓Measurements of profit, wage, and inventory
management, and so on.
✓The quality of products
5
6. Limitation of Statistics
✓Deals only with quantitative phenomenon
✓Statistics do not deal with individual
✓If one is not aware of statistical methods he can
not make the best possible use of available data.
6
7. 1.2 Classification of Statistics
⚫ Descriptive Statistics
✓ Collection of data
✓ Summarization of data
✓ Presentation of data
✓ Analysis of data
⚫ Inferential Statistics
✓ Predict and forecast values
of population parameters
✓ Test hypotheses about
values of population
parameters
✓ Make decisions
7
8. 1.3 Some Basic terms
•Population: is the aggregate of all elements,
individuals, items, objects whose
characteristic are being studied. The
population that is being studied is also
called target population.
• Sample: is a finite subset of the population
selected from it with the objective of
investigating its properties.
8
9. Some basic terms (cont)
• Sampling: is a tool which enables us to draw
conclusion about the characteristic of the
population after studying only those objects
or items that are included in the sample.
• Survey: is the collection of information from the
elements of a population or sample.
• Census: is the survey that includes every element of
the target population, although it is expensive and
time consuming.
9
10. Some basic terms (cont)
Census of a population may be:
✓ Impossible
✓ Impractical
✓ Too costly
• Sample survey: is a survey that is conducted on a
sample.
• Parameter: is statistical constant obtained from the
population data set.
• Statistic: is a statistical constants obtained from the
sample data.
10
11. Classification of Data
• Data can be classified on the bases of time or place of
happening , if the data are recorded for series of time
periods or different places.
• Depending on the nature of the variable or criterion used
for classification, four types of classifications can be
identified.
a) Chronological classifications: - the formation of groups is
based on the time or
sequence of happening.
11
12. Classification of Data
b) Geographical classifications: - the formation of groups is based on the
place of happening or location difference.
c) Qualitative Classification: - the formation of groups or categories is
based on some attribute or qualitative characteristics of the
population. An attribute is a qualitative feature of the
population units which cannot be measured or counted.
d) Quantitative classifications: - the formation of groups or categories is
based on quantitative (or numerical) characteristics of the population.
Such numerical characteristic that takes varying values for the different
elements of the population is called variable.
12
13. Types of variables
• Discrete variable: is a variable that assume finite or
countable infinite values such as 0, 1, 2, 3, 4, …..
Example: Number of children in a family,
Number of goal scored in a football match,
Number of times a machine fails per day in a factory, etc
• Continuous variable: is a random variable that assumes infinitely
many values in a range of values it attains.
Example: Weight, height of children
Length of time in a telephone call you make
Amount of water you drink per day, etc
13
14. Measurement scales
Variables differ in “how well” they can be measured, i.e., how
much measurable information their measurement scale can
provide. Variables are classified as:
•Nominal Scale - groups or classes
✓Gender, race, etc.
•Ordinal Scale - order matters
✓Ranks (top ten videos)
•Interval Scale - difference or distance matters – has
arbitrary zero value.
✓Temperatures (0F, 0C)
•Ratio Scale - Ratio matters – has a natural zero value.
✓Salaries
14
15. 2. Method of Data collection &
Presentation
2.1Methods of data collection
✓ Source of data
✓ Method of collection
2.2Methods of Data Presentation
✓ Table
✓ Diagram
✓ Graphical
15
16. LEARNING OBJECTIVES
• Identify sources of data and collect data
• Create different types of tables and charts that
describe data sets.
After studying this chapter, the participants should be able to:
16
17. 2.1 Method of data collection
There are two types of data:
• Secondary data:- are data obtained from some
other sources (which are already collected) but
can be used for immediate purpose of the
investigator.
Source of secondary data
✓government publications,
✓Journals and reports,
✓publications of research organization,
✓internal records of organizations, etc.
17
18. Method of data collection (cont.)
• Primary data:-are those data collected directly by the
investigator for the immediate purpose of his
investigation.
Ways of collecting primary data:
✓ Questionnaire:-Ask concerned people to have first hand
information to fill written list of questions
✓ Interview:- One to one communication between the
investigator (data collector) and the respondent.
✓ Experiment and observation:- The investigator may not
question anyone but conduct an experiment him/herself
and record the result or observe a certain phenomenon
and record the happening for him/herself.
18
19. 2.2 Presentation of Data
Major means of presentation of data are:
• Tables
• Graphs
• Diagrams
19
20. Tables
Table:-is a systematic arrangement of group of data
in rows and columns
A complete table consists of;
✓Self explanatory title;
✓Row and column headings;
✓Row and column totals, where appropriate;
✓Units of measurement of the data;
✓Footnotes;
✓Table number;
✓Source of data;
20
22. Frequency Distribution
• Frequency distribution:- is a tabular summary of raw
data formed by listing distinct values of a variable
or the values grouped in a class with their
corresponding frequencies.
• Ungrouped frequency distribution:- a distribution that
shows frequencies of single distinct values of a variable.
22
23. Frequency Distribution (cont.)
⚫ Dividing data into groups or classes or
intervals
⚫ Groups should be:
✓Mutually exclusive
• Not overlapping - every observation is assigned to only
one group
✓Exhaustive
• Every observation is assigned to a group
✓Equal-width (if possible)
• First or last group may be open-ended
23
24. Frequency Distribution (cont.)
• Grouped frequency distribution:-a distribution
that has a list of classes or groups of values of a
variable with the corresponding frequencies.
24
25. Frequency Distribution (cont.)
⚫ Table with two columns listing:
✓Each and every group or class or interval of values
✓Associated frequency of each group
• Number of observations assigned to each group
• Sum of frequencies is number of observations
– N for population
– n for sample
• Class or class interval :- range of values of a
variable used in grouping
• Class limits:- the list and greatest values assigned
to a class called the lower and upper class limits,
respectively.
25
26. Frequency Distribution (cont.)
⚫ Class boundary: If the classes are inclusive type
there is always a gap or difference between
successive lasses. To avoid such gaps we form a
class boundary by using the relation.
LCB =LCL - d/2 and UCB=UCL + d/2
where d is the gap between successive classes
⚫ Class midpoint (Class mark) is the middle value of a
class interval an a representative of the
corresponding class
⚫ Relative frequency is the ratio of the frequency a
cass to the total observations.
Sum of relative frequencies = 1
26
27. Frequency Distribution (cont.)
• Example of relative frequency of the 1st class: 4/55 = 0.07
• Sum of relative frequencies = 1
27
28. Frequency Distribution (Cont.)
The cumulative frequency of each class /group is the sum of the
frequencies of that and all preceding classes/groups.
28
29. Histogram
⚫ A histogram is a chart made of bars of
different heights.
✓Widths and locations of bars correspond to widths
and locations of data groupings
✓Heights of bars correspond to frequencies or
relative frequencies of data groupings
✓There is no gap between successive rectangular
bars.
29
32. Methods of Displaying Data
⚫ Pie Charts
✓Categories represented as percentages of total
⚫ Bar Graphs
✓Heights of rectangles represent group frequencies
⚫ Frequency Polygons
✓Height of line represents frequency
⚫ Ogives
✓Height of line represents cumulative frequency
⚫ Time Plots
✓Represents values over time
32
36. 3. Summary Measures
3.1 Measures of Central
Tendency
✓ Median
✓ Mean
✓ Mode
3.2 Measures of Variability
✓ Range
✓ Interquartile range
✓ Variance
✓ Standard Deviation
✓ Coefficient of variation
3.3 Other summary
measures:
✓ Skewness
✓ Kurtosis
36
37. LEARNING OBJECTIVE
⚫ Calculate and interpret percentiles and
quartiles.
⚫ Explain measures of central tendency and how
to compute them.
⚫ Explain measures of variations and how to
compute them.
After studying this chapter, the participants should be able to:
37
38. 3.1 Measures of Central Tendency
or Location
• Median
➢ Middle value when sorted in order of magnitude
➢ 50th percentile
➢ 2nd quartile
• Mode
➢ Most frequent occurring value
• Mean
➢ Average
38
39. Example 3-1
A large department store collects
data on sales made by each of its
salesperson. The number of sales
made on a given day by each of 20
salesperson is shown on the next
slide. Also, the data has been
sorted in magnitude.
39
41. Example 3-1 (Continued) Percentiles
⚫ Find the 50th, 80th, and the 90th percentiles of this
data set.
⚫ To find the 50th percentile, determine the data point
in position
(n + 1)P/100 = (20 + 1)(50/100)=10.5
⚫ Thus, the percentile is located at the 10.5th position.
⚫ The 10th observation is 16, and the 11th observation
is also 16.
⚫ The 50th percentile will lie halfway between the 10th
and 11th values (which are both 16 and
(16 +16)(1/2) in this case) and is thus equals 16.
41
42. Example 3-1 (Continued) Percentiles
⚫ To find the 80th percentile, determine the data
point in position
(n + 1)P/100 = (20 + 1)(80/100) = 16.8
⚫ Thus, the percentile is located at the 16.8th
position.
⚫ The 16th observation is 19, and the 17th
observation is also 20.
⚫ The 80th percentile is a point lying 0.8 of the
way from 19 to 20 and is thus 19.8
{[19 +(0.8)(20-19)]=19.8}
42
43. Quartiles – Special Percentiles
⚫ Quartiles are the percentage points that
break down the ordered data set into
quarters.
⚫ The first quartile is the 25th percentile. It is
the point below which lie 1/4 of the data.
⚫ The second quartile is the 50th percentile.
It is the point below which lie 1/2 of the
data. This is also called the median.
⚫ The third quartile is the 75th percentile. It
is the point below which lie 3/4 of the data.
43
44. Quartiles and Interquartile Range
⚫ The first quartile, Q1, (25th percentile) is
often called the lower quartile.
⚫ The second quartile, Q2, (50th
percentile) is often called the median
or the middle quartile.
⚫ The third quartile, Q3, (75th percentile)
is often called the upper quartile.
⚫ The interquartile range is the difference
between the first and the third quartiles.
44
46. Example – Median (Data is used from
Example 3-1)
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
Median
Median
50th Percentile
(20+1)50/100=10.5 16 + (.5)(0) = 16
The median is the middle
value of data sorted in
order of magnitude. It is
the 50th percentile.
46
47. Example - Mode (Data is used from
Example 3-1)
.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
The mode is the most frequently
occurring value. It is the value with the
highest frequency.
Mode = 16
47
48. Arithmetic Mean or Average
• The arithmetic mean or mean of a set of
observations is their average - the sum of the
observed values divided by the number of
observations.
Population Mean
=
=
n
i
i
x
n
x
1
1
=
=
N
i
i
x
N 1
1
Sample Mean
48
49. Example – Mean (Data is used from
Example 3-1)
Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17_
317
Mean is a computed average
85
.
15
)
317
(
20
1
1 20
1
=
=
=
=
i
i
x
n
x
49
50. Mean for ungrouped frequency
distribution
• The following is the frequency distribution of
the number of telephone call received in 245
successive one minute interval at an
exchange.
Obtain the mean number of calls per minute?
• Solution
50
# of calls (xi) 0 1 2 3 4 5 6 7
Frequency(fi) 14 21 25 43 51 40 39 12
765
.
3
245
922
12
39
40
51
43
25
21
14
)
12
(
7
)
39
(
6
)
40
(
5
)
51
(
4
)
43
(
3
)
25
(
2
)
21
(
1
)
14
(
0
1
1
=
=
+
+
+
+
+
+
+
+
+
+
+
+
+
+
=
=
=
=
n
i
i
n
i
i
i
f
f
x
x
51. Weighted Mean
• If all values in the data set have no equal
importance we use weighted mean.
• If a data set have weights
respectively, then the weighted mean is given
by :
n
n
n
W
w
w
w
w
x
w
x
w
x
x
+
+
+
+
+
+
=
2
1
2
2
1
1
n
x
x
x ,
,
, 2
1 n
w
w
w ,
,
, 2
1
51
52. Example on weighted average
• If an instructor counts the final examination in
a course four times as much as each 1-hour
examination, what is the weighted average
grade of a student who obtained grades of 69,
75, 56, and 72 in 1-hour examinations and a
final examination grade of 78?
• Solution
52
73
8
584
4
1
1
1
1
)
4
(
78
)
1
(
72
)
1
(
56
)
1
(
75
)
1
(
69
2
1
2
2
1
1
=
=
+
+
+
+
+
+
+
+
=
+
+
+
+
+
+
=
n
n
n
W
w
w
w
w
x
w
x
w
x
x
53. Example - Mode (Data is used from
Example 3-1)
Mean = 15.85
Median and Mode = 16
Mean < Median
.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
53
54. 3-2 Measures of Variability or Dispersion
⚫ Range
✓Difference between maximum and minimum
values
⚫ Interquartile Range
✓Difference between third and first quartile(Q3 - Q1)
⚫ Variance
✓Average*of the squared deviations from the mean
⚫ Standard Deviation
✓Square root of the variance
54
55. Example - Range and Interquartile
Range (Data is used from Example 3-1)
Sorted
Sales Sales Rank
9 6 1
6 9 2
12 10 3
10 12 4
13 13 5
15 14 6
16 14 7
14 15 8
14 16 9
16 16 10
17 16 11
16 17 12
24 17 13
21 18 14
22 18 15
18 19 16
19 20 17
18 21 18
20 22 19
17 24 20
Range = Maximum - Minimum
= 24 - 6 = 18
Q1 = 13 + (.25)(1) = 13.25
Q2 = Median =P50
Q3 = 18+ (.75)(1) = 18.75
Interquartile Range(IQR)= Q3 - Q1
=18.75 - 13.25 = 5.5
Minimum
Maximum
First quartile
Third quartile
55
56. Variance and Standard Deviation
• The variance is one of the most useful
measure of variation defined as the mean of
the square of the deviation of the given
observations from their mean.
➢For population data the variance ( ) is
where
2
2
1 1
2
2
2 1
)
(
1
−
=
−
=
= =
N
i
N
i
i
i x
N
x
N
=
=
N
i
i
x
N 1
1
56
57. Variance and Standard Deviation
➢For sample data the variance ( S2 ) is
• The positive square root of the variance is called the
standard deviation.
• The value of the standard deviation tells us how closely
the values of a data set are clustered around the mean.
• If the standard deviation is small (large), then the
values of the data set are spread over relatively smaller
(larger) range around the mean.
1
)
(
1
1 1
2
2
1
2
2
−
−
=
−
−
=
=
= n
x
n
x
x
x
n
s
n
i
i
n
i
i
57
58. Calculation of Variance
6 -9.85 97.0225 36
9 -6.85 46.9225 81
10 -5.85 34.2225 100
12 -3.85 14.8225 144
13 -2.85 8.1225 169
14 -1.85 3.4225 196
14 -1.85 3.4225 196
15 -0.85 0.7225 225
16 0.15 0.0225 256
16 0.15 0.0225 256
16 0.15 0.0225 256
17 1.15 1.3225 289
17 1.15 1.3225 289
18 2.15 4.6225 324
18 2.15 4.6225 324
19 3.15 9.9225 361
20 4.15 17.2225 400
21 5.15 26.5225 441
22 6.15 37.8225 484
24 8.15 66.4225 576
317 0 378.5500 5403
( ) 2
2
i
i
i
i x
x
x
x
x
x −
−
( )
( ) ( )
46
.
4
923684
.
19
923684
.
19
19
55
.
378
19
45
.
5024
5403
1
20
)
85
.
15
(
20
5403
1
923684
.
19
19
55
.
378
)
1
20
(
55
.
378
1
)
(
2
2
2
1
2
2
1
2
2
=
=
=
=
=
−
=
−
−
=
−
−
=
=
=
−
=
−
−
=
=
=
s
s
n
x
n
x
s
or
n
x
x
s
i
n
i
i
n
i
i
58
59. Coefficients of variation
• The standard deviation is an absolute measure of variation.
One of the relative measure of variation is the coefficient
of variation (CV).
• Coefficient of variation is used to compare the variability of
two or more than two different series of data sets.
• Coefficient of variation is the ratio of the standard deviation
to the arithmetic mean, usually expressed in percent.
• A distribution having less coefficient of variation is said to
be less variable or more consistent or more uniform with
respect to the variability of the two data sets.
%
100
%
100
.
=
=
Mean
d
S
CV
59
60. Coefficients of variation (Example)
• Suppose you are offered the opportunity to
invest money in one of the two projects listed
below. Both projects involve risk. Which
project is financially attractive to you?
• Solution
Project Projected return(%)
Mean Standard deviation
A
B
7.6 3.1
6.8 2.5
%
76
.
36
%
100
8
.
6
5
.
2
%
100
%
11
.
42
%
100
6
.
7
2
.
3
%
100
=
=
=
=
=
=
B
A
CV
CV
60
64. 3.3 Skewness and Kurtosis
⚫ Skewness
– Measure of asymmetry of a frequency distribution
• Skewed to left
• Symmetric or unskewed
• Skewed to right
⚫ Kurtosis
– Measure of flatness or peakedness of a frequency
distribution
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
64
71. Properties of the mean
• The mean of a numerical data can be computed & it is
unique.
• The mean is highly affected by an extreme values or
outliers.
• If we add (subtract) a constant to (from) each value of a
data set, the mean will increase (decrease) by that value.
• If we multiply each value of a data set by a constant, then
the mean will be multiplied by that constant.
• The sum of the deviation of a given data set of observations
from the mean is zero.
• The sum of the squares of deviations of the given data set
of observation is minimum when taken from the mean
71
72. Properties of the standard deviation
• If we add (subtract) a constant to (from) each
value of a data set, the standard deviation will
not be affected.
• If we multiply each value of a data set by a
constant, then the standard deviation will be
multiplied by that constant.
72