SlideShare a Scribd company logo
1. Introduction to Statistics
1.1 What is Statistics?
1.2 Classification of Statistics
1.3 Some Basic terms
1.4 Types of variables and measurement of
scales
1
LEARNING OBJECTIVES
⚫ Distinguish between qualitative data
and quantitative data.
⚫ Describe the difference between
population and sample.
⚫ Describe nominal, ordinal, interval, and
ratio scales of measurements.
After studying this chapter, the participants should be able to:
2
1.1 What is statistics?
⚫ In more common usage statistics refers to numerical
facts. Example: daily income of labour workers, number
of products produced. Number of students enrolled in a
university every year, number of graduate students from
universities of a country. (In plural sense)
⚫ Statistics: is a method and procedure which deals with
collecting, organizing, summarizing, analyzing and
interpreting of data based on the analysis of the data.
(Singular sense)
⚫ Statistics is a science that helps us make better decisions in
business and economics as well as in engineering fields.
3
Importance of statistics
⚫ Statistics teaches us how to summarize, analyze, and draw
meaningful inferences from data that then lead to improve
decisions or policy.
• In business, industry and management: The commitment of
large amount of capital and human resource business became
very complex and competitive and there came a direct need
for sound decision.
• Any decision or policy becomes successful if the forecasts is
made at time of decision prove to be accurate. The degree of
accuracy depends upon the proper analysis of the past data. It
is here that statistical data and statistical tools of average,
variation, test of significance, forecasting techniques, time
series and soon played an important role.
4
Importance of Statistics in (Example)
• Importance of statistics :
✓Statistics is used in establishing the relationship
between
➢Input and output analysis.
➢Cost benefit analysis.
✓Resource allocation.
✓National income in accounting.
✓Population survey.
✓Measurements of profit, wage, and inventory
management, and so on.
✓The quality of products
5
Limitation of Statistics
✓Deals only with quantitative phenomenon
✓Statistics do not deal with individual
✓If one is not aware of statistical methods he can
not make the best possible use of available data.
6
1.2 Classification of Statistics
⚫ Descriptive Statistics
✓ Collection of data
✓ Summarization of data
✓ Presentation of data
✓ Analysis of data
⚫ Inferential Statistics
✓ Predict and forecast values
of population parameters
✓ Test hypotheses about
values of population
parameters
✓ Make decisions
7
1.3 Some Basic terms
•Population: is the aggregate of all elements,
individuals, items, objects whose
characteristic are being studied. The
population that is being studied is also
called target population.
• Sample: is a finite subset of the population
selected from it with the objective of
investigating its properties.
8
Some basic terms (cont)
• Sampling: is a tool which enables us to draw
conclusion about the characteristic of the
population after studying only those objects
or items that are included in the sample.
• Survey: is the collection of information from the
elements of a population or sample.
• Census: is the survey that includes every element of
the target population, although it is expensive and
time consuming.
9
Some basic terms (cont)
Census of a population may be:
✓ Impossible
✓ Impractical
✓ Too costly
• Sample survey: is a survey that is conducted on a
sample.
• Parameter: is statistical constant obtained from the
population data set.
• Statistic: is a statistical constants obtained from the
sample data.
10
Classification of Data
• Data can be classified on the bases of time or place of
happening , if the data are recorded for series of time
periods or different places.
• Depending on the nature of the variable or criterion used
for classification, four types of classifications can be
identified.
a) Chronological classifications: - the formation of groups is
based on the time or
sequence of happening.
11
Classification of Data
b) Geographical classifications: - the formation of groups is based on the
place of happening or location difference.
c) Qualitative Classification: - the formation of groups or categories is
based on some attribute or qualitative characteristics of the
population. An attribute is a qualitative feature of the
population units which cannot be measured or counted.
d) Quantitative classifications: - the formation of groups or categories is
based on quantitative (or numerical) characteristics of the population.
Such numerical characteristic that takes varying values for the different
elements of the population is called variable.
12
Types of variables
• Discrete variable: is a variable that assume finite or
countable infinite values such as 0, 1, 2, 3, 4, …..
Example: Number of children in a family,
Number of goal scored in a football match,
Number of times a machine fails per day in a factory, etc
• Continuous variable: is a random variable that assumes infinitely
many values in a range of values it attains.
Example: Weight, height of children
Length of time in a telephone call you make
Amount of water you drink per day, etc
13
Measurement scales
Variables differ in “how well” they can be measured, i.e., how
much measurable information their measurement scale can
provide. Variables are classified as:
•Nominal Scale - groups or classes
✓Gender, race, etc.
•Ordinal Scale - order matters
✓Ranks (top ten videos)
•Interval Scale - difference or distance matters – has
arbitrary zero value.
✓Temperatures (0F, 0C)
•Ratio Scale - Ratio matters – has a natural zero value.
✓Salaries
14
2. Method of Data collection &
Presentation
2.1Methods of data collection
✓ Source of data
✓ Method of collection
2.2Methods of Data Presentation
✓ Table
✓ Diagram
✓ Graphical
15
LEARNING OBJECTIVES
• Identify sources of data and collect data
• Create different types of tables and charts that
describe data sets.
After studying this chapter, the participants should be able to:
16
2.1 Method of data collection
There are two types of data:
• Secondary data:- are data obtained from some
other sources (which are already collected) but
can be used for immediate purpose of the
investigator.
Source of secondary data
✓government publications,
✓Journals and reports,
✓publications of research organization,
✓internal records of organizations, etc.
17
Method of data collection (cont.)
• Primary data:-are those data collected directly by the
investigator for the immediate purpose of his
investigation.
Ways of collecting primary data:
✓ Questionnaire:-Ask concerned people to have first hand
information to fill written list of questions
✓ Interview:- One to one communication between the
investigator (data collector) and the respondent.
✓ Experiment and observation:- The investigator may not
question anyone but conduct an experiment him/herself
and record the result or observe a certain phenomenon
and record the happening for him/herself.
18
2.2 Presentation of Data
Major means of presentation of data are:
• Tables
• Graphs
• Diagrams
19
Tables
Table:-is a systematic arrangement of group of data
in rows and columns
A complete table consists of;
✓Self explanatory title;
✓Row and column headings;
✓Row and column totals, where appropriate;
✓Units of measurement of the data;
✓Footnotes;
✓Table number;
✓Source of data;
20
Example of table
21
Frequency Distribution
• Frequency distribution:- is a tabular summary of raw
data formed by listing distinct values of a variable
or the values grouped in a class with their
corresponding frequencies.
• Ungrouped frequency distribution:- a distribution that
shows frequencies of single distinct values of a variable.
22
Frequency Distribution (cont.)
⚫ Dividing data into groups or classes or
intervals
⚫ Groups should be:
✓Mutually exclusive
• Not overlapping - every observation is assigned to only
one group
✓Exhaustive
• Every observation is assigned to a group
✓Equal-width (if possible)
• First or last group may be open-ended
23
Frequency Distribution (cont.)
• Grouped frequency distribution:-a distribution
that has a list of classes or groups of values of a
variable with the corresponding frequencies.
24
Frequency Distribution (cont.)
⚫ Table with two columns listing:
✓Each and every group or class or interval of values
✓Associated frequency of each group
• Number of observations assigned to each group
• Sum of frequencies is number of observations
– N for population
– n for sample
• Class or class interval :- range of values of a
variable used in grouping
• Class limits:- the list and greatest values assigned
to a class called the lower and upper class limits,
respectively.
25
Frequency Distribution (cont.)
⚫ Class boundary: If the classes are inclusive type
there is always a gap or difference between
successive lasses. To avoid such gaps we form a
class boundary by using the relation.
LCB =LCL - d/2 and UCB=UCL + d/2
where d is the gap between successive classes
⚫ Class midpoint (Class mark) is the middle value of a
class interval an a representative of the
corresponding class
⚫ Relative frequency is the ratio of the frequency a
cass to the total observations.
Sum of relative frequencies = 1
26
Frequency Distribution (cont.)
• Example of relative frequency of the 1st class: 4/55 = 0.07
• Sum of relative frequencies = 1
27
Frequency Distribution (Cont.)
The cumulative frequency of each class /group is the sum of the
frequencies of that and all preceding classes/groups.
28
Histogram
⚫ A histogram is a chart made of bars of
different heights.
✓Widths and locations of bars correspond to widths
and locations of data groupings
✓Heights of bars correspond to frequencies or
relative frequencies of data groupings
✓There is no gap between successive rectangular
bars.
29
Histogram example
30
Histogram example
Relative frequency histogram
31
Methods of Displaying Data
⚫ Pie Charts
✓Categories represented as percentages of total
⚫ Bar Graphs
✓Heights of rectangles represent group frequencies
⚫ Frequency Polygons
✓Height of line represents frequency
⚫ Ogives
✓Height of line represents cumulative frequency
⚫ Time Plots
✓Represents values over time
32
Pie Chart
central angle
=rel. fre.(3600)
33
Bar Chart
C4
1Q
4Q
3Q
2Q
1Q
1.5
1.2
0.9
0.6
0.3
0.0
Figure 1-11: SHIFTING GEARS
2003 2004
Quartely net income for General Motors (in billions)
34
Frequency Polygon
Frequency Polygon
35
3. Summary Measures
3.1 Measures of Central
Tendency
✓ Median
✓ Mean
✓ Mode
3.2 Measures of Variability
✓ Range
✓ Interquartile range
✓ Variance
✓ Standard Deviation
✓ Coefficient of variation
3.3 Other summary
measures:
✓ Skewness
✓ Kurtosis
36
LEARNING OBJECTIVE
⚫ Calculate and interpret percentiles and
quartiles.
⚫ Explain measures of central tendency and how
to compute them.
⚫ Explain measures of variations and how to
compute them.
After studying this chapter, the participants should be able to:
37
3.1 Measures of Central Tendency
or Location
• Median
➢ Middle value when sorted in order of magnitude
➢ 50th percentile
➢ 2nd quartile
• Mode
➢ Most frequent occurring value
• Mean
➢ Average
38
Example 3-1
A large department store collects
data on sales made by each of its
salesperson. The number of sales
made on a given day by each of 20
salesperson is shown on the next
slide. Also, the data has been
sorted in magnitude.
39
Example 3-1 (Continued) - Sales and
Sorted Sales
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
40
Example 3-1 (Continued) Percentiles
⚫ Find the 50th, 80th, and the 90th percentiles of this
data set.
⚫ To find the 50th percentile, determine the data point
in position
(n + 1)P/100 = (20 + 1)(50/100)=10.5
⚫ Thus, the percentile is located at the 10.5th position.
⚫ The 10th observation is 16, and the 11th observation
is also 16.
⚫ The 50th percentile will lie halfway between the 10th
and 11th values (which are both 16 and
(16 +16)(1/2) in this case) and is thus equals 16.
41
Example 3-1 (Continued) Percentiles
⚫ To find the 80th percentile, determine the data
point in position
(n + 1)P/100 = (20 + 1)(80/100) = 16.8
⚫ Thus, the percentile is located at the 16.8th
position.
⚫ The 16th observation is 19, and the 17th
observation is also 20.
⚫ The 80th percentile is a point lying 0.8 of the
way from 19 to 20 and is thus 19.8
{[19 +(0.8)(20-19)]=19.8}
42
Quartiles – Special Percentiles
⚫ Quartiles are the percentage points that
break down the ordered data set into
quarters.
⚫ The first quartile is the 25th percentile. It is
the point below which lie 1/4 of the data.
⚫ The second quartile is the 50th percentile.
It is the point below which lie 1/2 of the
data. This is also called the median.
⚫ The third quartile is the 75th percentile. It
is the point below which lie 3/4 of the data.
43
Quartiles and Interquartile Range
⚫ The first quartile, Q1, (25th percentile) is
often called the lower quartile.
⚫ The second quartile, Q2, (50th
percentile) is often called the median
or the middle quartile.
⚫ The third quartile, Q3, (75th percentile)
is often called the upper quartile.
⚫ The interquartile range is the difference
between the first and the third quartiles.
44
Example 3-1: Finding Quartiles
Sorted
Sales Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
First Quartile
Third Quartile
Second Quartile
(n+1)P/100
Position
(20+1)25/100=5.25
(20+1)50/100=10.5
(20+1)75/100=15.75
Quartiles
13 + (0.25)(1) = 13.25
16 + (0.5)0) = 16
18 + (0.75)(1) = 18.75
45
Example – Median (Data is used from
Example 3-1)
Sales Sorted Sales
9 6
6 9
12 10
10 12
13 13
15 14
16 14
14 15
14 16
16 16
17 16
16 17
24 17
21 18
22 18
18 19
19 20
18 21
20 22
17 24
Median
Median
50th Percentile
(20+1)50/100=10.5 16 + (.5)(0) = 16
The median is the middle
value of data sorted in
order of magnitude. It is
the 50th percentile.
46
Example - Mode (Data is used from
Example 3-1)
.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
The mode is the most frequently
occurring value. It is the value with the
highest frequency.
Mode = 16
47
Arithmetic Mean or Average
• The arithmetic mean or mean of a set of
observations is their average - the sum of the
observed values divided by the number of
observations.
Population Mean

=
=
n
i
i
x
n
x
1
1

=
=
N
i
i
x
N 1
1

Sample Mean
48
Example – Mean (Data is used from
Example 3-1)
Sales
9
6
12
10
13
15
16
14
14
16
17
16
24
21
22
18
19
18
20
17_
317
Mean is a computed average
85
.
15
)
317
(
20
1
1 20
1
=
=
= 
=
i
i
x
n
x
49
Mean for ungrouped frequency
distribution
• The following is the frequency distribution of
the number of telephone call received in 245
successive one minute interval at an
exchange.
Obtain the mean number of calls per minute?
• Solution
50
# of calls (xi) 0 1 2 3 4 5 6 7
Frequency(fi) 14 21 25 43 51 40 39 12
765
.
3
245
922
12
39
40
51
43
25
21
14
)
12
(
7
)
39
(
6
)
40
(
5
)
51
(
4
)
43
(
3
)
25
(
2
)
21
(
1
)
14
(
0
1
1
=
=
+
+
+
+
+
+
+
+
+
+
+
+
+
+
=
=


=
=
n
i
i
n
i
i
i
f
f
x
x
Weighted Mean
• If all values in the data set have no equal
importance we use weighted mean.
• If a data set have weights
respectively, then the weighted mean is given
by :
n
n
n
W
w
w
w
w
x
w
x
w
x
x
+
+
+
+
+
+
=


2
1
2
2
1
1
n
x
x
x ,
,
, 2
1  n
w
w
w ,
,
, 2
1 
51
Example on weighted average
• If an instructor counts the final examination in
a course four times as much as each 1-hour
examination, what is the weighted average
grade of a student who obtained grades of 69,
75, 56, and 72 in 1-hour examinations and a
final examination grade of 78?
• Solution
52
73
8
584
4
1
1
1
1
)
4
(
78
)
1
(
72
)
1
(
56
)
1
(
75
)
1
(
69
2
1
2
2
1
1
=
=
+
+
+
+
+
+
+
+
=
+
+
+
+
+
+
=
n
n
n
W
w
w
w
w
x
w
x
w
x
x


Example - Mode (Data is used from
Example 3-1)
Mean = 15.85
Median and Mode = 16
Mean < Median
.
. . . . . : . : : : . . . . .
---------------------------------------------------------------
6 9 10 12 13 14 15 16 17 18 19 20 21 22 24
53
3-2 Measures of Variability or Dispersion
⚫ Range
✓Difference between maximum and minimum
values
⚫ Interquartile Range
✓Difference between third and first quartile(Q3 - Q1)
⚫ Variance
✓Average*of the squared deviations from the mean
⚫ Standard Deviation
✓Square root of the variance
54
Example - Range and Interquartile
Range (Data is used from Example 3-1)
Sorted
Sales Sales Rank
9 6 1
6 9 2
12 10 3
10 12 4
13 13 5
15 14 6
16 14 7
14 15 8
14 16 9
16 16 10
17 16 11
16 17 12
24 17 13
21 18 14
22 18 15
18 19 16
19 20 17
18 21 18
20 22 19
17 24 20
Range = Maximum - Minimum
= 24 - 6 = 18
Q1 = 13 + (.25)(1) = 13.25
Q2 = Median =P50
Q3 = 18+ (.75)(1) = 18.75
Interquartile Range(IQR)= Q3 - Q1
=18.75 - 13.25 = 5.5
Minimum
Maximum
First quartile
Third quartile
55
Variance and Standard Deviation
• The variance is one of the most useful
measure of variation defined as the mean of
the square of the deviation of the given
observations from their mean.
➢For population data the variance ( ) is
where
2

2
1 1
2
2
2 1
)
(
1


 −
=
−
=  
= =
N
i
N
i
i
i x
N
x
N

=
=
N
i
i
x
N 1
1

56
Variance and Standard Deviation
➢For sample data the variance ( S2 ) is
• The positive square root of the variance is called the
standard deviation.
• The value of the standard deviation tells us how closely
the values of a data set are clustered around the mean.
• If the standard deviation is small (large), then the
values of the data set are spread over relatively smaller
(larger) range around the mean.
1
)
(
1
1 1
2
2
1
2
2
−
−
=
−
−
=

 =
= n
x
n
x
x
x
n
s
n
i
i
n
i
i
57
Calculation of Variance
6 -9.85 97.0225 36
9 -6.85 46.9225 81
10 -5.85 34.2225 100
12 -3.85 14.8225 144
13 -2.85 8.1225 169
14 -1.85 3.4225 196
14 -1.85 3.4225 196
15 -0.85 0.7225 225
16 0.15 0.0225 256
16 0.15 0.0225 256
16 0.15 0.0225 256
17 1.15 1.3225 289
17 1.15 1.3225 289
18 2.15 4.6225 324
18 2.15 4.6225 324
19 3.15 9.9225 361
20 4.15 17.2225 400
21 5.15 26.5225 441
22 6.15 37.8225 484
24 8.15 66.4225 576
317 0 378.5500 5403
( ) 2
2
i
i
i
i x
x
x
x
x
x −
−
( )
( ) ( )
46
.
4
923684
.
19
923684
.
19
19
55
.
378
19
45
.
5024
5403
1
20
)
85
.
15
(
20
5403
1
923684
.
19
19
55
.
378
)
1
20
(
55
.
378
1
)
(
2
2
2
1
2
2
1
2
2
=
=
=

=
=
−
=
−

−
=
−
−
=
=
=
−
=
−
−
=


=
=
s
s
n
x
n
x
s
or
n
x
x
s
i
n
i
i
n
i
i
58
Coefficients of variation
• The standard deviation is an absolute measure of variation.
One of the relative measure of variation is the coefficient
of variation (CV).
• Coefficient of variation is used to compare the variability of
two or more than two different series of data sets.
• Coefficient of variation is the ratio of the standard deviation
to the arithmetic mean, usually expressed in percent.
• A distribution having less coefficient of variation is said to
be less variable or more consistent or more uniform with
respect to the variability of the two data sets.
%
100
%
100
.

=

=


Mean
d
S
CV
59
Coefficients of variation (Example)
• Suppose you are offered the opportunity to
invest money in one of the two projects listed
below. Both projects involve risk. Which
project is financially attractive to you?
• Solution
Project Projected return(%)
Mean Standard deviation
A
B
7.6 3.1
6.8 2.5
%
76
.
36
%
100
8
.
6
5
.
2
%
100
%
11
.
42
%
100
6
.
7
2
.
3
%
100
=

=

=
=

=

=




B
A
CV
CV
60
61
3.3 Z-score or standard score
62
63
3.3 Skewness and Kurtosis
⚫ Skewness
– Measure of asymmetry of a frequency distribution
• Skewed to left
• Symmetric or unskewed
• Skewed to right
⚫ Kurtosis
– Measure of flatness or peakedness of a frequency
distribution
• Platykurtic (relatively flat)
• Mesokurtic (normal)
• Leptokurtic (relatively peaked)
64
Skewness
• Skewed to right
65
Skewness
• Skewed to left
66
Skewness
• Symmetric
67
Kurtosis
• Platykurtic - flat distribution
68
Kurtosis
• Mesokurtic - not too flat and not too peaked
69
kurtosis
• Leptokurtic - peaked distribution
70
Properties of the mean
• The mean of a numerical data can be computed & it is
unique.
• The mean is highly affected by an extreme values or
outliers.
• If we add (subtract) a constant to (from) each value of a
data set, the mean will increase (decrease) by that value.
• If we multiply each value of a data set by a constant, then
the mean will be multiplied by that constant.
• The sum of the deviation of a given data set of observations
from the mean is zero.
• The sum of the squares of deviations of the given data set
of observation is minimum when taken from the mean
71
Properties of the standard deviation
• If we add (subtract) a constant to (from) each
value of a data set, the standard deviation will
not be affected.
• If we multiply each value of a data set by a
constant, then the standard deviation will be
multiplied by that constant.
72

More Related Content

PPTX
CHAPONE edited Stat.pptx
PDF
CHAPTER 1.pdf Probability and Statistics for Engineers
PDF
CHAPTER 1.pdfProbability and Statistics for Engineers
PPT
Introduction To Statistics.ppt
PPT
Introduction to statistics
PPTX
Medical Statistics.pptx
PPTX
statistics chp 1&2.pptx statistics in veterinary
PPT
Introduction-To-Statistics-18032022-010747pm (1).ppt
CHAPONE edited Stat.pptx
CHAPTER 1.pdf Probability and Statistics for Engineers
CHAPTER 1.pdfProbability and Statistics for Engineers
Introduction To Statistics.ppt
Introduction to statistics
Medical Statistics.pptx
statistics chp 1&2.pptx statistics in veterinary
Introduction-To-Statistics-18032022-010747pm (1).ppt

Similar to Distinguish between qualitative data and quantitative data. (20)

PPTX
Stat-Lesson.pptx
PPTX
1. Introduction To Statistics in computing.pptx
PPT
Chapter 1
PPT
Manpreet kay bhatia Business Statistics.ppt
PDF
INTRO to STATISTICAL THEORY.pdf
PPTX
Biostatistics ppt itroductionchapter 1.pptx
PDF
Introduction.pdf
PPTX
Meaning and Importance of Statistics
PPT
Chapter 1: Statistics
PPTX
Biostatistics ppt
PPTX
Short notes on Statistics PPT
PPTX
Topic 1 ELEMENTARY STATISTICS.pptx
PPT
Statistics.ppt
PPT
Chapter 1 A.pptkgcludkyfo6r6idi5dumtdyrsys4y
PPT
NOTES1.ppt
PPTX
Module 0. REVIEW ON STATISTICS POWERPOINT
PDF
PPTX
Chapter_1_Lecture.pptx
PPTX
Statistical techniques for interpreting and reporting quantitative data i
Stat-Lesson.pptx
1. Introduction To Statistics in computing.pptx
Chapter 1
Manpreet kay bhatia Business Statistics.ppt
INTRO to STATISTICAL THEORY.pdf
Biostatistics ppt itroductionchapter 1.pptx
Introduction.pdf
Meaning and Importance of Statistics
Chapter 1: Statistics
Biostatistics ppt
Short notes on Statistics PPT
Topic 1 ELEMENTARY STATISTICS.pptx
Statistics.ppt
Chapter 1 A.pptkgcludkyfo6r6idi5dumtdyrsys4y
NOTES1.ppt
Module 0. REVIEW ON STATISTICS POWERPOINT
Chapter_1_Lecture.pptx
Statistical techniques for interpreting and reporting quantitative data i
Ad

Recently uploaded (20)

PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
PDF
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PPTX
Introduction to pro and eukaryotes and differences.pptx
PPTX
Virtual and Augmented Reality in Current Scenario
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
IGGE1 Understanding the Self1234567891011
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Indian roads congress 037 - 2012 Flexible pavement
PPTX
B.Sc. DS Unit 2 Software Engineering.pptx
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Weekly quiz Compilation Jan -July 25.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper
CISA (Certified Information Systems Auditor) Domain-Wise Summary.pdf
Computer Architecture Input Output Memory.pptx
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
Introduction to pro and eukaryotes and differences.pptx
Virtual and Augmented Reality in Current Scenario
AI-driven educational solutions for real-life interventions in the Philippine...
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
IGGE1 Understanding the Self1234567891011
What if we spent less time fighting change, and more time building what’s rig...
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
Computing-Curriculum for Schools in Ghana
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
Practical Manual AGRO-233 Principles and Practices of Natural Farming
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Indian roads congress 037 - 2012 Flexible pavement
B.Sc. DS Unit 2 Software Engineering.pptx
Ad

Distinguish between qualitative data and quantitative data.

  • 1. 1. Introduction to Statistics 1.1 What is Statistics? 1.2 Classification of Statistics 1.3 Some Basic terms 1.4 Types of variables and measurement of scales 1
  • 2. LEARNING OBJECTIVES ⚫ Distinguish between qualitative data and quantitative data. ⚫ Describe the difference between population and sample. ⚫ Describe nominal, ordinal, interval, and ratio scales of measurements. After studying this chapter, the participants should be able to: 2
  • 3. 1.1 What is statistics? ⚫ In more common usage statistics refers to numerical facts. Example: daily income of labour workers, number of products produced. Number of students enrolled in a university every year, number of graduate students from universities of a country. (In plural sense) ⚫ Statistics: is a method and procedure which deals with collecting, organizing, summarizing, analyzing and interpreting of data based on the analysis of the data. (Singular sense) ⚫ Statistics is a science that helps us make better decisions in business and economics as well as in engineering fields. 3
  • 4. Importance of statistics ⚫ Statistics teaches us how to summarize, analyze, and draw meaningful inferences from data that then lead to improve decisions or policy. • In business, industry and management: The commitment of large amount of capital and human resource business became very complex and competitive and there came a direct need for sound decision. • Any decision or policy becomes successful if the forecasts is made at time of decision prove to be accurate. The degree of accuracy depends upon the proper analysis of the past data. It is here that statistical data and statistical tools of average, variation, test of significance, forecasting techniques, time series and soon played an important role. 4
  • 5. Importance of Statistics in (Example) • Importance of statistics : ✓Statistics is used in establishing the relationship between ➢Input and output analysis. ➢Cost benefit analysis. ✓Resource allocation. ✓National income in accounting. ✓Population survey. ✓Measurements of profit, wage, and inventory management, and so on. ✓The quality of products 5
  • 6. Limitation of Statistics ✓Deals only with quantitative phenomenon ✓Statistics do not deal with individual ✓If one is not aware of statistical methods he can not make the best possible use of available data. 6
  • 7. 1.2 Classification of Statistics ⚫ Descriptive Statistics ✓ Collection of data ✓ Summarization of data ✓ Presentation of data ✓ Analysis of data ⚫ Inferential Statistics ✓ Predict and forecast values of population parameters ✓ Test hypotheses about values of population parameters ✓ Make decisions 7
  • 8. 1.3 Some Basic terms •Population: is the aggregate of all elements, individuals, items, objects whose characteristic are being studied. The population that is being studied is also called target population. • Sample: is a finite subset of the population selected from it with the objective of investigating its properties. 8
  • 9. Some basic terms (cont) • Sampling: is a tool which enables us to draw conclusion about the characteristic of the population after studying only those objects or items that are included in the sample. • Survey: is the collection of information from the elements of a population or sample. • Census: is the survey that includes every element of the target population, although it is expensive and time consuming. 9
  • 10. Some basic terms (cont) Census of a population may be: ✓ Impossible ✓ Impractical ✓ Too costly • Sample survey: is a survey that is conducted on a sample. • Parameter: is statistical constant obtained from the population data set. • Statistic: is a statistical constants obtained from the sample data. 10
  • 11. Classification of Data • Data can be classified on the bases of time or place of happening , if the data are recorded for series of time periods or different places. • Depending on the nature of the variable or criterion used for classification, four types of classifications can be identified. a) Chronological classifications: - the formation of groups is based on the time or sequence of happening. 11
  • 12. Classification of Data b) Geographical classifications: - the formation of groups is based on the place of happening or location difference. c) Qualitative Classification: - the formation of groups or categories is based on some attribute or qualitative characteristics of the population. An attribute is a qualitative feature of the population units which cannot be measured or counted. d) Quantitative classifications: - the formation of groups or categories is based on quantitative (or numerical) characteristics of the population. Such numerical characteristic that takes varying values for the different elements of the population is called variable. 12
  • 13. Types of variables • Discrete variable: is a variable that assume finite or countable infinite values such as 0, 1, 2, 3, 4, ….. Example: Number of children in a family, Number of goal scored in a football match, Number of times a machine fails per day in a factory, etc • Continuous variable: is a random variable that assumes infinitely many values in a range of values it attains. Example: Weight, height of children Length of time in a telephone call you make Amount of water you drink per day, etc 13
  • 14. Measurement scales Variables differ in “how well” they can be measured, i.e., how much measurable information their measurement scale can provide. Variables are classified as: •Nominal Scale - groups or classes ✓Gender, race, etc. •Ordinal Scale - order matters ✓Ranks (top ten videos) •Interval Scale - difference or distance matters – has arbitrary zero value. ✓Temperatures (0F, 0C) •Ratio Scale - Ratio matters – has a natural zero value. ✓Salaries 14
  • 15. 2. Method of Data collection & Presentation 2.1Methods of data collection ✓ Source of data ✓ Method of collection 2.2Methods of Data Presentation ✓ Table ✓ Diagram ✓ Graphical 15
  • 16. LEARNING OBJECTIVES • Identify sources of data and collect data • Create different types of tables and charts that describe data sets. After studying this chapter, the participants should be able to: 16
  • 17. 2.1 Method of data collection There are two types of data: • Secondary data:- are data obtained from some other sources (which are already collected) but can be used for immediate purpose of the investigator. Source of secondary data ✓government publications, ✓Journals and reports, ✓publications of research organization, ✓internal records of organizations, etc. 17
  • 18. Method of data collection (cont.) • Primary data:-are those data collected directly by the investigator for the immediate purpose of his investigation. Ways of collecting primary data: ✓ Questionnaire:-Ask concerned people to have first hand information to fill written list of questions ✓ Interview:- One to one communication between the investigator (data collector) and the respondent. ✓ Experiment and observation:- The investigator may not question anyone but conduct an experiment him/herself and record the result or observe a certain phenomenon and record the happening for him/herself. 18
  • 19. 2.2 Presentation of Data Major means of presentation of data are: • Tables • Graphs • Diagrams 19
  • 20. Tables Table:-is a systematic arrangement of group of data in rows and columns A complete table consists of; ✓Self explanatory title; ✓Row and column headings; ✓Row and column totals, where appropriate; ✓Units of measurement of the data; ✓Footnotes; ✓Table number; ✓Source of data; 20
  • 22. Frequency Distribution • Frequency distribution:- is a tabular summary of raw data formed by listing distinct values of a variable or the values grouped in a class with their corresponding frequencies. • Ungrouped frequency distribution:- a distribution that shows frequencies of single distinct values of a variable. 22
  • 23. Frequency Distribution (cont.) ⚫ Dividing data into groups or classes or intervals ⚫ Groups should be: ✓Mutually exclusive • Not overlapping - every observation is assigned to only one group ✓Exhaustive • Every observation is assigned to a group ✓Equal-width (if possible) • First or last group may be open-ended 23
  • 24. Frequency Distribution (cont.) • Grouped frequency distribution:-a distribution that has a list of classes or groups of values of a variable with the corresponding frequencies. 24
  • 25. Frequency Distribution (cont.) ⚫ Table with two columns listing: ✓Each and every group or class or interval of values ✓Associated frequency of each group • Number of observations assigned to each group • Sum of frequencies is number of observations – N for population – n for sample • Class or class interval :- range of values of a variable used in grouping • Class limits:- the list and greatest values assigned to a class called the lower and upper class limits, respectively. 25
  • 26. Frequency Distribution (cont.) ⚫ Class boundary: If the classes are inclusive type there is always a gap or difference between successive lasses. To avoid such gaps we form a class boundary by using the relation. LCB =LCL - d/2 and UCB=UCL + d/2 where d is the gap between successive classes ⚫ Class midpoint (Class mark) is the middle value of a class interval an a representative of the corresponding class ⚫ Relative frequency is the ratio of the frequency a cass to the total observations. Sum of relative frequencies = 1 26
  • 27. Frequency Distribution (cont.) • Example of relative frequency of the 1st class: 4/55 = 0.07 • Sum of relative frequencies = 1 27
  • 28. Frequency Distribution (Cont.) The cumulative frequency of each class /group is the sum of the frequencies of that and all preceding classes/groups. 28
  • 29. Histogram ⚫ A histogram is a chart made of bars of different heights. ✓Widths and locations of bars correspond to widths and locations of data groupings ✓Heights of bars correspond to frequencies or relative frequencies of data groupings ✓There is no gap between successive rectangular bars. 29
  • 32. Methods of Displaying Data ⚫ Pie Charts ✓Categories represented as percentages of total ⚫ Bar Graphs ✓Heights of rectangles represent group frequencies ⚫ Frequency Polygons ✓Height of line represents frequency ⚫ Ogives ✓Height of line represents cumulative frequency ⚫ Time Plots ✓Represents values over time 32
  • 34. Bar Chart C4 1Q 4Q 3Q 2Q 1Q 1.5 1.2 0.9 0.6 0.3 0.0 Figure 1-11: SHIFTING GEARS 2003 2004 Quartely net income for General Motors (in billions) 34
  • 36. 3. Summary Measures 3.1 Measures of Central Tendency ✓ Median ✓ Mean ✓ Mode 3.2 Measures of Variability ✓ Range ✓ Interquartile range ✓ Variance ✓ Standard Deviation ✓ Coefficient of variation 3.3 Other summary measures: ✓ Skewness ✓ Kurtosis 36
  • 37. LEARNING OBJECTIVE ⚫ Calculate and interpret percentiles and quartiles. ⚫ Explain measures of central tendency and how to compute them. ⚫ Explain measures of variations and how to compute them. After studying this chapter, the participants should be able to: 37
  • 38. 3.1 Measures of Central Tendency or Location • Median ➢ Middle value when sorted in order of magnitude ➢ 50th percentile ➢ 2nd quartile • Mode ➢ Most frequent occurring value • Mean ➢ Average 38
  • 39. Example 3-1 A large department store collects data on sales made by each of its salesperson. The number of sales made on a given day by each of 20 salesperson is shown on the next slide. Also, the data has been sorted in magnitude. 39
  • 40. Example 3-1 (Continued) - Sales and Sorted Sales Sales Sorted Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24 40
  • 41. Example 3-1 (Continued) Percentiles ⚫ Find the 50th, 80th, and the 90th percentiles of this data set. ⚫ To find the 50th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(50/100)=10.5 ⚫ Thus, the percentile is located at the 10.5th position. ⚫ The 10th observation is 16, and the 11th observation is also 16. ⚫ The 50th percentile will lie halfway between the 10th and 11th values (which are both 16 and (16 +16)(1/2) in this case) and is thus equals 16. 41
  • 42. Example 3-1 (Continued) Percentiles ⚫ To find the 80th percentile, determine the data point in position (n + 1)P/100 = (20 + 1)(80/100) = 16.8 ⚫ Thus, the percentile is located at the 16.8th position. ⚫ The 16th observation is 19, and the 17th observation is also 20. ⚫ The 80th percentile is a point lying 0.8 of the way from 19 to 20 and is thus 19.8 {[19 +(0.8)(20-19)]=19.8} 42
  • 43. Quartiles – Special Percentiles ⚫ Quartiles are the percentage points that break down the ordered data set into quarters. ⚫ The first quartile is the 25th percentile. It is the point below which lie 1/4 of the data. ⚫ The second quartile is the 50th percentile. It is the point below which lie 1/2 of the data. This is also called the median. ⚫ The third quartile is the 75th percentile. It is the point below which lie 3/4 of the data. 43
  • 44. Quartiles and Interquartile Range ⚫ The first quartile, Q1, (25th percentile) is often called the lower quartile. ⚫ The second quartile, Q2, (50th percentile) is often called the median or the middle quartile. ⚫ The third quartile, Q3, (75th percentile) is often called the upper quartile. ⚫ The interquartile range is the difference between the first and the third quartiles. 44
  • 45. Example 3-1: Finding Quartiles Sorted Sales Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24 First Quartile Third Quartile Second Quartile (n+1)P/100 Position (20+1)25/100=5.25 (20+1)50/100=10.5 (20+1)75/100=15.75 Quartiles 13 + (0.25)(1) = 13.25 16 + (0.5)0) = 16 18 + (0.75)(1) = 18.75 45
  • 46. Example – Median (Data is used from Example 3-1) Sales Sorted Sales 9 6 6 9 12 10 10 12 13 13 15 14 16 14 14 15 14 16 16 16 17 16 16 17 24 17 21 18 22 18 18 19 19 20 18 21 20 22 17 24 Median Median 50th Percentile (20+1)50/100=10.5 16 + (.5)(0) = 16 The median is the middle value of data sorted in order of magnitude. It is the 50th percentile. 46
  • 47. Example - Mode (Data is used from Example 3-1) . . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24 The mode is the most frequently occurring value. It is the value with the highest frequency. Mode = 16 47
  • 48. Arithmetic Mean or Average • The arithmetic mean or mean of a set of observations is their average - the sum of the observed values divided by the number of observations. Population Mean  = = n i i x n x 1 1  = = N i i x N 1 1  Sample Mean 48
  • 49. Example – Mean (Data is used from Example 3-1) Sales 9 6 12 10 13 15 16 14 14 16 17 16 24 21 22 18 19 18 20 17_ 317 Mean is a computed average 85 . 15 ) 317 ( 20 1 1 20 1 = = =  = i i x n x 49
  • 50. Mean for ungrouped frequency distribution • The following is the frequency distribution of the number of telephone call received in 245 successive one minute interval at an exchange. Obtain the mean number of calls per minute? • Solution 50 # of calls (xi) 0 1 2 3 4 5 6 7 Frequency(fi) 14 21 25 43 51 40 39 12 765 . 3 245 922 12 39 40 51 43 25 21 14 ) 12 ( 7 ) 39 ( 6 ) 40 ( 5 ) 51 ( 4 ) 43 ( 3 ) 25 ( 2 ) 21 ( 1 ) 14 ( 0 1 1 = = + + + + + + + + + + + + + + = =   = = n i i n i i i f f x x
  • 51. Weighted Mean • If all values in the data set have no equal importance we use weighted mean. • If a data set have weights respectively, then the weighted mean is given by : n n n W w w w w x w x w x x + + + + + + =   2 1 2 2 1 1 n x x x , , , 2 1  n w w w , , , 2 1  51
  • 52. Example on weighted average • If an instructor counts the final examination in a course four times as much as each 1-hour examination, what is the weighted average grade of a student who obtained grades of 69, 75, 56, and 72 in 1-hour examinations and a final examination grade of 78? • Solution 52 73 8 584 4 1 1 1 1 ) 4 ( 78 ) 1 ( 72 ) 1 ( 56 ) 1 ( 75 ) 1 ( 69 2 1 2 2 1 1 = = + + + + + + + + = + + + + + + = n n n W w w w w x w x w x x  
  • 53. Example - Mode (Data is used from Example 3-1) Mean = 15.85 Median and Mode = 16 Mean < Median . . . . . . : . : : : . . . . . --------------------------------------------------------------- 6 9 10 12 13 14 15 16 17 18 19 20 21 22 24 53
  • 54. 3-2 Measures of Variability or Dispersion ⚫ Range ✓Difference between maximum and minimum values ⚫ Interquartile Range ✓Difference between third and first quartile(Q3 - Q1) ⚫ Variance ✓Average*of the squared deviations from the mean ⚫ Standard Deviation ✓Square root of the variance 54
  • 55. Example - Range and Interquartile Range (Data is used from Example 3-1) Sorted Sales Sales Rank 9 6 1 6 9 2 12 10 3 10 12 4 13 13 5 15 14 6 16 14 7 14 15 8 14 16 9 16 16 10 17 16 11 16 17 12 24 17 13 21 18 14 22 18 15 18 19 16 19 20 17 18 21 18 20 22 19 17 24 20 Range = Maximum - Minimum = 24 - 6 = 18 Q1 = 13 + (.25)(1) = 13.25 Q2 = Median =P50 Q3 = 18+ (.75)(1) = 18.75 Interquartile Range(IQR)= Q3 - Q1 =18.75 - 13.25 = 5.5 Minimum Maximum First quartile Third quartile 55
  • 56. Variance and Standard Deviation • The variance is one of the most useful measure of variation defined as the mean of the square of the deviation of the given observations from their mean. ➢For population data the variance ( ) is where 2  2 1 1 2 2 2 1 ) ( 1    − = − =   = = N i N i i i x N x N  = = N i i x N 1 1  56
  • 57. Variance and Standard Deviation ➢For sample data the variance ( S2 ) is • The positive square root of the variance is called the standard deviation. • The value of the standard deviation tells us how closely the values of a data set are clustered around the mean. • If the standard deviation is small (large), then the values of the data set are spread over relatively smaller (larger) range around the mean. 1 ) ( 1 1 1 2 2 1 2 2 − − = − − =   = = n x n x x x n s n i i n i i 57
  • 58. Calculation of Variance 6 -9.85 97.0225 36 9 -6.85 46.9225 81 10 -5.85 34.2225 100 12 -3.85 14.8225 144 13 -2.85 8.1225 169 14 -1.85 3.4225 196 14 -1.85 3.4225 196 15 -0.85 0.7225 225 16 0.15 0.0225 256 16 0.15 0.0225 256 16 0.15 0.0225 256 17 1.15 1.3225 289 17 1.15 1.3225 289 18 2.15 4.6225 324 18 2.15 4.6225 324 19 3.15 9.9225 361 20 4.15 17.2225 400 21 5.15 26.5225 441 22 6.15 37.8225 484 24 8.15 66.4225 576 317 0 378.5500 5403 ( ) 2 2 i i i i x x x x x x − − ( ) ( ) ( ) 46 . 4 923684 . 19 923684 . 19 19 55 . 378 19 45 . 5024 5403 1 20 ) 85 . 15 ( 20 5403 1 923684 . 19 19 55 . 378 ) 1 20 ( 55 . 378 1 ) ( 2 2 2 1 2 2 1 2 2 = = =  = = − = −  − = − − = = = − = − − =   = = s s n x n x s or n x x s i n i i n i i 58
  • 59. Coefficients of variation • The standard deviation is an absolute measure of variation. One of the relative measure of variation is the coefficient of variation (CV). • Coefficient of variation is used to compare the variability of two or more than two different series of data sets. • Coefficient of variation is the ratio of the standard deviation to the arithmetic mean, usually expressed in percent. • A distribution having less coefficient of variation is said to be less variable or more consistent or more uniform with respect to the variability of the two data sets. % 100 % 100 .  =  =   Mean d S CV 59
  • 60. Coefficients of variation (Example) • Suppose you are offered the opportunity to invest money in one of the two projects listed below. Both projects involve risk. Which project is financially attractive to you? • Solution Project Projected return(%) Mean Standard deviation A B 7.6 3.1 6.8 2.5 % 76 . 36 % 100 8 . 6 5 . 2 % 100 % 11 . 42 % 100 6 . 7 2 . 3 % 100 =  =  = =  =  =     B A CV CV 60
  • 61. 61 3.3 Z-score or standard score
  • 62. 62
  • 63. 63
  • 64. 3.3 Skewness and Kurtosis ⚫ Skewness – Measure of asymmetry of a frequency distribution • Skewed to left • Symmetric or unskewed • Skewed to right ⚫ Kurtosis – Measure of flatness or peakedness of a frequency distribution • Platykurtic (relatively flat) • Mesokurtic (normal) • Leptokurtic (relatively peaked) 64
  • 68. Kurtosis • Platykurtic - flat distribution 68
  • 69. Kurtosis • Mesokurtic - not too flat and not too peaked 69
  • 70. kurtosis • Leptokurtic - peaked distribution 70
  • 71. Properties of the mean • The mean of a numerical data can be computed & it is unique. • The mean is highly affected by an extreme values or outliers. • If we add (subtract) a constant to (from) each value of a data set, the mean will increase (decrease) by that value. • If we multiply each value of a data set by a constant, then the mean will be multiplied by that constant. • The sum of the deviation of a given data set of observations from the mean is zero. • The sum of the squares of deviations of the given data set of observation is minimum when taken from the mean 71
  • 72. Properties of the standard deviation • If we add (subtract) a constant to (from) each value of a data set, the standard deviation will not be affected. • If we multiply each value of a data set by a constant, then the standard deviation will be multiplied by that constant. 72