SlideShare a Scribd company logo
1
Statistics and Research Methods
2
Topics: Statistics
• Descriptive Statistics
• Probability Theory and Probability
Distributions
• Hypothesis Testing
• Confidence Interval
• Analysis Of Variance (ANOVA)
• Regression and Correlation
• Chi-Squared
3
Topics: Research Methods
• Research Design
• Literature Review
• Sampling
• Data Collection Methods
• Sampling
• Ethical Issues in Research Resource
• IT role in research & Formatting
4
Assessment
• One test – 15%
• One Individual Assignment – 10%
• One Group Assignment – 15%
• Final Exam – 60%
• Note: Pass Mark is B (50%)
5
References
References
• Probability and Statistics for Engineering and the Sciences, by
Jay L. Devore, Monterey, California.
• Basic Business Statistics Berenson M.L, Levine D.M, Krehbiel,
T.C
• Research Methodology, Methods and Techniques, by C.R.
Kothari
• Research Methods for Business students, by Mark Saunders,
Philip Lewis and Adrian Thornhill
• Plenty of Websites
6
7
Introduction and Descriptive Statistics
8
The Science of Statistics
• Statistics is the science of data. This involves
collecting, classifying, summarizing, organizing,
analyzing and interpreting numerical information.
Statistics
Descriptiv
e Statistics
Inferential
Statistics
9
Types of Statistical Applications
• Descriptive statistics utilizes numerical and
graphical methods to look for patterns in a data
set, to summarize the information revealed in a
data set and to present that information in a
convenient form.
• Inferential statistics utilizes sample data to make
estimates, decisions, predictions or other
generalizations about a larger set of data.
10
Descriptive Statistics
• Collect data
– e.g. Survey
• Present data
– e.g. Tables and graphs
• Characterize data
– e.g. Sample mean =
i
X
n

Descriptive statistics utilizes numerical and graphical
methods to look for patterns in a data set, to
summarize the information revealed in a data set and
to present that information in a convenient form.
11
Inferential Statistics
• Estimation
– e.g.: Estimate the population mean weight
using the sample mean weight
• Hypothesis testing
– e.g.: Test the claim that the population mean
weight is 120 pounds
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Inferential statistics utilizes sample data to make estimates,
decisions, predictions or other generalizations about a larger
set of data.
12
Fundamental Elements of Statistics
• An experimental unit is an object about which
we collect data.
– Person
– Place
– Thing
– Event
13
Fundamental Elements of Statistics
• An population is a set of units in which we are
interested.
– Typically, there are too many experimental units in
a population to consider every one.
• If we can examine every single one, we conduct a
census.
14
Fundamental Elements of Statistics
• A sample is a subset of the population.
• A variable is a characteristic or property of an
individual unit.
– The values of these characteristics will, not
surprisingly, vary.
– A measure of reliability is a statement about the
degree of uncertainty associated with a statistical
inference. (Based on our analysis, we think 56%
of soda drinkers prefer Pepsi to Coke, ± 5%.)
Fundamental Elements of Statistics
Descriptive Statistics
• The population or sample of
interest
• One or more variables to be
investigated
• Tables, graphs or numerical
summary tools
• Identification of patterns in
the data
Inferential Statistics
• Population of interest
• One or more variables to be
investigated
• The sample of population
units
• The inference about the
population based on the
sample data
• A measure of reliability of
the inference
15
Types of Data
• Quantitative Data are measurements that are
recorded on a naturally occurring numerical
scale. e.g. Age, GPA, Salary, Cost of books this
semester
• Categorical (Qualitative) Data are measurements
that cannot be recorded on a natural numerical
scale, but are recorded in categories e.g. Live
on/off campus, Major, Gender
16
17
Methods for Describing Sets of Data
18
Data Presentation
Ordered
Array
Ogive
Polygon
Histo-
gram
Frequency
Distributions
Numerical
Data
Stem-&-Leaf
Display
19
Ordered Array
• 1. Organizes Data to Focus on Major
Features
• 2. Data Placed in Rank Order
– Smallest to Largest
• 3. Data in Raw Form (as Collected)
– 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• 4. Data in Ordered Array
– 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
20
Stem-and-Leaf Display
• A Stem-and-Leaf Display
shows the number of
observations that share a
common value (the stem)
and the precise value of
each observation (the leaf)
Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
26
2 144677
3 028
4 1
21
Frequency Distribution Table
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Class Frequency
15 but < 25 3
25 but < 35 5
35 but < 45 2
22
Frequency Distribution Table Steps
• 1. Determine Range
• 2. Select Number of Classes
– Usually Between 5 & 15 Inclusive
• 3. Compute Class Intervals (Width)
• 4. Determine Class Boundaries (Limits)
• 5. Compute Class Midpoints
• 6. Count Observations & Assign to Classes
23
Frequency Distribution Table Example
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Boundaries
(Upper + Lower Boundaries) / 2
Width
Class Midpoint Frequency
15 but < 25 20 3
25 but < 35 30 5
35 but < 45 40 2
24
Relative Frequency &
% Distribution Tables
Percentage
Distribution
Relative Frequency Distribution
Class Prop.
15 but < 25 .3
25 but < 35 .5
35 but < 45 .2
Class %
15 but < 25 30.0
25 but < 35 50.0
35 but < 45 20.0
class frequency
class relative frequency =
n
class percentage = (class relative frequency) 100

25
Cumulative Percentage Distribution
Table
Percentage Less than
Lower Class Boundary
Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38
Lower Class
Boundary
30% + 50%
80% + 20%
Class Cumulative
Percentage
15 but < 25 0.0
25 but < 35 30.0
35 but < 45 80.0
45 but < 55 100.0
26
0
1
2
3
4
5
Histogram
Frequency
Relative
Frequency
Percent
0 15 25 35 45 55
Lower Boundary
Bars Touch
Class Freq.
15 but < 25 3
25 but < 35 5
35 but < 45 2
Count
• Histograms are graphs of the frequency or relative
frequency of a variable.
– Class intervals make up the horizontal axis
– The frequencies or relative frequencies are displayed on the
vertical axis.
27
0
1
2
3
4
5
Polygon
Midpoint
Fictitious
Class
0 10 20 30 40 50 60
Class Freq.
15 but < 25 3
25 but < 35 5
35 but < 45 2
Frequency
Relative
Frequency
Percent
Count
28
0%
25%
50%
75%
100%
Cumulative % Polygon (Ogive)
Lower Boundary
Fictitious
Class
0 15 25 35 45 55
Class Cum. %
15 but < 25 0%
25 but < 35 30%
35 but < 45 80%
45 but < 55 100%
Cumulative %
29
Categorical Data Presentation
Pareto
Diagram
Pie
Chart
Categorical
Data
Bar
Chart
Summary
Table
30
Summary Table
• 1. Lists Categories & No. Elements in Category
• 2. Obtained by Tallying Responses in Category
• 3. May Show Frequencies (Counts), % or Both
Row Is
Category
Tally:
|||| ||||
|||| ||||
Major Count
Accounting 130
Economics 20
Management 50
Total 200
31
0 50 100 150
Acct.
Econ.
Mgmt.
Bar Chart
Horizontal
Bars for
Categorical
Variables
Bar Length
Shows
Frequency
or %
1/2 to 1 Bar
Width
Equal Bar
Widths
Zero Point
Frequency
Major
Percent Used Also
32
Econ.
10%
Mgmt.
25%
Acct.
65%
Pie Chart
• 1. Shows Breakdown
of Total Quantity
into Categories
• 2. Useful for Showing
Relative Differences
• 3. Angle Size
– (360°)(Percent)
Majors
(360°) (10%) = 36°
36°
33
0%
33%
67%
100%
Acct. Mgmt. Econ.
Pareto Diagram
Percent
Major
Descending
Order
Cumulative
Polygon (Ogive)
Equal Bar
Widths
Vertical
Bar Chart
Bar Midpoint
Always %
34
Numerical Descriptive
Measures
35
Summary Measures
Central Tendency
Mean
Median
Mode
Quartile
Geometric Mean
Variation
Variance Standard
Deviation
Coefficient of Variation
Range
36
Measures of Central Tendency
• Various ways to describe the central,
most common or middle value in a
distribution or set of data
– The Mean (Arithmetic Mean)
– The Median
– The Mode
– The Geometric Mean
37
Numerical Measures of
Central Tendency
• Summarizing data sets numerically
– Are there certain values that seem more
typical for the data?
– How typical are they?
38
Numerical Measures of
Central Tendency
• Central tendency is the value or values around
which the data tend to cluster
• Variability shows how strongly the data
cluster around that (those) value(s)
39
Numerical Measures of
Central Tendency
• The mean of a set of quantitative data is the sum
of the observed values divided by the number of
values
1
n
i
i
x
x
n



40
Numerical Measures of
Central Tendency
• The mean of a sample is typically denoted by x-bar,
but the population mean is denoted by the Greek
symbol μ.
N
x
n
i
i


 1

1
n
i
i
x
x
n



41
• If x1 = 1, x2 = 2, x3 = 3 and x4 = 4,
= (1 + 2 + 3 + 4)/4 = 10/4 = 2.5
1
n
i
i
x
x
n



Numerical Measures of
Central Tendency
42
Numerical Measures of
Central Tendency
• The median of a set of quantitative data is the
value which is located in the middle of the data,
arranged from lowest to highest values (or vice
versa), with 50% of the observations above and
50% below.
43
Numerical Measures of
Central Tendency
50% 50%
Lowest Value Highest Value
Median
44
Numerical Measures of
Central Tendency
• Finding the Median, M:
– Arrange the n measurements from smallest to
largest
• If n is odd, M is the middle number
• If n is even, M is the average of the middle two
numbers
45
Numerical Measures of
Central Tendency
• The mode is the most frequently observed
value.
• The modal class is the midpoint of the class
with the highest relative frequency.
46
Geometric Mean
• Equals the nth root of the product of all
observations or values
• For a set of values: x1, x2, x3, x3, ........., xn
• Geometric mean =
47
Example
Problem
#
Time Spent
(Minutes)
1 12
2 4
3 3
4 8
5 7
6 5
7 4
8 9
9 11
Jim has 20 problems to do for
homework. Some are harder
than others and take more
time to solve. We take a
random sample of 9 problems.
Find the mean (arithmetic and
geometric), median and mode
for the number of minutes Jim
spends on his homework.
48
Solution: Mean
Problem
#
Time Spent
(Minutes)
1 12
2 4
3 3
4 8
5 7
6 5
7 4
8 9
9 11
Sample size (n) = 9
Problems 1 through 9 = x1, x2, x3 … x9,
respectively.
Σx= (12 + 4 + 3 + 8 + 7 + 5 + 4 + 9 + 11) =
63 minutes
Σx/n = 63/9 = 7 minutes
49
Solution: Geometric Mean
Problem
#
Time Spent
(Minutes)
1 12
2 4
3 3
4 8
5 7
6 5
7 4
8 9
9 11
Sample size (n) = 9
Problems 1 through 9 = x1, x2, x3 … x9,
respectively.
GM= 6.31
50
Solution: Median
3
4
4
5
7
8
9
11
12
Place the data in ascending order as at
right.
(n+1)/2 = (9+1)/2 = 5
The 5th
ordered observation is 7 and so is
the Median.
51
Solution: Mode
Since the data is already arranged in order
from smallest to largest we will keep it that
way.
Only the value 4 occurs >1 time.
The Mode is 4.
3
4
4
5
7
8
9
11
12
52
Approximating the Mean from a Frequency
Distribution
• Used when the only source of data is a
frequency distribution
1
sample size
number of classes in the frequency distribution
midpoint of the th class
frequencies of the th class
c
j j
j
j
j
m f
X
n
n
c
m j
f j







53
1
sample size
number of classes in the frequency distribution
midpoint of the th class
frequencies of the th class
c
j j
j
j
j
m f
X
n
n
c
m j
f j







Example
Class MP Freq.
10 but < 20 15 3
20 but < 30 25 6
30 but < 40 35 5
40 but < 50 45 4
50 but < 60 55 2
Total 20
X = ((15*3) + (25*6) + (35*5) + (45*4) + (55*2))/20
= (45 + 150 + 175 + 180 + 110)/20
= 660/20
= 33
54
Numerical Measures of
Central Tendency
• Perfectly symmetric data set:
– Mean = Median = Mode
• Extremely high value in the data set:
– Mean > Median > Mode
(Rightward skewness)
• Extremely low value in the data set:
– Mean < Median < Mode
(Leftward skewness)
55
Numerical Measures of
Central Tendency
• A data set is skewed if one tail of the
distribution has more extreme observations
that the other tail.
56
Numerical Measures of
Variability
• The mean, median and mode give us an idea
of the central tendency, or where the
“middle” of the data is
• Variability gives us an idea of how spread out
the data are around that middle
57
Measures of Variation
Variation
Variance Standard Deviation Coefficient
of Variation
Population
Variance
Sample
Variance
Population
Standard
Deviation
Sample
Standard
Deviation
Range
Interquartile Range
58
Numerical Measures of
Variability
• The range is equal to the largest measurement
minus the smallest measurement
– Easy to compute, but not very informative
– Considers only two observations (the smallest and
largest)
59
Quartiles
• Quartiles Split Ordered Data into 4 equal
portions
• Q1 and Q3 are Measures of Non-central Location
– Q2 = the Median
25% 25% 25% 25%
 
1
Q  
2
Q  
3
Q
60
Quartiles
• Each Quartile has position and value
– With the data in an ordered array, the position of Qi
is:
– The value of Qi is the value associated with that
position in the ordered array
• Example:
Data in Ordered Array: 11 12 13 16 16 17 18 21 22
   
1 1
1 9 1 12 13
Position of 2.5 12.5
4 2
Q Q
 
   
 
 
1
4
i
i n
Q


61
Quartiles Example
Find the 1st
and 3rd
Quartiles in the ordered
observations at right.
Position of Q1 = 1(9+1)/4 = 2.5
The 2.5th
observation = (4+4)/2 = 4
Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5
The 7.5th
observation = (9+11)/2 = 10
3
4
4
5
7
8
9
11
12
62
Interquartile Range (IQR)
• The difference between Q1 and Q3
– The middle 50% of the values
– Also Known as Midspread:
– Resistant to extreme values
• Example:
– Q1 = 12.5,Q3 = 17.5
– 17.5 – 12.5 = 5
– IQR = 5
11 12 13 16 16 17 17 18 21
 
1
Q  
3
Q
63
Range and IQR Example
Find the Range and the Interquartile Range in this distribution.
Range = largest – smallest = 12 – 3 = 9.
Position of Q1 = 1(9+1)/4 = 2.5
The 2.5th
observation = (4+4)/2 = 4
Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5
The 7.5th
observation = (9+11)/2 = 10
IQR = 10 – 4 = 6
3
4
4
5
7
8
9
11
12
64
Numerical Measures of
Variability
• The sample variance, s2, for a sample of n
measurements is equal to the sum of the
squared distances from the mean, divided
by (n – 1).
2
2 1
( )
1
i
n
i
x x
s
n





65
Numerical Measures of
Variability
• The sample standard deviation, s, for a
sample of n measurements is equal to the
square root of the sample variance.
2
2 1
( )
1
i
n
i
x x
s s
n


 


66
Numerical Measures of Variability
• Say a small data set consists of the measurements 1, 2
and 3.
= 2
 
2
2 2 2 2
1
2 2 2 2
2
( )
(3 2) (2 2) (1 2) (3 1)
1
1 0 1 / 2 2 / 2 1
1 1
/
i
n
i
x x
s
n
s
s s


 
       
 

    
  

x
67
Numerical Measures of Variability
• As before, Greek letters are used for
populations and Roman letters for samples
s2
= sample variance
s = sample standard deviation
s2
= population variance
s = population standard deviation
68
Comparing Standard Deviations
Mean = 15.5
s = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
s = .9258
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
s = 4.57
Data C
• Greater S (or σ) = more dispersion of data
69
Interpreting the Standard Deviation
• Chebyshev’s Rule
• The Empirical Rule
Both tell us something about where
the data will be relative to the mean.
70
Interpreting the Standard Deviation
• Chebyshev’s Rule
– Valid for any data set
– For any number k >1, at
least (1-1/k2
)% of the
observations will lie
within k standard
deviations of the mean
k k2
1/ k2
(1- 1/ k2)%
2 4 .25 75%
3 9 .11 89%
4 16 .0625 93.75%
71
The Bienayme-Chebyshev Rule
• At least (≥) 75% of the observations must be
contained within distances of 2 SD around the
mean
• At least (≥) 88.89% of the observations must be
contained within distances of 3 SD around the
mean
• At least (≥) 93.75% of the observations must be
contained within distances of 4 SD around the
mean
72
The Bienayme-Chebyshev Rule
- 4sd - 3sd - 2sd µ +2sd +3sd +4sd
≥ 88.89%
≥ 93.75%
≥ 75%
73
Interpreting the Standard Deviation
• The Empirical Rule
– Useful for mound-
shaped, symmetrical
distributions
– If not perfectly mounded
and symmetrical, the
values are
approximations
• For a perfectly
symmetrical and mound-
shaped distribution,
– ~68% will be within the
range
– ~95% will be within the
range
– ~99.7% will be within the
range
)
,
(
__
__
s
x
s
x 

)
2
,
2
(
__
__
s
x
s
x 

)
3
,
3
(
__
__
s
x
s
x 

74
Interpreting the Standard Deviation
• Hummingbirds beat their
wings in flight an average of 55
times per second.
• Assume the standard deviation
is 10, and that the distribution
is symmetrical and mounded.
– Approximately what percentage
of hummingbirds beat their
wings between 45 and 65 times
per second?
– Between 55 and 65?
– Less than 45?
75
Interpreting the Standard Deviation
Since 45 and 65 are exactly
one standard deviation below
and above the mean, the
empirical rule says that about
68% of the hummingbirds will
be in this range.
• Hummingbirds beat their
wings in flight an average of 55
times per second.
• Assume the standard deviation
is 10, and that the distribution
is symmetrical and mounded.
– Approximately what
percentage of hummingbirds
beat their wings between 45
and 65 times per second?
– Between 55 and 65?
– Less than 45?
76
Interpreting the Standard Deviation
This range of numbers is from
the mean to one standard
deviation above it, or one-half
of the range in the previous
question. So, about one-half
of 68%, or 34%, of the
hummingbirds will be in this
range.
• Hummingbirds beat their
wings in flight an average of 55
times per second.
• Assume the standard deviation
is 10, and that the distribution
is symmetrical and mounded.
– Approximately what
percentage of hummingbirds
beat their wings between 45
and 65 times per second?
– Between 55 and 65?
– Less than 45?
77
Interpreting the Standard Deviation
Half of the entire data set lies
above the mean, and ~34% lie
between 45 and 55 (between
one standard deviation below
the mean and the mean), so
~84% (~34% + 50%) are above
45, which means ~16% are
below 45.
• Hummingbirds beat their wings in
flight an average of 55 times per
second.
• Assume the standard deviation is
10, and that the distribution is
symmetrical and mounded.
– Approximately what percentage
of hummingbirds beat their
wings between 45 and 65 times
per second?
– Between 55 and 65?
– Less than 45?
78
Exercise
A manufacturer of automobile batteries claims that the average length of
life of its grade A battery is 60 months. However, the guarantee on this
brand is for just 36 months. Suppose the standard deviation of the life
length is known to be 10 months and the frequency distribution of the life-
length data is known to be mound shaped.
• Approximately what percentage of the manufacturer’s grade A batteries
will last more than 50 months, assuming that the manufacturer’s claim is
true?
• Approximately what percentage of the manufacturer’s batteries will last
less than 40 months, assuming that the manufacturer’s claim is true?
• Suppose your battery last 37 months. What could you infer about the
manufacturer’s claim?
79
Coefficient of Variation
• Measure of Relative Variation
• Shows Variation Relative to the Mean
• Used to Compare Two or More Sets of Data Measured in
Different Units
S = Sample Standard Deviation
X = Sample Mean
100%
S
CV
X
 
 
 
80
Comparing Coefficient
of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
• Stock B:
– Average price last year = $100
– Standard deviation = $5
Both stocks have
the same
standard
deviation, but
stock B is less
variable relative
to its price
10%
100%
$50
$5
100%
X
S
CVA 












5%
100%
$100
$5
100%
X
S
CVB 












81
Numerical Measures of Relative Standing
• The z-score tells us
how many standard
deviations above or
below the mean a
particular
measurement is.
• Sample z-score
• Population z-score
x x
z
s






x
z
82
Interpreting the Standard Deviation
• Hummingbirds beat their
wings in flight an average of 55
times per second.
• Assume the standard deviation
is 10, and that the distribution
is symmetrical and mounded.
An individual hummingbird is
measured with 75 beats per
second. What is this bird’s z-
score?
x x
z
s


0
.
2
10
55
75



z
83
Z Scores
Example:
• If the mean is 14.0 and the standard deviation is 3.0, what is
the Z score for the value 18.5?
• The value 18.5 is 1.5 standard deviations above the mean
• (A negative Z-score would mean that a value is less than the
mean)
1.5
3.0
14.0
18.5
S
X
X
Z 




84
Interpreting the Standard Deviation
• Since ~95% of all the
measurements will be within 2
standard deviations of the
mean, only ~5% will be more
than 2 standard deviations
from the mean.
• About half of this 5% will be
far below the mean, leaving
only about 2.5% of the
measurements at least 2
standard deviations above the
mean.
85
Numerical Measures of Relative Standing
• Z scores are related to the empirical rule:
For a perfectly symmetrical and mound-
shaped distribution,
– ~68 % will have z-scores between -1 and 1
– ~95 % will have z-scores between -2 and 2
– ~99.7% will have z-scores between -3 and 3
86
Methods for Determining Outliers
• An outlier is a measurement that is unusually
large or small relative to the other values.
• Three possible causes:
– Observation, recording or data entry error
– Item is from a different population
– A rare, chance event
87
The Box Plot (“Box-and-Whisker”)
• The box plot is a graph representing
information about certain percentiles for a
data set and can be used to identify outliers
• 5 number summary
– Median, Q1, Q3, Xsmallest, Xlargest
• Box Plot
– Graphical display of data using 5-number ummary
4 6 8 10 12 Xlargest
Xsmallest
1
Q 3
Q
2
Q
Median
88
Distribution Shape & Box Plot
Right-Skewed
Left-Skewed Symmetric
1
Q 1
Q 1
Q
2
Q 2
Q 2
Q
3
Q 3
Q
3
Q
89
• Outliers and z-scores
– The chance that a z-score is between -3 and +3 is
over 99%.
– Any measurement with |z| > 3 is considered an
outlier.
Methods for Determining Outliers
90
Correlation Coefficient
• Correlation Coefficient = r
– Unit Free
– Measures the strength of the linear relationship
between 2 quantitative variables
• Ranges between –1 and 1
– The Closer to –1, the stronger the negative linear
relationship becomes
– The Closer to 1, the stronger the positive linear
relationship becomes
– The Closer to 0, the weaker any linear relationship
becomes
91
Scatter Plots of Data with Various Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = .6 r = 1
• Scattergram (or scatterplot) shows the relationship between two
quantitative variables
92
Distorting the Truth with Deceptive Statistics
• Distortions
– Stretching the axis (and the truth)
– Is average relevant?
• Mean, median or mode?
– Is average relevant?
• What about the spread?

More Related Content

PDF
basic statisticsfor stastics basic knolege
PPTX
Lecture 1 - Overview.pptx
PPTX
Session 3&4.pptx
PPTX
Biostatistics mean median mode unit 1.pptx
PDF
Measure of central tendency
PPTX
Introduction to Biostatistics in medical research
PPTX
LECTURE 3 - inferential statistics bmaths
PPTX
determinatiion of
basic statisticsfor stastics basic knolege
Lecture 1 - Overview.pptx
Session 3&4.pptx
Biostatistics mean median mode unit 1.pptx
Measure of central tendency
Introduction to Biostatistics in medical research
LECTURE 3 - inferential statistics bmaths
determinatiion of

Similar to 1. Descriptive statistics.pptx engineering (20)

PPTX
Week 2 measures of disease occurence
PPTX
Statistics 000000000000000000000000.pptx
PDF
Biostatistics CH Lecture Pack
PPT
Univariate, bivariate analysis, hypothesis testing, chi square
PPT
Intro statistics
PPTX
Data Wrangling_1.pptx
PDF
Introduction to Statistics .pdf
PPTX
050325Online SPSS.pptx spss social science
PPT
Analysis
PPTX
Measure of Variability Report.pptx
PPTX
Business Statistics for Managers with SPSS[1].pptx
PPTX
Dscriptive statistics
PPTX
Biostatistics Basics Descriptive and Estimation Methods
PPTX
Statistics with R
PDF
Res701 research methodology lecture 7 8-devaprakasam
PPTX
Introduction to statistics
PPTX
Quantitative research
PDF
Spss basic Dr Marwa Zalat
PPTX
Data analytics course notes of Unit-1.pptx
PPTX
Measure of central tendency grouped data.pptx
Week 2 measures of disease occurence
Statistics 000000000000000000000000.pptx
Biostatistics CH Lecture Pack
Univariate, bivariate analysis, hypothesis testing, chi square
Intro statistics
Data Wrangling_1.pptx
Introduction to Statistics .pdf
050325Online SPSS.pptx spss social science
Analysis
Measure of Variability Report.pptx
Business Statistics for Managers with SPSS[1].pptx
Dscriptive statistics
Biostatistics Basics Descriptive and Estimation Methods
Statistics with R
Res701 research methodology lecture 7 8-devaprakasam
Introduction to statistics
Quantitative research
Spss basic Dr Marwa Zalat
Data analytics course notes of Unit-1.pptx
Measure of central tendency grouped data.pptx
Ad

Recently uploaded (20)

PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PPTX
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Welding lecture in detail for understanding
PPTX
CH1 Production IntroductoryConcepts.pptx
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Construction Project Organization Group 2.pptx
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
composite construction of structures.pdf
PDF
PPT on Performance Review to get promotions
PPT
Project quality management in manufacturing
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
Foundation to blockchain - A guide to Blockchain Tech
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
KTU 2019 -S7-MCN 401 MODULE 2-VINAY.pptx
R24 SURVEYING LAB MANUAL for civil enggi
Welding lecture in detail for understanding
CH1 Production IntroductoryConcepts.pptx
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Lecture Notes Electrical Wiring System Components
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
OOP with Java - Java Introduction (Basics)
Construction Project Organization Group 2.pptx
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
composite construction of structures.pdf
PPT on Performance Review to get promotions
Project quality management in manufacturing
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Model Code of Practice - Construction Work - 21102022 .pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Ad

1. Descriptive statistics.pptx engineering

  • 2. 2 Topics: Statistics • Descriptive Statistics • Probability Theory and Probability Distributions • Hypothesis Testing • Confidence Interval • Analysis Of Variance (ANOVA) • Regression and Correlation • Chi-Squared
  • 3. 3 Topics: Research Methods • Research Design • Literature Review • Sampling • Data Collection Methods • Sampling • Ethical Issues in Research Resource • IT role in research & Formatting
  • 4. 4 Assessment • One test – 15% • One Individual Assignment – 10% • One Group Assignment – 15% • Final Exam – 60% • Note: Pass Mark is B (50%)
  • 5. 5 References References • Probability and Statistics for Engineering and the Sciences, by Jay L. Devore, Monterey, California. • Basic Business Statistics Berenson M.L, Levine D.M, Krehbiel, T.C • Research Methodology, Methods and Techniques, by C.R. Kothari • Research Methods for Business students, by Mark Saunders, Philip Lewis and Adrian Thornhill • Plenty of Websites
  • 6. 6
  • 8. 8 The Science of Statistics • Statistics is the science of data. This involves collecting, classifying, summarizing, organizing, analyzing and interpreting numerical information. Statistics Descriptiv e Statistics Inferential Statistics
  • 9. 9 Types of Statistical Applications • Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set and to present that information in a convenient form. • Inferential statistics utilizes sample data to make estimates, decisions, predictions or other generalizations about a larger set of data.
  • 10. 10 Descriptive Statistics • Collect data – e.g. Survey • Present data – e.g. Tables and graphs • Characterize data – e.g. Sample mean = i X n  Descriptive statistics utilizes numerical and graphical methods to look for patterns in a data set, to summarize the information revealed in a data set and to present that information in a convenient form.
  • 11. 11 Inferential Statistics • Estimation – e.g.: Estimate the population mean weight using the sample mean weight • Hypothesis testing – e.g.: Test the claim that the population mean weight is 120 pounds Drawing conclusions and/or making decisions concerning a population based on sample results. Inferential statistics utilizes sample data to make estimates, decisions, predictions or other generalizations about a larger set of data.
  • 12. 12 Fundamental Elements of Statistics • An experimental unit is an object about which we collect data. – Person – Place – Thing – Event
  • 13. 13 Fundamental Elements of Statistics • An population is a set of units in which we are interested. – Typically, there are too many experimental units in a population to consider every one. • If we can examine every single one, we conduct a census.
  • 14. 14 Fundamental Elements of Statistics • A sample is a subset of the population. • A variable is a characteristic or property of an individual unit. – The values of these characteristics will, not surprisingly, vary. – A measure of reliability is a statement about the degree of uncertainty associated with a statistical inference. (Based on our analysis, we think 56% of soda drinkers prefer Pepsi to Coke, ± 5%.)
  • 15. Fundamental Elements of Statistics Descriptive Statistics • The population or sample of interest • One or more variables to be investigated • Tables, graphs or numerical summary tools • Identification of patterns in the data Inferential Statistics • Population of interest • One or more variables to be investigated • The sample of population units • The inference about the population based on the sample data • A measure of reliability of the inference 15
  • 16. Types of Data • Quantitative Data are measurements that are recorded on a naturally occurring numerical scale. e.g. Age, GPA, Salary, Cost of books this semester • Categorical (Qualitative) Data are measurements that cannot be recorded on a natural numerical scale, but are recorded in categories e.g. Live on/off campus, Major, Gender 16
  • 19. 19 Ordered Array • 1. Organizes Data to Focus on Major Features • 2. Data Placed in Rank Order – Smallest to Largest • 3. Data in Raw Form (as Collected) – 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 • 4. Data in Ordered Array – 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
  • 20. 20 Stem-and-Leaf Display • A Stem-and-Leaf Display shows the number of observations that share a common value (the stem) and the precise value of each observation (the leaf) Data: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 26 2 144677 3 028 4 1
  • 21. 21 Frequency Distribution Table Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 Class Frequency 15 but < 25 3 25 but < 35 5 35 but < 45 2
  • 22. 22 Frequency Distribution Table Steps • 1. Determine Range • 2. Select Number of Classes – Usually Between 5 & 15 Inclusive • 3. Compute Class Intervals (Width) • 4. Determine Class Boundaries (Limits) • 5. Compute Class Midpoints • 6. Count Observations & Assign to Classes
  • 23. 23 Frequency Distribution Table Example Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 Boundaries (Upper + Lower Boundaries) / 2 Width Class Midpoint Frequency 15 but < 25 20 3 25 but < 35 30 5 35 but < 45 40 2
  • 24. 24 Relative Frequency & % Distribution Tables Percentage Distribution Relative Frequency Distribution Class Prop. 15 but < 25 .3 25 but < 35 .5 35 but < 45 .2 Class % 15 but < 25 30.0 25 but < 35 50.0 35 but < 45 20.0 class frequency class relative frequency = n class percentage = (class relative frequency) 100 
  • 25. 25 Cumulative Percentage Distribution Table Percentage Less than Lower Class Boundary Raw Data: 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 Lower Class Boundary 30% + 50% 80% + 20% Class Cumulative Percentage 15 but < 25 0.0 25 but < 35 30.0 35 but < 45 80.0 45 but < 55 100.0
  • 26. 26 0 1 2 3 4 5 Histogram Frequency Relative Frequency Percent 0 15 25 35 45 55 Lower Boundary Bars Touch Class Freq. 15 but < 25 3 25 but < 35 5 35 but < 45 2 Count • Histograms are graphs of the frequency or relative frequency of a variable. – Class intervals make up the horizontal axis – The frequencies or relative frequencies are displayed on the vertical axis.
  • 27. 27 0 1 2 3 4 5 Polygon Midpoint Fictitious Class 0 10 20 30 40 50 60 Class Freq. 15 but < 25 3 25 but < 35 5 35 but < 45 2 Frequency Relative Frequency Percent Count
  • 28. 28 0% 25% 50% 75% 100% Cumulative % Polygon (Ogive) Lower Boundary Fictitious Class 0 15 25 35 45 55 Class Cum. % 15 but < 25 0% 25 but < 35 30% 35 but < 45 80% 45 but < 55 100% Cumulative %
  • 30. 30 Summary Table • 1. Lists Categories & No. Elements in Category • 2. Obtained by Tallying Responses in Category • 3. May Show Frequencies (Counts), % or Both Row Is Category Tally: |||| |||| |||| |||| Major Count Accounting 130 Economics 20 Management 50 Total 200
  • 31. 31 0 50 100 150 Acct. Econ. Mgmt. Bar Chart Horizontal Bars for Categorical Variables Bar Length Shows Frequency or % 1/2 to 1 Bar Width Equal Bar Widths Zero Point Frequency Major Percent Used Also
  • 32. 32 Econ. 10% Mgmt. 25% Acct. 65% Pie Chart • 1. Shows Breakdown of Total Quantity into Categories • 2. Useful for Showing Relative Differences • 3. Angle Size – (360°)(Percent) Majors (360°) (10%) = 36° 36°
  • 33. 33 0% 33% 67% 100% Acct. Mgmt. Econ. Pareto Diagram Percent Major Descending Order Cumulative Polygon (Ogive) Equal Bar Widths Vertical Bar Chart Bar Midpoint Always %
  • 35. 35 Summary Measures Central Tendency Mean Median Mode Quartile Geometric Mean Variation Variance Standard Deviation Coefficient of Variation Range
  • 36. 36 Measures of Central Tendency • Various ways to describe the central, most common or middle value in a distribution or set of data – The Mean (Arithmetic Mean) – The Median – The Mode – The Geometric Mean
  • 37. 37 Numerical Measures of Central Tendency • Summarizing data sets numerically – Are there certain values that seem more typical for the data? – How typical are they?
  • 38. 38 Numerical Measures of Central Tendency • Central tendency is the value or values around which the data tend to cluster • Variability shows how strongly the data cluster around that (those) value(s)
  • 39. 39 Numerical Measures of Central Tendency • The mean of a set of quantitative data is the sum of the observed values divided by the number of values 1 n i i x x n   
  • 40. 40 Numerical Measures of Central Tendency • The mean of a sample is typically denoted by x-bar, but the population mean is denoted by the Greek symbol μ. N x n i i    1  1 n i i x x n   
  • 41. 41 • If x1 = 1, x2 = 2, x3 = 3 and x4 = 4, = (1 + 2 + 3 + 4)/4 = 10/4 = 2.5 1 n i i x x n    Numerical Measures of Central Tendency
  • 42. 42 Numerical Measures of Central Tendency • The median of a set of quantitative data is the value which is located in the middle of the data, arranged from lowest to highest values (or vice versa), with 50% of the observations above and 50% below.
  • 43. 43 Numerical Measures of Central Tendency 50% 50% Lowest Value Highest Value Median
  • 44. 44 Numerical Measures of Central Tendency • Finding the Median, M: – Arrange the n measurements from smallest to largest • If n is odd, M is the middle number • If n is even, M is the average of the middle two numbers
  • 45. 45 Numerical Measures of Central Tendency • The mode is the most frequently observed value. • The modal class is the midpoint of the class with the highest relative frequency.
  • 46. 46 Geometric Mean • Equals the nth root of the product of all observations or values • For a set of values: x1, x2, x3, x3, ........., xn • Geometric mean =
  • 47. 47 Example Problem # Time Spent (Minutes) 1 12 2 4 3 3 4 8 5 7 6 5 7 4 8 9 9 11 Jim has 20 problems to do for homework. Some are harder than others and take more time to solve. We take a random sample of 9 problems. Find the mean (arithmetic and geometric), median and mode for the number of minutes Jim spends on his homework.
  • 48. 48 Solution: Mean Problem # Time Spent (Minutes) 1 12 2 4 3 3 4 8 5 7 6 5 7 4 8 9 9 11 Sample size (n) = 9 Problems 1 through 9 = x1, x2, x3 … x9, respectively. Σx= (12 + 4 + 3 + 8 + 7 + 5 + 4 + 9 + 11) = 63 minutes Σx/n = 63/9 = 7 minutes
  • 49. 49 Solution: Geometric Mean Problem # Time Spent (Minutes) 1 12 2 4 3 3 4 8 5 7 6 5 7 4 8 9 9 11 Sample size (n) = 9 Problems 1 through 9 = x1, x2, x3 … x9, respectively. GM= 6.31
  • 50. 50 Solution: Median 3 4 4 5 7 8 9 11 12 Place the data in ascending order as at right. (n+1)/2 = (9+1)/2 = 5 The 5th ordered observation is 7 and so is the Median.
  • 51. 51 Solution: Mode Since the data is already arranged in order from smallest to largest we will keep it that way. Only the value 4 occurs >1 time. The Mode is 4. 3 4 4 5 7 8 9 11 12
  • 52. 52 Approximating the Mean from a Frequency Distribution • Used when the only source of data is a frequency distribution 1 sample size number of classes in the frequency distribution midpoint of the th class frequencies of the th class c j j j j j m f X n n c m j f j       
  • 53. 53 1 sample size number of classes in the frequency distribution midpoint of the th class frequencies of the th class c j j j j j m f X n n c m j f j        Example Class MP Freq. 10 but < 20 15 3 20 but < 30 25 6 30 but < 40 35 5 40 but < 50 45 4 50 but < 60 55 2 Total 20 X = ((15*3) + (25*6) + (35*5) + (45*4) + (55*2))/20 = (45 + 150 + 175 + 180 + 110)/20 = 660/20 = 33
  • 54. 54 Numerical Measures of Central Tendency • Perfectly symmetric data set: – Mean = Median = Mode • Extremely high value in the data set: – Mean > Median > Mode (Rightward skewness) • Extremely low value in the data set: – Mean < Median < Mode (Leftward skewness)
  • 55. 55 Numerical Measures of Central Tendency • A data set is skewed if one tail of the distribution has more extreme observations that the other tail.
  • 56. 56 Numerical Measures of Variability • The mean, median and mode give us an idea of the central tendency, or where the “middle” of the data is • Variability gives us an idea of how spread out the data are around that middle
  • 57. 57 Measures of Variation Variation Variance Standard Deviation Coefficient of Variation Population Variance Sample Variance Population Standard Deviation Sample Standard Deviation Range Interquartile Range
  • 58. 58 Numerical Measures of Variability • The range is equal to the largest measurement minus the smallest measurement – Easy to compute, but not very informative – Considers only two observations (the smallest and largest)
  • 59. 59 Quartiles • Quartiles Split Ordered Data into 4 equal portions • Q1 and Q3 are Measures of Non-central Location – Q2 = the Median 25% 25% 25% 25%   1 Q   2 Q   3 Q
  • 60. 60 Quartiles • Each Quartile has position and value – With the data in an ordered array, the position of Qi is: – The value of Qi is the value associated with that position in the ordered array • Example: Data in Ordered Array: 11 12 13 16 16 17 18 21 22     1 1 1 9 1 12 13 Position of 2.5 12.5 4 2 Q Q           1 4 i i n Q  
  • 61. 61 Quartiles Example Find the 1st and 3rd Quartiles in the ordered observations at right. Position of Q1 = 1(9+1)/4 = 2.5 The 2.5th observation = (4+4)/2 = 4 Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5 The 7.5th observation = (9+11)/2 = 10 3 4 4 5 7 8 9 11 12
  • 62. 62 Interquartile Range (IQR) • The difference between Q1 and Q3 – The middle 50% of the values – Also Known as Midspread: – Resistant to extreme values • Example: – Q1 = 12.5,Q3 = 17.5 – 17.5 – 12.5 = 5 – IQR = 5 11 12 13 16 16 17 17 18 21   1 Q   3 Q
  • 63. 63 Range and IQR Example Find the Range and the Interquartile Range in this distribution. Range = largest – smallest = 12 – 3 = 9. Position of Q1 = 1(9+1)/4 = 2.5 The 2.5th observation = (4+4)/2 = 4 Position of Q3 = 3(9+1)/4 = 3(Q1) = 7.5 The 7.5th observation = (9+11)/2 = 10 IQR = 10 – 4 = 6 3 4 4 5 7 8 9 11 12
  • 64. 64 Numerical Measures of Variability • The sample variance, s2, for a sample of n measurements is equal to the sum of the squared distances from the mean, divided by (n – 1). 2 2 1 ( ) 1 i n i x x s n     
  • 65. 65 Numerical Measures of Variability • The sample standard deviation, s, for a sample of n measurements is equal to the square root of the sample variance. 2 2 1 ( ) 1 i n i x x s s n      
  • 66. 66 Numerical Measures of Variability • Say a small data set consists of the measurements 1, 2 and 3. = 2   2 2 2 2 2 1 2 2 2 2 2 ( ) (3 2) (2 2) (1 2) (3 1) 1 1 0 1 / 2 2 / 2 1 1 1 / i n i x x s n s s s                         x
  • 67. 67 Numerical Measures of Variability • As before, Greek letters are used for populations and Roman letters for samples s2 = sample variance s = sample standard deviation s2 = population variance s = population standard deviation
  • 68. 68 Comparing Standard Deviations Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 s = .9258 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 s = 4.57 Data C • Greater S (or σ) = more dispersion of data
  • 69. 69 Interpreting the Standard Deviation • Chebyshev’s Rule • The Empirical Rule Both tell us something about where the data will be relative to the mean.
  • 70. 70 Interpreting the Standard Deviation • Chebyshev’s Rule – Valid for any data set – For any number k >1, at least (1-1/k2 )% of the observations will lie within k standard deviations of the mean k k2 1/ k2 (1- 1/ k2)% 2 4 .25 75% 3 9 .11 89% 4 16 .0625 93.75%
  • 71. 71 The Bienayme-Chebyshev Rule • At least (≥) 75% of the observations must be contained within distances of 2 SD around the mean • At least (≥) 88.89% of the observations must be contained within distances of 3 SD around the mean • At least (≥) 93.75% of the observations must be contained within distances of 4 SD around the mean
  • 72. 72 The Bienayme-Chebyshev Rule - 4sd - 3sd - 2sd µ +2sd +3sd +4sd ≥ 88.89% ≥ 93.75% ≥ 75%
  • 73. 73 Interpreting the Standard Deviation • The Empirical Rule – Useful for mound- shaped, symmetrical distributions – If not perfectly mounded and symmetrical, the values are approximations • For a perfectly symmetrical and mound- shaped distribution, – ~68% will be within the range – ~95% will be within the range – ~99.7% will be within the range ) , ( __ __ s x s x   ) 2 , 2 ( __ __ s x s x   ) 3 , 3 ( __ __ s x s x  
  • 74. 74 Interpreting the Standard Deviation • Hummingbirds beat their wings in flight an average of 55 times per second. • Assume the standard deviation is 10, and that the distribution is symmetrical and mounded. – Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? – Between 55 and 65? – Less than 45?
  • 75. 75 Interpreting the Standard Deviation Since 45 and 65 are exactly one standard deviation below and above the mean, the empirical rule says that about 68% of the hummingbirds will be in this range. • Hummingbirds beat their wings in flight an average of 55 times per second. • Assume the standard deviation is 10, and that the distribution is symmetrical and mounded. – Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? – Between 55 and 65? – Less than 45?
  • 76. 76 Interpreting the Standard Deviation This range of numbers is from the mean to one standard deviation above it, or one-half of the range in the previous question. So, about one-half of 68%, or 34%, of the hummingbirds will be in this range. • Hummingbirds beat their wings in flight an average of 55 times per second. • Assume the standard deviation is 10, and that the distribution is symmetrical and mounded. – Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? – Between 55 and 65? – Less than 45?
  • 77. 77 Interpreting the Standard Deviation Half of the entire data set lies above the mean, and ~34% lie between 45 and 55 (between one standard deviation below the mean and the mean), so ~84% (~34% + 50%) are above 45, which means ~16% are below 45. • Hummingbirds beat their wings in flight an average of 55 times per second. • Assume the standard deviation is 10, and that the distribution is symmetrical and mounded. – Approximately what percentage of hummingbirds beat their wings between 45 and 65 times per second? – Between 55 and 65? – Less than 45?
  • 78. 78 Exercise A manufacturer of automobile batteries claims that the average length of life of its grade A battery is 60 months. However, the guarantee on this brand is for just 36 months. Suppose the standard deviation of the life length is known to be 10 months and the frequency distribution of the life- length data is known to be mound shaped. • Approximately what percentage of the manufacturer’s grade A batteries will last more than 50 months, assuming that the manufacturer’s claim is true? • Approximately what percentage of the manufacturer’s batteries will last less than 40 months, assuming that the manufacturer’s claim is true? • Suppose your battery last 37 months. What could you infer about the manufacturer’s claim?
  • 79. 79 Coefficient of Variation • Measure of Relative Variation • Shows Variation Relative to the Mean • Used to Compare Two or More Sets of Data Measured in Different Units S = Sample Standard Deviation X = Sample Mean 100% S CV X      
  • 80. 80 Comparing Coefficient of Variation • Stock A: – Average price last year = $50 – Standard deviation = $5 • Stock B: – Average price last year = $100 – Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price 10% 100% $50 $5 100% X S CVA              5% 100% $100 $5 100% X S CVB             
  • 81. 81 Numerical Measures of Relative Standing • The z-score tells us how many standard deviations above or below the mean a particular measurement is. • Sample z-score • Population z-score x x z s       x z
  • 82. 82 Interpreting the Standard Deviation • Hummingbirds beat their wings in flight an average of 55 times per second. • Assume the standard deviation is 10, and that the distribution is symmetrical and mounded. An individual hummingbird is measured with 75 beats per second. What is this bird’s z- score? x x z s   0 . 2 10 55 75    z
  • 83. 83 Z Scores Example: • If the mean is 14.0 and the standard deviation is 3.0, what is the Z score for the value 18.5? • The value 18.5 is 1.5 standard deviations above the mean • (A negative Z-score would mean that a value is less than the mean) 1.5 3.0 14.0 18.5 S X X Z     
  • 84. 84 Interpreting the Standard Deviation • Since ~95% of all the measurements will be within 2 standard deviations of the mean, only ~5% will be more than 2 standard deviations from the mean. • About half of this 5% will be far below the mean, leaving only about 2.5% of the measurements at least 2 standard deviations above the mean.
  • 85. 85 Numerical Measures of Relative Standing • Z scores are related to the empirical rule: For a perfectly symmetrical and mound- shaped distribution, – ~68 % will have z-scores between -1 and 1 – ~95 % will have z-scores between -2 and 2 – ~99.7% will have z-scores between -3 and 3
  • 86. 86 Methods for Determining Outliers • An outlier is a measurement that is unusually large or small relative to the other values. • Three possible causes: – Observation, recording or data entry error – Item is from a different population – A rare, chance event
  • 87. 87 The Box Plot (“Box-and-Whisker”) • The box plot is a graph representing information about certain percentiles for a data set and can be used to identify outliers • 5 number summary – Median, Q1, Q3, Xsmallest, Xlargest • Box Plot – Graphical display of data using 5-number ummary 4 6 8 10 12 Xlargest Xsmallest 1 Q 3 Q 2 Q Median
  • 88. 88 Distribution Shape & Box Plot Right-Skewed Left-Skewed Symmetric 1 Q 1 Q 1 Q 2 Q 2 Q 2 Q 3 Q 3 Q 3 Q
  • 89. 89 • Outliers and z-scores – The chance that a z-score is between -3 and +3 is over 99%. – Any measurement with |z| > 3 is considered an outlier. Methods for Determining Outliers
  • 90. 90 Correlation Coefficient • Correlation Coefficient = r – Unit Free – Measures the strength of the linear relationship between 2 quantitative variables • Ranges between –1 and 1 – The Closer to –1, the stronger the negative linear relationship becomes – The Closer to 1, the stronger the positive linear relationship becomes – The Closer to 0, the weaker any linear relationship becomes
  • 91. 91 Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6 r = 0 r = .6 r = 1 • Scattergram (or scatterplot) shows the relationship between two quantitative variables
  • 92. 92 Distorting the Truth with Deceptive Statistics • Distortions – Stretching the axis (and the truth) – Is average relevant? • Mean, median or mode? – Is average relevant? • What about the spread?

Editor's Notes

  • #21: The number of classes is usually between 5 and 15. Only 3 are used here for illustration purposes.
  • #25: The number of classes is usually between 5 and 15. Only 3 are used here for illustration purposes.
  • #31: Horizontal bars are used for categorical variables. Vertical bars are used for numerical variables. Still, some variation exists on this point in the literature. Also, there are many variations on the bar (e.g., stacked bar)