SlideShare a Scribd company logo
Descriptive Statistics
by Indramani Tripathi
Measures of Central Tendency
And Dispersion
Measures of Central Tendency
 1. Mode = can be used for any kind of data
but only measure of central tendency for
nominal or qualitative data.
 Formula: value that occurs most often or the
category or interval with highest frequency.
 Note: Omit Formula 3.1 Variation Ratio in
Healey and Prus 2nd
Cdn.
Example for Nominal Variables:

 Religion frequency cf proportion % Cum%
 Catholic 17 17 .41 41 41
 Protestant 4 21 .10 10 51
 Jewish 2 23 .05 5 56
 Muslim 1 24 .02 2 58
 Other 9 33 .22 9 80
 None 8 41 .20 20 100

 Total 41 1.00 100%
 Central Tendency: MODE = largest category = Catholic
Central Tendency (cont.)
 2. Median = exact centre or middle of
ordered data. The 50th percentile.
 Formula:
 Array data.
 When sample size is even, median falls
halfway between two middle numbers.
 To calculate: find (n/2) and (n/2)+1, and
divide the total by 2 to find the exact median.
 When sample size is odd, median is exact
middle (n+1) /2
Example for Raw Data:
 Suppose you have the following set of test
scores:
 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66,
98, 52, 74, 66, 89, 95, 66, 69
 1. Array (put in order) your data:
 98 98 95 89 89 77 76 74 69 69
68 67 66 66 66 66 60 60 52 41
N = 20 (N is even)
To calculate:
- find middle numbers(n/2)+(n/2 )+1
- add together the two middle numbers
- divide the total by 2
 First middle number: (20/2) = the 10th
number
 2nd
middle number: (20/2)+1 = the 11th
number
 Look at data:
the middle numbers are 69 and 68
 The median would be (69+68)/2 = 68.5
Median for Aggregate (grouped) Data
 This formula is shown in Healey 1st
Cdn
Edition but NOT in 2/3 Cdn
 We will NOT COVER this one!
Properties of median:
 - for numerical data at interval or ordinal level
 -"balance point“
 -not affected by outliers
 -median is appropriate when distribution is
highly skewed.
3. Mean for Raw Data
 The mean is the sum of measurements /
number of subjects
 Formula: (X-bar) = ΣXi / N
 Data (from above):
66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66,
98, 52, 74, 66, 89, 95, 66, 69
Example for Mean
 Formula: = ΣXi / N
= 1446 / 20
= 72.3
The mean for these test scores is 72.30
Mean for Aggregate (Grouped) Data
(Note: not in text but covered in class)
 To calculate the mean for grouped data, you
need a frequency table that includes a
column for the midpoints, for the product of
the frequencies times the midpoints (fm).
Formula: = Σ (fm)
N
Frequency table:
Score f m* (fm)
41-50 1 45.5 45.5
51-60 3 55.5 166.5
61-70 8 65.5 524
71-80 3 75.5 226.5
81-90 2 85.5 171
91-100 3 95.5 286.5
N = 20 Σ (fm) = 1420
* Find midpoints first
Calculating Mean for Grouped Data:
Formula: = Σ (fm)
N
= 1420 / 20
= 71
The mean for the grouped data is 71.
Properties of the Mean:
- only for numerical data at interval level
- "balance point“
- can be affected by outliers = skewed distribution
- tail becomes elongated and the mean is pulled in
direction of outlier.
Example…
no outlier:
$30000, 30000, 35000, 25000, 30000 then mean = $30000
but if outlier is present, then:
$130000, 30000, 35000, 25000, 30000 then mean = $50000
(the mean is pulled up or down in the direction of the outlier)
NOTE:
 When distribution is symmetric,
mean = median = mode
 For skewed, mean will lie in direction of skew.
 i.e. skewed to right (tail pulled to right)
mean > median (positive skew)
 skewed to left (tail pulled to left)
median > mean (negative skew)
Measures of Dispersion
 Describe how variable the data are.
 i.e. how spread out around the mean
 Also called measures of variation or
variability
Variability for Non-numerical Data
(Nominal or Ordinal Level Data)
 Measures of variability for non-numerical
nominal or ordinal) data are rarely used
 We will not be covering these in class
 Omit Formula 4.1 IQV in Healey and Prus
1st
Canadian Edition
 Omit Formula 3.1 Variation Ratio in Healey
and Prus 2/3 Canadian Edition
2. Range (for numerical data)
Range = difference between largest and
smallest observations
i.e. if data are $130000, 35000, 30000, 30000,
30000, 30000, 25000, 25000
then range = 130000 - 25000 = $105000
Interquartile Range (Q):
- This is the difference between the 75th and the 25th
percentiles (the middle 50%)
- Gives better idea than range of what the middle of
the distribution looks like.
Formula: Q = Q3 - Q1 (where Q3 = N x .75,
and Q1 = N x .25)
Using above data: Q = Q3 - Q1 = (6th
– 2nd
case)
= $30000-25000 =$5000
The interquartile range (Q) is $5000.
3. Variance and Standard Deviation:
 For raw data at the interval/ratio level.
 Most common measure of variation.
 The numerator in the formula is known as
the sum of squares, and the denominator is
either the population size N or the sample
size n-1
 The variance is denoted by S2
and the
standard deviation, which is the square root
of the variance, by S
Definitional Formula for Variance and
Standard Deviation:
 Variance: s2
= Σ (xi - )2
/ N
 Standard Deviation:
s =
 (the standard deviation is the square root of
the variance; the variance is simply the
standard deviation squared)
Example for S and S2
:
 Data: 66, 89, 41, 98, 76, 77, 68, 60, 60, 67,
69, 66, 98, 52, 74, 66, 89, 95, 66, 69
1. Find ∑ Xi
2
: Square each Xi and find total.
2. Find (∑ Xi)2
: Find total of all Xi and square.
3. Substitute above and N into formula for S.
4. For S2
, simply square S.
S = 14.75 S2
= 217.71
A working formula for the standard
deviation:
Note: the definitional formula for standard deviation is
not practical for use with data when N>10.
The working formula, which is much easier to do on
your calculator, should be used instead.
Both formulae give exactly the same result. Try it!
2
2
X
N
X
S i
−=
∑
Properties of S:
 always greater than or equal to 0
 the greater the variation about mean,
the greater S is
 n-1 corrects for bias when using sample data. S
tends to underestimate the real population standard
deviation when based on sample data so to correct
for this, we use n-1. The larger the sample size, the
smaller difference this correction makes. When
calculating the standard deviation for the whole
population, use N in the denominator.
NOTE:
 σ, N and Mu (µ) denote population
parameters
 s, n, x-bar ( ) denote sample statistics
Remember the Rounding Rules!
 Always use as many decimal places as your
calculator can handle.
 Round your final answer to 2 decimal places,
rounding to nearest number.
 Engineers Rule: When last digit is exactly 5
(followed by 0’s), round the digit before the
last digit to nearest EVEN number.
Homework Questions
 Healey and Prus 1e:
 #3.1, #3.5, #3.11 and 4.9, #4.15
 Healey and Prus 2/3e
 #3.1, #3.5, #3.11 (compute s for 8 nations also), #3.15
 SPSS:
 Read the SPSS sections for Ch. 3 and 4 in 1st
Cdn. Edition
and for Ch. 4 in 2/3 Cdn. Edition
 Try some of the SPSS exercises for practice

More Related Content

PPTX
INTRODUCTION TO BIO STATISTICS
PPTX
Descriptive statistics
PPTX
Introduction to Descriptive Statistics
PPTX
Normal probability distribution
PPT
Inferential statistics-estimation
PPTX
Basic Descriptive statistics
PPTX
Descriptive statistics
PPT
Descriptive statistics
INTRODUCTION TO BIO STATISTICS
Descriptive statistics
Introduction to Descriptive Statistics
Normal probability distribution
Inferential statistics-estimation
Basic Descriptive statistics
Descriptive statistics
Descriptive statistics

What's hot (20)

PPTX
biostatistics basic
PDF
Descriptive Statistics
PPTX
What is a Mann Whitney U?
PPT
Descriptive statistics i
PPTX
How to determine sample size
PPTX
Basic biostatistics dr.eezn
PPTX
Measures of mortality
PPT
Ch4 Confidence Interval
PPTX
Inferential statistics
DOCX
descriptive and inferential statistics
PPT
Z score presnetation
PPT
Univariate, bivariate analysis, hypothesis testing, chi square
PPTX
Statistics
PPT
Introduction to Probability and Probability Distributions
PPTX
Introduction To SPSS
PPTX
Central limit theorem
PPTX
Normal Curve
PDF
Measurement in epidemiology
PDF
Research method ch07 statistical methods 1
biostatistics basic
Descriptive Statistics
What is a Mann Whitney U?
Descriptive statistics i
How to determine sample size
Basic biostatistics dr.eezn
Measures of mortality
Ch4 Confidence Interval
Inferential statistics
descriptive and inferential statistics
Z score presnetation
Univariate, bivariate analysis, hypothesis testing, chi square
Statistics
Introduction to Probability and Probability Distributions
Introduction To SPSS
Central limit theorem
Normal Curve
Measurement in epidemiology
Research method ch07 statistical methods 1
Ad

Similar to Descriptive statistics (20)

PPTX
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
ODP
QT1 - 03 - Measures of Central Tendency
ODP
QT1 - 03 - Measures of Central Tendency
PPTX
Central tendency and Variation or Dispersion
PPT
Statistics 3, 4
PPTX
Measures of Central Tendency, Variability and Shapes
PPTX
Central tendency _dispersion
PPTX
Statistics
PPTX
analytical representation of data
PPT
statistics_1________________________.ppt
PPTX
Lect 3 background mathematics for Data Mining
PPTX
Ch5-quantitative-data analysis.pptx
PPTX
Lect 3 background mathematics
DOCX
PPTX
Chapter 3_M of Location and dispersion mean, median, mode, standard deviation
PDF
Empirics of standard deviation
PPTX
Measures of Central Tendency With Variance and Ranges.pptx
PPTX
measure of variability (windri). In research include example
PPTX
Basic Statistical Descriptions of Data.pptx
PPT
Different analytical techniques in management
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
QT1 - 03 - Measures of Central Tendency
QT1 - 03 - Measures of Central Tendency
Central tendency and Variation or Dispersion
Statistics 3, 4
Measures of Central Tendency, Variability and Shapes
Central tendency _dispersion
Statistics
analytical representation of data
statistics_1________________________.ppt
Lect 3 background mathematics for Data Mining
Ch5-quantitative-data analysis.pptx
Lect 3 background mathematics
Chapter 3_M of Location and dispersion mean, median, mode, standard deviation
Empirics of standard deviation
Measures of Central Tendency With Variance and Ranges.pptx
measure of variability (windri). In research include example
Basic Statistical Descriptions of Data.pptx
Different analytical techniques in management
Ad

Recently uploaded (20)

PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
RMMM.pdf make it easy to upload and study
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Cell Types and Its function , kingdom of life
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Complications of Minimal Access Surgery at WLH
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Basic Mud Logging Guide for educational purpose
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Institutional Correction lecture only . . .
PDF
Pre independence Education in Inndia.pdf
PPTX
master seminar digital applications in india
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
RMMM.pdf make it easy to upload and study
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
Cell Types and Its function , kingdom of life
VCE English Exam - Section C Student Revision Booklet
O5-L3 Freight Transport Ops (International) V1.pdf
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Microbial diseases, their pathogenesis and prophylaxis
Complications of Minimal Access Surgery at WLH
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Basic Mud Logging Guide for educational purpose
Supply Chain Operations Speaking Notes -ICLT Program
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Institutional Correction lecture only . . .
Pre independence Education in Inndia.pdf
master seminar digital applications in india

Descriptive statistics

  • 1. Descriptive Statistics by Indramani Tripathi Measures of Central Tendency And Dispersion
  • 2. Measures of Central Tendency  1. Mode = can be used for any kind of data but only measure of central tendency for nominal or qualitative data.  Formula: value that occurs most often or the category or interval with highest frequency.  Note: Omit Formula 3.1 Variation Ratio in Healey and Prus 2nd Cdn.
  • 3. Example for Nominal Variables:   Religion frequency cf proportion % Cum%  Catholic 17 17 .41 41 41  Protestant 4 21 .10 10 51  Jewish 2 23 .05 5 56  Muslim 1 24 .02 2 58  Other 9 33 .22 9 80  None 8 41 .20 20 100   Total 41 1.00 100%  Central Tendency: MODE = largest category = Catholic
  • 4. Central Tendency (cont.)  2. Median = exact centre or middle of ordered data. The 50th percentile.  Formula:  Array data.  When sample size is even, median falls halfway between two middle numbers.  To calculate: find (n/2) and (n/2)+1, and divide the total by 2 to find the exact median.  When sample size is odd, median is exact middle (n+1) /2
  • 5. Example for Raw Data:  Suppose you have the following set of test scores:  66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69  1. Array (put in order) your data:  98 98 95 89 89 77 76 74 69 69 68 67 66 66 66 66 60 60 52 41 N = 20 (N is even)
  • 6. To calculate: - find middle numbers(n/2)+(n/2 )+1 - add together the two middle numbers - divide the total by 2  First middle number: (20/2) = the 10th number  2nd middle number: (20/2)+1 = the 11th number  Look at data: the middle numbers are 69 and 68  The median would be (69+68)/2 = 68.5
  • 7. Median for Aggregate (grouped) Data  This formula is shown in Healey 1st Cdn Edition but NOT in 2/3 Cdn  We will NOT COVER this one!
  • 8. Properties of median:  - for numerical data at interval or ordinal level  -"balance point“  -not affected by outliers  -median is appropriate when distribution is highly skewed.
  • 9. 3. Mean for Raw Data  The mean is the sum of measurements / number of subjects  Formula: (X-bar) = ΣXi / N  Data (from above): 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69
  • 10. Example for Mean  Formula: = ΣXi / N = 1446 / 20 = 72.3 The mean for these test scores is 72.30
  • 11. Mean for Aggregate (Grouped) Data (Note: not in text but covered in class)  To calculate the mean for grouped data, you need a frequency table that includes a column for the midpoints, for the product of the frequencies times the midpoints (fm). Formula: = Σ (fm) N
  • 12. Frequency table: Score f m* (fm) 41-50 1 45.5 45.5 51-60 3 55.5 166.5 61-70 8 65.5 524 71-80 3 75.5 226.5 81-90 2 85.5 171 91-100 3 95.5 286.5 N = 20 Σ (fm) = 1420 * Find midpoints first
  • 13. Calculating Mean for Grouped Data: Formula: = Σ (fm) N = 1420 / 20 = 71 The mean for the grouped data is 71.
  • 14. Properties of the Mean: - only for numerical data at interval level - "balance point“ - can be affected by outliers = skewed distribution - tail becomes elongated and the mean is pulled in direction of outlier. Example… no outlier: $30000, 30000, 35000, 25000, 30000 then mean = $30000 but if outlier is present, then: $130000, 30000, 35000, 25000, 30000 then mean = $50000 (the mean is pulled up or down in the direction of the outlier)
  • 15. NOTE:  When distribution is symmetric, mean = median = mode  For skewed, mean will lie in direction of skew.  i.e. skewed to right (tail pulled to right) mean > median (positive skew)  skewed to left (tail pulled to left) median > mean (negative skew)
  • 16. Measures of Dispersion  Describe how variable the data are.  i.e. how spread out around the mean  Also called measures of variation or variability
  • 17. Variability for Non-numerical Data (Nominal or Ordinal Level Data)  Measures of variability for non-numerical nominal or ordinal) data are rarely used  We will not be covering these in class  Omit Formula 4.1 IQV in Healey and Prus 1st Canadian Edition  Omit Formula 3.1 Variation Ratio in Healey and Prus 2/3 Canadian Edition
  • 18. 2. Range (for numerical data) Range = difference between largest and smallest observations i.e. if data are $130000, 35000, 30000, 30000, 30000, 30000, 25000, 25000 then range = 130000 - 25000 = $105000
  • 19. Interquartile Range (Q): - This is the difference between the 75th and the 25th percentiles (the middle 50%) - Gives better idea than range of what the middle of the distribution looks like. Formula: Q = Q3 - Q1 (where Q3 = N x .75, and Q1 = N x .25) Using above data: Q = Q3 - Q1 = (6th – 2nd case) = $30000-25000 =$5000 The interquartile range (Q) is $5000.
  • 20. 3. Variance and Standard Deviation:  For raw data at the interval/ratio level.  Most common measure of variation.  The numerator in the formula is known as the sum of squares, and the denominator is either the population size N or the sample size n-1  The variance is denoted by S2 and the standard deviation, which is the square root of the variance, by S
  • 21. Definitional Formula for Variance and Standard Deviation:  Variance: s2 = Σ (xi - )2 / N  Standard Deviation: s =  (the standard deviation is the square root of the variance; the variance is simply the standard deviation squared)
  • 22. Example for S and S2 :  Data: 66, 89, 41, 98, 76, 77, 68, 60, 60, 67, 69, 66, 98, 52, 74, 66, 89, 95, 66, 69 1. Find ∑ Xi 2 : Square each Xi and find total. 2. Find (∑ Xi)2 : Find total of all Xi and square. 3. Substitute above and N into formula for S. 4. For S2 , simply square S. S = 14.75 S2 = 217.71
  • 23. A working formula for the standard deviation: Note: the definitional formula for standard deviation is not practical for use with data when N>10. The working formula, which is much easier to do on your calculator, should be used instead. Both formulae give exactly the same result. Try it! 2 2 X N X S i −= ∑
  • 24. Properties of S:  always greater than or equal to 0  the greater the variation about mean, the greater S is  n-1 corrects for bias when using sample data. S tends to underestimate the real population standard deviation when based on sample data so to correct for this, we use n-1. The larger the sample size, the smaller difference this correction makes. When calculating the standard deviation for the whole population, use N in the denominator.
  • 25. NOTE:  σ, N and Mu (µ) denote population parameters  s, n, x-bar ( ) denote sample statistics
  • 26. Remember the Rounding Rules!  Always use as many decimal places as your calculator can handle.  Round your final answer to 2 decimal places, rounding to nearest number.  Engineers Rule: When last digit is exactly 5 (followed by 0’s), round the digit before the last digit to nearest EVEN number.
  • 27. Homework Questions  Healey and Prus 1e:  #3.1, #3.5, #3.11 and 4.9, #4.15  Healey and Prus 2/3e  #3.1, #3.5, #3.11 (compute s for 8 nations also), #3.15  SPSS:  Read the SPSS sections for Ch. 3 and 4 in 1st Cdn. Edition and for Ch. 4 in 2/3 Cdn. Edition  Try some of the SPSS exercises for practice