SlideShare a Scribd company logo
SUMMARIZING
DATA
Dr Lipilekha Patnaik
Professor, Community Medicine
Institute of Medical Sciences & SUM Hospital
Siksha ‘O’ Anusandhan deemed to be University
Bhubaneswar, Odisha, India
Email: drlipilekha@yahoo.co.in
1
•Measures of central tendency –
Mean, Median, mode
•Measures of dispersion – Range,
standard deviation, Standard error
2
Session Objectives
Descriptive Measures for continuous data
•Central tendency measures – They are
computed to give a “center” around
which the measurements in the data
are distributed.
•Variation or variability measures –
They describe data spread or how far
away the measurements are from the
center.
3
Statistics related to continuous variables
• Mean
• Median
• Mode
• Range
• Standard Deviation
• Standard Error
4
Measures of Central Tendency
5
Central tendency measures
•Mean – The average value
Affected by extreme values
•Median – The middle value
Not affected by extremes
•Mode – Most frequently occurring
observation, there may be
more than one mode.
6
Mean
•Average
•Arithmetic Mean = (x )
= sum of individual values
number of observations
= Ʃ x
n
7
Exercise
• The diastolicblood pressureof 10 individualswas
83, 75, 81, 79, 71, 95, 75, 77, 84, 90.
•
• Arithmetic
Mean = 83+75+81+79+71+95+75+77+84+90
10
= 810
10
= 81
8
Median
§ The data are first arranged in an ascending or
descending order of magnitude
§ Middle observation is located, which is called
median.
§If the number of values is odd,
Median = middle value
§If the number of values is even,
Median = average of the two middle values
9
Median divides the data into two equal parts
with 50% of the observations above the median
and 50% below it.
10
Unsorted
Sorted in ascending
order
• Exercise: 1 odd no (11) of observations
• 11, 13, 15, 12, 10, 9, 2, 8, 12, 11, 10
• Median
• 8, 9, 10, 10, 11, 11, 12, 12, 12, 13, 15
• Exercise: 2 even no (12) of observations
• 11, 13, 15, 12, 10, 9, 12, 8, 12, 11, 10,12
• Arranged in ascending order
• 8, 9, 10, 10, 11, 11, 12, 12, 12, 12, 13, 15
11
median = 11+12
2
Exercise
Mode
•Most frequent observation.
•The value that appears most frequently in the data set.
12
11, 13, 15, 12, 10, 9, 12, 8, 12, 11, 10
Mode = 12
13
Exercise
Number of seizures/month:
3, 3, 1, 2, 4, 7, 9
14
•Mean?	 4.1	
•Median? 3
•Mode? 30
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7
No	of	seizures
What’s wrong with a mean?
•Mean	is	sensitive	to	outliers (values	far	from	the	
middle	of	the	distribution)
–Provides	a	falsely	high	or	low	measure	of	central	
tendency	when	outliers	exist.
–In	such	cases	(look	at	your	data),	use	the	median
as	the	preferred	measure	of	central	tendency.
15
Number	of	seizures/month:	100,2,3,3,4,7,1
•Mean? 17.14
•Median? 3
•Mode? 3
16
0
20
40
60
80
100
120
1 2 3 4 5 6 7
Outlier
Measures of Dispersion
17
Measures of dispersion
• “Dispersion” also called variability, scatter, spread)
•Measure how spread out a set of data is.
•Dispersion is the scatteredness of the data series
around its average.
18
Measures of dispersion
•Range
•Standard deviation
•Variance
•Interquartile range
19
Range
•The difference between the values of
the two extreme items of a series.
•i.e Difference between the maximum
& minimum value in a set of
observations.
20
•For example, from the following record of diastolic
blood pressure of 10 individuals -
93, 75, 81, 79, 7 7, 90, 75, 95, 77, 94.
• Highest value = 95
• Lowest value = 71.
•The Range is expressed as = 95-71=24
& 71 to 95 .
21
Exercise
•Simplest and most crude measure of
dispersion.
• Affected by the extreme values.
•Gives an idea of the variability very quickly.
22
Characteristics of Range
Standard deviation
•Tells us how individual values are deviated from and
around the mean in the sample.
•Provides an index of variability.
23
Characteristics of Standard Deviation
• Very satisfactory and most widely used measure of
dispersion.
•If SD is small, there is a high probability for getting a
value close to the mean and
• If it is large, the value is farther away from the mean.
• It is less affected by fluctuations of sampling.
24
How to determine a SD
1. Calculate the mean
2. Calculate the difference between each value
and the mean
3. Square each of the differences and sum them
4. Divide the sum by one less than the number of
observations (if n< 30) and no. of observations
(if n > 30).
25
Standard deviation
26
Standard deviation
• The diastolic blood pressure was as follows : 83, 75, 81, 79, 71, 95, 75, 77,
84, 90 of 10 individuals.
27
x
_
(		x	– x		)
_
(x	– x		) 2
83 2 4
75 -6 36
81 0 0
79 -2 4
71 -10 100
95 -14 196
75 6 36
77 4 16
84 3 9
90 9 81
Ʃ x = 810 _
Ʃ( x – x ) 2 = 482
n	=	10
Mean	=					810		=		81
10
Uses of the standard deviation
•The standard deviation enables us to determine,
with a great deal of accuracy, where the values
of a frequency distribution are located in relation
to the mean.
28
Standard Deviation (SD) – for ‘Normal
distribution’
2.5 3.5 4.5
Birth Weight
[N]
29
Mean	Birth-wt	=	3.5	kg
Std	Dev.	=	1.0	kg
Mean	±1	SD
3.5	±1kg
2.5	– 4.5	kg	=	68%
Mean		± 2	SD
3.5	±2	kg
1.5	– 5.5	kg	=	95%
3.5
1.5 5.5
(kg)
Variance
• Variance = (SD)2
_
=
! "!" 𝟐
(𝒏!𝟏)
• Indicates the degree of variability among the
observations for a given variable.
30
Percentiles
• The percentile is a number such that most p%
of the measurements are below it and at most
100 – p percent of data are above it.
• Ex – if in a certain data the 85th percentile is
520 means that 15% of the measurements in
the data are above 520 and 85% of the
measurements are below 520.
31
Percentiles - for
non-normally distributed data
32
50 60 70 80 90 100 110 120
Diastolic BP
[N]
25% 25% 25% 25%
25th	%-ile 50th	%-ile
Ð Ï
75th %-ile
Ï
50th percentile is the
MEDIAN.
The 25th to the 75th
percentile is the
INTERQUARTILE
RANGE (IQR).
….% of data that fall below a specific value
INTERQUARTILE RANGE
25% 25% 25% 25%
33
Q 1 Q 2 Q3
“Interquartilerange” is from Q1 to Q3.
interquartile range = Q 3 – Q 1
To calculate it just subtract quartile 1
from quartile 3
Example: 5, 8 , 4, 4, 6, 3, 8.
• First put the list of numbers in order.
• Then cut the list into 4 equal parts.
• The quartiles are the cuts.
3 , 4 , 4 , 5 , 6 , 8 , 8
34
Q 1
Lower
quartile
Q 2
Middle quartile
(median)
Q 3
upper
quartile
Quartile (Q1) =4
Quartile (Q2) = median = 5
Quartile (Q3) = 8
Interquartile range is Q3 – Q1 = 8 – 4 = 4
Standard Error
•If we take a random sample (n) from the population,
and similar samples over and over again we will find
that every sample will have a differentmean (x ).
•If we make a frequency distribution of all the sample
means drawn from the same population, we will find
that the distribution of the mean is nearly a normal
distribution and the mean of the sample means
practically the same as the population mean (p).
35
•This	is	a	very	important	observation	that	the	sample	
means	are	distributed	normally	about	the	population	
mean	(p).	
•The	standard	deviation	of	the	means	is	a	measure	of	
the	sample	error	and	is	given	by	the	formula	б/√n	
which	is	called	the	standard	error	or	the	standard	
error	of	the	mean.
36
95% confidence interval
•Approximately 2 standard errors above and below
the estimate
•The range within which 95% of estimates from
multiple samples would be expected to lie
•Regarded as the range within which the “true
population” value probably lies (with 95% certainty)
37
95% confidence interval of the mean
The SEM is used to describe a 95% confidence interval for an observed
mean. (95% CI = Mean ± 2 SEM)
This confidence interval narrows with larger sample size.
Since SE = '(
)*
38
95% CI of the mean
If based on 4 values,
95% CI is mean ± 2 SE
150 ± 2 x 30/ 4
150 ± 2 x 15
If based on 100 values,
95% CI is mean ± 2 SE
150 ± 2 x 30/ 100
150 ± 2 x 3
120	– 180
144	– 156
Mean	=	150
S.D.	=	30
39
Interpreting Estimates with Confidence
Intervals
•Confident that 95% of all sample
means based on the given sample size
will fall within the range of the CI.
40
Categorical data
• For categorical data
Compare groups
Use proportions
41
Example
• In a prevalence study of Hypertension, we found
that
Hypertension No Hypertension
Non smokers 10 (10%) 90
Smokers 26 (26%) 74
• It is visible from the table that the proportion of
HTN was higher among smokers . The question
that arises is whether HTN was really higher
among smokers or the difference was merely due
to chance.
42
Take – home messages:
§Look at your data
§For continuous data, summarize with mean (for
central tendency) and SD (for dispersion) only
for normal bell – shaped distributions
(otherwise, use median and percentiles)
§Interpret mean with confidence interval while
inferring to population
§For categorical data, use proportions.
43
44

More Related Content

PPTX
Inferential statistics
PPT
Descriptive Statistics and Data Visualization
PPTX
Lecture 1 Biostatistics Introduciton.pptx
PPT
Measure of Dispersion
PPTX
Descriptive statistics
PPTX
Intro to statistics
PPTX
Descriptive statistics
PPT
Descriptive statistics
Inferential statistics
Descriptive Statistics and Data Visualization
Lecture 1 Biostatistics Introduciton.pptx
Measure of Dispersion
Descriptive statistics
Intro to statistics
Descriptive statistics
Descriptive statistics

What's hot (20)

PPT
Tabulation
PPTX
coefficient correlation
PPTX
Understanding statistics in research
PPTX
Methods of data presention
PPTX
Analysis of data
PPTX
Bar Diagram (chart) in Statistics presentation
PPTX
ANALYSIS OF DATA.pptx
PPTX
Sampling techniques
PPTX
Statstics in nursing
PPTX
Descriptive statistics
PDF
PPT
Stratified Random Sampling
PPTX
Data and its Types
PPSX
Inferential statistics.ppt
PPTX
Frequency Polygon.pptx
ODP
Sampling & data collection Methods
PPTX
Introduction to Descriptive Statistics
PPTX
biostatistics
PPTX
Data collection
PPTX
Types of Data
Tabulation
coefficient correlation
Understanding statistics in research
Methods of data presention
Analysis of data
Bar Diagram (chart) in Statistics presentation
ANALYSIS OF DATA.pptx
Sampling techniques
Statstics in nursing
Descriptive statistics
Stratified Random Sampling
Data and its Types
Inferential statistics.ppt
Frequency Polygon.pptx
Sampling & data collection Methods
Introduction to Descriptive Statistics
biostatistics
Data collection
Types of Data
Ad

Similar to Summarizing data (20)

PPTX
Measurements of study variables (Basic Course in Biomedical Research)
PPT
Introduction to Biostatistics_20_4_17.ppt
PPTX
Descriptive Statistics.pptx
PPT
Bio statistics
PPTX
Central tendency and dispersion
PPTX
Lecture 3 Measures of Central Tendency and Dispersion.pptx
PPTX
Descriptive statistics: Mean, Mode, Median
PDF
IV STATISTICS I.pdf
PPTX
Data Display and Summary
PPTX
Basics of Stats (2).pptx
PPTX
descriptive data analysis
PPTX
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
PPTX
Measures of central tendency
PPTX
Basics of Educational Statistics (Descriptive statistics)
PDF
Measures of central tendency and dispersion mphpt-201844
PPTX
Measures of Central Tendency, Variability and Shapes
PPTX
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
PPTX
Biostatistics Basics Descriptive and Estimation Methods
PDF
Biostatistics (L3-L4) 1.3.........24.pdf
PPT
Looking at data
Measurements of study variables (Basic Course in Biomedical Research)
Introduction to Biostatistics_20_4_17.ppt
Descriptive Statistics.pptx
Bio statistics
Central tendency and dispersion
Lecture 3 Measures of Central Tendency and Dispersion.pptx
Descriptive statistics: Mean, Mode, Median
IV STATISTICS I.pdf
Data Display and Summary
Basics of Stats (2).pptx
descriptive data analysis
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
Measures of central tendency
Basics of Educational Statistics (Descriptive statistics)
Measures of central tendency and dispersion mphpt-201844
Measures of Central Tendency, Variability and Shapes
3. BIOSTATISTICS III measures of central tendency and dispersion by SM - Cop...
Biostatistics Basics Descriptive and Estimation Methods
Biostatistics (L3-L4) 1.3.........24.pdf
Looking at data
Ad

More from Dr Lipilekha Patnaik (20)

PDF
Health Programmes in India.pdf
PDF
Concept of public health.pdf
PDF
Demographic profile of india
PDF
Indicators of health
PDF
Immunization
PDF
Epidemic investigation
PDF
Study designs
PDF
Cross sectional study
PDF
Descriptive epidemiology
PDF
Rate, ratio, proportion
PDF
Introduction to epidemiology
PDF
Health planning in india
PDF
12th five year plan and NITI ayog
PDF
National programme for prevention and control of cancer, diabetes, CVDs and s...
PDF
Universal immunization programme
PDF
Health education
PDF
National AIDS Control Programme
PDF
PDF
Nutrition programmes in india
PDF
Normality tests
Health Programmes in India.pdf
Concept of public health.pdf
Demographic profile of india
Indicators of health
Immunization
Epidemic investigation
Study designs
Cross sectional study
Descriptive epidemiology
Rate, ratio, proportion
Introduction to epidemiology
Health planning in india
12th five year plan and NITI ayog
National programme for prevention and control of cancer, diabetes, CVDs and s...
Universal immunization programme
Health education
National AIDS Control Programme
Nutrition programmes in india
Normality tests

Recently uploaded (20)

PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PDF
Khadir.pdf Acacia catechu drug Ayurvedic medicine
PPT
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
post stroke aphasia rehabilitation physician
PPTX
Acid Base Disorders educational power point.pptx
PPTX
Neuropathic pain.ppt treatment managment
PPTX
History and examination of abdomen, & pelvis .pptx
PPT
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
DOC
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPTX
Gastroschisis- Clinical Overview 18112311
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
Slider: TOC sampling methods for cleaning validation
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPTX
Important Obstetric Emergency that must be recognised
PPTX
Note on Abortion.pptx for the student note
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
Medical Evidence in the Criminal Justice Delivery System in.pdf
Khadir.pdf Acacia catechu drug Ayurvedic medicine
Copy-Histopathology Practical by CMDA ESUTH CHAPTER(0) - Copy.ppt
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
post stroke aphasia rehabilitation physician
Acid Base Disorders educational power point.pptx
Neuropathic pain.ppt treatment managment
History and examination of abdomen, & pelvis .pptx
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
Adobe Premiere Pro CC Crack With Serial Key Full Free Download 2025
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
Gastroschisis- Clinical Overview 18112311
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Slider: TOC sampling methods for cleaning validation
surgery guide for USMLE step 2-part 1.pptx
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
Important Obstetric Emergency that must be recognised
Note on Abortion.pptx for the student note

Summarizing data

  • 1. SUMMARIZING DATA Dr Lipilekha Patnaik Professor, Community Medicine Institute of Medical Sciences & SUM Hospital Siksha ‘O’ Anusandhan deemed to be University Bhubaneswar, Odisha, India Email: drlipilekha@yahoo.co.in 1
  • 2. •Measures of central tendency – Mean, Median, mode •Measures of dispersion – Range, standard deviation, Standard error 2 Session Objectives
  • 3. Descriptive Measures for continuous data •Central tendency measures – They are computed to give a “center” around which the measurements in the data are distributed. •Variation or variability measures – They describe data spread or how far away the measurements are from the center. 3
  • 4. Statistics related to continuous variables • Mean • Median • Mode • Range • Standard Deviation • Standard Error 4
  • 5. Measures of Central Tendency 5
  • 6. Central tendency measures •Mean – The average value Affected by extreme values •Median – The middle value Not affected by extremes •Mode – Most frequently occurring observation, there may be more than one mode. 6
  • 7. Mean •Average •Arithmetic Mean = (x ) = sum of individual values number of observations = Ʃ x n 7
  • 8. Exercise • The diastolicblood pressureof 10 individualswas 83, 75, 81, 79, 71, 95, 75, 77, 84, 90. • • Arithmetic Mean = 83+75+81+79+71+95+75+77+84+90 10 = 810 10 = 81 8
  • 9. Median § The data are first arranged in an ascending or descending order of magnitude § Middle observation is located, which is called median. §If the number of values is odd, Median = middle value §If the number of values is even, Median = average of the two middle values 9
  • 10. Median divides the data into two equal parts with 50% of the observations above the median and 50% below it. 10 Unsorted Sorted in ascending order
  • 11. • Exercise: 1 odd no (11) of observations • 11, 13, 15, 12, 10, 9, 2, 8, 12, 11, 10 • Median • 8, 9, 10, 10, 11, 11, 12, 12, 12, 13, 15 • Exercise: 2 even no (12) of observations • 11, 13, 15, 12, 10, 9, 12, 8, 12, 11, 10,12 • Arranged in ascending order • 8, 9, 10, 10, 11, 11, 12, 12, 12, 12, 13, 15 11 median = 11+12 2 Exercise
  • 12. Mode •Most frequent observation. •The value that appears most frequently in the data set. 12
  • 13. 11, 13, 15, 12, 10, 9, 12, 8, 12, 11, 10 Mode = 12 13 Exercise
  • 14. Number of seizures/month: 3, 3, 1, 2, 4, 7, 9 14 •Mean? 4.1 •Median? 3 •Mode? 30 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 No of seizures
  • 15. What’s wrong with a mean? •Mean is sensitive to outliers (values far from the middle of the distribution) –Provides a falsely high or low measure of central tendency when outliers exist. –In such cases (look at your data), use the median as the preferred measure of central tendency. 15
  • 16. Number of seizures/month: 100,2,3,3,4,7,1 •Mean? 17.14 •Median? 3 •Mode? 3 16 0 20 40 60 80 100 120 1 2 3 4 5 6 7 Outlier
  • 18. Measures of dispersion • “Dispersion” also called variability, scatter, spread) •Measure how spread out a set of data is. •Dispersion is the scatteredness of the data series around its average. 18
  • 19. Measures of dispersion •Range •Standard deviation •Variance •Interquartile range 19
  • 20. Range •The difference between the values of the two extreme items of a series. •i.e Difference between the maximum & minimum value in a set of observations. 20
  • 21. •For example, from the following record of diastolic blood pressure of 10 individuals - 93, 75, 81, 79, 7 7, 90, 75, 95, 77, 94. • Highest value = 95 • Lowest value = 71. •The Range is expressed as = 95-71=24 & 71 to 95 . 21 Exercise
  • 22. •Simplest and most crude measure of dispersion. • Affected by the extreme values. •Gives an idea of the variability very quickly. 22 Characteristics of Range
  • 23. Standard deviation •Tells us how individual values are deviated from and around the mean in the sample. •Provides an index of variability. 23
  • 24. Characteristics of Standard Deviation • Very satisfactory and most widely used measure of dispersion. •If SD is small, there is a high probability for getting a value close to the mean and • If it is large, the value is farther away from the mean. • It is less affected by fluctuations of sampling. 24
  • 25. How to determine a SD 1. Calculate the mean 2. Calculate the difference between each value and the mean 3. Square each of the differences and sum them 4. Divide the sum by one less than the number of observations (if n< 30) and no. of observations (if n > 30). 25
  • 27. Standard deviation • The diastolic blood pressure was as follows : 83, 75, 81, 79, 71, 95, 75, 77, 84, 90 of 10 individuals. 27 x _ ( x – x ) _ (x – x ) 2 83 2 4 75 -6 36 81 0 0 79 -2 4 71 -10 100 95 -14 196 75 6 36 77 4 16 84 3 9 90 9 81 Ʃ x = 810 _ Ʃ( x – x ) 2 = 482 n = 10 Mean = 810 = 81 10
  • 28. Uses of the standard deviation •The standard deviation enables us to determine, with a great deal of accuracy, where the values of a frequency distribution are located in relation to the mean. 28
  • 29. Standard Deviation (SD) – for ‘Normal distribution’ 2.5 3.5 4.5 Birth Weight [N] 29 Mean Birth-wt = 3.5 kg Std Dev. = 1.0 kg Mean ±1 SD 3.5 ±1kg 2.5 – 4.5 kg = 68% Mean ± 2 SD 3.5 ±2 kg 1.5 – 5.5 kg = 95% 3.5 1.5 5.5 (kg)
  • 30. Variance • Variance = (SD)2 _ = ! "!" 𝟐 (𝒏!𝟏) • Indicates the degree of variability among the observations for a given variable. 30
  • 31. Percentiles • The percentile is a number such that most p% of the measurements are below it and at most 100 – p percent of data are above it. • Ex – if in a certain data the 85th percentile is 520 means that 15% of the measurements in the data are above 520 and 85% of the measurements are below 520. 31
  • 32. Percentiles - for non-normally distributed data 32 50 60 70 80 90 100 110 120 Diastolic BP [N] 25% 25% 25% 25% 25th %-ile 50th %-ile Ð Ï 75th %-ile Ï 50th percentile is the MEDIAN. The 25th to the 75th percentile is the INTERQUARTILE RANGE (IQR). ….% of data that fall below a specific value
  • 33. INTERQUARTILE RANGE 25% 25% 25% 25% 33 Q 1 Q 2 Q3 “Interquartilerange” is from Q1 to Q3. interquartile range = Q 3 – Q 1
  • 34. To calculate it just subtract quartile 1 from quartile 3 Example: 5, 8 , 4, 4, 6, 3, 8. • First put the list of numbers in order. • Then cut the list into 4 equal parts. • The quartiles are the cuts. 3 , 4 , 4 , 5 , 6 , 8 , 8 34 Q 1 Lower quartile Q 2 Middle quartile (median) Q 3 upper quartile Quartile (Q1) =4 Quartile (Q2) = median = 5 Quartile (Q3) = 8 Interquartile range is Q3 – Q1 = 8 – 4 = 4
  • 35. Standard Error •If we take a random sample (n) from the population, and similar samples over and over again we will find that every sample will have a differentmean (x ). •If we make a frequency distribution of all the sample means drawn from the same population, we will find that the distribution of the mean is nearly a normal distribution and the mean of the sample means practically the same as the population mean (p). 35
  • 37. 95% confidence interval •Approximately 2 standard errors above and below the estimate •The range within which 95% of estimates from multiple samples would be expected to lie •Regarded as the range within which the “true population” value probably lies (with 95% certainty) 37
  • 38. 95% confidence interval of the mean The SEM is used to describe a 95% confidence interval for an observed mean. (95% CI = Mean ± 2 SEM) This confidence interval narrows with larger sample size. Since SE = '( )* 38
  • 39. 95% CI of the mean If based on 4 values, 95% CI is mean ± 2 SE 150 ± 2 x 30/ 4 150 ± 2 x 15 If based on 100 values, 95% CI is mean ± 2 SE 150 ± 2 x 30/ 100 150 ± 2 x 3 120 – 180 144 – 156 Mean = 150 S.D. = 30 39
  • 40. Interpreting Estimates with Confidence Intervals •Confident that 95% of all sample means based on the given sample size will fall within the range of the CI. 40
  • 41. Categorical data • For categorical data Compare groups Use proportions 41
  • 42. Example • In a prevalence study of Hypertension, we found that Hypertension No Hypertension Non smokers 10 (10%) 90 Smokers 26 (26%) 74 • It is visible from the table that the proportion of HTN was higher among smokers . The question that arises is whether HTN was really higher among smokers or the difference was merely due to chance. 42
  • 43. Take – home messages: §Look at your data §For continuous data, summarize with mean (for central tendency) and SD (for dispersion) only for normal bell – shaped distributions (otherwise, use median and percentiles) §Interpret mean with confidence interval while inferring to population §For categorical data, use proportions. 43
  • 44. 44