SlideShare a Scribd company logo
Murtaza KamalMurtaza Kamal
Dec 22, 2018Dec 22, 2018
murtaza.vmmc@gmail.commurtaza.vmmc@gmail.com
1

 Definition of biostatistics
 Data and its types: Qualitative/ Quantitative
 Variable and its types
 Mean/ median/ mode
 Normal curve
 Sn/ Sp/ PV
 Sample and its types and calculation of sample size
Points to be covered…
2
Science dealing with methods of data
collection, compilation, tabulation and analysis
to provide meaningful and valid interpretation
3
Bio-statisticsBio-statistics
Application of statistical methods in field of
biology, public health and medicine
4
 Quality of clinical & health planning decision
depends on Quality of information on which
they are based
• Medicine: A science with chance playing very
significant role
• Statistics help to quantify contribution of chance and
helps individual clinician make valid diagnostic,
prognostic or therapeutic decisions
• Helps programme managers+ policy planners to plan,
monitor+ evaluate public health initiatives
Remember this…
5
Datum: Latin "fact"
Collection of processed information
Sources of Data:
Primary Data : Collected and recorded by investigator/s
themselves by observation, interviews or measuring
instruments usually systematically and for defined
purposes
Secondary Data : Collected by somebody else or for
other purposes e.g. information derived from hospital
Data
6
• Choice of statistical tests to be used depend on kind of variable
studied
• An attribute, quality, characteristic or property of persons or
things being studied that can be quantitatively measured or
enumerated
• Varies from person to person or from time to time in same
person
• Ex: Height, weight, age , gender, blood pressure, pulse rate,
smoking status
Variable
7
 A variable that is manipulated or applied by
investigator or explains outcome
 Eg: Maternal age, age at marriage, spacing between
successive pregnancies, pre-pregnancy weight,
weight gain during pregnancy
Independent (stimulus/ explanatory)
variable
8
• Resulting response or behaviour that is observed
when exposed to independent variable e. g.
Independent Dependent
Maternal age Birth weight
Birth spacing Birth weight
PIH Perinatal mortality
Dependent (Outcome/ response)
variable
9
10
Nominal: This variable has mutually exclusive
categories and unordered
E.g. Blood group: A, B, AB, O,
Marital status: Unmarried, Married, Divorced,
Widowed
Ordinal: This variable has mutually exclusive
categories and ordered
E.g. Disease severity: Mild, Moderate, Severe,
Qualitative data
11
Quantitative data
Discrete: Often represents counts
e.g. Number of children
Number of times admitted to hospital in the
last 5 years
Continuous: Can take any value within a range of
values
e.g. Height in cm
Weight in kg
Distance from home to work in km
12
Importance of data type
Type of data: Critically important in determining which
methods of analysis will be appropriate and valid
13
 Descriptive statistics
 Describes basic features of data in a study
 Provide summaries about sample
 Inferential statistics
 Investigate questions, models, and hypotheses
 Infer population characteristics based on
sample
 Make judgments about what we observe
Types of Statistics
14
Descriptive Statistics
Univariate analysis (one variable at a time)
•Qualitative Data:
• Proportions or percentages
•Quantitative Data:
• Central tendencies
(Mean, Median, Mode)
• Measures of dispersion
(Range, Standard deviation, Coefficient of
variation, Percentiles, interqurtile range) 15
Two approaches
1. Estimating Parameters: Process of using sample
information to draw conclusions about the value of a
population parameter
e.g., Proportion, mean, SD, Correlation
2. Testing Hypothesis
Inferential Statistics
16
1. Point Estimates : Proportion, mean, SD, correlation
2. Interval Estimates:
 Confidence Intervals: Define an upper limit & lower limit
with an associated probability
Ends of confidence intervals Confidence limits
 95% confidence interval – 95% probability of containing
the population mean
 99% confidence interval – 99% probability of containing
the population mean
Wider/ greater range of values must be included for greater
confidence
Estimating Parameters
17
 To permit generalizations from a sample to the population
from which it came
 Steps in Hypothesis Testing
1. State the research question in terms of Statistical hypothesis
2. Decide on appropriate test statistic
3. Select level of significance
4. Determine value the test statistic must attain to be declared
significant
5. Perform calculations
6. Draw and State conclusions
Hypothesis Testing
18
Null hypothesis, Ho  Statement that no difference
or relationship
If related then Ho is rejected
If unrelated then Ho is retained (not accepted!!)
Alternative hypothesis, Ha  Disagrees with Ho
Step 1: State the research question in terms of
Statistical hypothesis
19
 Statistics whose primary use is in testing
hypotheses are called test statistics
 Parametric & non-parametric tests
Step 2: Decide on appropriate Test Statistic
20
• Some assumptions are to be met before a particular
test of significance can be applied to a set of data
• Sample measurements drawn from normally
distributed population of measurements in a
random manner
• Parametric tests are Student 't' test (paired and
unpaired), F test for analysis of variance,
correlation and regression analyses
Parametric Statistics
21
 Many naturally occurring events follow
a pattern with:
 Many observations clustered around the
mean
 Few observations with values away from
the mean
 This bell-shaped curve was named
Normal distribution by a mathematician
Gauss
Normal distribution 22
 Normal distribution
 The symmetrical clustering of values
around a central location
 Normal curve
 The bell-shaped curve that results when a
normal distribution is graphed
Normal distribution 23
Normal Distribution
 Developed by Karl F. Gauss (1777-1855) ‘Gaussian
distribution’
 Called ‘normal’ because many continuous variables in biology
and other sciences follow this particular distribution
24
The Normal
Distribution
Properties
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x
f(x)1- Symmetric about mean
25
The Normal
Distribution
Properties
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
-3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3
x
f(x)2- Mean, median and mode coincides
Mean = Median = Mode
26
27
The Normal
Distribution
Properties
3- Area
property
Skewed (or Asymmetric) Data
When the left and right-side sides of a frequency distribution do not
approximate mirror images, the data are said to be skewed or asymmetrical
Curve A Curve B
negative skew
positive skew
Mean>Median>Mode Mean<Median<Mode
28
• Suitable alternative particularly when data is in
form of ranks or counts
• Chi- squared test: MC employed nonparametric test
• Wilcoxan Rank Sum, Mann Whitney U or median
test , Kruskal-Wallis 1-way and Friedman 2-way
analysis of variance
Non-Parametric Statistics or
Distribution Free Methods
29
 p value: Related to hypothesis test
 Probability that the observed result is due to chance alone
 Calculated after test has been performed
 Small p-value (typically ≤ 0.05) indicates strong evidence
against null hypothesis, so you reject null hypothesis
 P = 5% is not a rule written on stone
 More generous (p=0.1)
 More strict (p =0.01)
Step 3: Select the level of significance for the statistical
test
30
Example
Study of the effects of anticonvulsant therapy on serum calcium
concentration in the elderly.
• group of treated patients
• group of untreated patients
Outcome variable: serum calcium concentration
Independent variable: Anticonvulsant therapy
Null Hypothesis:
Both the groups (treated and untreated) have same mean serum calcium
conc.
Test of Significance :
. t test
31
Direction of inquiry
Onset of study Time
Exposed
Unexposed
Cases
Controls
Exposed
Unexposed
Case-control study (retrospective)
32
Cases
Controls
Total
Ate Raw
Yes
17 (a)
7 (c)
24
Hamburger
No
20 (b)
26 (d)
46
Total
37
33
70
Cross Product Ratio
2.3
)7)(20(
)26)(17(^
==
×
×
=
cb
da
OR
Controls
Cases
ddsO
ddsO
^
^
=
Case-control study: Outbreak of
Diarrheal Disease at a Resort Club
33

 Odds ratio is a ratio of two odds
 Relative risk is a ratio of two probabilities
ODDS RATIO & RR
34
35
THREE KEY MEASURES OF
VALIDITY
1. SENSITIVITY
2. SPECIFICITY
3. PREDICTIVE VALUE
36
True Disease Status
Screening/
Diagnostic
Test
Positive Negative Total
Positive True Positives
(TP)
False Positives
(FP)
TP+FP
Negative False Negatives
(FN)
True Negatives
(TN)
FN+TN
Total TP+FN FP+TN TP+FP+FN+TN
Outcomes of a Screening/ Diagnostic
Test
37
38
What is used as a “gold standard”
1. Most definitive diagnostic procedure
e.g. microscopic examination of a tissue
specimen
2. Best available laboratory test
e.g. polymerase chain reaction (PCR)
for HIV virus
3. Comprehensive clinical evaluation
e.g. clinical assessment of arthritis
39
True
positive
True
negative
False
positive
False
negative
Sensitivity =
True positives
All cases
a + c b + d
=
a
a + c
Specificity =
True negatives
All non-cases
=
d
b + d
a + b
c + d
True Disease Status
Cases Non-cases
Positive
Negative
Screening
Test
Results
a
d
b
c
40
True Disease Status
Cases Non-cases
Positive
Negative
Screening
Test
Results
a
d
1,000
b
c
60
Sensitivity =
True positives
All cases
200 20,000
=
140
200
Specificity = True negatives
All non-cases
=
19,000
20,000
1,14
0
19,060
140
19,000
=
= 70%
95%
41
Interpreting test results:
predictive value
Probability (proportion) of those tested who
are correctly classified
PPV = Cases identified
/all positive tests
NPV = Non-cases identified
/all negative tests

 Positive predictive value: Probability that subjects
with a positive screening test truly have disease
 Negative predictive value: Probability that subjects
with a negative screening test truly don't have
disease
42
43
True
positive
True
negative
False
positive
False
negative
PPV =
True positives
All positives
a + c b + d
=
a
a + b
NPV =
True negatives
All negatives
=
d
c + d
a + b
c + d
True Disease Status
Cases Non-cases
Positive
Negative
Screening
Test
Results
a
d
b
c
44
True Disease
StatusCases Non-cases
Positive
Negative
Screening
Test
Results
a
d
1,000
b
c
60
PPV =
True positives
All positives
200 20,000
=
140
1,140
NPV = True negatives
All negatives
=
19,000
19,060
1,14
0
19,060
140
19,000
=
= 12.3%
99.7%
45
Positive predictive value,
Sensitivity, specificity, and prevalence
Prevalence (%) PV+ (%) Se (%) Sp (%)
0.1 1.4 70 95
1.0 12.3 70 95
5.0 42.4 70 95
50.0 93.3 70 95
46
 Study population are large: Not possible to meet all population
members (Not feasible practically and costly)
 Ideally: Study everyone so that we can generalize a finding to
study population
 Cannot study everyone due to limited time and resources
 If we cannot study everyone then we sample the population,
but in manner so that we can generalize the findings
Why Sampling?
47
Sampling
Process
Defining the
population
Developing
a sampling
Frame
Determining
Sample
Size
Specifying
Sample
Method
SELECTING THE SAMPLE
48
TYPES OF SAMPLES
49
50
Sample Size Determination
• To carry out any scientific study, MC asked
question: What should be minimum sample size?
• If sample is too small  Fails to detect true
difference
•An exceedingly large sample size
•Wastage of time and money
•Will report tiniest relation/difference as significant
• Sample size should be calculated at planning
stage
Neither too small nor too large
51
Need to know following:
 Estimated prevalence/ SD
 Confidence interval (95% CI)
 Power: Ability to find significance when two
groups are really different (80%)
 Allowable error or precision (5-10%)
Sample size calculation
52
Single Group Studies
53
Sample Size for Qualitative
outcome variable
 P = Estimated prevalence
(percentage)
 Q =1-P
 L = Allowable Error
2
4
L
PQ
n =
54
Definition
 P = Estimated prevalence (percentage)
 From pilot study, published papers,
experience
 Q =1-P
 L = Allowable Error
 L and Q and P are in same unit
55
L; Allowable Error
 Suppose, the survey wants to
estimate the true prevalence of a
disease in population
 The estimate we get from the
survey will be within +/- L% of
the true prevalence
- L +L
56
Example
 A survey is to estimate prevalence of
influenza virus infection in school kids
 Suppose the available evidence
suggests that approximately 20%
(P=20) of the children will have
antibodies to the virus
 Assume the investigator wants to
estimate the prevalence within 6% of
the true value (6% is called allowable
error; L) 57
Example
 The required sample size is
 n = (4 x 20 x 80) / (6 x 6) =
177.78
 Thus approximately 180 kids
would be needed for the survey
2
4
L
PQ
n =
Note: population size not involved in the formula
58
Sample Size for Estimation of the
Mean (Quantitative outcome
variable)
 S = Standard Deviation of the
parameter
 L = Allowable Error
 S and L are in the same unit
 The average we find in the survey
will be within +/- L of the true
2
24
L
S
n =
- L +L
59
Example
 Suppose an investigator has some
evidence suggesting that the
standard deviation of rat weight
is about 455 g
 He wishes to provide an estimate
within 80 g of the true average
(80 g is the allowable error; L)
60
Example
 The required sample size is
n = 4 x (455)2
/ (80)2
= 129.39
 Thus approximately 130 rats would be needed.
2
24
L
S
n =
61

THANKS…
62

More Related Content

PPTX
Biostatics
PPTX
Basics of biostatistic
PPTX
Introduction of biostatistics
PPTX
biostatistics
PDF
1. Introduction to biostatistics
PPTX
PPTX
Introduction to biostatistics
PDF
Introduction to biostatistics
Biostatics
Basics of biostatistic
Introduction of biostatistics
biostatistics
1. Introduction to biostatistics
Introduction to biostatistics
Introduction to biostatistics

What's hot (20)

PPT
1.introduction
PPT
Biostatistics lec 1
PPTX
Biostatistics
PPT
role of Biostatistics (new)
PPTX
INTRODUCTION TO BIO STATISTICS
PPTX
Introduction to statistics in health care
PPSX
Inferential statistics.ppt
PPTX
Test of significance
PPTX
How to determine sample size
PPT
Epidemiology Study Design
PPTX
Biostatistics ppt
PPT
Parametric and non parametric test
PPT
Sample size
PPTX
PPTX
Odds ratios (Basic concepts)
PPTX
t-test vs ANOVA
PPTX
Odds ratio
PPTX
Statistical tests of significance and Student`s T-Test
PPTX
1.2 types of data
PPT
biostatstics :Type and presentation of data
1.introduction
Biostatistics lec 1
Biostatistics
role of Biostatistics (new)
INTRODUCTION TO BIO STATISTICS
Introduction to statistics in health care
Inferential statistics.ppt
Test of significance
How to determine sample size
Epidemiology Study Design
Biostatistics ppt
Parametric and non parametric test
Sample size
Odds ratios (Basic concepts)
t-test vs ANOVA
Odds ratio
Statistical tests of significance and Student`s T-Test
1.2 types of data
biostatstics :Type and presentation of data
Ad

Similar to BIOSTATISTICS (20)

PPTX
Application of statistical tests in Biomedical Research .pptx
PPTX
TEST OF SIGNIFICANCE.pptx
PPT
Soni_Biostatistics.ppt
PDF
Biostatistics clinical research & trials
PPTX
Research and methodology 2
PPTX
Research Designs
PPTX
Basic of Biostatistics and epidemology_1.pptx
PDF
desc-and-analytic-studies_ppt_final_09252013(1).pdf
PPTX
bio 1 & 2.pptx
PPT
Statistics Introduction In Pharmacy
PPTX
Quantitative Methods.pptx
PPTX
Presentation 1 data types and distributions1.pptx
PPT
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
PPTX
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
PPTX
Epidemological methods
DOCX
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
PDF
Lemeshow samplesize
PPTX
NON-PARAMETRIC TESTS by Prajakta Sawant
PPT
statistics in pharmaceutical sciences
PDF
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Application of statistical tests in Biomedical Research .pptx
TEST OF SIGNIFICANCE.pptx
Soni_Biostatistics.ppt
Biostatistics clinical research & trials
Research and methodology 2
Research Designs
Basic of Biostatistics and epidemology_1.pptx
desc-and-analytic-studies_ppt_final_09252013(1).pdf
bio 1 & 2.pptx
Statistics Introduction In Pharmacy
Quantitative Methods.pptx
Presentation 1 data types and distributions1.pptx
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
Dr. RM Pandey -Importance of Biostatistics in Biomedical Research.pptx
Epidemological methods
PUH 5302, Applied Biostatistics 1 Course Learning Outcomes.docx
Lemeshow samplesize
NON-PARAMETRIC TESTS by Prajakta Sawant
statistics in pharmaceutical sciences
Effective strategies to monitor clinical risks using biostatistics - Pubrica.pdf
Ad

More from Dr. Murtaza Kamal MRCPCH,MD,DNB,DrNB Ped Cardiology (20)

PPTX
PEDIATRIC SUDDEN CARDIAC DEATH, SYNCOPE, INHERITABLE ARRHYTHMIAS
PPTX
SYNCOPE, SUDDEN CARDIAC DEATH AND INHERITED ARRHYTHMIAS
PPTX
PEDIATRIC CARDIOLOGY CASE SCENARIOS
PPTX
PERCUTANEOUS DEVICE CLOSURE OF AORTO- PULMONARY WINDOW (RESIDUAL)
PPTX
LONG TERM OUTCOMES OF POST OPERATIVE CHILD WITH CONGENITAL HEART DISEASES
PPTX
WHEN TO REFER TO A PEDIATRIC CARDIOLOGIST
PPTX
PEDIATRIC ECHOCARDIOGRAPHY: APICAL AND PARASTERNAL VIEWS
PPTX
WHEN TO REFER A CHILD TO A PEDIATRIC CARDIOLOGIST FOR INTERVENTION
PPTX
PEDAITRIC OBESITY AND HYPERLIPEDEMIA
PPTX
Micronutrient deficiency In Children
PPTX
DYSBIOSIS IN CHILDREN BORN BY CAESAREAN SECTION
PPTX
PEDIATRIC CARDIAC SERVICES IN INDIA: WHERE DO WE ACTUALLY STAND?
PPTX
CONGENITAL HEART DISEASES: A SIMPLIFIED APPROACH
PPTX
Examination of Cardio Vascular System (CVS): Pediatrics+ APPROACH TO A CHILD ...
PPTX
TACHYPNIC NEOANTE: IS IS A CHD: APPROACH TO A CHILD WITH CONGENITAL HEART DIS...
PPTX
Cath meet 25020202 (TGA, VSD, PS FOR PA PRESSURES)
PPTX
Cath meet 03022020 (VSD PAH FOR REVERSIBILITY, PVR)
PEDIATRIC SUDDEN CARDIAC DEATH, SYNCOPE, INHERITABLE ARRHYTHMIAS
SYNCOPE, SUDDEN CARDIAC DEATH AND INHERITED ARRHYTHMIAS
PEDIATRIC CARDIOLOGY CASE SCENARIOS
PERCUTANEOUS DEVICE CLOSURE OF AORTO- PULMONARY WINDOW (RESIDUAL)
LONG TERM OUTCOMES OF POST OPERATIVE CHILD WITH CONGENITAL HEART DISEASES
WHEN TO REFER TO A PEDIATRIC CARDIOLOGIST
PEDIATRIC ECHOCARDIOGRAPHY: APICAL AND PARASTERNAL VIEWS
WHEN TO REFER A CHILD TO A PEDIATRIC CARDIOLOGIST FOR INTERVENTION
PEDAITRIC OBESITY AND HYPERLIPEDEMIA
Micronutrient deficiency In Children
DYSBIOSIS IN CHILDREN BORN BY CAESAREAN SECTION
PEDIATRIC CARDIAC SERVICES IN INDIA: WHERE DO WE ACTUALLY STAND?
CONGENITAL HEART DISEASES: A SIMPLIFIED APPROACH
Examination of Cardio Vascular System (CVS): Pediatrics+ APPROACH TO A CHILD ...
TACHYPNIC NEOANTE: IS IS A CHD: APPROACH TO A CHILD WITH CONGENITAL HEART DIS...
Cath meet 25020202 (TGA, VSD, PS FOR PA PRESSURES)
Cath meet 03022020 (VSD PAH FOR REVERSIBILITY, PVR)

Recently uploaded (20)

PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
PPTX
Respiratory drugs, drugs acting on the respi system
PPTX
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
PDF
Medical Evidence in the Criminal Justice Delivery System in.pdf
PPT
ASRH Presentation for students and teachers 2770633.ppt
PDF
CT Anatomy for Radiotherapy.pdf eryuioooop
PPT
Breast Cancer management for medicsl student.ppt
PPTX
Important Obstetric Emergency that must be recognised
PDF
Khadir.pdf Acacia catechu drug Ayurvedic medicine
PPT
Obstructive sleep apnea in orthodontics treatment
PPTX
Fundamentals of human energy transfer .pptx
PPT
Management of Acute Kidney Injury at LAUTECH
PPT
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
PPTX
post stroke aphasia rehabilitation physician
PPTX
1 General Principles of Radiotherapy.pptx
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
PPTX
neonatal infection(7392992y282939y5.pptx
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
POLYCYSTIC OVARIAN SYNDROME.pptx by Dr( med) Charles Amoateng
Respiratory drugs, drugs acting on the respi system
15.MENINGITIS AND ENCEPHALITIS-elias.pptx
Medical Evidence in the Criminal Justice Delivery System in.pdf
ASRH Presentation for students and teachers 2770633.ppt
CT Anatomy for Radiotherapy.pdf eryuioooop
Breast Cancer management for medicsl student.ppt
Important Obstetric Emergency that must be recognised
Khadir.pdf Acacia catechu drug Ayurvedic medicine
Obstructive sleep apnea in orthodontics treatment
Fundamentals of human energy transfer .pptx
Management of Acute Kidney Injury at LAUTECH
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
post stroke aphasia rehabilitation physician
1 General Principles of Radiotherapy.pptx
MENTAL HEALTH - NOTES.ppt for nursing students
SKIN Anatomy and physiology and associated diseases
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
neonatal infection(7392992y282939y5.pptx

BIOSTATISTICS

  • 1. Murtaza KamalMurtaza Kamal Dec 22, 2018Dec 22, 2018 murtaza.vmmc@gmail.commurtaza.vmmc@gmail.com 1
  • 2.   Definition of biostatistics  Data and its types: Qualitative/ Quantitative  Variable and its types  Mean/ median/ mode  Normal curve  Sn/ Sp/ PV  Sample and its types and calculation of sample size Points to be covered… 2
  • 3. Science dealing with methods of data collection, compilation, tabulation and analysis to provide meaningful and valid interpretation 3
  • 4. Bio-statisticsBio-statistics Application of statistical methods in field of biology, public health and medicine 4
  • 5.  Quality of clinical & health planning decision depends on Quality of information on which they are based • Medicine: A science with chance playing very significant role • Statistics help to quantify contribution of chance and helps individual clinician make valid diagnostic, prognostic or therapeutic decisions • Helps programme managers+ policy planners to plan, monitor+ evaluate public health initiatives Remember this… 5
  • 6. Datum: Latin "fact" Collection of processed information Sources of Data: Primary Data : Collected and recorded by investigator/s themselves by observation, interviews or measuring instruments usually systematically and for defined purposes Secondary Data : Collected by somebody else or for other purposes e.g. information derived from hospital Data 6
  • 7. • Choice of statistical tests to be used depend on kind of variable studied • An attribute, quality, characteristic or property of persons or things being studied that can be quantitatively measured or enumerated • Varies from person to person or from time to time in same person • Ex: Height, weight, age , gender, blood pressure, pulse rate, smoking status Variable 7
  • 8.  A variable that is manipulated or applied by investigator or explains outcome  Eg: Maternal age, age at marriage, spacing between successive pregnancies, pre-pregnancy weight, weight gain during pregnancy Independent (stimulus/ explanatory) variable 8
  • 9. • Resulting response or behaviour that is observed when exposed to independent variable e. g. Independent Dependent Maternal age Birth weight Birth spacing Birth weight PIH Perinatal mortality Dependent (Outcome/ response) variable 9
  • 10. 10
  • 11. Nominal: This variable has mutually exclusive categories and unordered E.g. Blood group: A, B, AB, O, Marital status: Unmarried, Married, Divorced, Widowed Ordinal: This variable has mutually exclusive categories and ordered E.g. Disease severity: Mild, Moderate, Severe, Qualitative data 11
  • 12. Quantitative data Discrete: Often represents counts e.g. Number of children Number of times admitted to hospital in the last 5 years Continuous: Can take any value within a range of values e.g. Height in cm Weight in kg Distance from home to work in km 12
  • 13. Importance of data type Type of data: Critically important in determining which methods of analysis will be appropriate and valid 13
  • 14.  Descriptive statistics  Describes basic features of data in a study  Provide summaries about sample  Inferential statistics  Investigate questions, models, and hypotheses  Infer population characteristics based on sample  Make judgments about what we observe Types of Statistics 14
  • 15. Descriptive Statistics Univariate analysis (one variable at a time) •Qualitative Data: • Proportions or percentages •Quantitative Data: • Central tendencies (Mean, Median, Mode) • Measures of dispersion (Range, Standard deviation, Coefficient of variation, Percentiles, interqurtile range) 15
  • 16. Two approaches 1. Estimating Parameters: Process of using sample information to draw conclusions about the value of a population parameter e.g., Proportion, mean, SD, Correlation 2. Testing Hypothesis Inferential Statistics 16
  • 17. 1. Point Estimates : Proportion, mean, SD, correlation 2. Interval Estimates:  Confidence Intervals: Define an upper limit & lower limit with an associated probability Ends of confidence intervals Confidence limits  95% confidence interval – 95% probability of containing the population mean  99% confidence interval – 99% probability of containing the population mean Wider/ greater range of values must be included for greater confidence Estimating Parameters 17
  • 18.  To permit generalizations from a sample to the population from which it came  Steps in Hypothesis Testing 1. State the research question in terms of Statistical hypothesis 2. Decide on appropriate test statistic 3. Select level of significance 4. Determine value the test statistic must attain to be declared significant 5. Perform calculations 6. Draw and State conclusions Hypothesis Testing 18
  • 19. Null hypothesis, Ho  Statement that no difference or relationship If related then Ho is rejected If unrelated then Ho is retained (not accepted!!) Alternative hypothesis, Ha  Disagrees with Ho Step 1: State the research question in terms of Statistical hypothesis 19
  • 20.  Statistics whose primary use is in testing hypotheses are called test statistics  Parametric & non-parametric tests Step 2: Decide on appropriate Test Statistic 20
  • 21. • Some assumptions are to be met before a particular test of significance can be applied to a set of data • Sample measurements drawn from normally distributed population of measurements in a random manner • Parametric tests are Student 't' test (paired and unpaired), F test for analysis of variance, correlation and regression analyses Parametric Statistics 21
  • 22.  Many naturally occurring events follow a pattern with:  Many observations clustered around the mean  Few observations with values away from the mean  This bell-shaped curve was named Normal distribution by a mathematician Gauss Normal distribution 22
  • 23.  Normal distribution  The symmetrical clustering of values around a central location  Normal curve  The bell-shaped curve that results when a normal distribution is graphed Normal distribution 23
  • 24. Normal Distribution  Developed by Karl F. Gauss (1777-1855) ‘Gaussian distribution’  Called ‘normal’ because many continuous variables in biology and other sciences follow this particular distribution 24
  • 25. The Normal Distribution Properties 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x f(x)1- Symmetric about mean 25
  • 26. The Normal Distribution Properties 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 x f(x)2- Mean, median and mode coincides Mean = Median = Mode 26
  • 28. Skewed (or Asymmetric) Data When the left and right-side sides of a frequency distribution do not approximate mirror images, the data are said to be skewed or asymmetrical Curve A Curve B negative skew positive skew Mean>Median>Mode Mean<Median<Mode 28
  • 29. • Suitable alternative particularly when data is in form of ranks or counts • Chi- squared test: MC employed nonparametric test • Wilcoxan Rank Sum, Mann Whitney U or median test , Kruskal-Wallis 1-way and Friedman 2-way analysis of variance Non-Parametric Statistics or Distribution Free Methods 29
  • 30.  p value: Related to hypothesis test  Probability that the observed result is due to chance alone  Calculated after test has been performed  Small p-value (typically ≤ 0.05) indicates strong evidence against null hypothesis, so you reject null hypothesis  P = 5% is not a rule written on stone  More generous (p=0.1)  More strict (p =0.01) Step 3: Select the level of significance for the statistical test 30
  • 31. Example Study of the effects of anticonvulsant therapy on serum calcium concentration in the elderly. • group of treated patients • group of untreated patients Outcome variable: serum calcium concentration Independent variable: Anticonvulsant therapy Null Hypothesis: Both the groups (treated and untreated) have same mean serum calcium conc. Test of Significance : . t test 31
  • 32. Direction of inquiry Onset of study Time Exposed Unexposed Cases Controls Exposed Unexposed Case-control study (retrospective) 32
  • 33. Cases Controls Total Ate Raw Yes 17 (a) 7 (c) 24 Hamburger No 20 (b) 26 (d) 46 Total 37 33 70 Cross Product Ratio 2.3 )7)(20( )26)(17(^ == × × = cb da OR Controls Cases ddsO ddsO ^ ^ = Case-control study: Outbreak of Diarrheal Disease at a Resort Club 33
  • 34.   Odds ratio is a ratio of two odds  Relative risk is a ratio of two probabilities ODDS RATIO & RR 34
  • 35. 35
  • 36. THREE KEY MEASURES OF VALIDITY 1. SENSITIVITY 2. SPECIFICITY 3. PREDICTIVE VALUE 36
  • 37. True Disease Status Screening/ Diagnostic Test Positive Negative Total Positive True Positives (TP) False Positives (FP) TP+FP Negative False Negatives (FN) True Negatives (TN) FN+TN Total TP+FN FP+TN TP+FP+FN+TN Outcomes of a Screening/ Diagnostic Test 37
  • 38. 38 What is used as a “gold standard” 1. Most definitive diagnostic procedure e.g. microscopic examination of a tissue specimen 2. Best available laboratory test e.g. polymerase chain reaction (PCR) for HIV virus 3. Comprehensive clinical evaluation e.g. clinical assessment of arthritis
  • 39. 39 True positive True negative False positive False negative Sensitivity = True positives All cases a + c b + d = a a + c Specificity = True negatives All non-cases = d b + d a + b c + d True Disease Status Cases Non-cases Positive Negative Screening Test Results a d b c
  • 40. 40 True Disease Status Cases Non-cases Positive Negative Screening Test Results a d 1,000 b c 60 Sensitivity = True positives All cases 200 20,000 = 140 200 Specificity = True negatives All non-cases = 19,000 20,000 1,14 0 19,060 140 19,000 = = 70% 95%
  • 41. 41 Interpreting test results: predictive value Probability (proportion) of those tested who are correctly classified PPV = Cases identified /all positive tests NPV = Non-cases identified /all negative tests
  • 42.   Positive predictive value: Probability that subjects with a positive screening test truly have disease  Negative predictive value: Probability that subjects with a negative screening test truly don't have disease 42
  • 43. 43 True positive True negative False positive False negative PPV = True positives All positives a + c b + d = a a + b NPV = True negatives All negatives = d c + d a + b c + d True Disease Status Cases Non-cases Positive Negative Screening Test Results a d b c
  • 44. 44 True Disease StatusCases Non-cases Positive Negative Screening Test Results a d 1,000 b c 60 PPV = True positives All positives 200 20,000 = 140 1,140 NPV = True negatives All negatives = 19,000 19,060 1,14 0 19,060 140 19,000 = = 12.3% 99.7%
  • 45. 45 Positive predictive value, Sensitivity, specificity, and prevalence Prevalence (%) PV+ (%) Se (%) Sp (%) 0.1 1.4 70 95 1.0 12.3 70 95 5.0 42.4 70 95 50.0 93.3 70 95
  • 46. 46
  • 47.  Study population are large: Not possible to meet all population members (Not feasible practically and costly)  Ideally: Study everyone so that we can generalize a finding to study population  Cannot study everyone due to limited time and resources  If we cannot study everyone then we sample the population, but in manner so that we can generalize the findings Why Sampling? 47
  • 50. 50
  • 51. Sample Size Determination • To carry out any scientific study, MC asked question: What should be minimum sample size? • If sample is too small  Fails to detect true difference •An exceedingly large sample size •Wastage of time and money •Will report tiniest relation/difference as significant • Sample size should be calculated at planning stage Neither too small nor too large 51
  • 52. Need to know following:  Estimated prevalence/ SD  Confidence interval (95% CI)  Power: Ability to find significance when two groups are really different (80%)  Allowable error or precision (5-10%) Sample size calculation 52
  • 54. Sample Size for Qualitative outcome variable  P = Estimated prevalence (percentage)  Q =1-P  L = Allowable Error 2 4 L PQ n = 54
  • 55. Definition  P = Estimated prevalence (percentage)  From pilot study, published papers, experience  Q =1-P  L = Allowable Error  L and Q and P are in same unit 55
  • 56. L; Allowable Error  Suppose, the survey wants to estimate the true prevalence of a disease in population  The estimate we get from the survey will be within +/- L% of the true prevalence - L +L 56
  • 57. Example  A survey is to estimate prevalence of influenza virus infection in school kids  Suppose the available evidence suggests that approximately 20% (P=20) of the children will have antibodies to the virus  Assume the investigator wants to estimate the prevalence within 6% of the true value (6% is called allowable error; L) 57
  • 58. Example  The required sample size is  n = (4 x 20 x 80) / (6 x 6) = 177.78  Thus approximately 180 kids would be needed for the survey 2 4 L PQ n = Note: population size not involved in the formula 58
  • 59. Sample Size for Estimation of the Mean (Quantitative outcome variable)  S = Standard Deviation of the parameter  L = Allowable Error  S and L are in the same unit  The average we find in the survey will be within +/- L of the true 2 24 L S n = - L +L 59
  • 60. Example  Suppose an investigator has some evidence suggesting that the standard deviation of rat weight is about 455 g  He wishes to provide an estimate within 80 g of the true average (80 g is the allowable error; L) 60
  • 61. Example  The required sample size is n = 4 x (455)2 / (80)2 = 129.39  Thus approximately 130 rats would be needed. 2 24 L S n = 61

Editor's Notes

  • #15: Descriptive Statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple graphics analysis, they form the basis of virtually every quantitative analysis of data. With descriptive statistics you are simply describing what is, what the data shows. Inferential Statistics investigate questions, models and hypotheses. In many cases, the conclusions from inferential statistics extend beyond the immediate data alone. For instance, we use inferential statistics to try to infer from the sample data what the population thinks. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study. Thus, we use inferential statistics to make inferences from our data to more general conditions; we use descriptive statistics simply to describe what&amp;apos;s going on in our data.
  • #17: Researchers focus on probabilities (often called p values) that fall lower end of continum. The reason for this is partly intuitive and partly historic.
  • #18: Researchers focus on probabilities (often called p values) that fall lower end of continum. The reason for this is partly intuitive and partly historic.
  • #19: Given a underlying theatrical structure, a representative sample, an appropriate research design, researcher. can test hypothesis. We test to see whether data support the hypothesis. Alterative hypothesis is called research hypothesis
  • #20: Given a underlying theatrical structure, a representative sample, an appropriate research design, researcher. can test hypothesis. We test to see whether data support the hypothesis. Alterative hypothesis is called research hypothesis
  • #21: Given a underlying theatrical structure, a representative sample, an appropriate research design, researcher. can test hypothesis. We test to see whether data support the hypothesis. Alterative hypothesis is called research hypothesis
  • #39: As noted, calculation of sensitivity and specificity, and therefore calculation of predictive value, requires a way to determine authoritatively who has and does not have the condition of interest. This “gold standard” is typically the most definitive diagnostic procedure (for example, the definitive diagnosis of cancer is generally based on microscopic examination of a tissue specimen), the best available laboratory test (for example, a polymerase chain reaction (PCR) test for the actual virus, as opposed to a test for antibody to the virus), or a comprehensive clinical evaluation, where there is no definitive laboratory test. For example, the best diagnosis for arthritis might be obtained through an examination.
  • #40: Data for estimating sensitivity and specificity are typically displayed in a 2 x 2 table that classifies people according to their disease status and test results. The above table has the True disease status along one dimension, with a column for cases and a column for non-cases, and the Test results on the other dimension, with a row for people who tested positive and a row for people who tested negative. In the top left-hand corner – the “a” cell – are the people who have the disease and whose test came up positive. They are “true positives”, cases who were correctly classified. In the lower right-hand corner – the “d” cell – are the people who do not have the disease and whose test came up negative. They are “true negatives”, non-cases who were correctly classified. The other two cells, b and c, contain people who were misclassified. Non-cases who nevertheless received a positive test are often called “false positives”, and cases who received a negative test are often called “false negatives”, but these terms are not always employed with these meanings. If cell “c” is in the lower left-hand corner of the table, then the left-hand column – the cases – has a total of (a + c) people, and we we can write the formula for sensitivity as a / (a+c): the number of cases correctly classified divided by the total number of cases. Similarly, the formula for specificity is d / (b+d): the number of correctly classified non-cases divided by the total number of non-cases.
  • #41: If a population has a total of 200 cases, and the test correctly identifies 140 of them as cases, then a = 140, a+c = 200, and the sensitivity is: a / (a+c) = 140 / 200 = 70% If there are 20,000 people without the disease, and the test correctly classifies 19,000 of them as non-cases, then d = 19,000, b+d = 20,000, and the specificity is: d / (b+d) = 19,000 / 20,000 = 95%. As is often the case for a rare disease, even with what seems like a high specificity (95%), the number of false positives can easily exceed the number of true positives. This observation brings us to the concept of predictive value.
  • #42: Sensitivity and specificity tell us what happens to cases and non-cases, respectively. However, appropriate interpretation of the results of a test – both screening tests and diagnostic tests – makes use of another concept that is very important for both the epidemiologic and the clinical perspectives, predictive value. Predictive value is also a probability of correct classification, but here the starting point, the denominator for the probability, is the way people have been classified by the test. There are two types of predictive value – predictive value of a positive test and predictive value of a negative test. Predictive value tells us the probability that the test was correct. This is obviously a key question for the clinician (and the patient), since we generally do not know whether someone is a case or not, but we do know whether the person tests positive or negative. In clinical epidemiology, the prevalence of a disease is referred to as the “prior probability” or “pretest probability”, since it is our initial estimate of the probability that the condition is present. Predictive values are referred to as posterior or posttest probabilities, since they provide estimates of probability that take into account the result of the screening or diagnostic test. The relation of the posttest and pretest probabilities indicates the informativeness of the test.
  • #44: The table for examining predictive value is the same as that for sensitivity and specificity. Instead of using the total numbers of cases and non-cases, though, predictive value involves the total number of people with a positive test and the total number with a negative test. Positive predictive value, abbreviated PPV or PV+, is the proportion of all people with positive tests who truly have the condition – a / (a+b) in the above table. Negative predictive value (NPV or NP-) is the proportion of all people with negative tests who truly do not have the condition – d / (c+d) in the above table.
  • #45: Using the same numbers as in our example for calculating sensitivity and specificity, we find that the predictive value of a positive test (PPV) is only 140 / 1,140 = 12.3%. The predictive value of a negative test (NPV) is 19,000 / 19,060 = 99.7%. Although the NPV is very high, that is not such an impressive result in this population, since the prevalence of the condition is only 200 / 20,200, which is not quite 1%. That means that if we select a person at random from the population, there is a 1% probability that the person will be a case (the pretest probability). The probability that a person who tests positive actually is a case is 12.3% (the posterior probability), so the test raises the probability substantially. On the other hand, the probability that a person randomly selected from the population does not actually have the condition is already 99%, so the additional information that a person tested negative cannot shift that estimate significantly. However, the PPV of 12.3% poses a dilemma. Of the 1,140 people who tested positive, the vast majority – 87.7% – are falsely positive. They do not have the disease. Thus, for every person whose disease is detected and who may therefore be helped, 7 people who do not have the disease and will therefore not derive any benefit will undergo a diagnostic workup that may be costly, uncomfortable, and possibly harmful. This tradeoff is the dilemma in population screening for a rare disease.
  • #46: The above table illustrates the relation among positive predictive value (PPV), sensitivity, specificity, and prevalence of the condition. Note that sensitivity and specificity are being regarded as properties of the test, unaffected – in principle – by the rarity of the condition. In contrast, prevalence is a property of the population in which the test being screened, and PPV shows the result of applying a test with given sensitivity and specificity to a population with a given prevalence. For sensitivity held constant at 70% and specificity held constant at 95%, PPV is only 1.4% for a disease with a prevalence of 1 in 1,000, but rises to over 40% when the prevalence is 5%. This table illustrates the difference between using a test for screening and for diagnosis. Using the test in the general population, where the disease is rare (say, less than 1%), will result in a positive predictive value below 15% – the large majority of people who test positive will not have the condition. In contrast, people with symptoms are much more likely to have the condition. If the prevalence among them is above 5%, then the proportion of false positive tests is greatly reduced. The challenge in population screening is to try to target a population at sufficiently high risk that the number of false positives is acceptable and yet a sufficient proportion of the cases are included. A point often not mentioned in introductory presentations is that while sensitivity and specificity are in principle fixed properties of the test, in practice a test is not a fixed entity. Various factors can affect the sensitivity and specificity of a test when it is actually implemented, since there are often human factors involved in interpreting test results, equipment may require frequent calibration, etc.
  • #48: Suggestions to the facilitator Ask participants why and when sampling is required and explain.