SlideShare a Scribd company logo
PROBABILITY
GR Miyambu
School of Science & Technology
Department of Statistical Sciences
Types of Probability
Types of Probability
Frequency
What is the probability of a
randomly chosen person
dying in the next year?
Model-based
What is the probability of a
child being affected by
cystic fibrosis given one of
the parents is a carrier of
the disease?
Subjective
What is the probability that
a particular patient has
heart disease given they
have chest pain?
Properties of Probability
The three types of probability all have the following
properties.
1. All probabilities lie between 0 and 1.
2. When the outcome can never happen the probability is 0.
3. When the outcome will denitely happen the probability
is 1.
Diagnostic Test
• A diagnostic test is any approach used to gather clinical
information for the purpose of making a clinical decision
(i.e., diagnosis).
• Some examples of diagnostic tests include X-rays, biopsies,
pregnancy tests, medical histories, and results from
physical examinations.
• From a statistical point of view there are two points to keep
in mind:
1. the clinical decision-making process is based on
probability;
2. the goal of a diagnostic test is to move the estimated
probability of disease toward either end of the
probability scale (i.e., 0 rules out disease, 1 confirms the
disease).
Uses of Diagnostic Test
• In making a diagnosis, a clinician first establishes a possible set of
diagnostic alternatives and then attempts to reduce these by
progressively ruling out specic diseases or conditions.
• Alternatively, the clinician may have a strong hunch that the patient
has one particular disease and he then sets about conrming it.
• Given a particular diagnosis, a good diagnostic test should indicate
either that the disease is very unlikely or that it is very probable.
• In a practical sense it is important to realize that a diagnostic test is
useful only if the result influences patient management since, if the
management is the same for two different conditions, there is little
point in trying strenuously to distinguish between them.
Analysis of Diagnostic Test
Disease No Disease
Test Positive a (true positives) b (false positives)
Test Negative c (false negatives) d (true negatives)
Gold Standard
• The "Gold Standard" is the method used to obtain a definitive diagnosis for a
particular disease; it may be biopsy, surgery, autopsy or an acknowledged standard.
• Gold Standards are used to define true disease status against which the results of a
new diagnostic test are compared.
• Here are a number of definitive diagnostic tests that will confirm whether or not
you have the disease.
• Some of these are quite invasive and this is a major reason why new diagnostic
procedures are being developed.
Target Disorder Gold Standard
Breast cancer Excisional biopsy
Prostate cancer Transrectal biopsy
Coronar stenosis Coronary angiography
Myocardial infarction Catheterization
Strep throat Throat culture
Sensitivity and specicity
• Many diagnostic test results are given in the form of a
continuous variable (that is one that can take any value
within a given range), such as diastolic blood pressure or
haemoglobin level.
• However, for ease of discussion we will first assume that
these have been divided into positive or negative results.
• For example, a positive diagnostic result of ‘hypertension’ is
a diastolic blood pressure greater than 90 mmHg;
• whereas for ‘anaemia’, a haemoglob
in level less than 10 g/ d
l is
required.
• For every diagnostic procedure (which may involve a
laboratory test of a sample taken) there is a set of
fundamental questions that should be asked.
Sensitivity and specicity
• First, if the d
isease is present, what is the prob
a b
ility that the test
result will be positive?
• This leads to the notion of the sensitivity of the test.
• Secon d
, if the d
isease is a b
sent, what is the pro b
a b
ility that the test
result will be negative?
• This question refers to the specificity of the test.
• These questions can b
e answere donly if it is known what the ‘true’
diagnosis is.
• In the case of organic d
isease this can b
e d
etermine d b
y b
iopsy or, for
example, an expensive an drisky proce d
ure such as angiography for
heart disease.
• In other situations it may b
e b
y ‘expert’ opinion. Such tests provi d
e
the so-called ‘gold standard’.
Example
Diagnosis of heart
disease
• Consider the results of an
assay of N-terminal pro-brain
natriuretic peptide (NT-
proBNP) for diagnosis of heart
failure in a general population
survey in those over 45 years
of age and in patients with
existing diagnosis of heart
failure obtained by Hobbs et
al (2002) and summarised in
the table.
• Heart failure was identified
when NT-proBNP >36 pmol/l.
NT-proBNP
(pmol/l)
Confirmed Diagnosis of
Heart Failure
Present Absent Total
(D+) (D-)
> 36
Positive
(T+) 35 (a) 7 (b) 42
Negative
(T-) 68 (c) 300 (d) 368
Total 103 307 410
Sensitivity and Specicity
• The prevalence of heart failure in these subjects is (a + c)/(a + b + c + d)
• P(D+)=??
• The sensitivity of a test is theproportion of those with thedisease who also have a
positive test result.
• The sensitivity is a/(a + c)=??
• N o
w sensitivity is the pr o
bability o
f a p o
s it iv
e test result (e v
ent T
+) g
iv
en that the d
isease is
present (event D+) and can be written as p(T+|D+)=??, where the ‘|’ is read as ‘given’.
• The specificity of the test is the proportion of those without disease who give a
negative test result.
• Thus the specificity is d/(b + d)=??
• N o
w specicity is the pr o
bability o
f a ne g
ativ
e test result (e v
ent T
-) g
iv
en that the d
isease is absent
(event D-) and can be written as p(T-|D-)=??
• Since sensitivity is conditional on the disease being present, and specificity on the
disease being absent, in theory, they are unaffected by disease prevalence.
Sensitivity and Specicity
• Sensitivity and specificity are useful statistics because they will
yield consistent results for the diagnostic test in a variety of
patient groups with different disease prevalences.
• This is an important point; sensitivity and specificity are
characteristics of the test, not the population to which the test
is applied.
• Although indeed they are independent of disease prevalence, in
practice if the disease is very rare, the accuracy with which one
can estimate the sensitivity will be limited.
• Two other terms in common use are: the false negative rate (or
probability of a false negative) which is given by c/(a + c) =
1 - Sensitivity, and the false positive rate (or probability of
a false positive) or b/(b + d) = 1 - Specicity.
• Since sensitivity = 1 - Probability(false negative) and specificity =
1 - Probability(false positive), a possibly useful mnemonic to
recall this is that ‘sensitivity’ and ‘negative’ have ‘n’s in them
and ‘specificity’ and ‘positive’ have ‘p’s in them.
Summary of Definitions of Sensitivity and Specificity
Test result
True diagnosis
Disease present Disease absent
Positive Sensitivity
Probability of a false
positive
Negative
Probability of a false
negative
Specicity
Rates Assuming a Predicted Condition
• The predictive value refers to the likelihood for determining an outbreak or
non-outbreak of an infectious disease based on early warning results.
• Predictive values can be classified into the positive predictive value (PPV) and
the negative predictive value (PNV).
• Positive predictive value is the proportion of individuals with positive test
results that are correctly diagnosed and actually have the disease.
𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑝𝑝 𝐷𝐷 + 𝑇𝑇 + =
𝑎𝑎
𝑎𝑎 + 𝑏𝑏
• Negative predictive value is the proportion of individuals with negative test
results that are correctly diagnosed and do not have the disease.
𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑝𝑝 𝐷𝐷 − 𝑇𝑇 − =
𝑑𝑑
𝑐𝑐 + 𝑑𝑑
• False Omission Rate is the proportion of the individuals with a negative test
result for which the true condition is positive.
𝐹𝐹𝐹𝐹𝐹𝐹 =
𝑐𝑐
𝑐𝑐 + 𝑑𝑑
• The false discovery rate is the proportion of the individuals with a positive
test result for which the true condition is negative.
𝐹𝐹𝐹𝐹𝐹𝐹 =
𝑏𝑏
𝑎𝑎 + 𝑏𝑏
Whole Table Rates
• P r
evalence is the proportion of a population who have a
specific characteristic in a given time period.
• The prevalence may be estimated from the table if all the
individuals are randomly sampled from the population.
𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑎𝑎 + 𝑐𝑐
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
• The Accu racy o
r P ro
p orti o
n C o
rrectly Cla ss
ifie d reflects
the total proportion of individuals that are correctly
classified.
𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑎𝑎 + 𝑑𝑑
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
• The proportion incorrectly classified reflects the total
proportion of individuals that are incorrectly classified.
𝑃𝑃𝑃𝑃𝑃𝑃 =
𝑏𝑏 + 𝑐𝑐
𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
Likelihood Ratio
• The clear simplicity of diagnostic test data, particularly when presented as a 2 x 2
table, is confounded by many ways of reporting the results.
• The likelihood ratio (LR) is a simple measure combining sensitivity and specificity
• We have positive likelihood ratio (LR+) defined as
𝐿𝐿𝐿𝐿 +=
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
=
𝑇𝑇𝑇𝑇𝑇𝑇
𝐹𝐹𝐹𝐹𝐹𝐹
• This gives a ratio of the test being positive for patients with disease compared with those without
disease. Aim to be much greater than 1 for a good test.
• And negative likelihood ratio (LR-) defined as
𝐿𝐿𝐿𝐿 −=
1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆
=
𝐹𝐹𝐹𝐹𝐹𝐹
𝑇𝑇𝑇𝑇𝑇𝑇
• This gives a ratio of the test being negative for patients with disease compared with those without
disease. Aim to be considerably less than 1 for a good test.
• The likelihood odds ratio is the ratio of the positive likelihood ratio to the negative
likelihood ratio.
• In some calculation methods, ½ is added to all counts before the calculation of LOR, to avoid
dividing by 0.
𝐿𝐿𝐿𝐿𝐿𝐿 =
𝐿𝐿𝐿𝐿 +
𝐿𝐿𝐿𝐿 −
Distributions
Types of Distributions
• Binomial
• Poisson
• Normal
The Normal Distribution
Properties of Normal Distribution
How do we use the Normal distribution?
• The Normal probability distribution can be used to calculate
the probability of different values occurring.
• We could be interested in: what is the probability of being
within 1 standard deviation of the mean (or outside it)?
• We can use a Normal distribution table which tells us the
probability of being outside this value.
• The Normal distribution also has other uses in statistics and
is often used as an approximation to the Binomial and
Poisson distributions.
Populations and Samples
• In the statistical sense a population is a theoretical concept used
to describe an entire group of individuals in whom we are
interested.
• Examples are the population of all patients with diabetes
mellitus, or the population of all middle-aged men.
• Parameters are quantities used to describe characteristics of
such populations.
• Thus the proportion of diabetic patients with nephropathy, or
the mean blood pressure of middle-aged men, are characteristics
describing the two populations.
• Generally, it is costly and labour intensive to study the entire
population.
• Therefore we collect data on a sample of individuals from the
population who we believe are representative of that
population, that is, they have similar characteristics to the
individuals in the population.
• We then use them to draw conclusions, technically make
inferences, about the population as a whole.
Populations and Samples
Populations and Samples
• The process is represented schematically in the figure.
• So, samples are taken from populations to provide estimates
of population parameters
• It is important to note that although the study populations
are unique, samples are not as we could take more than one
sample from the target population if we wished.
• Thus for middle-aged men there is only one normal range
for blood pressure.
• However, one investigator taking a random sample from a
population of middle-aged men and measuring their blood
pressure may obtain a different normal range from another
investigator who takes a different random sample from the
same population of such men.
• By studying only some of the population we have
introduced a sampling error.
Populations and Samples
Sample
• In some circumstances the sample may consist of all the members of a specifically
dened population.
• For practical reasons, this is only likely to be the case if the population of interest is
not too large.
• If all members of the population can be assessed, then the estimate of the
parameter concerned is derived from information obtained on all members and so
its value will be the population parameter itself.
• In this idealised situation we know all about the population as we have examined all
its members and the parameter is estimated with no bias.
• The dotted arrow in Figure 6.1 connecting the population ellipse to population
parameter box illustrates this.
• However, this situation will rarely be the case so, in practice, we take a sample which
is often much smaller in size than the population under study.
Sample
• Ideally we should aim for a random sample.
• A list of all individuals from the population is drawn up (the
sampling frame), and individuals are selected randomly
from this list, that is, every possible sample of a given size in
the population has an equal chance of being chosen.
• Sometimes, there may be difficulty in constructing this list
or we may have to ‘make-do’ with those subjects who
happen to be available or what is termed a convenience
sample.
• Essentially if we take a random sample then we obtain an
unbiased estimate of the corresponding population
parameter, whereas a convenience sample may provide a
biased estimate but by how much we will not know
Properties of the distribution of sample
means
• The mean of all the sample means will be the same as the
population mean.
• The standard deviation of all the sample means is known as
the standard error (SE) of the mean or SEM.
• Given a large enough sample size, the distribution of sample
means, will be roughly Normal regardless of the distribution
of the variable.
Properties of standard errors
• The standard error (SE) is a measure of the precision
of a sample estimate.
• It provides a measure of how far from the true value
in the population the sample estimate is likely to be.
• All standard errors have the following interpretation:
• A large standard error indicates that the estimate is
imprecise.
• A small standard error indicates that the estimate is
precise.
• The standard error is reduced, that is, we obtain a more
precise estimate, if the size of the sample is increased.
Standard errors
Worked example: Standard error of a mean
– birthweight of preterm infants
• Simpson (2004) reported the birthweights of 98 infants who
were born prematurely, for which n = 98, ̅
𝑥𝑥 = 1.31 kg,
s = 0.42 kg and 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥 =? ?
• The standard error provides a measure of the precision of
our sample estimate of the population mean birthweight
Worked example: Standard error of a
proportion – acupuncture and headache
• Melchart et al (2005) give the proportion who responded to
acupuncture treatment in 124 patients with tension type
headache as p = 0.46.
• We assume the numbers who respond have a Binomial
distribution and from Table 6.3 we nd the standard error is
𝑆𝑆𝑆𝑆 𝑝𝑝 =? ?
Worked example: Standard error of a rate –
cadaveric heart donors
• The study of Wight et al (2004) gave the number of organ
donations calculated over a two-year period as r = 1.82 per
day.
• We assume the number of donations follows a Poisson
distribution and from Table 6.3 we nd the standard error is
𝑆𝑆𝑆𝑆 𝑟𝑟 =? ?
Standard Errors of Differences
Example: Difference in means – physiotherapy for patients with lung
disease
• Griffiths et al (2000) report the results of a randomised
controlled trial to compare a pulmonary rehabilitation
programme (Intervention) with standard medical management
(Control) for the treatment of chronic obstructive pulmonary
disease.
• One outcome measure was the walking capacity (distance
walked in metres from a standardised test) of the patient
assessed 6 weeks after randomisation.
• Further suppose such measurements can be assumed to follow a
Normal distribution.
• The results from the 184 patients are expressed using the group
means and standard deviations (SD) as follows:
nInt = 93, ̅
𝑥𝑥Int = 211, SD(xInt ) = sInt = 118
nCon = 91, ̅
𝑥𝑥Con = 123, SD(xCon ) = sCon = 99.
• From these data d = xInt - xCon = 211 - 123 = 88 m and the
corresponding standard error is 𝑺𝑺𝑺𝑺 �
𝒅𝒅 =
Worked example: Difference in proportions – post-natal
urinary incontinence
• The results of randomised controlled trial conducted by
Glazener et al (2001) to assess the effect of nurse
assessment with reinforcement of pelvic floor muscle
training exercises and bladder training (Intervention)
compared with standard management (Control) among
women with persistent incontinence three months
postnatally are summarised in Table 6.2.
Condence intervals for an estimate
• A confidence interval defines a range of values within which
our population parameter is likely to lie
• Such an interval for the population mean 𝜇𝜇 is defined by
̅
𝑥𝑥 − 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥 𝑡𝑡𝑡𝑡 ̅
𝑥𝑥 + 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥
and, in this case, is termed a 95% condence interval as it
includes the multiplier 1.96.
Condence intervals for an mean
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for R
Confidence Intervals for Differences
• To calculate a confidence interval for a difference in
means, for example d = 𝜇𝜇A - 𝜇𝜇B, the same structure for
the condence interval of a single mean is used but
with ̅
𝑥𝑥 replaced by ̅
𝑥𝑥1 - ̅
𝑥𝑥2 and SE( x) replaced by SE( x1 -
x2).
• Algebraic expressions for these standard errors are
given in Table 6.4 (Section 6.10).
• Thus the 95% CI is given by
̅
𝑥𝑥1 − ̅
𝑥𝑥2 − 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥1 − ̅
𝑥𝑥2 𝑡𝑡𝑡𝑡 ̅
𝑥𝑥1 − ̅
𝑥𝑥2 + 1,96 × 𝑆𝑆𝑆𝑆 ̅
𝑥𝑥1 − ̅
𝑥𝑥2
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for R
More Accurate Confidence Intervals for a Proportion
• To use this method, we first need to calculate three
quantities: 𝐴𝐴 = 2𝑟𝑟 + 𝑧𝑧2; 𝐵𝐵 = 𝑧𝑧 𝑧𝑧24𝑟𝑟(1 − 𝑝𝑝); and, 𝐶𝐶 =
2(𝑛𝑛 + 𝑧𝑧2)
Where 𝑧𝑧 is from the standard normal table
• The recommended confidence interval is given by
(𝐴𝐴 − 𝐵𝐵)
𝐶𝐶
𝑡𝑡𝑡𝑡
(𝐴𝐴 + 𝐵𝐵)
𝐶𝐶
• When there are no observed events, r = 0 and hence 𝑝𝑝 =
0
𝑛𝑛
= 0, the recommended CI simplifies to
0 𝑡𝑡𝑡𝑡
𝑧𝑧2
(𝑛𝑛 + 𝑧𝑧2)
• While when r = n so that p = 1, the CI becomes
𝑛𝑛
(𝑛𝑛 + 𝑧𝑧2)
𝑡𝑡𝑡𝑡 1
Probability.pdf.pdf and Statistics for R
Probability.pdf.pdf and Statistics for R

More Related Content

PPT
05 diagnostic tests cwq
PPTX
Evaluating diagnostic tests.pptx
PPT
Diagnostic tests - research .-6thoct.ppt
PPT
Diagnostic testing 2009
PPTX
OAJ presentation final draft
PPTX
Evidence based diagnosis
PDF
Clinical evaluation Statistics with expanded formulas.pdf
PPTX
Chapter- 1 Introduction (1).pptx to hematolooogyy
05 diagnostic tests cwq
Evaluating diagnostic tests.pptx
Diagnostic tests - research .-6thoct.ppt
Diagnostic testing 2009
OAJ presentation final draft
Evidence based diagnosis
Clinical evaluation Statistics with expanded formulas.pdf
Chapter- 1 Introduction (1).pptx to hematolooogyy

Similar to Probability.pdf.pdf and Statistics for R (20)

PPTX
Tests of diagnostic accuracy
PPTX
Dr Amit Diagnostic Tests.pptx
PPT
Evidence Based Diagnosis
PPSX
Validity and Reliability
DOC
Ch 14 diagn test.doc
PPTX
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
PPTX
Diagnostic Tests for PGs
PPT
Lecture 3 -Screening tests.ppt presentation
PPTX
Diagnotic and screening tests
PPTX
Epidemiological method to determine utility of a diagnostic test
PPTX
Screening test (basic concepts)
PPTX
Screening and diagnostic testing
PPT
Evidence-based diagnosis
PDF
Bayesian clinical test
PPT
Validity and reliability of screening/ diagnostic tests
PPT
Validity of a screening test
PPTX
clinical trial and study design copy 2.pptx
PPTX
Testing and Screening Nov 2021-undergraduate.pptx
PPTX
session three epidemiology.pptx
PPTX
session three epidemiology.pptx
Tests of diagnostic accuracy
Dr Amit Diagnostic Tests.pptx
Evidence Based Diagnosis
Validity and Reliability
Ch 14 diagn test.doc
Epidemiological Approaches for Evaluation of diagnostic tests.pptx
Diagnostic Tests for PGs
Lecture 3 -Screening tests.ppt presentation
Diagnotic and screening tests
Epidemiological method to determine utility of a diagnostic test
Screening test (basic concepts)
Screening and diagnostic testing
Evidence-based diagnosis
Bayesian clinical test
Validity and reliability of screening/ diagnostic tests
Validity of a screening test
clinical trial and study design copy 2.pptx
Testing and Screening Nov 2021-undergraduate.pptx
session three epidemiology.pptx
session three epidemiology.pptx
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
annual-report-2024-2025 original latest.
PDF
Lecture1 pattern recognition............
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Introduction to the R Programming Language
PPTX
Introduction to machine learning and Linear Models
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
climate analysis of Dhaka ,Banglades.pptx
1_Introduction to advance data techniques.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
annual-report-2024-2025 original latest.
Lecture1 pattern recognition............
IBA_Chapter_11_Slides_Final_Accessible.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
oil_refinery_comprehensive_20250804084928 (1).pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
STERILIZATION AND DISINFECTION-1.ppthhhbx
Reliability_Chapter_ presentation 1221.5784
Business Ppt On Nestle.pptx huunnnhhgfvu
Introduction to the R Programming Language
Introduction to machine learning and Linear Models
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
Data_Analytics_and_PowerBI_Presentation.pptx
Ad

Probability.pdf.pdf and Statistics for R

  • 1. PROBABILITY GR Miyambu School of Science & Technology Department of Statistical Sciences
  • 2. Types of Probability Types of Probability Frequency What is the probability of a randomly chosen person dying in the next year? Model-based What is the probability of a child being affected by cystic fibrosis given one of the parents is a carrier of the disease? Subjective What is the probability that a particular patient has heart disease given they have chest pain?
  • 3. Properties of Probability The three types of probability all have the following properties. 1. All probabilities lie between 0 and 1. 2. When the outcome can never happen the probability is 0. 3. When the outcome will denitely happen the probability is 1.
  • 4. Diagnostic Test • A diagnostic test is any approach used to gather clinical information for the purpose of making a clinical decision (i.e., diagnosis). • Some examples of diagnostic tests include X-rays, biopsies, pregnancy tests, medical histories, and results from physical examinations. • From a statistical point of view there are two points to keep in mind: 1. the clinical decision-making process is based on probability; 2. the goal of a diagnostic test is to move the estimated probability of disease toward either end of the probability scale (i.e., 0 rules out disease, 1 confirms the disease).
  • 5. Uses of Diagnostic Test • In making a diagnosis, a clinician rst establishes a possible set of diagnostic alternatives and then attempts to reduce these by progressively ruling out specic diseases or conditions. • Alternatively, the clinician may have a strong hunch that the patient has one particular disease and he then sets about conrming it. • Given a particular diagnosis, a good diagnostic test should indicate either that the disease is very unlikely or that it is very probable. • In a practical sense it is important to realize that a diagnostic test is useful only if the result influences patient management since, if the management is the same for two different conditions, there is little point in trying strenuously to distinguish between them.
  • 6. Analysis of Diagnostic Test Disease No Disease Test Positive a (true positives) b (false positives) Test Negative c (false negatives) d (true negatives)
  • 7. Gold Standard • The "Gold Standard" is the method used to obtain a definitive diagnosis for a particular disease; it may be biopsy, surgery, autopsy or an acknowledged standard. • Gold Standards are used to define true disease status against which the results of a new diagnostic test are compared. • Here are a number of definitive diagnostic tests that will confirm whether or not you have the disease. • Some of these are quite invasive and this is a major reason why new diagnostic procedures are being developed. Target Disorder Gold Standard Breast cancer Excisional biopsy Prostate cancer Transrectal biopsy Coronar stenosis Coronary angiography Myocardial infarction Catheterization Strep throat Throat culture
  • 8. Sensitivity and specicity • Many diagnostic test results are given in the form of a continuous variable (that is one that can take any value within a given range), such as diastolic blood pressure or haemoglobin level. • However, for ease of discussion we will rst assume that these have been divided into positive or negative results. • For example, a positive diagnostic result of ‘hypertension’ is a diastolic blood pressure greater than 90 mmHg; • whereas for ‘anaemia’, a haemoglob in level less than 10 g/ d l is required. • For every diagnostic procedure (which may involve a laboratory test of a sample taken) there is a set of fundamental questions that should be asked.
  • 9. Sensitivity and specicity • First, if the d isease is present, what is the prob a b ility that the test result will be positive? • This leads to the notion of the sensitivity of the test. • Secon d , if the d isease is a b sent, what is the pro b a b ility that the test result will be negative? • This question refers to the specicity of the test. • These questions can b e answere donly if it is known what the ‘true’ diagnosis is. • In the case of organic d isease this can b e d etermine d b y b iopsy or, for example, an expensive an drisky proce d ure such as angiography for heart disease. • In other situations it may b e b y ‘expert’ opinion. Such tests provi d e the so-called ‘gold standard’.
  • 10. Example Diagnosis of heart disease • Consider the results of an assay of N-terminal pro-brain natriuretic peptide (NT- proBNP) for diagnosis of heart failure in a general population survey in those over 45 years of age and in patients with existing diagnosis of heart failure obtained by Hobbs et al (2002) and summarised in the table. • Heart failure was identied when NT-proBNP >36 pmol/l. NT-proBNP (pmol/l) Confirmed Diagnosis of Heart Failure Present Absent Total (D+) (D-) > 36 Positive (T+) 35 (a) 7 (b) 42 Negative (T-) 68 (c) 300 (d) 368 Total 103 307 410
  • 11. Sensitivity and Specicity • The prevalence of heart failure in these subjects is (a + c)/(a + b + c + d) • P(D+)=?? • The sensitivity of a test is theproportion of those with thedisease who also have a positive test result. • The sensitivity is a/(a + c)=?? • N o w sensitivity is the pr o bability o f a p o s it iv e test result (e v ent T +) g iv en that the d isease is present (event D+) and can be written as p(T+|D+)=??, where the ‘|’ is read as ‘given’. • The specicity of the test is the proportion of those without disease who give a negative test result. • Thus the specicity is d/(b + d)=?? • N o w specicity is the pr o bability o f a ne g ativ e test result (e v ent T -) g iv en that the d isease is absent (event D-) and can be written as p(T-|D-)=?? • Since sensitivity is conditional on the disease being present, and specicity on the disease being absent, in theory, they are unaffected by disease prevalence.
  • 12. Sensitivity and Specicity • Sensitivity and specicity are useful statistics because they will yield consistent results for the diagnostic test in a variety of patient groups with different disease prevalences. • This is an important point; sensitivity and specicity are characteristics of the test, not the population to which the test is applied. • Although indeed they are independent of disease prevalence, in practice if the disease is very rare, the accuracy with which one can estimate the sensitivity will be limited. • Two other terms in common use are: the false negative rate (or probability of a false negative) which is given by c/(a + c) = 1 - Sensitivity, and the false positive rate (or probability of a false positive) or b/(b + d) = 1 - Specicity. • Since sensitivity = 1 - Probability(false negative) and specicity = 1 - Probability(false positive), a possibly useful mnemonic to recall this is that ‘sensitivity’ and ‘negative’ have ‘n’s in them and ‘specicity’ and ‘positive’ have ‘p’s in them.
  • 13. Summary of Definitions of Sensitivity and Specificity Test result True diagnosis Disease present Disease absent Positive Sensitivity Probability of a false positive Negative Probability of a false negative Specicity
  • 14. Rates Assuming a Predicted Condition • The predictive value refers to the likelihood for determining an outbreak or non-outbreak of an infectious disease based on early warning results. • Predictive values can be classified into the positive predictive value (PPV) and the negative predictive value (PNV). • Positive predictive value is the proportion of individuals with positive test results that are correctly diagnosed and actually have the disease. 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑝𝑝 𝐷𝐷 + 𝑇𝑇 + = 𝑎𝑎 𝑎𝑎 + 𝑏𝑏 • Negative predictive value is the proportion of individuals with negative test results that are correctly diagnosed and do not have the disease. 𝑁𝑁𝑁𝑁𝑁𝑁 = 𝑝𝑝 𝐷𝐷 − 𝑇𝑇 − = 𝑑𝑑 𝑐𝑐 + 𝑑𝑑 • False Omission Rate is the proportion of the individuals with a negative test result for which the true condition is positive. 𝐹𝐹𝐹𝐹𝐹𝐹 = 𝑐𝑐 𝑐𝑐 + 𝑑𝑑 • The false discovery rate is the proportion of the individuals with a positive test result for which the true condition is negative. 𝐹𝐹𝐹𝐹𝐹𝐹 = 𝑏𝑏 𝑎𝑎 + 𝑏𝑏
  • 15. Whole Table Rates • P r evalence is the proportion of a population who have a specific characteristic in a given time period. • The prevalence may be estimated from the table if all the individuals are randomly sampled from the population. 𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑎𝑎 + 𝑐𝑐 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 • The Accu racy o r P ro p orti o n C o rrectly Cla ss ifie d reflects the total proportion of individuals that are correctly classified. 𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑎𝑎 + 𝑑𝑑 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑 • The proportion incorrectly classified reflects the total proportion of individuals that are incorrectly classified. 𝑃𝑃𝑃𝑃𝑃𝑃 = 𝑏𝑏 + 𝑐𝑐 𝑎𝑎 + 𝑏𝑏 + 𝑐𝑐 + 𝑑𝑑
  • 16. Likelihood Ratio • The clear simplicity of diagnostic test data, particularly when presented as a 2 x 2 table, is confounded by many ways of reporting the results. • The likelihood ratio (LR) is a simple measure combining sensitivity and specificity • We have positive likelihood ratio (LR+) defined as 𝐿𝐿𝐿𝐿 += 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑇𝑇𝑇𝑇𝑇𝑇 𝐹𝐹𝐹𝐹𝐹𝐹 • This gives a ratio of the test being positive for patients with disease compared with those without disease. Aim to be much greater than 1 for a good test. • And negative likelihood ratio (LR-) defined as 𝐿𝐿𝐿𝐿 −= 1 − 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆𝑆 = 𝐹𝐹𝐹𝐹𝐹𝐹 𝑇𝑇𝑇𝑇𝑇𝑇 • This gives a ratio of the test being negative for patients with disease compared with those without disease. Aim to be considerably less than 1 for a good test. • The likelihood odds ratio is the ratio of the positive likelihood ratio to the negative likelihood ratio. • In some calculation methods, ½ is added to all counts before the calculation of LOR, to avoid dividing by 0. 𝐿𝐿𝐿𝐿𝐿𝐿 = 𝐿𝐿𝐿𝐿 + 𝐿𝐿𝐿𝐿 −
  • 17. Distributions Types of Distributions • Binomial • Poisson • Normal
  • 19. Properties of Normal Distribution
  • 20. How do we use the Normal distribution? • The Normal probability distribution can be used to calculate the probability of different values occurring. • We could be interested in: what is the probability of being within 1 standard deviation of the mean (or outside it)? • We can use a Normal distribution table which tells us the probability of being outside this value. • The Normal distribution also has other uses in statistics and is often used as an approximation to the Binomial and Poisson distributions.
  • 21. Populations and Samples • In the statistical sense a population is a theoretical concept used to describe an entire group of individuals in whom we are interested. • Examples are the population of all patients with diabetes mellitus, or the population of all middle-aged men. • Parameters are quantities used to describe characteristics of such populations. • Thus the proportion of diabetic patients with nephropathy, or the mean blood pressure of middle-aged men, are characteristics describing the two populations. • Generally, it is costly and labour intensive to study the entire population. • Therefore we collect data on a sample of individuals from the population who we believe are representative of that population, that is, they have similar characteristics to the individuals in the population. • We then use them to draw conclusions, technically make inferences, about the population as a whole.
  • 23. Populations and Samples • The process is represented schematically in the figure. • So, samples are taken from populations to provide estimates of population parameters • It is important to note that although the study populations are unique, samples are not as we could take more than one sample from the target population if we wished. • Thus for middle-aged men there is only one normal range for blood pressure. • However, one investigator taking a random sample from a population of middle-aged men and measuring their blood pressure may obtain a different normal range from another investigator who takes a different random sample from the same population of such men. • By studying only some of the population we have introduced a sampling error.
  • 25. Sample • In some circumstances the sample may consist of all the members of a specically dened population. • For practical reasons, this is only likely to be the case if the population of interest is not too large. • If all members of the population can be assessed, then the estimate of the parameter concerned is derived from information obtained on all members and so its value will be the population parameter itself. • In this idealised situation we know all about the population as we have examined all its members and the parameter is estimated with no bias. • The dotted arrow in Figure 6.1 connecting the population ellipse to population parameter box illustrates this. • However, this situation will rarely be the case so, in practice, we take a sample which is often much smaller in size than the population under study.
  • 26. Sample • Ideally we should aim for a random sample. • A list of all individuals from the population is drawn up (the sampling frame), and individuals are selected randomly from this list, that is, every possible sample of a given size in the population has an equal chance of being chosen. • Sometimes, there may be difculty in constructing this list or we may have to ‘make-do’ with those subjects who happen to be available or what is termed a convenience sample. • Essentially if we take a random sample then we obtain an unbiased estimate of the corresponding population parameter, whereas a convenience sample may provide a biased estimate but by how much we will not know
  • 27. Properties of the distribution of sample means • The mean of all the sample means will be the same as the population mean. • The standard deviation of all the sample means is known as the standard error (SE) of the mean or SEM. • Given a large enough sample size, the distribution of sample means, will be roughly Normal regardless of the distribution of the variable.
  • 28. Properties of standard errors • The standard error (SE) is a measure of the precision of a sample estimate. • It provides a measure of how far from the true value in the population the sample estimate is likely to be. • All standard errors have the following interpretation: • A large standard error indicates that the estimate is imprecise. • A small standard error indicates that the estimate is precise. • The standard error is reduced, that is, we obtain a more precise estimate, if the size of the sample is increased.
  • 30. Worked example: Standard error of a mean – birthweight of preterm infants • Simpson (2004) reported the birthweights of 98 infants who were born prematurely, for which n = 98, ̅ 𝑥𝑥 = 1.31 kg, s = 0.42 kg and 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 =? ? • The standard error provides a measure of the precision of our sample estimate of the population mean birthweight
  • 31. Worked example: Standard error of a proportion – acupuncture and headache • Melchart et al (2005) give the proportion who responded to acupuncture treatment in 124 patients with tension type headache as p = 0.46. • We assume the numbers who respond have a Binomial distribution and from Table 6.3 we nd the standard error is 𝑆𝑆𝑆𝑆 𝑝𝑝 =? ?
  • 32. Worked example: Standard error of a rate – cadaveric heart donors • The study of Wight et al (2004) gave the number of organ donations calculated over a two-year period as r = 1.82 per day. • We assume the number of donations follows a Poisson distribution and from Table 6.3 we nd the standard error is 𝑆𝑆𝑆𝑆 𝑟𝑟 =? ?
  • 33. Standard Errors of Differences
  • 34. Example: Difference in means – physiotherapy for patients with lung disease • Grifths et al (2000) report the results of a randomised controlled trial to compare a pulmonary rehabilitation programme (Intervention) with standard medical management (Control) for the treatment of chronic obstructive pulmonary disease. • One outcome measure was the walking capacity (distance walked in metres from a standardised test) of the patient assessed 6 weeks after randomisation. • Further suppose such measurements can be assumed to follow a Normal distribution. • The results from the 184 patients are expressed using the group means and standard deviations (SD) as follows: nInt = 93, ̅ 𝑥𝑥Int = 211, SD(xInt ) = sInt = 118 nCon = 91, ̅ 𝑥𝑥Con = 123, SD(xCon ) = sCon = 99. • From these data d = xInt - xCon = 211 - 123 = 88 m and the corresponding standard error is 𝑺𝑺𝑺𝑺 ďż˝ 𝒅𝒅 =
  • 35. Worked example: Difference in proportions – post-natal urinary incontinence • The results of randomised controlled trial conducted by Glazener et al (2001) to assess the effect of nurse assessment with reinforcement of pelvic floor muscle training exercises and bladder training (Intervention) compared with standard management (Control) among women with persistent incontinence three months postnatally are summarised in Table 6.2.
  • 36. Condence intervals for an estimate • A condence interval denes a range of values within which our population parameter is likely to lie • Such an interval for the population mean 𝜇𝜇 is dened by ̅ 𝑥𝑥 − 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 𝑡𝑡𝑡𝑡 ̅ 𝑥𝑥 + 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥 and, in this case, is termed a 95% condence interval as it includes the multiplier 1.96.
  • 40. Confidence Intervals for Differences • To calculate a condence interval for a difference in means, for example d = 𝜇𝜇A - 𝜇𝜇B, the same structure for the condence interval of a single mean is used but with ̅ 𝑥𝑥 replaced by ̅ 𝑥𝑥1 - ̅ 𝑥𝑥2 and SE( x) replaced by SE( x1 - x2). • Algebraic expressions for these standard errors are given in Table 6.4 (Section 6.10). • Thus the 95% CI is given by ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 − 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 𝑡𝑡𝑡𝑡 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2 + 1,96 × 𝑆𝑆𝑆𝑆 ̅ 𝑥𝑥1 − ̅ 𝑥𝑥2
  • 43. More Accurate Confidence Intervals for a Proportion • To use this method, we first need to calculate three quantities: 𝐴𝐴 = 2𝑟𝑟 + 𝑧𝑧2; 𝐵𝐵 = 𝑧𝑧 𝑧𝑧24𝑟𝑟(1 − 𝑝𝑝); and, 𝐶𝐶 = 2(𝑛𝑛 + 𝑧𝑧2) Where 𝑧𝑧 is from the standard normal table • The recommended confidence interval is given by (𝐴𝐴 − 𝐵𝐵) 𝐶𝐶 𝑡𝑡𝑡𝑡 (𝐴𝐴 + 𝐵𝐵) 𝐶𝐶 • When there are no observed events, r = 0 and hence 𝑝𝑝 = 0 𝑛𝑛 = 0, the recommended CI simplifies to 0 𝑡𝑡𝑡𝑡 𝑧𝑧2 (𝑛𝑛 + 𝑧𝑧2) • While when r = n so that p = 1, the CI becomes 𝑛𝑛 (𝑛𝑛 + 𝑧𝑧2) 𝑡𝑡𝑡𝑡 1