SlideShare a Scribd company logo
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Descriptive analysis
of categorical variables
Tuan V. Nguyen
Professor and NHMRC Senior Research Fellow
Garvan Institute of Medical Research
University of New South Wales
Sydney, Australia
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
What we are going to learn
•  Categorical data
•  Probability
•  Statistical description of
–  Prevalence
–  Incidence
–  Rate
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Measurement and comparison
To find out whether a community is healthy or
unhealthy:
•  first measure one or more indicators of health
(deaths, new cases of disease, etc)
•  compare the results with another community or
group.
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Measures of Disease Occurrence
•  Incidence proportion (risk)
•  Incidence rate (density)
•  Prevalence
All three are loosely called “rates” (but only the
second is a true rate)
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Types of populations
We measure disease occurrence in two types of
populations:
•  Closed populations ! “cohorts”
•  Open populations
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
6
Cohort word origin
(Latin cohors) basic
tactical unit of a
Roman legion
Epi cohort ≡ a
group of individuals
followed over time
Closed population = cohort
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Inflow (immigration,
births)
•  Outflow (emigration,
death)
•  An open population in
“steady
state” (constant size)
is said to be
stationary
Open population
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  “Rates” are composed of
numerators and
denominators
•  Numerator ! case count
Incidence count ! onsets
Prevalence count ! old + new
cases
•  Denominators ! reflection
of population size
Numerators and denominators
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Denominators
Denominators:
reflection of population
size
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Synonyms: risk, cumulative incidence,
attack rate
•  Interpretation: average risk
study
of
beginning
at
risk
@
no.
over time
onsets
of
no.
IP =
Can be calculated only in cohorts
Incidence proportion
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Objective: estimate risk of uterine cancer
•  Recruit cohort of 1000 women
•  100 had hysterectomies, leaving 900 at risk
•  Follow at risk individuals for 10 years
•  Observe 10 onsets of uterine cancer
women
900
women
10
risk
@
no.
onsets
of
no.
IP =
=
10-year average risk is .011 or 1.1%.
0111
.
0
=
Example of IP
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Synonyms: incidence density, person-time rate
•  Interpretation A: “Speed” at which events occur
•  Interpretation B: When disease is rare:
rate per person-year ≈ one-year risk
•  Calculated differently in closed and open
populations
risk
@
time
-
person
of
Sum
onsets
no.
IR =
Incidence rate
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Objective: estimate rate of uterine cancer
•  Recruit cohort of 1000 women
•  100 had hysterectomies, leaving 900 at risk
•  Follow at risk individuals for 10 years
•  Observe 10 onsets of uterine cancer
time
-
person
onsets
of
no.
IR =
Rate is .00111 per year or 11.1 per 10,000 years
years
9000
10
=
years
10
women
900
women
10
×
=
year
.00111
=
Example of IP
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Individual follow-up over time
years
50
years
25
onsets
2
+
=
time
-
person
onsets
IR
∑
=
years
-
person
100
per
2.67
years
-
person
per
0267
.
0 =
=
years
75
onsets
2
=
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Rate
Mortality
1
expectacy
Life =
In stationary populations, and in cohorts with
complete follow-up, the mortality rate is the
reciprocal of life expectancy (and vice versa).
Example: for a mortality rate of .0267 per year
years
5
.
37
year
.0267
1
expectacy
Life =
=
Mortality and life expectancy
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
years
37.5
2
years
50)
(25
expectancy
life
has
cohort
This =
+
year
0267
.
0
years
50)
(25
deaths
2
of
rate
mortality
a
has
cohort
This 1
−
=
+
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
years
-
person
100,000
per
877
=
n
observatio
of
duration
size
population
Avg
onsets
IR
×
=
-1
year
deaths
008770
.
0
=
Example: 2,391,630 deaths in 1999 (one year)
Population size = 272,705,815
year
1
persons
5
272,705,81
deaths
2,391,630
IR
×
=
Incidence rate in open population
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Point prevalence ≡ prevalence at a particular point in
time
•  Period prevalence ≡ prevalence over a period of time
•  Interpretation A: proportion with condition
•  Interpretation B: probability a person selected at
random will have the condition
people
of
no.
cases
new
and
old
no.
Prevalence=
Prevalence
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
•  Recruit 1000 women
•  Ascertain: 100 with hysterectomies
people
of
no.
cases
no.
Prevalence=
Prevalence in sample is 10%
10
.
0
=
people
1000
people
100
=
Example of prevalence
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Increase incidence ! increase
inflow
Increase average duration of
disease ! decreased outflow
Ways to increase prevalence
Dynamic prevalence
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
duration)
(average
rate)
(incidence
prevalence ×
≈
Example:
•  Incidence rate = 0.01 / year
•  Average duration of the illness = 2 years.
•  Prevalence ≈ 0.01 / year × 2 years = 0.02
When disease rare & population stationary
Prevalence and incidence
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Estimation of 95% confidence interval
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Proportions
•  Proportion of event in the sample, denoted “p hat”:
where x = no. of events and n = sample size
n
x
p =
ˆ
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Proportion, cont
Two of 10 individuals in the sample have a risk factor
for disease X
The prevalence of this risk factor in the sample is:
(or10%)
1
.
0
10
2
ˆ =
=
=
n
x
p
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Inference about a Proportion
How good is sample proportion at estimating
population proportion p?
Consider what would happen if we took repeated
samples, each of size n, from the population? How
would sample proportions be distributed?
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
p
q
n
pq
p
N
p
−
=
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
1
where
,
~
ˆ
Normal Approximation for Proportions
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Normal approximation
H0: p = p0 vs. Ha: p ! p0 where p0 represents the
proportion specified by the null hypothesis
Test statistic
ˆ
0
0
0
stat
n
q
p
p
p
z
−
=
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Example
n = 57 finds 17 smokers (p-hat = 17 / 57 = 0.2982).
The national average for smoking prevalence is 0.25.
Is the proportion in the sample significantly different
than the national average?
H0:p = 0.25 vs. Ha: p ≠ 0.25
The sample proportion is not significantly different
than the national average.
84
.
0
57
75
.
25
.
25
.
2982
.
ˆ
0
0
0
stat =
⋅
−
=
−
=
n
q
p
p
p
z
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Confidence Interval for Proportion

p± z1−α
2
⋅

p
q

n
where

x = 
x + 2, 
n = n + 4, 
p =

x

n
, and 
q =1− 
p
This method is called the “plus four method”
because it adds four imaginary points during
calculations. It is much more accurate than the
traditional Normal method.
A 1−α(100%) confidence interval for p is:
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Confidence Interval, example
)
4277
.
,
1953
(.
1162
.
3115
.
)
0593
)(.
96
.
1
(
3115
.
~
for
CI
95%
confidence
95%
for
96
.
1
0593
.
61
)
6885
)(.
3115
(.
~
~
~
6885
.
3115
.
1
~
;
3115
.
61
19
~
61
4
57
4
~
;
19
2
17
2
~
~
~
=
±
=
±
=
⋅
±
=
=
=
=
=
=
−
=
=
=
=
+
=
+
=
=
+
=
+
=
p
p
SE
z
p
p
z
n
q
p
SE
q
p
n
n
x
x
Based on n = 57 and x = 17, the 95% CI for the
prevalence of smoking in the population is:
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Sample Size and Power
Three approaches:
•  n needed to estimate p with margin of error m (for
confidence interval)
•  n needed to test H0 at given α level and power
•  The power of testing H0 under stated conditions
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
n need to achieve margin of error m
•  where p* represent an educated guess for population
proportion p (when no educated guess for p* is
available, let p* = .5)
•  Round up to next integer to ensure stated precision
2
*
*
2
1 2
m
q
p
z
n
α
−
=
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
n need to achieve m, example
Suppose our educated guess for the proportion is
p* = 0.30
897
896.4
03
.
)
70
)(.
30
)(.
96
.
1
(
2
2
⇒
=
=
n
For margin of error of .03, use:
323
322.7
05
.
)
70
)(.
30
)(.
96
.
1
(
2
2
⇒
=
=
n
For margin of error of .05, use:
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
n to test H0: p = p0
where
•  α ≡ alpha level of the test (two-sided)
•  1 – β ≡ power of the test
•  p0 ≡ proportion under the null hypothesis
•  p1 ≡ proportion under the alternative hypothesis
2
0
1
1
1
1
0
0
1 2
⎟
⎟
⎟
⎠
⎞
⎜
⎜
⎜
⎝
⎛
−
+
=
−
−
p
p
q
p
z
q
p
z
n
β
α
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
n to test H0: p = p0, example
How large a sample is needed to test H0: p = 0.21 against
Ha: p = 0.31 at α = 0.05 (two-sided) with 90% power?
194
3
.
193
21
.
0
31
.
0
)
69
.
0
)(
31
.
0
(
28
.
1
)
79
.
0
)(
21
.
0
(
96
.
1
2
⇒
=
⎟
⎟
⎠
⎞
⎜
⎜
⎝
⎛
−
+
=
n
! means round up to ensure stated power
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Conditions for Inference
•  Sampling independence
•  Valid information
•  The plus-four confidence
interval requires at least 10
observations
•  The z test of H0: p = p0 requires
np0q0 ! 5
I'd rather have a sound
judgment than a talent.
Mark Twain
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Bayesian analysis of proportion
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Review
•  When X ∼ Binomial(n, p) we know that
•  p = X/n is the MLE for p	

•  Var(p) = p(1 − p)/n
•  Wald interval for p	

	

p± Z1−α/2 p 1− p
( )
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Problems of Wald CI
•  The Wald interval performs terribly
•  Coverage probability varies wildly, sometimes being
quite low for certain values of n even when p is not
near the boundaries
–  Example, when p = .5 and n = 40 the actual coverage of a
95% interval is only 92%
•  When p is small or large, coverage can be quite poor
even for extremely large values of n
–  Example, when p = .005 and n = 1, 876 the actual cov-
erage rate of a 95% interval is only 90%
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Simple adjustment
•  A simple fix for the problem is to add 2 successes and
2 failures
•  That is let p = (X + 2) / (n + 4)
•  Lead to the Agresti-Coull interval
p± Z1−α/2 p 1− p
( )
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Bayesian analysis
•  Bayesian statistics posits a prior on the parameter of
interest
•  All inferences are then performed on the distribution
of the parameter given the data, called the posterior
•  In general
Posterior ∝ Likelihood × Prior
•  The likelihood is the factor by which our prior beliefs
are updated to produce conclusions in the light of the
data
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Beta priors
•  The beta distribution is the default prior for parame-
ters between 0 and 1
•  The beta density depends on two parameters α and
β
•  The mean of the beta density is α/(α + β)
•  The variance of the beta density is
•  The uniform density is the special case where α = β
= 1
between 0 and 1.
beta density depends on two parameters α a
Γ(α + β)
Γ(α)Γ(β)
pα−1(1 − p)β−1 for 0 ≤ p ≤ 1
mean of the beta density is α/(α + β)
variance of the beta density is
αβ
(α + β)2(α + β + 1)
uniform density is the special case where α =
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Some beta distributions
0.0 0.4 0.8
2
6
10
p
density
alpha = 0.5 beta = 0.5
0.0 0.4 0.8
0
5
10
15
p
density
alpha = 0.5 beta = 1
0.0 0.4 0.8
0
10
20
p
density
alpha = 0.5 beta = 2
0.0 0.4 0.8
0
5
10
15
p
density
alpha = 1 beta = 0.5
0.0 0.4 0.8
0.6
1.0
1.4
p
density
alpha = 1 beta = 1
0.0 0.4 0.8
0.0
1.0
2.0
p
density
alpha = 1 beta = 2
0.0 0.4 0.8
0
10
20
p
density
alpha = 2 beta = 0.5
0.0 0.4 0.8
0.0
1.0
2.0
p
density
alpha = 2 beta = 1
0.0 0.4 0.8
0.0
1.0
p
density
alpha = 2 beta = 2
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Posterior
•  Suppose that we chose values of α and β so that the
beta prior is indicative of our degree of belief regard-
ing p in the absence of data
•  Then using the rule that
Posterior ∝ Likelihood × Prior
and throwing out anything that doesn’t depend on p, we
have that
terior
uppose that we chose values of α and β so that th
eta prior is indicative of our degree of belief regar
g p in the absence of data
hen using the rule that
Posterior ∝ Likelihood × Prior
nd throwing out anything that doesn’t depend on
e have that
Posterior ∝ px(1 − p)n−x × pα−1(1 − p)β−1
= px+α−1(1 − p)n−x+β−1
his density is just another beta density with param
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Posterior mean
•  This density is just another beta density with param-
eters α* =x+α and β =n−x+β
Posterior mean
• Posterior mean
E[p | X] =
α̃
α̃ + β̃
=
x + α
x + α + n − x + β
=
x + α
n + α + β
=
x
n
×
n
n + α + β
+
α
α + β
×
α + β
n + α + β
= MLE × π + Prior Mean × (1 − π)
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Posterior variance
•  Posterior variance is
Posterior variance
• The posterior variance is
Var(p | x) =
α̃β̃
(α̃ + β̃)2(α̃ + β̃ + 1)
=
(x + α)(n − x + β)
(n + α + β)2(n + α + β + 1)
• Let p̃ = (x + α)/(n + α + β) and ñ = n + α + β then we have
Var(p | x) =
p̃(1 − p̃)
ñ + 1
•  Let p* = (x + α)/(n + α + β) and n* = n + α + β then
we have
Var(p | x) = p*(1 – p*) / (n* + 1)
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
Jeffreys prior
•  The “Jeffrey’s prior” has some theoretical benefits
puts α = β = 0.5
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p
prior,
likelihood,
posterior Prior
Likelihood
Posterior
alpha = 0.5 beta = 0.5
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p
prior,
likelihood,
posterior
Prior
Likelihood
Posterior
alpha = 1 beta = 1
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
p
prior,
likelihood,
posterior
Prior
Likelihood
Posterior
alpha = 2 beta = 2
Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012
R code
•  Install the binom package, then the command
library(binom)
binom.bayes(13, 20, type = "highest")
gives the HPD interval. The default credible level is 95%
and the default prior is the Jeffrey’s prior.

More Related Content

PPTX
Deciding on a medical research topic: your first challenge
PPTX
Choosing your study design
PPT
Statistical errors can cause deaths
PDF
RSS 2013 - A re-analysis of the Cochrane Library data]
PPTX
Sample size calculation
PPT
The ABC of Evidence-Base Medicine
PDF
Re-analysis of the Cochrane Library data and heterogeneity challenges
PPTX
Home Telehealth Monitoring Outcome Assessment - Kings Fund
Deciding on a medical research topic: your first challenge
Choosing your study design
Statistical errors can cause deaths
RSS 2013 - A re-analysis of the Cochrane Library data]
Sample size calculation
The ABC of Evidence-Base Medicine
Re-analysis of the Cochrane Library data and heterogeneity challenges
Home Telehealth Monitoring Outcome Assessment - Kings Fund

What's hot (20)

PPTX
Has modelling killed randomisation inference frankfurt
PPTX
Correlation Studies - Descriptive Studies
PPTX
PPTX
Clinical trials: quo vadis in the age of covid?
PPT
5. Calculate samplesize for case-control studies
PDF
Reliability of a German Questionnaire about General Practitioners? Handling o...
PPTX
Types of Research Studies
PPTX
MD Paediatricts (Part 2) - Epidemiology and Statistics
PPTX
Approximate ANCOVA
PPT
Epidemiology Lectures for UG
PPT
Designs and sample size in medical resarch
PPTX
To infinity and beyond v2
PPTX
Study designs
PPTX
Research and methodology 2
PDF
Mantel Haenszel methods in epidemiology (Stratification)
PPTX
The challenge of small data
PPT
Fallacies indrayan
PPT
Study Designs - Case control design by Dr Amita Kashyap
PPT
First in man tokyo
PPTX
To infinity and beyond
Has modelling killed randomisation inference frankfurt
Correlation Studies - Descriptive Studies
Clinical trials: quo vadis in the age of covid?
5. Calculate samplesize for case-control studies
Reliability of a German Questionnaire about General Practitioners? Handling o...
Types of Research Studies
MD Paediatricts (Part 2) - Epidemiology and Statistics
Approximate ANCOVA
Epidemiology Lectures for UG
Designs and sample size in medical resarch
To infinity and beyond v2
Study designs
Research and methodology 2
Mantel Haenszel methods in epidemiology (Stratification)
The challenge of small data
Fallacies indrayan
Study Designs - Case control design by Dr Amita Kashyap
First in man tokyo
To infinity and beyond
Ad

Similar to Ct lecture 5. descriptive analysis of categorical variables (20)

PDF
Ct lecture 17. introduction to logistic regression
PDF
Ct lecture 1. theory of measurements
PDF
Introduction to stats important for third proff and hivhly recommended for no...
PDF
Ct lecture 6. test of significance and test of h
PDF
Ct lecture 4. descriptive analysis of cont variables
PPTX
Research Designs
PPT
CHAPTER FOUR. Epidemiological study design
PPTX
Exploring Types of Study Design in Research
PPTX
Evidence based Surgery bedah dasaraa.pptx
PPTX
study design (post)المحاضرة دي من أهم المحاضرات اللي بييجي منها في الامتحان.pptx
PPT
Displaying your results
PDF
Lemeshow samplesize
PPT
Malimu cross sectional studies.
PPTX
Mesurement of morbidity (prevalence) presentation
PPT
Epidemiological studies
PDF
Ct lecture 8. comparing two groups categorical data
PPT
Malimu introduction to study designs
PPTX
P-values the gold measure of statistical validity are not as reliable as many...
PPT
2010-Epidemiology (Dr. Sameem) basics and priciples.ppt
PPTX
Cross sectional study.pptx community medicine
Ct lecture 17. introduction to logistic regression
Ct lecture 1. theory of measurements
Introduction to stats important for third proff and hivhly recommended for no...
Ct lecture 6. test of significance and test of h
Ct lecture 4. descriptive analysis of cont variables
Research Designs
CHAPTER FOUR. Epidemiological study design
Exploring Types of Study Design in Research
Evidence based Surgery bedah dasaraa.pptx
study design (post)المحاضرة دي من أهم المحاضرات اللي بييجي منها في الامتحان.pptx
Displaying your results
Lemeshow samplesize
Malimu cross sectional studies.
Mesurement of morbidity (prevalence) presentation
Epidemiological studies
Ct lecture 8. comparing two groups categorical data
Malimu introduction to study designs
P-values the gold measure of statistical validity are not as reliable as many...
2010-Epidemiology (Dr. Sameem) basics and priciples.ppt
Cross sectional study.pptx community medicine
Ad

More from Hau Pham (12)

PDF
Introductory Biostatistics_ Chap T Le_Wiley 2003.pdf
PPT
2008_Plague-Slide_Ref SR20080097
PDF
Thuc hanh Dich Te Hoc Y Ha Noi 2003
PDF
Ct lecture 20. survival analysis (part 2)
PDF
Lecture 3. planning data analysis
PDF
Ct lecture 16. model selection
PDF
Ct lecture 13. more on linear regression analysis
PDF
Ct lecture 12. simple linear regression analysis
PDF
Ct lecture 11. correlation analysis
PDF
Ct lecture 7. comparing two groups cont data
PDF
Ct lecture 2. questionnaire deisgn
PDF
ThongKe Y-Sinh Hoc_Bài 1 một số kiến thức toán cơ bản
Introductory Biostatistics_ Chap T Le_Wiley 2003.pdf
2008_Plague-Slide_Ref SR20080097
Thuc hanh Dich Te Hoc Y Ha Noi 2003
Ct lecture 20. survival analysis (part 2)
Lecture 3. planning data analysis
Ct lecture 16. model selection
Ct lecture 13. more on linear regression analysis
Ct lecture 12. simple linear regression analysis
Ct lecture 11. correlation analysis
Ct lecture 7. comparing two groups cont data
Ct lecture 2. questionnaire deisgn
ThongKe Y-Sinh Hoc_Bài 1 một số kiến thức toán cơ bản

Recently uploaded (20)

PPTX
NRPchitwan6ab2802f9.pptxnepalindiaindiaindiapakistan
PPTX
Anatomy and physiology of the digestive system
PPTX
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
PDF
شيت_عطا_0000000000000000000000000000.pdf
PPTX
obstructive neonatal jaundice.pptx yes it is
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPTX
Neuropathic pain.ppt treatment managment
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
PDF
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
PPTX
antibiotics rational use of antibiotics.pptx
PPTX
Acid Base Disorders educational power point.pptx
PPTX
NASO ALVEOLAR MOULDNIG IN CLEFT LIP AND PALATE PATIENT
PDF
Copy of OB - Exam #2 Study Guide. pdf
PPTX
Cardiovascular - antihypertensive medical backgrounds
PDF
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
PPTX
surgery guide for USMLE step 2-part 1.pptx
PPT
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
PPT
MENTAL HEALTH - NOTES.ppt for nursing students
NRPchitwan6ab2802f9.pptxnepalindiaindiaindiapakistan
Anatomy and physiology of the digestive system
anaemia in PGJKKKKKKKKKKKKKKKKHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH...
شيت_عطا_0000000000000000000000000000.pdf
obstructive neonatal jaundice.pptx yes it is
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
Neuropathic pain.ppt treatment managment
History and examination of abdomen, & pelvis .pptx
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Stimulation Protocols for IUI | Dr. Laxmi Shrikhande
Intl J Gynecology Obste - 2021 - Melamed - FIGO International Federation o...
antibiotics rational use of antibiotics.pptx
Acid Base Disorders educational power point.pptx
NASO ALVEOLAR MOULDNIG IN CLEFT LIP AND PALATE PATIENT
Copy of OB - Exam #2 Study Guide. pdf
Cardiovascular - antihypertensive medical backgrounds
Handout_ NURS 220 Topic 10-Abnormal Pregnancy.pdf
surgery guide for USMLE step 2-part 1.pptx
genitourinary-cancers_1.ppt Nursing care of clients with GU cancer
MENTAL HEALTH - NOTES.ppt for nursing students

Ct lecture 5. descriptive analysis of categorical variables

  • 1. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Descriptive analysis of categorical variables Tuan V. Nguyen Professor and NHMRC Senior Research Fellow Garvan Institute of Medical Research University of New South Wales Sydney, Australia
  • 2. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 What we are going to learn •  Categorical data •  Probability •  Statistical description of –  Prevalence –  Incidence –  Rate
  • 3. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Measurement and comparison To find out whether a community is healthy or unhealthy: •  first measure one or more indicators of health (deaths, new cases of disease, etc) •  compare the results with another community or group.
  • 4. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Measures of Disease Occurrence •  Incidence proportion (risk) •  Incidence rate (density) •  Prevalence All three are loosely called “rates” (but only the second is a true rate)
  • 5. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Types of populations We measure disease occurrence in two types of populations: •  Closed populations ! “cohorts” •  Open populations
  • 6. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 6 Cohort word origin (Latin cohors) basic tactical unit of a Roman legion Epi cohort ≡ a group of individuals followed over time Closed population = cohort
  • 7. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Inflow (immigration, births) •  Outflow (emigration, death) •  An open population in “steady state” (constant size) is said to be stationary Open population
  • 8. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  “Rates” are composed of numerators and denominators •  Numerator ! case count Incidence count ! onsets Prevalence count ! old + new cases •  Denominators ! reflection of population size Numerators and denominators
  • 9. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Denominators Denominators: reflection of population size
  • 10. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Synonyms: risk, cumulative incidence, attack rate •  Interpretation: average risk study of beginning at risk @ no. over time onsets of no. IP = Can be calculated only in cohorts Incidence proportion
  • 11. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Objective: estimate risk of uterine cancer •  Recruit cohort of 1000 women •  100 had hysterectomies, leaving 900 at risk •  Follow at risk individuals for 10 years •  Observe 10 onsets of uterine cancer women 900 women 10 risk @ no. onsets of no. IP = = 10-year average risk is .011 or 1.1%. 0111 . 0 = Example of IP
  • 12. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Synonyms: incidence density, person-time rate •  Interpretation A: “Speed” at which events occur •  Interpretation B: When disease is rare: rate per person-year ≈ one-year risk •  Calculated differently in closed and open populations risk @ time - person of Sum onsets no. IR = Incidence rate
  • 13. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Objective: estimate rate of uterine cancer •  Recruit cohort of 1000 women •  100 had hysterectomies, leaving 900 at risk •  Follow at risk individuals for 10 years •  Observe 10 onsets of uterine cancer time - person onsets of no. IR = Rate is .00111 per year or 11.1 per 10,000 years years 9000 10 = years 10 women 900 women 10 × = year .00111 = Example of IP
  • 14. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Individual follow-up over time years 50 years 25 onsets 2 + = time - person onsets IR ∑ = years - person 100 per 2.67 years - person per 0267 . 0 = = years 75 onsets 2 =
  • 15. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Rate Mortality 1 expectacy Life = In stationary populations, and in cohorts with complete follow-up, the mortality rate is the reciprocal of life expectancy (and vice versa). Example: for a mortality rate of .0267 per year years 5 . 37 year .0267 1 expectacy Life = = Mortality and life expectancy
  • 16. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 years 37.5 2 years 50) (25 expectancy life has cohort This = + year 0267 . 0 years 50) (25 deaths 2 of rate mortality a has cohort This 1 − = +
  • 17. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 years - person 100,000 per 877 = n observatio of duration size population Avg onsets IR × = -1 year deaths 008770 . 0 = Example: 2,391,630 deaths in 1999 (one year) Population size = 272,705,815 year 1 persons 5 272,705,81 deaths 2,391,630 IR × = Incidence rate in open population
  • 18. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Point prevalence ≡ prevalence at a particular point in time •  Period prevalence ≡ prevalence over a period of time •  Interpretation A: proportion with condition •  Interpretation B: probability a person selected at random will have the condition people of no. cases new and old no. Prevalence= Prevalence
  • 19. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 •  Recruit 1000 women •  Ascertain: 100 with hysterectomies people of no. cases no. Prevalence= Prevalence in sample is 10% 10 . 0 = people 1000 people 100 = Example of prevalence
  • 20. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Increase incidence ! increase inflow Increase average duration of disease ! decreased outflow Ways to increase prevalence Dynamic prevalence
  • 21. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 duration) (average rate) (incidence prevalence × ≈ Example: •  Incidence rate = 0.01 / year •  Average duration of the illness = 2 years. •  Prevalence ≈ 0.01 / year × 2 years = 0.02 When disease rare & population stationary Prevalence and incidence
  • 22. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Estimation of 95% confidence interval
  • 23. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Proportions •  Proportion of event in the sample, denoted “p hat”: where x = no. of events and n = sample size n x p = ˆ
  • 24. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Proportion, cont Two of 10 individuals in the sample have a risk factor for disease X The prevalence of this risk factor in the sample is: (or10%) 1 . 0 10 2 ˆ = = = n x p
  • 25. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Inference about a Proportion How good is sample proportion at estimating population proportion p? Consider what would happen if we took repeated samples, each of size n, from the population? How would sample proportions be distributed?
  • 26. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 p q n pq p N p − = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ 1 where , ~ ˆ Normal Approximation for Proportions
  • 27. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Normal approximation H0: p = p0 vs. Ha: p ! p0 where p0 represents the proportion specified by the null hypothesis Test statistic ˆ 0 0 0 stat n q p p p z − =
  • 28. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Example n = 57 finds 17 smokers (p-hat = 17 / 57 = 0.2982). The national average for smoking prevalence is 0.25. Is the proportion in the sample significantly different than the national average? H0:p = 0.25 vs. Ha: p ≠ 0.25 The sample proportion is not significantly different than the national average. 84 . 0 57 75 . 25 . 25 . 2982 . ˆ 0 0 0 stat = ⋅ − = − = n q p p p z
  • 29. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Confidence Interval for Proportion  p± z1−α 2 ⋅  p q  n where  x =  x + 2,  n = n + 4,  p =  x  n , and  q =1−  p This method is called the “plus four method” because it adds four imaginary points during calculations. It is much more accurate than the traditional Normal method. A 1−α(100%) confidence interval for p is:
  • 30. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Confidence Interval, example ) 4277 . , 1953 (. 1162 . 3115 . ) 0593 )(. 96 . 1 ( 3115 . ~ for CI 95% confidence 95% for 96 . 1 0593 . 61 ) 6885 )(. 3115 (. ~ ~ ~ 6885 . 3115 . 1 ~ ; 3115 . 61 19 ~ 61 4 57 4 ~ ; 19 2 17 2 ~ ~ ~ = ± = ± = ⋅ ± = = = = = = − = = = = + = + = = + = + = p p SE z p p z n q p SE q p n n x x Based on n = 57 and x = 17, the 95% CI for the prevalence of smoking in the population is:
  • 31. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Sample Size and Power Three approaches: •  n needed to estimate p with margin of error m (for confidence interval) •  n needed to test H0 at given α level and power •  The power of testing H0 under stated conditions
  • 32. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 n need to achieve margin of error m •  where p* represent an educated guess for population proportion p (when no educated guess for p* is available, let p* = .5) •  Round up to next integer to ensure stated precision 2 * * 2 1 2 m q p z n α − =
  • 33. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 n need to achieve m, example Suppose our educated guess for the proportion is p* = 0.30 897 896.4 03 . ) 70 )(. 30 )(. 96 . 1 ( 2 2 ⇒ = = n For margin of error of .03, use: 323 322.7 05 . ) 70 )(. 30 )(. 96 . 1 ( 2 2 ⇒ = = n For margin of error of .05, use:
  • 34. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 n to test H0: p = p0 where •  α ≡ alpha level of the test (two-sided) •  1 – β ≡ power of the test •  p0 ≡ proportion under the null hypothesis •  p1 ≡ proportion under the alternative hypothesis 2 0 1 1 1 1 0 0 1 2 ⎟ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎜ ⎝ ⎛ − + = − − p p q p z q p z n β α
  • 35. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 n to test H0: p = p0, example How large a sample is needed to test H0: p = 0.21 against Ha: p = 0.31 at α = 0.05 (two-sided) with 90% power? 194 3 . 193 21 . 0 31 . 0 ) 69 . 0 )( 31 . 0 ( 28 . 1 ) 79 . 0 )( 21 . 0 ( 96 . 1 2 ⇒ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + = n ! means round up to ensure stated power
  • 36. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Conditions for Inference •  Sampling independence •  Valid information •  The plus-four confidence interval requires at least 10 observations •  The z test of H0: p = p0 requires np0q0 ! 5 I'd rather have a sound judgment than a talent. Mark Twain
  • 37. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Bayesian analysis of proportion
  • 38. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Review •  When X ∼ Binomial(n, p) we know that •  p = X/n is the MLE for p •  Var(p) = p(1 − p)/n •  Wald interval for p p± Z1−α/2 p 1− p ( )
  • 39. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Problems of Wald CI •  The Wald interval performs terribly •  Coverage probability varies wildly, sometimes being quite low for certain values of n even when p is not near the boundaries –  Example, when p = .5 and n = 40 the actual coverage of a 95% interval is only 92% •  When p is small or large, coverage can be quite poor even for extremely large values of n –  Example, when p = .005 and n = 1, 876 the actual cov- erage rate of a 95% interval is only 90%
  • 40. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Simple adjustment •  A simple fix for the problem is to add 2 successes and 2 failures •  That is let p = (X + 2) / (n + 4) •  Lead to the Agresti-Coull interval p± Z1−α/2 p 1− p ( )
  • 41. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Bayesian analysis •  Bayesian statistics posits a prior on the parameter of interest •  All inferences are then performed on the distribution of the parameter given the data, called the posterior •  In general Posterior ∝ Likelihood × Prior •  The likelihood is the factor by which our prior beliefs are updated to produce conclusions in the light of the data
  • 42. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Beta priors •  The beta distribution is the default prior for parame- ters between 0 and 1 •  The beta density depends on two parameters α and β •  The mean of the beta density is α/(α + β) •  The variance of the beta density is •  The uniform density is the special case where α = β = 1 between 0 and 1. beta density depends on two parameters α a Γ(α + β) Γ(α)Γ(β) pα−1(1 − p)β−1 for 0 ≤ p ≤ 1 mean of the beta density is α/(α + β) variance of the beta density is αβ (α + β)2(α + β + 1) uniform density is the special case where α =
  • 43. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Some beta distributions 0.0 0.4 0.8 2 6 10 p density alpha = 0.5 beta = 0.5 0.0 0.4 0.8 0 5 10 15 p density alpha = 0.5 beta = 1 0.0 0.4 0.8 0 10 20 p density alpha = 0.5 beta = 2 0.0 0.4 0.8 0 5 10 15 p density alpha = 1 beta = 0.5 0.0 0.4 0.8 0.6 1.0 1.4 p density alpha = 1 beta = 1 0.0 0.4 0.8 0.0 1.0 2.0 p density alpha = 1 beta = 2 0.0 0.4 0.8 0 10 20 p density alpha = 2 beta = 0.5 0.0 0.4 0.8 0.0 1.0 2.0 p density alpha = 2 beta = 1 0.0 0.4 0.8 0.0 1.0 p density alpha = 2 beta = 2
  • 44. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Posterior •  Suppose that we chose values of α and β so that the beta prior is indicative of our degree of belief regard- ing p in the absence of data •  Then using the rule that Posterior ∝ Likelihood × Prior and throwing out anything that doesn’t depend on p, we have that terior uppose that we chose values of α and β so that th eta prior is indicative of our degree of belief regar g p in the absence of data hen using the rule that Posterior ∝ Likelihood × Prior nd throwing out anything that doesn’t depend on e have that Posterior ∝ px(1 − p)n−x × pα−1(1 − p)β−1 = px+α−1(1 − p)n−x+β−1 his density is just another beta density with param
  • 45. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Posterior mean •  This density is just another beta density with param- eters α* =x+α and β =n−x+β Posterior mean • Posterior mean E[p | X] = α̃ α̃ + β̃ = x + α x + α + n − x + β = x + α n + α + β = x n × n n + α + β + α α + β × α + β n + α + β = MLE × π + Prior Mean × (1 − π)
  • 46. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Posterior variance •  Posterior variance is Posterior variance • The posterior variance is Var(p | x) = α̃β̃ (α̃ + β̃)2(α̃ + β̃ + 1) = (x + α)(n − x + β) (n + α + β)2(n + α + β + 1) • Let p̃ = (x + α)/(n + α + β) and ñ = n + α + β then we have Var(p | x) = p̃(1 − p̃) ñ + 1 •  Let p* = (x + α)/(n + α + β) and n* = n + α + β then we have Var(p | x) = p*(1 – p*) / (n* + 1)
  • 47. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 Jeffreys prior •  The “Jeffrey’s prior” has some theoretical benefits puts α = β = 0.5 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p prior, likelihood, posterior Prior Likelihood Posterior alpha = 0.5 beta = 0.5
  • 48. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p prior, likelihood, posterior Prior Likelihood Posterior alpha = 1 beta = 1 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 p prior, likelihood, posterior Prior Likelihood Posterior alpha = 2 beta = 2
  • 49. Workshop on Analysis of Clinical Studies – Can Tho University of Medicine and Pharmacy – April 2012 R code •  Install the binom package, then the command library(binom) binom.bayes(13, 20, type = "highest") gives the HPD interval. The default credible level is 95% and the default prior is the Jeffrey’s prior.