SlideShare a Scribd company logo
Imputation Techniques For Missing
Data In Clinical Trials
Presentation by,
NITHINGEORGEVINOD
PROJECTASSOCIATE
CENTREFORLIVESTOCKDEVELOPMENTANDPOLICYRESEARCH
KERALAVETERINARYANDANIMALSCIENCESUNIVERSITY
1
Contents
• Objectives
• Introduction to missing data
• Reasons for missing data
• Missing data mechanism
• Simple methods
• Single imputation
»Last observation carried forward (LOCF)
»Hot-deck imputation
»Arithmetic mean imputation
»Regression imputation
»Stochastic imputation
• Multiple imputation
2
Objectives
To introduce different imputation techniques
in missing data mechanism.
3
Introduction to missing data
Missing data
Some of the values in the data set are either
lost or not observed or not available due to
natural or non natural reasons.
(James R. Carpenter: Missing data in randomized controlled trials)
4
Reasons for missing data
• patients are in very critical conditions.
• patients wants to change the treatment.
• Missing due to the break down of machines
• Failed in continuing the follow up.
• Failed to answer some questionnaires.
• Patients are cured or died before the study.
• Investigator is forgot to collect the data
• Family migrated
• Patients profile may missing
5
Effect of missing data
• Bias
• Power and variability
• Inaccurate results
6
Missing data mechanism
(Rubin 1976)
• Missing Completely At Random (MCAR).
• Missing at random (MAR).
• Missing Not At Random (MNAR).
7
Missing At Random (MAR)
The probability of missing data on a variable Y is related to
some other measured variables in the analysis model but not
to the values of Y itself.
Examples
• Missing blood pressure measurement may be lower than
measured blood pressure because younger people may be
more likely to have missing blood pressure measurement.
• In the study of quality of life the psychologist finds that
elderly patients with and patients with less education have a
higher probability to refuse the QL questionnaire.
8
Missing Completely At Random
The probability of missing data on a variable Y is
unrelated to other measured variables and unrelated
to the values of Y itself.
Examples
• Blood Pressure measurement is missing because of
break down of an automatic sphygmomanometer.
• Suppose that a psychologist is studying quality of life
in a group of cancer patients and finds that patient is
missing, because they migrated to other place.
9
Missing Not At Random (MNAR)
The probability of missing data in a variable Y
is related to the values of Y itself, even after
controlling for other variables.
Examples
• Suppose the study is not effective for reducing
the blood pressure, their may be a chance of
subjects drop out.
10
Different methods to deal missing data
• List Wise deletion
• Pair Wise Deletion
• Last Observation Carried Forward
• Hot-Deck Imputation
• Arithmetic Mean Imputation
• Regression Imputation
• Stochastic Regression Imputation
• Cold-Deck Imputation
• Averaging The Available Pattern Imputation
• Maximum Likelihood Estimation
• Markov chain Monte Carlo method
11
Simple techniques
• List wise deletion
Discards the data for any case that has one or
more missing value.
12
Single Imputation
Method that imputes the missing data with
seemingly suitable replacement value.
13
Last Observation Carried Forward
(LOCF)
LOCF takes the last available response and
substitutes the value into all subsequent
missing values.
Advantages
• It generates a complete data set.
• Easy to implement
Disadvantages
• Produce biased estimates.
• Not sensible when the data are MCAR.
14
Hot-deck Imputation (Scheuren, 2005)
Replaces each missing value with a random draw
from a subsample of respondents that scored
similarly on a data set of matching variables.
Advantages
• It generates a complete data set.
Disadvantages
• Not well suited for estimating measures of
association.
• Produce substantially biased estimates of correlation
and regression coefficients.
15
Arithmetic Mean Imputation
(Wilks, 1932)
Filling the missing values with arithmetic
mean of the available cases.
Advantages
• It is applicable for all type of missingness.
• It also generate a complete data set.
Disadvantages
• Reduces the variability of the data.
• Affect the measures of association.
16
Regression Imputation (Buck, 1960)
Replaces missing values with predicted scores from
a regression equation by using information from
the complete variables.
Advantages
• It generates a complete data set.
• Variables tend to be correlated
Disadvantages
• Inputs data with perfectly correlated scores
• Over estimate correlation
• bias
17
AGE QL QL_missing R I
35 90 90
36 89 89
38 88 88
38 87 87
41 82 82
45 80 80
47 78 78
48 76 76
49 71 71
55 73 73
57 70 70
59 70 70
62 68 __ 65.03
65 67 __ 62.37
68 67 __ 59.71
72 63 __ 56.17
72 60 __ 56.17
73 59 __ 55.28
75 52 __ 53.51
76 51 __ 52.63
QL R I
mean 70.74 72.05
SD 12.726 11.74
QL = βo+β1*AGE
QL = 119.950-.886*AGE
Stochastic Regression Imputation
Uses regression equations to predict the
incomplete data with a normally distributed
residual term.
Advantages
• Most appropriate method.
• Input approximately equal results.
• It gives unbiased parameter under an MAR data
mechanism.
Disadvantage
• Under estimate standard error.
19
20
AGE QL QL_missing R V S I
35 90 90
36 89 89
38 88 88
38 87 87
41 82 82
45 80 80
47 78 78
48 76 76
49 71 71
55 73 73
57 70 70
59 70 70
62 68 __ 5.67 70.69
65 67 __ 3.72 66.08
68 67 __ -4.13 55.57
72 63 __ -0.39 55.77
72 60 __ -7.20 48.96
73 59 __ 2.39 57.66
75 52 __ -6.64 46.86
76 51 __ 1.84 54.45
QL = βo+β1*AGE+ʐi
QL = 119.950-.886*AGE+ʐi
QL S I
mean 70.74 70.50
SD 12.726 13.61
Multiple imputation
Creates several copies of the data and imputes
each copy with different plausible estimates of
missing values.
21
Procedure
I. Imputation phase
• Data augmentation
» I-step
» P-step
II. Analysis phase
• A statistical analysis is performed on each data
using the same technique.
III. Pooling phase
• Estimates and their standard errors are averaged
into a single set of value.
22
Data augmentation
I-step
stochastic
imputation
New data
set
P-step
23
Data set 1 Data set 2 Data set 20
Conclusion
• Imputation is an attractive idea because it produce a
complete data set and make the data usable.
• Each imputation produce biased parameter
estimates.
• Stochastic regression is the only traditional approach
and yield unbiased estimate under an MAR
mechanism.
• Multiple imputation also produce similar estimates
• The techniques are rare, if the data is categorical and
if the missing mechanism is MNAR.
24
References
• Amanda N. Baraldi & Craig K. Ender: An
introduction to missing data analysis. Journal of
school psychology . 2009; 9-18.
• Craig K. Enters: Applied missing data analysis.
The Guilford press. New York. London. 2010, 2-
85.
• James R. Carpenter: Missing data in randomized
controlled trials-A practical guide 2007; 4-16
• Schafer J L, & Graham J W. Missing data: Our view
of the state of the art 2002; 147-77.
25
26

More Related Content

PDF
Missing data handling
PPTX
Missing Data and Causes
PDF
Brief introduction to data visualization
PPTX
Data Augmentation
PPTX
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
PDF
Multinomial Logistic Regression
PDF
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
PDF
Regression Analysis
Missing data handling
Missing Data and Causes
Brief introduction to data visualization
Data Augmentation
SAMPLE SIZE CALCULATION IN DIFFERENT STUDY DESIGNS AT.pptx
Multinomial Logistic Regression
2024-10-28 All Things Open - Advanced Retrieval Augmented Generation (RAG) Te...
Regression Analysis

What's hot (20)

PDF
PPTX
Imputation of missing data in clinical trials
PPTX
Missing Data and data imputation techniques
PPTX
Data mining Part 1
PPTX
Analysis-of-data-with-missing-values.pptx
PDF
Survival Analysis Using SPSS
PPTX
Hypothesis testing ppt final
PPT
Survival Analysis Lecture.ppt
PPTX
Multiple Linear Regression
PPTX
Poisson regression models for count data
PPTX
PPT
My regression lecture mk3 (uploaded to web ct)
PDF
Biostatistics Workshop: Missing Data
PPTX
Normality test on SPSS
PPTX
Statistical inference concept, procedure of hypothesis testing
PPTX
Analysis Of Variance - ANOVA
PPTX
Logistic regression analysis
PDF
Categorical data analysis
PPSX
Biostatistics
Imputation of missing data in clinical trials
Missing Data and data imputation techniques
Data mining Part 1
Analysis-of-data-with-missing-values.pptx
Survival Analysis Using SPSS
Hypothesis testing ppt final
Survival Analysis Lecture.ppt
Multiple Linear Regression
Poisson regression models for count data
My regression lecture mk3 (uploaded to web ct)
Biostatistics Workshop: Missing Data
Normality test on SPSS
Statistical inference concept, procedure of hypothesis testing
Analysis Of Variance - ANOVA
Logistic regression analysis
Categorical data analysis
Biostatistics
Ad

Viewers also liked (12)

PPTX
Statistical Approaches to Missing Data
PPTX
Imputation Techniques For Market Research Datasets With Missing Values
PDF
Multiple Imputation: Joint and Conditional Modeling of Missing Data
PDF
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
PDF
How To Optimize Your EDC Solution For Risk Based Monitoring
PDF
Risk Based Monitoring in Practice
PDF
PROMISE 2011: "Handling missing data in software effort prediction with naive...
PDF
Stata tutorial
PPTX
Survival analysis
PDF
Data management in Stata
PPTX
Sampling Methods in Qualitative and Quantitative Research
Statistical Approaches to Missing Data
Imputation Techniques For Market Research Datasets With Missing Values
Multiple Imputation: Joint and Conditional Modeling of Missing Data
SheffieldR July Meeting - Multiple Imputation with Chained Equations (MICE) p...
How To Optimize Your EDC Solution For Risk Based Monitoring
Risk Based Monitoring in Practice
PROMISE 2011: "Handling missing data in software effort prediction with naive...
Stata tutorial
Survival analysis
Data management in Stata
Sampling Methods in Qualitative and Quantitative Research
Ad

Similar to Imputation techniques for missing data in clinical trials (20)

PDF
Overview of statistical tests: Data handling and data quality (Part II)
PPT
3 Missing data12256429.ppt
PDF
Chapter 6 data analysis iec11
PDF
CHE Seminar 20 November 2013
PDF
Statistics
PPTX
Analysis Report Presentation 041515 - Team 4
PDF
Statistical analysis
PDF
Anomaly detection Meetup Slides
PDF
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
PPT
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
PPT
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
PPTX
Basics in Biostats,applications,types,about in detile
PPTX
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
PDF
Anomaly detection
PDF
Anomaly detection
PPTX
Outlier analysis and anomaly detection
PPT
EXPLORATORY DATA ANALYSIS
PPT
EXPLORATORY DATA ANALYSIS with tools.ppt
PPTX
Presentation of Project and Critique.pptx
PDF
Biostatistics CH Lecture Pack
Overview of statistical tests: Data handling and data quality (Part II)
3 Missing data12256429.ppt
Chapter 6 data analysis iec11
CHE Seminar 20 November 2013
Statistics
Analysis Report Presentation 041515 - Team 4
Statistical analysis
Anomaly detection Meetup Slides
Anomaly detection: Core Techniques and Advances in Big Data and Deep Learning
EXPLORATORY DATA ANALYSIS and ANALYSIS.ppt
EXPLORATORY DATA ANALYSIS FOR BEGINNERS AND STUDENTS
Basics in Biostats,applications,types,about in detile
Quality Control for Quantitative Tests by Prof Aamir Ijaz (Pakistan)
Anomaly detection
Anomaly detection
Outlier analysis and anomaly detection
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS with tools.ppt
Presentation of Project and Critique.pptx
Biostatistics CH Lecture Pack

Recently uploaded (20)

PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
GDM (1) (1).pptx small presentation for students
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Lesson notes of climatology university.
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
Institutional Correction lecture only . . .
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Abdominal Access Techniques with Prof. Dr. R K Mishra
GDM (1) (1).pptx small presentation for students
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Final Presentation General Medicine 03-08-2024.pptx
Complications of Minimal Access Surgery at WLH
Microbial diseases, their pathogenesis and prophylaxis
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Computing-Curriculum for Schools in Ghana
Lesson notes of climatology university.
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharma ospi slides which help in ospi learning
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
102 student loan defaulters named and shamed – Is someone you know on the list?
Microbial disease of the cardiovascular and lymphatic systems
Institutional Correction lecture only . . .
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf

Imputation techniques for missing data in clinical trials

  • 1. Imputation Techniques For Missing Data In Clinical Trials Presentation by, NITHINGEORGEVINOD PROJECTASSOCIATE CENTREFORLIVESTOCKDEVELOPMENTANDPOLICYRESEARCH KERALAVETERINARYANDANIMALSCIENCESUNIVERSITY 1
  • 2. Contents • Objectives • Introduction to missing data • Reasons for missing data • Missing data mechanism • Simple methods • Single imputation »Last observation carried forward (LOCF) »Hot-deck imputation »Arithmetic mean imputation »Regression imputation »Stochastic imputation • Multiple imputation 2
  • 3. Objectives To introduce different imputation techniques in missing data mechanism. 3
  • 4. Introduction to missing data Missing data Some of the values in the data set are either lost or not observed or not available due to natural or non natural reasons. (James R. Carpenter: Missing data in randomized controlled trials) 4
  • 5. Reasons for missing data • patients are in very critical conditions. • patients wants to change the treatment. • Missing due to the break down of machines • Failed in continuing the follow up. • Failed to answer some questionnaires. • Patients are cured or died before the study. • Investigator is forgot to collect the data • Family migrated • Patients profile may missing 5
  • 6. Effect of missing data • Bias • Power and variability • Inaccurate results 6
  • 7. Missing data mechanism (Rubin 1976) • Missing Completely At Random (MCAR). • Missing at random (MAR). • Missing Not At Random (MNAR). 7
  • 8. Missing At Random (MAR) The probability of missing data on a variable Y is related to some other measured variables in the analysis model but not to the values of Y itself. Examples • Missing blood pressure measurement may be lower than measured blood pressure because younger people may be more likely to have missing blood pressure measurement. • In the study of quality of life the psychologist finds that elderly patients with and patients with less education have a higher probability to refuse the QL questionnaire. 8
  • 9. Missing Completely At Random The probability of missing data on a variable Y is unrelated to other measured variables and unrelated to the values of Y itself. Examples • Blood Pressure measurement is missing because of break down of an automatic sphygmomanometer. • Suppose that a psychologist is studying quality of life in a group of cancer patients and finds that patient is missing, because they migrated to other place. 9
  • 10. Missing Not At Random (MNAR) The probability of missing data in a variable Y is related to the values of Y itself, even after controlling for other variables. Examples • Suppose the study is not effective for reducing the blood pressure, their may be a chance of subjects drop out. 10
  • 11. Different methods to deal missing data • List Wise deletion • Pair Wise Deletion • Last Observation Carried Forward • Hot-Deck Imputation • Arithmetic Mean Imputation • Regression Imputation • Stochastic Regression Imputation • Cold-Deck Imputation • Averaging The Available Pattern Imputation • Maximum Likelihood Estimation • Markov chain Monte Carlo method 11
  • 12. Simple techniques • List wise deletion Discards the data for any case that has one or more missing value. 12
  • 13. Single Imputation Method that imputes the missing data with seemingly suitable replacement value. 13
  • 14. Last Observation Carried Forward (LOCF) LOCF takes the last available response and substitutes the value into all subsequent missing values. Advantages • It generates a complete data set. • Easy to implement Disadvantages • Produce biased estimates. • Not sensible when the data are MCAR. 14
  • 15. Hot-deck Imputation (Scheuren, 2005) Replaces each missing value with a random draw from a subsample of respondents that scored similarly on a data set of matching variables. Advantages • It generates a complete data set. Disadvantages • Not well suited for estimating measures of association. • Produce substantially biased estimates of correlation and regression coefficients. 15
  • 16. Arithmetic Mean Imputation (Wilks, 1932) Filling the missing values with arithmetic mean of the available cases. Advantages • It is applicable for all type of missingness. • It also generate a complete data set. Disadvantages • Reduces the variability of the data. • Affect the measures of association. 16
  • 17. Regression Imputation (Buck, 1960) Replaces missing values with predicted scores from a regression equation by using information from the complete variables. Advantages • It generates a complete data set. • Variables tend to be correlated Disadvantages • Inputs data with perfectly correlated scores • Over estimate correlation • bias 17
  • 18. AGE QL QL_missing R I 35 90 90 36 89 89 38 88 88 38 87 87 41 82 82 45 80 80 47 78 78 48 76 76 49 71 71 55 73 73 57 70 70 59 70 70 62 68 __ 65.03 65 67 __ 62.37 68 67 __ 59.71 72 63 __ 56.17 72 60 __ 56.17 73 59 __ 55.28 75 52 __ 53.51 76 51 __ 52.63 QL R I mean 70.74 72.05 SD 12.726 11.74 QL = βo+β1*AGE QL = 119.950-.886*AGE
  • 19. Stochastic Regression Imputation Uses regression equations to predict the incomplete data with a normally distributed residual term. Advantages • Most appropriate method. • Input approximately equal results. • It gives unbiased parameter under an MAR data mechanism. Disadvantage • Under estimate standard error. 19
  • 20. 20 AGE QL QL_missing R V S I 35 90 90 36 89 89 38 88 88 38 87 87 41 82 82 45 80 80 47 78 78 48 76 76 49 71 71 55 73 73 57 70 70 59 70 70 62 68 __ 5.67 70.69 65 67 __ 3.72 66.08 68 67 __ -4.13 55.57 72 63 __ -0.39 55.77 72 60 __ -7.20 48.96 73 59 __ 2.39 57.66 75 52 __ -6.64 46.86 76 51 __ 1.84 54.45 QL = βo+β1*AGE+ʐi QL = 119.950-.886*AGE+ʐi QL S I mean 70.74 70.50 SD 12.726 13.61
  • 21. Multiple imputation Creates several copies of the data and imputes each copy with different plausible estimates of missing values. 21
  • 22. Procedure I. Imputation phase • Data augmentation » I-step » P-step II. Analysis phase • A statistical analysis is performed on each data using the same technique. III. Pooling phase • Estimates and their standard errors are averaged into a single set of value. 22
  • 24. Conclusion • Imputation is an attractive idea because it produce a complete data set and make the data usable. • Each imputation produce biased parameter estimates. • Stochastic regression is the only traditional approach and yield unbiased estimate under an MAR mechanism. • Multiple imputation also produce similar estimates • The techniques are rare, if the data is categorical and if the missing mechanism is MNAR. 24
  • 25. References • Amanda N. Baraldi & Craig K. Ender: An introduction to missing data analysis. Journal of school psychology . 2009; 9-18. • Craig K. Enters: Applied missing data analysis. The Guilford press. New York. London. 2010, 2- 85. • James R. Carpenter: Missing data in randomized controlled trials-A practical guide 2007; 4-16 • Schafer J L, & Graham J W. Missing data: Our view of the state of the art 2002; 147-77. 25
  • 26. 26