SlideShare a Scribd company logo
Anuj Vijay Bhatia
FPRM 14
Institute of Rural Management Anand
NON RESPONSE
ERROR
HOW TO HANDLE IT?
ResearchMethodology
 The respondent has not replied to the mail or did not
find time to give the interview or cannot be
contacted. There can be many such reasons for
nonresponse.
 High rate of non response is serious.
 Research may lose:
 Credibility
 Acceptability
 Accuracy and Professional Soundness
 Methodology used should be described completely.
 Researchers responsibility to establish external
validity.
 Appropriate sample size and acceptable response
rate must be achieved.
NON RESPONSE ERROR
 Nonresponse error exist to the extent that subjects
included in sample fail to provide usable responses.
 Research manifested by high nonresponse loses
Validity and Reliability.
 Many research articles:
 Do not mention nonresponse as a threat to external validity.
 Do not attempt to control for non response error.
 Do not provide reference to the literature of handling
nonresponse.
 It limits the ability of the researcher to generalize.
NON RESPONSE ERROR
 In a survey research, the ability to generalize is critical.
 There is a risk that non-respondents will be
systematically different from respondents.
 Response rate is higher (100% many times) when
purposive or convenience sampling is used.
 However, probability sampling is used, response rates are
low.
 Ability to generalize is limited when purposive or
convenience sampling is used.
 The threat to validity is not due to response rate but due
to nonrepresentataive sampling procedures.
 To ensure external validity answer: Will your results be
same if a 100% response rate was achieved?
SAMPLING PROCEDURES AND NON-
RESPONSE
 Suppose the population is divided into two strata i.e., the
respondents ( r ) and the non-respondents whose data is
missing (m). Suppose we want to determine 𝑌 , the total
population mean.
 𝒀 = Wr 𝒀 𝒓 + Wm 𝒀 𝒎
 Yr and Ym are the means of respondents and non—
respondents respectively. Wr and Wm are weights.
 If the survey fails to collect data from non-respondents, it will
produce result estimate equal to 𝑌 𝑟.
 The bias will be the difference between 𝑌 𝑟 𝑎𝑛𝑑 𝑌
 𝒀 𝒓 − 𝒀 = 𝒀 𝒓 − ( Wr 𝒀 𝒓 + Wm 𝒀 𝒎 )
= 𝒀 𝒓 𝟏 − 𝑾𝒓 − 𝑾𝒎 𝒀 𝒎
= Wm (𝒀 𝒓 − 𝒀 𝒎)
A SIMPLE LOGIC
 Begins with designing and implementation.
 Appropriate sampling protocols and procedures
should be used to maximize participation.
 Ensure that response rate is enough to conclude that
non-response is not a threat to external validity.
 If required go for some additional procedures to
establish that non-response is not a threat to
external validity.
CONTROLLING NON-RESPONSE ERROR
Methods for Handling Non-Response
1. Comparison of Early to Late Respondents
2. Using “Days to Respond” as a Regression Variable
3. Compare Respondents to Non-Respondents
4. Compare Respondents on Characteristics known a
priori
5. Ignore Non-Response as a Threat to External
Validity
RECOMMENDATIONS FOR HANDLING
NON-RESPONSE
Method 1: Comparison of Early to Late Respondents
 Extrapolation based on statistical inferences
 Operationally define ‘Late Respondents’
 Last wave of respondents: Late Respondents
 Compare early and late respondents based on key
variables of interest.
 If no difference, results can be generalized to larger
population.
METHODS FOR HANDLING
NON-RESPONSE
Method 2: Using “Days to Respond” as a Regression
Variable
 “Days to respond” is coded as continuous variable and
used as IV in regression equation.
 Primary variables of interest are regressed on variable
“Days to Respond”.
 If not statistically significant: Assume that respondents
are not different from non-respondents.
METHODS FOR HANDLING
NON-RESPONSE
Method 3: Compare Respondents to Non-Respondents
Compute differences by sampling nonrespondents
and working extra diligently to get their responses.
Minimum 20% of responses from nonrespondents
should be obtained.
If fewer than 20% responses are obtained, Method 1
or 2 should be used by combining the results.
METHODS FOR HANDLING
NON-RESPONSE
Method 4: Compare Respondents on Characteristics
known a priori
 Compare respondents to population or
characteristics known in advance
 Describe similarities and differences.
Method 5: Ignore Non-Response as a Threat to External
Validity
 If above methods are you can choose to ignore.
METHODS FOR HANDLING
NON-RESPONSE
Anuj Vijay Bhatia
FPRM 14
Institute of Rural Management Anand
MISSING DATA
IN QUANTITATIVE RESEARCH
ResearchMethodology
 What is certain in life?
 Death
 Taxes
 What is certain in research?
 Measurement error
 Missing data
 Missing data can be:
 Due to preventable errors, mistakes, or lack of foresight by the
researcher
 Due to problems outside the control of the researcher
 Deliberate, intended, or planned by the researcher to reduce
cost or respondent burden
 Due to differential applicability of some items to subsets of
respondents Etc.
A FOOD FOR THOUGHT
Missing data and non response pdf
• Non-Response v/s Missing Data
• Missing Data: Where valid values on one or more
variables are not available for analysis.
• Researchers primary concern is to identify the
patterns and relationships underlying the missing
data.
• we need to understand process leading to missing
data to take appropriate course of action.
• Common in Social Research
• More acute in experiments and surveys
• Best way is to avoid it by planning and conscientious
data collection.
• Not uncommon to have some level of missing data.
MISSING DATA
Lost data
Reduces Statistical Power
Meaningfully diminishes sample size
Bias Parameter Estimates
Correlations biased downwards
Predictor scores affected
Restrict Variance
Central Tendency Biased
PRIMARY PROBLEMS
Simple Techniques
Listwise Deletion
Pairwise Deletion
Mean Substitution
Regression Imputation
Hot-Deck Imputation
Maximum Likelihood and Related Methods
Maximum Likelihood
Expectation Maximization
Repeated Measures and Time Series Designs
TECHNIQUES TO DEAL WITH
MISSING DATA
Eliminate all cases with missing data on any
predictor or criterion.
Sacrifices large amount of data
Decreases statistical power
May introduce bias in parameter
Default option in many statistical packages
LISTWISE DELETION
Deletes information only from those statistics
that “need” information.
Preserves great deal of information than
listwise deletion.
Interpretation becomes difficult.
May lead to mathematically inconsistent
correlations.
PAIRWISE DELETION
Use means in place of missing data
Allows to use rest of individual’s data
Preserves data
Easy to use
Attenuate variance and covariance estimates
Useful when correlations between variables is
low and less than 10% of data are missing.
MEAN SUBSTITUTION
 Estimate missing data based on other variables in
data set.
 Advantages:
 Preserves data
 Better than Listwise and Pairwise deletion
 Preserves the deviation from the mean
 Doesn’t attune correlations like mean substitution.
 Variants:
 Simple regression strategy
 Only one iteration
 Estimate relationships in variables and estimate missing data
 Stepwise/Iterative Regression
 Isolate a few key variables, prepare correlation matrix.
 Estimate regression equation and predict missing values
REGRESSION IMPUTATION
 Replace missing value with actual score from similar
case in current data set.
 Hot-deck? What is so hot about it?
 What is Cold-Deck then?
 Missing values are replaced with a reasonable estimate
from similar individual.
 Accurate: Real values are imputed
 May not distort distributions.
 Helpful when data is missing in patterns.
 Little literature backing the accuracy claim.
 Problematic when there are large classification variables.
 Categorizing variables sacrifices information.
 Estimating Standard Errors Difficult.
HOT-DECK IMPUTATION
 Assume: The observed data are a sample drawn from
multivariate normal distribution.
 Parameters are estimated by available data and then
missing scores are estimated based on the parameters
just estimated.
 The missing values are predicted by using conditional
distribution of variables on which data is available.
 ML provides explicit modeling of the imputation process
that is open to scientific analysis and critique.
 More accurate then Listwise deletion and better than ad
hoc approaches like mean substitution.
 However, it may be possible that differences are small
and the distributional assumptions in this method are
relatively strict.
MAXIMUM LIKELIHOOD
 Uses Expectation Maximization Algorithm
 Iterations through process of estimating missing data
 First iteration involves estimating missing data and then
estimating parameters using ML method.
 Second iteration would require re-estimating the missing
data based on new parameter estimates and then
recalculating the parameter estimates.
 This process continues till there is convergence in the
parameter estimates.
 Produces less biased estimates, more accurate.
 Open to scientific analysis and critique.
 Lengthy and complex.
EXPECTATION MAXIMIZATION
 Problem of Missing Data more severe
 Listwise deletion: Loss of more data due to repeated
measures.
 Additional data is collected on same measures at
different time.
 Opportunity to use strongly correlated variables to
impute missing data.
 Linear regression and subject mean can be used to
predict missing values, but it may be biased.
 Interpolation and Extrapolation can produced
relatively unbiased estimates.
REPEATED MEASURES AND TIME SERIES
DESIGN
 The data can be missing at three levels:
1. Item-level missingness
2. Construct-level missingness
3. Person-level missingness
LEVELS OF MISSINGNESS
(Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
Data can be missing randomly or
systematically.
Random Missingness:
Missing Completely at Random (MCR)
Systematic Missingness
Missing at Random (MAR)
Missing not at Random (MNAR)
MECHANISMS OF MISSING DATA
 MCAR (Missing Completely at Random)
 The probability that a variable value is missing does not depend on
the observed data values nor the missing data values.
 P ( missing | complete data ) = P (missing)
 MAR (Missing at Random)
 The probability that a variable value is missing partly depends on
other data that are observed in the dataset but does not depend on
any of the values that are missing.
 P(missing | complete data ) = P (missing | observed data)
 MNAR (Missing Not at Random)
 The probability that a variable value is missing depends on the
missing data values themselves.
 P (missing | complete data ) ≠ P (missing | observed data)
(Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
BIAS AND INACCURATE STANDARD
ERRORS
CHOOSING MISSING DATA TREATMENTS
(Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
STEP 1: DETERMINE THE TYPE OF MISSING DATA
 Is it under the control of researcher?
 Is it ignorable?
 Ignorable Missing Data
 Expected
 Remedies not needed
 Allowance for missing data are inherent in the technique
 Missing data is operating at random
 Non—Ignorable Missing Data
 Known to researchers: Some remedies if random
 Unknown missing data: Process less easy, but remedies
available
 Missing data known or unknown: Proceed to next step
A FOUR STEP PROCESS FOR IDENTIFYING
MISSING DATA AND APPLYING REMEDIES
STEP 2: DETERMINE THE EXTENT OF MISSING DATA
 Determine the extent of missing data
 Patterns of individual variables, individual cases and even
overall.
 Is it low enough to affect the results?
 It is random?
 If sufficiently low: Apply any remedy
 If not low: Determine the randomness before applying the
remedy
 Assessing the Extent and Pattern of Missing data:
 Tabulate
 Number of cases with missing data
 Percentage of variables with missing data in each case.
 Look for non-random pattern
 Also determine number of cases with no missing data (100%
complete)
 Is missing data too high to create a bias? (Rule of Thumb 1)
 Can deletion be used? (Rule of Thumb 2)
Missing data under 10% can generally be
ignored when it happens in random fashion.
The number of cases with no missing data
should be sufficient for the selected analysis
technique if replacement values will not be
substituted (imputed) for the missing data.
RULE OF THUMB 1
HOW MUCH MISSING DATA IS TOO MUCH?
 Variables with less 15% data are candidates for deletion.
 Higher level of missingness like 20-30% can be
remedied.
 Deletion of large data should be justifiable.
 Cases with missing data for dependent variables typically
are deleted to avoid increase in relationship with
independent variable.
 While deleting a variable, ensure a highly correlated
variable is available to represent intent of original
variable.
 Always perform analysis with or without the deleted
cases or variables to identify any marked differences.
RULE OF THUMB 2
DELETION BASED ON MISSING DATA
STEP 3: DIAGNOSE THE RANDOMNESS OF THE
MISSING DATA PROCESSES.
 Degree of randomness determines the appropriate level
of remedy.
Level of Randomness
 Random: MCAR
 Observed values of Y are truly a random sample of Y values.
 No underlying process that tends to bias the observed data.
 Missing data are indistinguishable form complete data.
 Non-Random: MAR
 Missing values of Y depends on X but not on Y
 Observed values of Y represent a random sample of Y for each
value of X.
 Cannot be generalized.
Diagnostic Tests for Level of Randomness
 Forming 2 groups, with and without missing data : T-Test
 Overall test of Randomness for MCAR
STEP 4: SELECT THE IMPUTATION METHOD
UNDER 10%
Any imputation method can be applied.
10% - 20%
For MCAR
 Hot-Deck Case Substitution and Regression Imputation
For MAR
 Model Based Methods
Over 20%
Regression method for MCAR
Model Based method for MAR
RULE OF THUMB 3
IMPUTATION OF MISSING DATA
1. Dooley, L. M., & Lindner, J. R. (2003). The handling of
nonresponse error. Human Resource Development
Quarterly, 14(1), 99-110.
2. Roth, P. L. (1994). Missing data: A conceptual review for
applied psychologists. Personnel psychology, 47(3), 537-560.
3. Blair, E., & Zinkhan, G. M. (2006). Nonresponse and
generalizability in academic research. Journal of the Academy
of Marketing Science, 34(1), 4-7.
4. Newman, D. A. (2014). Missing data five practical
guidelines. Organizational Research Methods, 17(4), 372-411.
5. Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham,
R. L. (2006). Multivariate data analysis 6th Edition. New
Jersey: Pearson Education.
REFERENCES
Missing data and non response pdf

More Related Content

PPTX
Logistic regression
PPTX
PDF
Logistic Regression Analysis
PPTX
Sample size estimation
PPTX
Stat 3203 -pps sampling
PPTX
Multivariate analysis
PPTX
Introduction to excel - application to statistics
PPTX
Correlation & Regression Analysis using SPSS
Logistic regression
Logistic Regression Analysis
Sample size estimation
Stat 3203 -pps sampling
Multivariate analysis
Introduction to excel - application to statistics
Correlation & Regression Analysis using SPSS

What's hot (20)

PPTX
Student t-test
PPTX
Multinomial Logistic Regression Analysis
PDF
Assumptions of ANOVA
PPTX
Regression Analysis
PPTX
coefficient correlation
PPTX
Survival analysis
PPTX
Sampling
PPTX
Chi square test
PPTX
T test, independant sample, paired sample and anova
PPTX
Properties of correlation coefficient
PPTX
Factor Analysis in Research
PPTX
Logistic regression with SPSS
PPTX
Non parametric tests
PPTX
sampling simple random sampling
PPTX
CASE STUDY OF cohort studY
PDF
Multiple regression
PDF
Simple Random Sampling
PPTX
Addition rule and multiplication rule
Student t-test
Multinomial Logistic Regression Analysis
Assumptions of ANOVA
Regression Analysis
coefficient correlation
Survival analysis
Sampling
Chi square test
T test, independant sample, paired sample and anova
Properties of correlation coefficient
Factor Analysis in Research
Logistic regression with SPSS
Non parametric tests
sampling simple random sampling
CASE STUDY OF cohort studY
Multiple regression
Simple Random Sampling
Addition rule and multiplication rule
Ad

Viewers also liked (19)

PPTX
What constitutes a theoretical contribution
PPT
Financialstatementanalysis 121109105608-phpapp01
PDF
Data collection m.com final
PDF
Ratio analysis formula sheet cbse accounting
PPT
Financial statement analysis
PDF
The social construction of reality peter berger thomas luckmann
PPTX
Explanation in science (philosophy of science)
DOCX
PPTX
Business management
PPT
Humanresourcemanagement 121110122521-phpapp01
PPTX
Assessement of companies
PPTX
Cost and management accounting
PPT
Human resource management
PPT
DEPENDENT & INDEPENDENT VARIABLES
PPT
Corporate tax planning
PPTX
Dependent v. independent variables
PDF
An Introduction into Philosophy of Science for Software Engineers
PPTX
The structure of scientific revolutions (anuj)
PPT
Analysis of financial statements
What constitutes a theoretical contribution
Financialstatementanalysis 121109105608-phpapp01
Data collection m.com final
Ratio analysis formula sheet cbse accounting
Financial statement analysis
The social construction of reality peter berger thomas luckmann
Explanation in science (philosophy of science)
Business management
Humanresourcemanagement 121110122521-phpapp01
Assessement of companies
Cost and management accounting
Human resource management
DEPENDENT & INDEPENDENT VARIABLES
Corporate tax planning
Dependent v. independent variables
An Introduction into Philosophy of Science for Software Engineers
The structure of scientific revolutions (anuj)
Analysis of financial statements
Ad

Similar to Missing data and non response pdf (20)

DOCX
Multiple imputation of missing data
PPTX
Missing Data Analysis_Data Analysis Techniques
DOC
data Sreening.doc
PPTX
How to Deal With Missing Data
PDF
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
PDF
Missing data handling
PPTX
Imputation techniques for missing data in clinical trials
PPTX
A survey on missing information strategies and imputation methods in healthcare
PPTX
Dealing with incomplete data for mapping and spatial analysis
PPTX
missingdatahandling-160923201313.pptx
PPTX
Analysis-of-data-with-missing-values.pptx
PDF
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
PDF
Biostatistics Workshop: Missing Data
PDF
CHE Seminar 20 November 2013
PDF
2010 smg training_cardiff_day1_session3_higgins
PDF
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
PDF
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
PPTX
PACIS Survey Workshop
PPTX
Missing Data and Causes
PDF
missingpdf
Multiple imputation of missing data
Missing Data Analysis_Data Analysis Techniques
data Sreening.doc
How to Deal With Missing Data
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Scores...
Missing data handling
Imputation techniques for missing data in clinical trials
A survey on missing information strategies and imputation methods in healthcare
Dealing with incomplete data for mapping and spatial analysis
missingdatahandling-160923201313.pptx
Analysis-of-data-with-missing-values.pptx
SPSS GuideAssessing Normality, Handling Missing Data, and Calculating Total S...
Biostatistics Workshop: Missing Data
CHE Seminar 20 November 2013
2010 smg training_cardiff_day1_session3_higgins
COMPARISION OF PERCENTAGE ERROR BY USING IMPUTATION METHOD ON MID TERM EXAMIN...
A Two-Step Self-Evaluation Algorithm On Imputation Approaches For Missing Cat...
PACIS Survey Workshop
Missing Data and Causes
missingpdf

More from Anuj Bhatia (6)

PPTX
Mtp agency and transaction cost theory
PPTX
Risk aversion
PDF
Multi factor models in asset pricing
PDF
Capital asset pricing model
PDF
The market for lemons rrs paper 1 anuj bhatia f1401
PDF
Financial management for net
Mtp agency and transaction cost theory
Risk aversion
Multi factor models in asset pricing
Capital asset pricing model
The market for lemons rrs paper 1 anuj bhatia f1401
Financial management for net

Recently uploaded (20)

PDF
01-Introduction-to-Information-Management.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
VCE English Exam - Section C Student Revision Booklet
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Classroom Observation Tools for Teachers
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
01-Introduction-to-Information-Management.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Anesthesia in Laparoscopic Surgery in India
Sports Quiz easy sports quiz sports quiz
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Insiders guide to clinical Medicine.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
PPH.pptx obstetrics and gynecology in nursing
VCE English Exam - Section C Student Revision Booklet
Renaissance Architecture: A Journey from Faith to Humanism
Classroom Observation Tools for Teachers
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
O5-L3 Freight Transport Ops (International) V1.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
Final Presentation General Medicine 03-08-2024.pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Missing data and non response pdf

  • 1. Anuj Vijay Bhatia FPRM 14 Institute of Rural Management Anand NON RESPONSE ERROR HOW TO HANDLE IT? ResearchMethodology
  • 2.  The respondent has not replied to the mail or did not find time to give the interview or cannot be contacted. There can be many such reasons for nonresponse.  High rate of non response is serious.  Research may lose:  Credibility  Acceptability  Accuracy and Professional Soundness  Methodology used should be described completely.  Researchers responsibility to establish external validity.  Appropriate sample size and acceptable response rate must be achieved. NON RESPONSE ERROR
  • 3.  Nonresponse error exist to the extent that subjects included in sample fail to provide usable responses.  Research manifested by high nonresponse loses Validity and Reliability.  Many research articles:  Do not mention nonresponse as a threat to external validity.  Do not attempt to control for non response error.  Do not provide reference to the literature of handling nonresponse.  It limits the ability of the researcher to generalize. NON RESPONSE ERROR
  • 4.  In a survey research, the ability to generalize is critical.  There is a risk that non-respondents will be systematically different from respondents.  Response rate is higher (100% many times) when purposive or convenience sampling is used.  However, probability sampling is used, response rates are low.  Ability to generalize is limited when purposive or convenience sampling is used.  The threat to validity is not due to response rate but due to nonrepresentataive sampling procedures.  To ensure external validity answer: Will your results be same if a 100% response rate was achieved? SAMPLING PROCEDURES AND NON- RESPONSE
  • 5.  Suppose the population is divided into two strata i.e., the respondents ( r ) and the non-respondents whose data is missing (m). Suppose we want to determine 𝑌 , the total population mean.  𝒀 = Wr 𝒀 𝒓 + Wm 𝒀 𝒎  Yr and Ym are the means of respondents and non— respondents respectively. Wr and Wm are weights.  If the survey fails to collect data from non-respondents, it will produce result estimate equal to 𝑌 𝑟.  The bias will be the difference between 𝑌 𝑟 𝑎𝑛𝑑 𝑌  𝒀 𝒓 − 𝒀 = 𝒀 𝒓 − ( Wr 𝒀 𝒓 + Wm 𝒀 𝒎 ) = 𝒀 𝒓 𝟏 − 𝑾𝒓 − 𝑾𝒎 𝒀 𝒎 = Wm (𝒀 𝒓 − 𝒀 𝒎) A SIMPLE LOGIC
  • 6.  Begins with designing and implementation.  Appropriate sampling protocols and procedures should be used to maximize participation.  Ensure that response rate is enough to conclude that non-response is not a threat to external validity.  If required go for some additional procedures to establish that non-response is not a threat to external validity. CONTROLLING NON-RESPONSE ERROR
  • 7. Methods for Handling Non-Response 1. Comparison of Early to Late Respondents 2. Using “Days to Respond” as a Regression Variable 3. Compare Respondents to Non-Respondents 4. Compare Respondents on Characteristics known a priori 5. Ignore Non-Response as a Threat to External Validity RECOMMENDATIONS FOR HANDLING NON-RESPONSE
  • 8. Method 1: Comparison of Early to Late Respondents  Extrapolation based on statistical inferences  Operationally define ‘Late Respondents’  Last wave of respondents: Late Respondents  Compare early and late respondents based on key variables of interest.  If no difference, results can be generalized to larger population. METHODS FOR HANDLING NON-RESPONSE
  • 9. Method 2: Using “Days to Respond” as a Regression Variable  “Days to respond” is coded as continuous variable and used as IV in regression equation.  Primary variables of interest are regressed on variable “Days to Respond”.  If not statistically significant: Assume that respondents are not different from non-respondents. METHODS FOR HANDLING NON-RESPONSE
  • 10. Method 3: Compare Respondents to Non-Respondents Compute differences by sampling nonrespondents and working extra diligently to get their responses. Minimum 20% of responses from nonrespondents should be obtained. If fewer than 20% responses are obtained, Method 1 or 2 should be used by combining the results. METHODS FOR HANDLING NON-RESPONSE
  • 11. Method 4: Compare Respondents on Characteristics known a priori  Compare respondents to population or characteristics known in advance  Describe similarities and differences. Method 5: Ignore Non-Response as a Threat to External Validity  If above methods are you can choose to ignore. METHODS FOR HANDLING NON-RESPONSE
  • 12. Anuj Vijay Bhatia FPRM 14 Institute of Rural Management Anand MISSING DATA IN QUANTITATIVE RESEARCH ResearchMethodology
  • 13.  What is certain in life?  Death  Taxes  What is certain in research?  Measurement error  Missing data  Missing data can be:  Due to preventable errors, mistakes, or lack of foresight by the researcher  Due to problems outside the control of the researcher  Deliberate, intended, or planned by the researcher to reduce cost or respondent burden  Due to differential applicability of some items to subsets of respondents Etc. A FOOD FOR THOUGHT
  • 15. • Non-Response v/s Missing Data • Missing Data: Where valid values on one or more variables are not available for analysis. • Researchers primary concern is to identify the patterns and relationships underlying the missing data. • we need to understand process leading to missing data to take appropriate course of action. • Common in Social Research • More acute in experiments and surveys • Best way is to avoid it by planning and conscientious data collection. • Not uncommon to have some level of missing data. MISSING DATA
  • 16. Lost data Reduces Statistical Power Meaningfully diminishes sample size Bias Parameter Estimates Correlations biased downwards Predictor scores affected Restrict Variance Central Tendency Biased PRIMARY PROBLEMS
  • 17. Simple Techniques Listwise Deletion Pairwise Deletion Mean Substitution Regression Imputation Hot-Deck Imputation Maximum Likelihood and Related Methods Maximum Likelihood Expectation Maximization Repeated Measures and Time Series Designs TECHNIQUES TO DEAL WITH MISSING DATA
  • 18. Eliminate all cases with missing data on any predictor or criterion. Sacrifices large amount of data Decreases statistical power May introduce bias in parameter Default option in many statistical packages LISTWISE DELETION
  • 19. Deletes information only from those statistics that “need” information. Preserves great deal of information than listwise deletion. Interpretation becomes difficult. May lead to mathematically inconsistent correlations. PAIRWISE DELETION
  • 20. Use means in place of missing data Allows to use rest of individual’s data Preserves data Easy to use Attenuate variance and covariance estimates Useful when correlations between variables is low and less than 10% of data are missing. MEAN SUBSTITUTION
  • 21.  Estimate missing data based on other variables in data set.  Advantages:  Preserves data  Better than Listwise and Pairwise deletion  Preserves the deviation from the mean  Doesn’t attune correlations like mean substitution.  Variants:  Simple regression strategy  Only one iteration  Estimate relationships in variables and estimate missing data  Stepwise/Iterative Regression  Isolate a few key variables, prepare correlation matrix.  Estimate regression equation and predict missing values REGRESSION IMPUTATION
  • 22.  Replace missing value with actual score from similar case in current data set.  Hot-deck? What is so hot about it?  What is Cold-Deck then?  Missing values are replaced with a reasonable estimate from similar individual.  Accurate: Real values are imputed  May not distort distributions.  Helpful when data is missing in patterns.  Little literature backing the accuracy claim.  Problematic when there are large classification variables.  Categorizing variables sacrifices information.  Estimating Standard Errors Difficult. HOT-DECK IMPUTATION
  • 23.  Assume: The observed data are a sample drawn from multivariate normal distribution.  Parameters are estimated by available data and then missing scores are estimated based on the parameters just estimated.  The missing values are predicted by using conditional distribution of variables on which data is available.  ML provides explicit modeling of the imputation process that is open to scientific analysis and critique.  More accurate then Listwise deletion and better than ad hoc approaches like mean substitution.  However, it may be possible that differences are small and the distributional assumptions in this method are relatively strict. MAXIMUM LIKELIHOOD
  • 24.  Uses Expectation Maximization Algorithm  Iterations through process of estimating missing data  First iteration involves estimating missing data and then estimating parameters using ML method.  Second iteration would require re-estimating the missing data based on new parameter estimates and then recalculating the parameter estimates.  This process continues till there is convergence in the parameter estimates.  Produces less biased estimates, more accurate.  Open to scientific analysis and critique.  Lengthy and complex. EXPECTATION MAXIMIZATION
  • 25.  Problem of Missing Data more severe  Listwise deletion: Loss of more data due to repeated measures.  Additional data is collected on same measures at different time.  Opportunity to use strongly correlated variables to impute missing data.  Linear regression and subject mean can be used to predict missing values, but it may be biased.  Interpolation and Extrapolation can produced relatively unbiased estimates. REPEATED MEASURES AND TIME SERIES DESIGN
  • 26.  The data can be missing at three levels: 1. Item-level missingness 2. Construct-level missingness 3. Person-level missingness LEVELS OF MISSINGNESS (Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
  • 27. Data can be missing randomly or systematically. Random Missingness: Missing Completely at Random (MCR) Systematic Missingness Missing at Random (MAR) Missing not at Random (MNAR) MECHANISMS OF MISSING DATA
  • 28.  MCAR (Missing Completely at Random)  The probability that a variable value is missing does not depend on the observed data values nor the missing data values.  P ( missing | complete data ) = P (missing)  MAR (Missing at Random)  The probability that a variable value is missing partly depends on other data that are observed in the dataset but does not depend on any of the values that are missing.  P(missing | complete data ) = P (missing | observed data)  MNAR (Missing Not at Random)  The probability that a variable value is missing depends on the missing data values themselves.  P (missing | complete data ) ≠ P (missing | observed data) (Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
  • 29. BIAS AND INACCURATE STANDARD ERRORS
  • 30. CHOOSING MISSING DATA TREATMENTS (Adopted from: Newman, D. A., (2014). Missing Data: Five Practical Guidelines, Sage Publications.)
  • 31. STEP 1: DETERMINE THE TYPE OF MISSING DATA  Is it under the control of researcher?  Is it ignorable?  Ignorable Missing Data  Expected  Remedies not needed  Allowance for missing data are inherent in the technique  Missing data is operating at random  Non—Ignorable Missing Data  Known to researchers: Some remedies if random  Unknown missing data: Process less easy, but remedies available  Missing data known or unknown: Proceed to next step A FOUR STEP PROCESS FOR IDENTIFYING MISSING DATA AND APPLYING REMEDIES
  • 32. STEP 2: DETERMINE THE EXTENT OF MISSING DATA  Determine the extent of missing data  Patterns of individual variables, individual cases and even overall.  Is it low enough to affect the results?  It is random?  If sufficiently low: Apply any remedy  If not low: Determine the randomness before applying the remedy  Assessing the Extent and Pattern of Missing data:  Tabulate  Number of cases with missing data  Percentage of variables with missing data in each case.  Look for non-random pattern  Also determine number of cases with no missing data (100% complete)  Is missing data too high to create a bias? (Rule of Thumb 1)  Can deletion be used? (Rule of Thumb 2)
  • 33. Missing data under 10% can generally be ignored when it happens in random fashion. The number of cases with no missing data should be sufficient for the selected analysis technique if replacement values will not be substituted (imputed) for the missing data. RULE OF THUMB 1 HOW MUCH MISSING DATA IS TOO MUCH?
  • 34.  Variables with less 15% data are candidates for deletion.  Higher level of missingness like 20-30% can be remedied.  Deletion of large data should be justifiable.  Cases with missing data for dependent variables typically are deleted to avoid increase in relationship with independent variable.  While deleting a variable, ensure a highly correlated variable is available to represent intent of original variable.  Always perform analysis with or without the deleted cases or variables to identify any marked differences. RULE OF THUMB 2 DELETION BASED ON MISSING DATA
  • 35. STEP 3: DIAGNOSE THE RANDOMNESS OF THE MISSING DATA PROCESSES.  Degree of randomness determines the appropriate level of remedy. Level of Randomness  Random: MCAR  Observed values of Y are truly a random sample of Y values.  No underlying process that tends to bias the observed data.  Missing data are indistinguishable form complete data.  Non-Random: MAR  Missing values of Y depends on X but not on Y  Observed values of Y represent a random sample of Y for each value of X.  Cannot be generalized. Diagnostic Tests for Level of Randomness  Forming 2 groups, with and without missing data : T-Test  Overall test of Randomness for MCAR
  • 36. STEP 4: SELECT THE IMPUTATION METHOD
  • 37. UNDER 10% Any imputation method can be applied. 10% - 20% For MCAR  Hot-Deck Case Substitution and Regression Imputation For MAR  Model Based Methods Over 20% Regression method for MCAR Model Based method for MAR RULE OF THUMB 3 IMPUTATION OF MISSING DATA
  • 38. 1. Dooley, L. M., & Lindner, J. R. (2003). The handling of nonresponse error. Human Resource Development Quarterly, 14(1), 99-110. 2. Roth, P. L. (1994). Missing data: A conceptual review for applied psychologists. Personnel psychology, 47(3), 537-560. 3. Blair, E., & Zinkhan, G. M. (2006). Nonresponse and generalizability in academic research. Journal of the Academy of Marketing Science, 34(1), 4-7. 4. Newman, D. A. (2014). Missing data five practical guidelines. Organizational Research Methods, 17(4), 372-411. 5. Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006). Multivariate data analysis 6th Edition. New Jersey: Pearson Education. REFERENCES