SlideShare a Scribd company logo
2
Most read
4
Most read
5
Most read
Sampling Bias
Dr.K.Prabhakar
Bias
• Once we collect the data we represent the data by way of a model.
Let us assume a linear model.
• This may be written as y(outcome)= a1x1+a2x2+a3x3+…+anxn+ error
• Therefore we predict that there will be an error as the outcome is
expressed as a set of predictor variables multiplied by a set of
coefficients the parameters the a in the equation and tell us about
the relationship between the predictor and outcome variable.
• The prediction will not be perfect as there will be an error as we are
using sample data to predict the outcome variable.
The contexts for bias
• Things that bias the parameter estimates
• Things that bias standard errors and confidence intervals
• Things that bias test statistics and p-values. These bias are related. If
the test statistics are bias then the confidence intervals will be biased.
A bias in confidence intervals will bias the test statistics.
• If the test statistics is biased then the results will be biased and we
need to identify and eliminate the biases as much as possible.
Assumptions that lead to bias
1. Presence of outliners
2. Additivity and linearity
3. Normality
4. Homoscedasticity or homogeneity of variance
5. Independence
Outliers
• Presence of outliers in data will bias the data.
• For example if the class average marks is 60 and standard deviation is
10 marks then if there is a presence of zero marks or 100 marks by
few students may bias the data.
• The outliers need to be identified and removed or replaced to have a
better representation of the data. It generally affect the mean of the
data as well as some of the squares errors. The sum of the squares is
used to compute the standard deviation, which in turn is used to
estimate the standard error. The standard error is used for confidence
intervals around the parameter estimates. This it will have a domino
effect on the results.
Additivity and Linearity
• The assumption is the outcome variable is linearly related to all
predictors. That means the relationship may be summed up as a
straight line.
• If there are several predictors as we have see the equation
y(outcome)= a1x1+a2x2+a3x3+…+anxn+ error
their combined effect is described by adding their effects together.
The model can described accurately by the equation given here.
Assumption of Normality
• There is a mistaken belief that assumption of normality = the data need to be
from normally distributed. This misconception stems from the fact that if the
data is normally distributed then errors in the model as well as sampling
distribution is also normally distributed.
• The central limit theorem means that there are different situations in which we
can assume normality regardless of the shape of the sample data.
• Normality matters when you construct confidence intervals around parameters of
the model or compute significance tests relating to those parameters then
assumption of normality matters in small samples.
• As long as the sample size is fairly large, outliers are taken into account then
assumption of normality will not be a pressing concern.
• Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of
the normality assumption in large public health data sets. Annual review of
public health, 23(1), 151-169.
Homoscedasticity or homogeneity of variance

More Related Content

PPTX
Correlation & Linear Regression
PPTX
Errors in Chemical Analysis and Sampling
PPTX
Regression, Multiple regression in statistics
PDF
Measurements and error in experiments
PPT
Why we run cronbach’s alpha
PPTX
Outlier managment
PPTX
Week 3 unit 1
DOCX
University Ranking Variable Analysis
Correlation & Linear Regression
Errors in Chemical Analysis and Sampling
Regression, Multiple regression in statistics
Measurements and error in experiments
Why we run cronbach’s alpha
Outlier managment
Week 3 unit 1
University Ranking Variable Analysis

What's hot (19)

PPTX
Multivariate reg analysis
PDF
Introduction to Structural Equation Modeling
PPTX
M1 regression metrics_middleschool
PDF
Methods of point estimation
PDF
Statistical Methods to Handle Missing Data
PDF
Biostatistics Workshop: Missing Data
PPTX
Lab report walk through
PPTX
Estimation Theory
PDF
Use of Linear Regression in Machine Learning for Ranking
PPT
Statistical Methods
PDF
R - Multiple Regression
PDF
CS550 Presentation - On comparing classifiers by Slazberg
PPTX
Regression
PPTX
Introduction to principal component analysis (pca)
PPTX
Point estimation
PDF
Lecture note 2
PPT
Polynomials 12.2 12.4
PDF
Lesson 10 rm psych stats & graphs 2013
Multivariate reg analysis
Introduction to Structural Equation Modeling
M1 regression metrics_middleschool
Methods of point estimation
Statistical Methods to Handle Missing Data
Biostatistics Workshop: Missing Data
Lab report walk through
Estimation Theory
Use of Linear Regression in Machine Learning for Ranking
Statistical Methods
R - Multiple Regression
CS550 Presentation - On comparing classifiers by Slazberg
Regression
Introduction to principal component analysis (pca)
Point estimation
Lecture note 2
Polynomials 12.2 12.4
Lesson 10 rm psych stats & graphs 2013
Ad

Similar to Bias in Research Methods (20)

PDF
DOTE 2011 Lecture9 - Estimation - 0930.pdf
PPTX
Statistics
PPTX
Statistics
PPTX
Chapter 2 Simple Linear Regression Model.pptx
PDF
PanelDadasdsadadsadasdasdasdataNotes-1b.pdf
PPTX
Statistics
PPTX
Statistics
PPTX
MModule 1 ppt.pptx
PPTX
Inorganic CHEMISTRY
PPT
statistical estimation
PPTX
SAMPLING DISTRIBUTION AND POINT ESTIMATION OF PARAMETERS - Copy.pptx
PPT
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
PPT
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
PPTX
Statistical Learning and Model Selection module 2.pptx
PPTX
Review & Hypothesis Testing
PPTX
Point estimation.pptx
PDF
Machine Learning Machine Learning Interview
PDF
Estimation rs
PPTX
Statistics Applied to Biomedical Sciences
DOCX
PHStat Notes Using the PHStat Stack Data and .docx
DOTE 2011 Lecture9 - Estimation - 0930.pdf
Statistics
Statistics
Chapter 2 Simple Linear Regression Model.pptx
PanelDadasdsadadsadasdasdasdataNotes-1b.pdf
Statistics
Statistics
MModule 1 ppt.pptx
Inorganic CHEMISTRY
statistical estimation
SAMPLING DISTRIBUTION AND POINT ESTIMATION OF PARAMETERS - Copy.pptx
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Clinical Trials Versus Health Outcomes Research: SAS/STAT Versus SAS Enterpri...
Statistical Learning and Model Selection module 2.pptx
Review & Hypothesis Testing
Point estimation.pptx
Machine Learning Machine Learning Interview
Estimation rs
Statistics Applied to Biomedical Sciences
PHStat Notes Using the PHStat Stack Data and .docx
Ad

More from Centre for Social Initiative and Management (20)

PPTX
Job Creation In India Opportunities and Challenges
PPTX
Burns,_Senge,_and_Modern_Leadership_Trends.pptx
PDF
The Economics of Dravidian Model- Equity and Social Justice
PPTX
Epistemology and Learning for Researchers and Teachers
PPTX
The Crooked Timber of New India [Autosaved].pptx
PPTX
Qualitative research and use of Nvivo
PPTX
Impact of covid pandemic on indian economy future
PPTX
Introduction to qualitative research and nvivo 12
PPTX
Examiners Expectations from PhD Thesis
PPTX
PPTX
Reporting Results of Statistical Analysis
PPTX
PPTX
Variables, Theory and Sampling Map
PPTX
Role of Good Governance Practices
PPS
The twelve commandments to live better by one of my friend
PPTX
Innovations for next 30 years and business
Job Creation In India Opportunities and Challenges
Burns,_Senge,_and_Modern_Leadership_Trends.pptx
The Economics of Dravidian Model- Equity and Social Justice
Epistemology and Learning for Researchers and Teachers
The Crooked Timber of New India [Autosaved].pptx
Qualitative research and use of Nvivo
Impact of covid pandemic on indian economy future
Introduction to qualitative research and nvivo 12
Examiners Expectations from PhD Thesis
Reporting Results of Statistical Analysis
Variables, Theory and Sampling Map
Role of Good Governance Practices
The twelve commandments to live better by one of my friend
Innovations for next 30 years and business

Recently uploaded (20)

PDF
Global Data and Analytics Market Outlook Report
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPT
statistic analysis for study - data collection
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
Steganography Project Steganography Project .pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Microsoft 365 products and services descrption
PPT
Image processing and pattern recognition 2.ppt
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
New ISO 27001_2022 standard and the changes
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Global Data and Analytics Market Outlook Report
SET 1 Compulsory MNH machine learning intro
Topic 5 Presentation 5 Lesson 5 Corporate Fin
statistic analysis for study - data collection
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Steganography Project Steganography Project .pptx
[EN] Industrial Machine Downtime Prediction
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
A Complete Guide to Streamlining Business Processes
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
SAP 2 completion done . PRESENTATION.pptx
Microsoft 365 products and services descrption
Image processing and pattern recognition 2.ppt
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Optimise Shopper Experiences with a Strong Data Estate.pdf
retention in jsjsksksksnbsndjddjdnFPD.pptx
New ISO 27001_2022 standard and the changes
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx

Bias in Research Methods

  • 2. Bias • Once we collect the data we represent the data by way of a model. Let us assume a linear model. • This may be written as y(outcome)= a1x1+a2x2+a3x3+…+anxn+ error • Therefore we predict that there will be an error as the outcome is expressed as a set of predictor variables multiplied by a set of coefficients the parameters the a in the equation and tell us about the relationship between the predictor and outcome variable. • The prediction will not be perfect as there will be an error as we are using sample data to predict the outcome variable.
  • 3. The contexts for bias • Things that bias the parameter estimates • Things that bias standard errors and confidence intervals • Things that bias test statistics and p-values. These bias are related. If the test statistics are bias then the confidence intervals will be biased. A bias in confidence intervals will bias the test statistics. • If the test statistics is biased then the results will be biased and we need to identify and eliminate the biases as much as possible.
  • 4. Assumptions that lead to bias 1. Presence of outliners 2. Additivity and linearity 3. Normality 4. Homoscedasticity or homogeneity of variance 5. Independence
  • 5. Outliers • Presence of outliers in data will bias the data. • For example if the class average marks is 60 and standard deviation is 10 marks then if there is a presence of zero marks or 100 marks by few students may bias the data. • The outliers need to be identified and removed or replaced to have a better representation of the data. It generally affect the mean of the data as well as some of the squares errors. The sum of the squares is used to compute the standard deviation, which in turn is used to estimate the standard error. The standard error is used for confidence intervals around the parameter estimates. This it will have a domino effect on the results.
  • 6. Additivity and Linearity • The assumption is the outcome variable is linearly related to all predictors. That means the relationship may be summed up as a straight line. • If there are several predictors as we have see the equation y(outcome)= a1x1+a2x2+a3x3+…+anxn+ error their combined effect is described by adding their effects together. The model can described accurately by the equation given here.
  • 7. Assumption of Normality • There is a mistaken belief that assumption of normality = the data need to be from normally distributed. This misconception stems from the fact that if the data is normally distributed then errors in the model as well as sampling distribution is also normally distributed. • The central limit theorem means that there are different situations in which we can assume normality regardless of the shape of the sample data. • Normality matters when you construct confidence intervals around parameters of the model or compute significance tests relating to those parameters then assumption of normality matters in small samples. • As long as the sample size is fairly large, outliers are taken into account then assumption of normality will not be a pressing concern. • Lumley, T., Diehr, P., Emerson, S., & Chen, L. (2002). The importance of the normality assumption in large public health data sets. Annual review of public health, 23(1), 151-169.