SlideShare a Scribd company logo
A lesson on Statistics:
Data – Types, description and
interpretation
Dr Andrea Josephine R,
2nd year MD PG,
Department of Pediatrics,
ESIC Medical College & PGIMSR, Chennai.
Topics
 Types of data
 Measures of central tendency
 Measures of dispersion
 Measures of distribution
 Characterizing diagnostic tests – The test of a test
Types of data
1. Nominal:
 Qualitative data
 Characteristics of a variable – Categories
 Mutually exclusive, exhaustive
 No implied order
 E.g. Sex : Male/Female, Demographics
(Urban/Suburban/Rural)
Types of data
2. Ordinal:
 Qualitative data – Categories
 Rank/Order into a progression, mutually exclusive,
exhaustive
 Size of the interval not measurable or equal
 E.g. Satisfaction with treatment – Very satisfied /
Somewhat satisfied / Somewhat dissatisfied / Very
dissatisfied
Types of data
3. Interval:
 Quantitative data
 Meaningful intervals
 No absolute zero
 Ratio between 2 measurements not meaningful
 E.g. Temperature scale: In degrees Celsius, difference
between 2 measurements quantifiable, but ratio not
meaningful; 0⁰C does not imply a total absence of heat
Types of data
4. Ratio:
 Quantitative data
 Absolute zero
 Meaningful ratios
 E.g. Age (years), Weight(kg), Blood pressure(mmHg)
Types of data
1. Discrete:
 Only whole numbers possible / distinct categories
 E.g. Number of patients, number of syringes used, Gender,
hair colour
2. Continuous:
 Any value in a continuum
 E.g. Weight, Height, Serum creatinine
Measures of central tendency
1. Mean:
 Used for interval & ratio data
 Summation of all values divided by number of values in
the sample
 x = Ʃx
n
Measures of central tendency
2. Median:
 Used for ordinal data
 Half of the values lie above it, half below it
 If n is odd, arrange in order: Middle value = median
 If n is even, arrange and take mean of middle 2 values
Measures of central tendency
3. Mode:
 Used for nominal data
 Most frequently appearing category
 If 2 categories appear equally, bimodal
 Can be multimodal
Measures of dispersion
1. Range:
 Difference between highest and lowest values
 E.g. A set of values 102, 105, 109, 111 and 120. Range is
not 102-120. Range = 120-102 = 18.
Measures of dispersion
2. Interquartile range:
 Range of the middle 50% of the data
 Difference between the upper and lower quartile
Measures of dispersion
3. Mean deviation:
 Average of the absolute deviations from mean.
 Mean deviation = Ʃ ǀ x – x ǀ
n
Example
 Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16
 Step 1: Find the mean: (3 + 6 + 6 + 7 + 8 + 11 + 15 + 16)/8
= 72/8 = 9
 Step 2: Find the distance of each value from that mean:
Example (Contd.)
 Step 3. Find the mean of those distances:
 Mean Deviation = (6 + 3 + 3 + 2 + 1 + 2 + 6 + 7)/8 = 30/8 =
3.75
 So, the mean = 9, and the mean deviation = 3.75
 3.75 away from the middle
 Why take absolute value?
Measures of dispersion
4. Variance and Standard deviation:
 Variance(s2) = Mean of the squares of the deviation
= Ʃ (x – x )2
n
 Standard deviation(SD) = Ʃ (x – x)2
√ n
Smaller value of SD Closer the values cluster around
the mean
If a constant is added to all values, mean changes; Variance
and SD remain the same.
Measures of dispersion
5. Coefficient of variation(CV):
 CV = SD/Mean
 The units of SD and mean are same, hence CV is an
independent value.
 If both SD and mean are multiplied by a constant, CV
remains the same (Useful in ratio measurements).
 Not useful in interval level data, as CV decreases with
addition of a constant to each value.
Skewness
 Refers to the symmetry of the frequency-distribution
curve.
 Value of 0 – Unskewed, Positive value – skewed to the
right, Negative value – skewed to the left.
 Refers to the side of the longer tail, NOT that of the bulk
of the data.
Kurtosis
 Refers to the peak of the frequency-distribution curve.
 Mesokurtosis – Normal distribution curve
 Leptokurtosis – Peaked; Platykurtosis - Flat
Sensitivity
 Ability of a test to correctly identify patients with disease
 Sensitivity = True positives
True positives + False negatives
Patients
picked up, 80
Undiagnosed
diseased
population, 20
Sensitivity
Specificity
 Ability of the test to correctly identify patients who are
disease-free/healthy
 Specificity = True negatives
True negatives + False positives
Healthy,
80
Healthy
mis-
labelled
diseased, 20
Specificity
Positive predictive value
 Proportion of patients with positive test results who truly have
disease.
 PPV = True positive
True positive + False positive
 Answers the question: “I have tested
positive. Am I really diseased?”
Truly
diseased
80%
Healthy
mislabelled
diseased
20%
PPV
Negative predictive value
 Proportion of patients with negative test results who are
truly disease-free
 NPV = True negatives
True negatives + False negatives
 Answers the question: “I have tested
negative. Am I really disease-free?”
Truly
healthy
80%
Diseased
mis-
labelled
healthy
20%
NPV
PPV and NPV
 Highly dependent on the prevalence of a disease in a
given population.
 Less reliable in rare diseases.
 Less transferable from one population to another.
Likelihood ratio
 Combines sensitivity and specificity
 Positive likelihood ratio defines the extent to which a
positive test result increases the likelihood of having
disease.
 LR+ = Sensitivity
1 – Specificity
 If the LR + of a test is 1.36, a patient who tests positive is
1.36 times more likely to have the disease than a patient
who tests negative.
LR – Interpretation:
 LR+ over 5 - 10: Significantly increases likelihood of the
disease
 LR+ between 0.2 to 5 (esp if close to 1): Does not modify
the likelihood of the disease
 LR+ below 0.1 - 0.2: Significantly decreases the likelihood
of the disease
Likelihood ratio
 Negative likelihood ratio defines the extent to which a
negative test result decreases the likelihood of having
disease.
 LR- = 1 – sensitivity
Specificity
 If LR- of a test is 1.5, it means a patient with a negative
test result is 1.5 times more likely to be disease-free than
a patient with a positive test result.
LR – Points:
 Independent of disease prevalence
 Specific to the test being used
 Can be applied to the individual patient to evaluate how
worthwhile it is to perform a given test
Receiver Operator Characteristics Curve
 If the cut-off for a test is raised, both true and
false positive rate would decrease.
 True positive rate = Sensitivity
 False positive rate = 1 – Specificity.
 A graph between the 2 is ROC curve.
ROC curve – Area under curve
 Area under curve – Used to assess overall accuracy of a
test
 Value of 1 – High sensitivity and specificity
 Value of 0.5 – Zero diagnostic capability, Line
of zero discrimination, no better than tossing
a coin.
Using ROC curve and AUC to choose
between tests
 ROC curves:
References
 Biostatistics: The bare essentials, 3e by Norman and Streiner
 Health services research methods, 2e by Leiyu Shi
 Bewick V, Cheek L, Ball J. Statistics review 13: Receiver operating
characteristic curves. Crit Care. 2004;8(6):508-512.
 AG Lalkhen, A McCluskey. Clinical tests: sensitivity and specificity. Contin Educ
Anaesth Crit Care Pain (2008) 8 (6): 221-223.
Thank You

More Related Content

PPT
Grade 7 Statistics
PPT
Statistics lesson 1
PPTX
Participants of the study
PPTX
Sampling Technique - Anish
PDF
Basics statistics
PPTX
Sampling, measurement, and stats(2013)
PPTX
Population & sample lecture 04
PPT
Pre-Algebra: Intro to Statistics
Grade 7 Statistics
Statistics lesson 1
Participants of the study
Sampling Technique - Anish
Basics statistics
Sampling, measurement, and stats(2013)
Population & sample lecture 04
Pre-Algebra: Intro to Statistics

What's hot (20)

PPTX
L4 theory of sampling
DOC
Chapter 9 sampling and statistical tool
PPTX
MEASUREMENT AND SAMPLING TECHNIQUES
PPT
Advanced statistics
PPTX
Sampling distribution
PDF
Research Method for Business chapter 10
PPTX
Sampling and sampling distributions
PDF
Sampling and sampling distribution tttt
PDF
Samplels & Sampling Techniques
PPT
Sampling design ppt
PPT
Statistics chapter1
PPT
Sampling methods
PPT
050 sampling theory
PPTX
Sampling distribution
DOCX
Sample size determination
PPTX
CABT SHS Statistics & Probability - Sampling Distribution of Means
PPTX
tests of significance
PPT
Sampling and Inference_Political_Science
PPT
Introduction to basic concept in sampling and sampling techniques
PPT
Sampling and its variability
L4 theory of sampling
Chapter 9 sampling and statistical tool
MEASUREMENT AND SAMPLING TECHNIQUES
Advanced statistics
Sampling distribution
Research Method for Business chapter 10
Sampling and sampling distributions
Sampling and sampling distribution tttt
Samplels & Sampling Techniques
Sampling design ppt
Statistics chapter1
Sampling methods
050 sampling theory
Sampling distribution
Sample size determination
CABT SHS Statistics & Probability - Sampling Distribution of Means
tests of significance
Sampling and Inference_Political_Science
Introduction to basic concept in sampling and sampling techniques
Sampling and its variability
Ad

Viewers also liked (20)

PDF
Diagnosing a diagnostic april 08 2015
PPT
Statistics lesson 2
PPT
Evaluating a diagnostic test presentation www.eyenirvaan.com - part 2
PPTX
PPT
Diagnostic testing 2009
PPTX
Diagnotic and screening tests
PDF
Introduction to correlation and regression analysis
PPT
Advanced statistics Lesson 1
PPTX
Correlation and Regression
PPT
Clinical epidemiology
PPTX
Correlation and Regression
PPT
Lesson 8 Linear Correlation And Regression
PPTX
Correlation & Regression
PDF
Pearson Correlation, Spearman Correlation &Linear Regression
DOC
SPSS statistics - how to use SPSS
PPT
Validity of a screening test
PDF
What Is Statistics
PPTX
Role of Statistics in Scientific Research
PPT
Statistics
PDF
STATISTICS AND PROBABILITY (TEACHING GUIDE)
Diagnosing a diagnostic april 08 2015
Statistics lesson 2
Evaluating a diagnostic test presentation www.eyenirvaan.com - part 2
Diagnostic testing 2009
Diagnotic and screening tests
Introduction to correlation and regression analysis
Advanced statistics Lesson 1
Correlation and Regression
Clinical epidemiology
Correlation and Regression
Lesson 8 Linear Correlation And Regression
Correlation & Regression
Pearson Correlation, Spearman Correlation &Linear Regression
SPSS statistics - how to use SPSS
Validity of a screening test
What Is Statistics
Role of Statistics in Scientific Research
Statistics
STATISTICS AND PROBABILITY (TEACHING GUIDE)
Ad

Similar to A lesson on statistics (20)

PPTX
Statistics for Medical students
PDF
1Basic biostatistics.pdf
PPTX
Presentation # 3 - Measurements, Validity, and Reliability.pptx
PPTX
Basics of biostatistic
PPTX
PARAMETRIC TESTS.pptx
PPTX
Data Display and Summary
PPTX
Complete Biostatistics (Descriptive and Inferential analysis)
PDF
Lecture notes on basic research statistics dr habibullah
PPTX
Biostatistics in Research Methodoloyg Presentation.pptx
PPT
PPTX
Introduction to statistics.pptx
PPTX
Seminar 10 BIOSTATISTICS
PDF
1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf
PPTX
clinical trial and study design copy 2.pptx
PPTX
Applying_basic_health_statstics_2024_final.pptx
PPTX
bio 1 & 2.pptx
PPTX
Univariate Analysis
PPTX
Basics of statistics
PDF
Introduction to Applied Biostatistics in public health
PPTX
Introduction to Practical Biostatistics
Statistics for Medical students
1Basic biostatistics.pdf
Presentation # 3 - Measurements, Validity, and Reliability.pptx
Basics of biostatistic
PARAMETRIC TESTS.pptx
Data Display and Summary
Complete Biostatistics (Descriptive and Inferential analysis)
Lecture notes on basic research statistics dr habibullah
Biostatistics in Research Methodoloyg Presentation.pptx
Introduction to statistics.pptx
Seminar 10 BIOSTATISTICS
1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf
clinical trial and study design copy 2.pptx
Applying_basic_health_statstics_2024_final.pptx
bio 1 & 2.pptx
Univariate Analysis
Basics of statistics
Introduction to Applied Biostatistics in public health
Introduction to Practical Biostatistics

Recently uploaded (20)

PPTX
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
PDF
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPTX
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
DOCX
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
Gastroschisis- Clinical Overview 18112311
PPTX
Acid Base Disorders educational power point.pptx
PPTX
1 General Principles of Radiotherapy.pptx
PPTX
Fundamentals of human energy transfer .pptx
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PDF
Khadir.pdf Acacia catechu drug Ayurvedic medicine
PPTX
post stroke aphasia rehabilitation physician
PPTX
Uterus anatomy embryology, and clinical aspects
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
PPTX
Note on Abortion.pptx for the student note
PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
CEREBROVASCULAR DISORDER.POWERPOINT PRESENTATIONx
Therapeutic Potential of Citrus Flavonoids in Metabolic Inflammation and Ins...
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
Pathophysiology And Clinical Features Of Peripheral Nervous System .pptx
RUHS II MBBS Microbiology Paper-II with Answer Key | 6th August 2025 (New Sch...
Human Health And Disease hggyutgghg .pdf
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
JUVENILE NASOPHARYNGEAL ANGIOFIBROMA.pptx
SKIN Anatomy and physiology and associated diseases
Gastroschisis- Clinical Overview 18112311
Acid Base Disorders educational power point.pptx
1 General Principles of Radiotherapy.pptx
Fundamentals of human energy transfer .pptx
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
Khadir.pdf Acacia catechu drug Ayurvedic medicine
post stroke aphasia rehabilitation physician
Uterus anatomy embryology, and clinical aspects
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
Note on Abortion.pptx for the student note
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...

A lesson on statistics

  • 1. A lesson on Statistics: Data – Types, description and interpretation Dr Andrea Josephine R, 2nd year MD PG, Department of Pediatrics, ESIC Medical College & PGIMSR, Chennai.
  • 2. Topics  Types of data  Measures of central tendency  Measures of dispersion  Measures of distribution  Characterizing diagnostic tests – The test of a test
  • 3. Types of data 1. Nominal:  Qualitative data  Characteristics of a variable – Categories  Mutually exclusive, exhaustive  No implied order  E.g. Sex : Male/Female, Demographics (Urban/Suburban/Rural)
  • 4. Types of data 2. Ordinal:  Qualitative data – Categories  Rank/Order into a progression, mutually exclusive, exhaustive  Size of the interval not measurable or equal  E.g. Satisfaction with treatment – Very satisfied / Somewhat satisfied / Somewhat dissatisfied / Very dissatisfied
  • 5. Types of data 3. Interval:  Quantitative data  Meaningful intervals  No absolute zero  Ratio between 2 measurements not meaningful  E.g. Temperature scale: In degrees Celsius, difference between 2 measurements quantifiable, but ratio not meaningful; 0⁰C does not imply a total absence of heat
  • 6. Types of data 4. Ratio:  Quantitative data  Absolute zero  Meaningful ratios  E.g. Age (years), Weight(kg), Blood pressure(mmHg)
  • 7. Types of data 1. Discrete:  Only whole numbers possible / distinct categories  E.g. Number of patients, number of syringes used, Gender, hair colour 2. Continuous:  Any value in a continuum  E.g. Weight, Height, Serum creatinine
  • 8. Measures of central tendency 1. Mean:  Used for interval & ratio data  Summation of all values divided by number of values in the sample  x = Ʃx n
  • 9. Measures of central tendency 2. Median:  Used for ordinal data  Half of the values lie above it, half below it  If n is odd, arrange in order: Middle value = median  If n is even, arrange and take mean of middle 2 values
  • 10. Measures of central tendency 3. Mode:  Used for nominal data  Most frequently appearing category  If 2 categories appear equally, bimodal  Can be multimodal
  • 11. Measures of dispersion 1. Range:  Difference between highest and lowest values  E.g. A set of values 102, 105, 109, 111 and 120. Range is not 102-120. Range = 120-102 = 18.
  • 12. Measures of dispersion 2. Interquartile range:  Range of the middle 50% of the data  Difference between the upper and lower quartile
  • 13. Measures of dispersion 3. Mean deviation:  Average of the absolute deviations from mean.  Mean deviation = Ʃ ǀ x – x ǀ n
  • 14. Example  Mean Deviation of 3, 6, 6, 7, 8, 11, 15, 16  Step 1: Find the mean: (3 + 6 + 6 + 7 + 8 + 11 + 15 + 16)/8 = 72/8 = 9  Step 2: Find the distance of each value from that mean:
  • 15. Example (Contd.)  Step 3. Find the mean of those distances:  Mean Deviation = (6 + 3 + 3 + 2 + 1 + 2 + 6 + 7)/8 = 30/8 = 3.75  So, the mean = 9, and the mean deviation = 3.75  3.75 away from the middle  Why take absolute value?
  • 16. Measures of dispersion 4. Variance and Standard deviation:  Variance(s2) = Mean of the squares of the deviation = Ʃ (x – x )2 n  Standard deviation(SD) = Ʃ (x – x)2 √ n Smaller value of SD Closer the values cluster around the mean If a constant is added to all values, mean changes; Variance and SD remain the same.
  • 17. Measures of dispersion 5. Coefficient of variation(CV):  CV = SD/Mean  The units of SD and mean are same, hence CV is an independent value.  If both SD and mean are multiplied by a constant, CV remains the same (Useful in ratio measurements).  Not useful in interval level data, as CV decreases with addition of a constant to each value.
  • 18. Skewness  Refers to the symmetry of the frequency-distribution curve.  Value of 0 – Unskewed, Positive value – skewed to the right, Negative value – skewed to the left.  Refers to the side of the longer tail, NOT that of the bulk of the data.
  • 19. Kurtosis  Refers to the peak of the frequency-distribution curve.  Mesokurtosis – Normal distribution curve  Leptokurtosis – Peaked; Platykurtosis - Flat
  • 20. Sensitivity  Ability of a test to correctly identify patients with disease  Sensitivity = True positives True positives + False negatives Patients picked up, 80 Undiagnosed diseased population, 20 Sensitivity
  • 21. Specificity  Ability of the test to correctly identify patients who are disease-free/healthy  Specificity = True negatives True negatives + False positives Healthy, 80 Healthy mis- labelled diseased, 20 Specificity
  • 22. Positive predictive value  Proportion of patients with positive test results who truly have disease.  PPV = True positive True positive + False positive  Answers the question: “I have tested positive. Am I really diseased?” Truly diseased 80% Healthy mislabelled diseased 20% PPV
  • 23. Negative predictive value  Proportion of patients with negative test results who are truly disease-free  NPV = True negatives True negatives + False negatives  Answers the question: “I have tested negative. Am I really disease-free?” Truly healthy 80% Diseased mis- labelled healthy 20% NPV
  • 24. PPV and NPV  Highly dependent on the prevalence of a disease in a given population.  Less reliable in rare diseases.  Less transferable from one population to another.
  • 25. Likelihood ratio  Combines sensitivity and specificity  Positive likelihood ratio defines the extent to which a positive test result increases the likelihood of having disease.  LR+ = Sensitivity 1 – Specificity  If the LR + of a test is 1.36, a patient who tests positive is 1.36 times more likely to have the disease than a patient who tests negative.
  • 26. LR – Interpretation:  LR+ over 5 - 10: Significantly increases likelihood of the disease  LR+ between 0.2 to 5 (esp if close to 1): Does not modify the likelihood of the disease  LR+ below 0.1 - 0.2: Significantly decreases the likelihood of the disease
  • 27. Likelihood ratio  Negative likelihood ratio defines the extent to which a negative test result decreases the likelihood of having disease.  LR- = 1 – sensitivity Specificity  If LR- of a test is 1.5, it means a patient with a negative test result is 1.5 times more likely to be disease-free than a patient with a positive test result.
  • 28. LR – Points:  Independent of disease prevalence  Specific to the test being used  Can be applied to the individual patient to evaluate how worthwhile it is to perform a given test
  • 29. Receiver Operator Characteristics Curve  If the cut-off for a test is raised, both true and false positive rate would decrease.  True positive rate = Sensitivity  False positive rate = 1 – Specificity.  A graph between the 2 is ROC curve.
  • 30. ROC curve – Area under curve  Area under curve – Used to assess overall accuracy of a test  Value of 1 – High sensitivity and specificity  Value of 0.5 – Zero diagnostic capability, Line of zero discrimination, no better than tossing a coin.
  • 31. Using ROC curve and AUC to choose between tests  ROC curves:
  • 32. References  Biostatistics: The bare essentials, 3e by Norman and Streiner  Health services research methods, 2e by Leiyu Shi  Bewick V, Cheek L, Ball J. Statistics review 13: Receiver operating characteristic curves. Crit Care. 2004;8(6):508-512.  AG Lalkhen, A McCluskey. Clinical tests: sensitivity and specificity. Contin Educ Anaesth Crit Care Pain (2008) 8 (6): 221-223.