SlideShare a Scribd company logo
GOOD
MORNING
1
DESCRIPTIVE
DATA
Presented by
Dr. P. Gnana Sarita Kumari
I MDS
Department of Public Health Dentistry
2
CONTENTS
: INTRODUCTION
 TYPES OF VARIABLES AND LEVELS OF
MEASUREMENT
 MEASURES OF CENTRAL TENDENCY
 MEASURES OF DISPERSION
 NORMAL DISTRIBUTION
 MEASURES OF ASYMMETRY
 MEASURES OF RELATIONSHIP
 CONCLUSION
 REFERENCES
3
DESCRIPTIVE ANALYSIS :
• The data describe one group and that group only.
• Descriptive data analysis limits generalization to the
particular group of individuals observed.
• No conclusions are extended beyond this group.
• It provides valuable information about the nature of a
particular group of individuals.
INTRODUCTION 4
CLASSIFICATION OF VARIABLES
QUALITATIVE QUANTITATIVE
NOMINAL ORDINAL DISCRETE CONTINUOUS
5
LEVELS OF MEASUREMENT
• Introduced by STEVENS
6
NOMINAL MEASUREMENT SCALE
Nomina
scale
Represents
Simplest
of data
Values in
unordered
categories
No
quantitative
relationship
Numbers are
used for the
sake of
convenience
7
ORDINAL MEASUREMENT SCALE
Ordina
scale
Can be
ordered or
ranked
Though
ordered is not
quantified
Number or label
assigned does
indicate
magnitude
Precise
measurement
of differences
does not exist
8
INTERVAL MEASUREMENT SCALE
Interva
scale
Observations
can be ordered
Precise differences
between units of
measure exist
No meaningful
absolute zero
9
RATIO MEASUREMENT SCALE
Possess same properties as that of
interval scale
• Highest level of measurement
In this a true zero exist
10
MEASURES OF CENTRAL TENDENCY
• Mean
• Median
• Mode
11
TYPES OF MEAN
• Sample mean
• Weighted mean
• Geometric mean
• Harmonic mean
• Mean of two or more means
12
SAMPLE MEAN
Mean = Total or Sum of observations
Number of observations
For ungrouped series it is Calculated by :
1. DIRECT METHOD
2. ASSUMED MEAN METHOD
Where,
13
WEIGHTED MEAN
• Grouped data with a range of values :
 Also called GRAND MEAN
Calculation :
𝑋 𝑤= 𝑤1 𝑋1 + 𝑤2 𝑋2 + …….. + 𝑤 𝑛 𝑋 𝑛 = 𝑖=1
𝑛
𝑤𝑖𝑋𝑖
o By middle point method
o By alternative method
Let 𝑋1, 𝑋2,….., 𝑋 𝑛 be n measurements, and their relative importance be
expressed by a corresponding set of numbers 𝑤1, 𝑤2,…..., 𝑤 𝑛
14
GEOMETRIC MEAN
• The sample geometric mean of n non-negative observations, 𝑋1, 𝑋2,…..,
𝑋 𝑛, in a sample is defined by 𝒏 𝒕𝒉
root of the product.
𝑋 𝐺 = 𝑛
𝑋1. 𝑋2….. 𝑋 𝑛 = [𝑋1, 𝑋2,….., 𝑋 𝑛]1/𝑛
• If there are any negative measurements in a data set, the geometric
mean cannot be used.
15
HARMONIC MEAN
• Harmonic mean is defined as the reciprocal of the
average of reciprocals of the values of items of a series.
• Harmonic mean
16
MEAN OF TWO OR MORE MEANS
17
MEDIAN
• The median is the value that divides the distribution of data
points into two equal parts, that is, the value at which 50% of
the data points lie above it and 50% lie below it.
• The median is the middle of the quartiles (the values that
divide the series into quarters) and the middle of the
percentiles (the values that divide the series into defined
percentages).
18
Calculation :
Median for ungrouped series :
a) In a series with an odd number of untied values, the values in the series are
arranged from lowest to highest, and the value that divides the series in half is the
median.
b) In a series with even number of untied values, the two values that divide the
series in half are determined, and the arithmetic mean of these values is the median.
c) An alternative method for calculating the median is to determine the 50% value
on a cumulative frequency curve.
19
d] If the data include tied scores at the median point, interpolation
within the tied scores is necessary.
• Lets consider a series of 70, 73, 74, 75, 75, 75, 75, 80 in which the mid
point observations were tied.
20
Median for grouped data :
21
MODE
• The mode of a data set is that value that occurs with the greatest frequency.
• Whenever there are two non-adjacent scores with the same frequency and
they are the highest in the distribution, each score may be referred to as the
‘mode’ and the distribution is ‘bimodal’.
• In truly bimodal distribution, the population contains two sub-groups, each
of which has a different distribution that peaks at a different point.
• Calculation :
Mode = Mean – 3 [ Mean – Median ] or
= 3 Median – 2 Mean
22
MEASURES OF DISPERSION
• Percentile
• Range
• Inter-quartile range
• Mean deviation
• Standard deviation
• Variance
• Coefficient of variation
• To understand the data more completely, it is necessary to know how the members
of the data set arrange themselves about the central or typical value.
• The following questions must be answered:
1. How spread out are the data points?
2. How stable are the values in the group?
Based on percentiles
Based on mean
23
RANGE
• The range is the difference between the highest and
lowest values in a series.
Range = Maximum – Minimum.
• For example in the following series :
8, 8,10,10,10,12,13,14,15,16,58
Range = 18-8 = 10 min
24
PERCENTILE
• These are the percentage of observations below the point
indicated when all of the observations are ranked in ascending
order.
• The median is the 50th percentile.
• The 75th percentile is the point below which 75% of the
observations lie, while the 25th percentile is the point below which
25% of the observations lie.
25
INTER-QUARTILE RANGE
• The range of a variable between first quartile and the third
quartile is called inter-quartile range.
• Interquartile range = Q3 – Q1
• Median is the second quartile.
• Half of the median is called semi – interquartile range or
sometimes quartile deviation which is a measure of
dispersion around the mean.
26
MEAN DEVIATION
• Because the mean has several advantages, it might seem logical to
measure dispersion by taking the “average deviation” from the mean.
That proves to be useless, because the sum of the deviations from the
mean is 0.
• However, this inconvenience can easily be solved by computing the
mean deviation, which is the average of the absolute value of the
deviations from the mean, as shown in the following formula:
Mean deviation = |𝑋 − 𝑋|
n
27
VARIANCE
• The variance is the sum of the squared deviations from the mean divided by the
number of values in the series minus 1.
• Variance is symbolized by 𝑆2 or V.
𝑆2
= Σ(X − X)2
/n where Σ(X − X)2
is called sum of squares.
• Dividing by N-1 (called degrees of freedom), instead of dividing by N, is necessary
for the sample variance to be an unbiased estimator of the population variance.
• The numerator of the variance (i.e., the sum of the squared deviations of the
observations from the mean) is an extremely important entity in statistics. It is usually
called either the sum of squares (abbreviated SS) or the total sum of squares.
28
STANDARD DEVIATION
• The standard deviation is a measure of the variability among the
individual values within a group.
• Loosely defined, it is a description of the average distance of
individual observations from the group mean.
• From one point of view, however, the s is similar to the mean; that is;
it represents the mean of the squared deviations.
29
• Taking the mean and the standard deviation together, a sample can be described
in terms of its average score and in terms of its average variation.
• If more samples were taken from the same population it would be possible to
predict with some accuracy the average score of these samples and also the
amount of variation.
• The mathematical derivation of the standard deviation is presented here in some
detail because the intermediate steps in its calculation.
• (1) create a theme (called “sum of squares”) that is repeated over and over in
statistical arithmetic and (2) create the quantity known as the sample variance.
30
• The standard deviation is reported along with the sample mean, usually
in the following format: mean ± SD.
• This format serves as a pertinent reminder that the SD measures the
variability of values surrounding the middle of the data set.
• It also leads us to the practical application of the concepts of mean and
standard deviation shown in the following rules of thumb:
X ± 1 SD encompasses approximately 68% of the values in a group.
X ± 2 SD encompasses approximately 95% of the values in a group.
X ± 3 SD encompasses approximately 99% of the values in a group.
31
• These rules of thumb are useful when deciding whether to report
the mean ± SD or the median and range as the appropriate
descriptive statistics for a group of data points.
• If roughly 95% of the values in a group are contained in the
interval ‘X’ ± 2SD, researchers tend to use mean ± SD. Otherwise
the median and the range are perhaps more appropriate.
32
Applications and characteristics
1. The standard deviation is extremely important in sampling theory, in co relational
analysis, in estimating reliability of measures, and in determining relative position of an
individual within a distribution of scores and between distributions of scores.
2. The standard deviation is the most widely used estimate of variation because of its
known algebraic properties and its amenability to use with other statistics.
3. It also provides a better estimate of variation in the population than the other indexes.
33
4. When the standard deviation of any sample is small, the sample mean is
close to any individual value.
5. When standard deviation of a random sample is small, the sample mean is
likely to be close to the mean of all the data in the population.
6. The standard deviation decreases when the sample size increases.
34
COEFFICIENT OF VARIATION
• The coefficient of variation is the ratio of the standard deviation of a series to
the arithmetic mean of the series.
• The coefficient of variation is unit less and is expressed as a percentage.
Application and characteristics
The co efficient of variation is used to compare the relative variation, or spread,
of the distributions of different series, samples, or populations or of the
distributions of different characteristics of a single series.
35
Calculation:
• The coefficient of variation (CV) is calculated as CV (%) = SD / X х100
• For example,
In a typical medical school, the mean weight of 100 fourth-year medical
students is 140 lb, with a standard deviation of 28 lb.
CV (%) = 28 / 140 х 100 = 20%
The coefficient of variation for weight is 28 lb divided by 140 lb, or 20%.
36
NORMAL DISTRIBUTION
• Normal distribution, also called Gaussian distribution, is a continuous,
symmetric, bell shaped distribution and can be defined by a number of
measures.
• The majority of measurements of continuous data in medicine and
biology tend to approximate the theoretical distribution that is known as
the normal distribution and is also called the Gaussian distribution
(named after Johann Karl Gauss, the person who best described it).
37
• The normal distribution is one of the most frequently used distributions in biomedical and dental
research.
• The normal distribution is a population frequency distribution.
• It is characterized by a bell-shaped curve that is unimodal and is symmetric around the mean of the
distribution.
• The normal curve depends on two parameters: the population mean and the population standard
deviation.
• In order to discuss the area under the normal curve in terms of easily seen percentages of the
population distribution, the normal distribution has been standardized to the normal distribution in
which the population mean is 0 and the population standard deviation is 1.
• The area under the normal curve can be segmented starting with the mean in the center (on the x
axis) and moving by increments of 1 SD above and below the mean.
38
Figure shows a standard normal distribution (mean = 0; SD= 1) and the
percentages of area under the curve at each increment of SD.
39
• The total area beneath the normal curve is 1, or 100% of the observations in the
population represented by the curve.
• As indicated in the figure, the portion of the area under the curve between the
mean and 1 SD is 34.13% of the total area.
• The same area is found between the mean and one unit below the mean.
• Moving 2 SD more above the mean cuts off an additional 13.59% of the area,
and moving a total of 3 SD above the mean cuts off another 2.27%.
40
• The theory of the standard normal distribution leads us, therefore, to the following
property of a normally distributed variable:
Exactly 68.26% of the observations lie within 1 SD of the mean.
Exactly 95.45% of the observations lie within 2 SD of the mean.
Exactly 99.73% of the observations lie within 3 SD of the mean.
• Virtually all of the observations are contained within 3 SD of the mean. This is the
justification used by those who label values outside of the interval `X ± 3 SD as
“outliers” or unlikely values.
• Incidentally, the number of standard deviations away from the mean is called Z
score.
41
MEASURES OF ASYMMETRY
• Skewness
• kurtosis
42
SKEWNESS
A horizontal stretching of a frequency distribution to one side or
the other, so that one tail of observations is longer and has more
observations than the other tail, is called skewness.
43
• If a distribution is skewed, the mean moves farther in the direction of the
long tail than does the median, because the mean is more heavily
influenced by extreme values.
44
KURTOSIS
• It is characterized by a vertical
stretching of the frequency distribution.
• It is the measure of the peakedness of
a probability distribution.
• As shown in the figure kurtotic
distribution could look more peaked or
could look more flattened than the bell
shaped normal distribution.
• A normal distribution has zero kurtosis.
45
46
• Any distribution with kurtosis =3 is called as Mesokurtic.
• In Leptokurtic, the central peak is higher & sharper , tails are longer & flatter.
• In platykurtic, the central peak is lower & broader, tails are short & thinner.
MEASURES OF RELATIONSHIP
Correlation :
• This is used to assess the relationship between two continuous
variables within a group of subjects.
• This is used for quantifying any association between two
continuous variables. But it does not prove that one particular
variable alone causes the change in the other.
47
Correlation coefficient :
• This a measure of degree of straight line association
between two continuous variables.
• It is denoted by ‘r’ which may vary from -1 or +1.
• This can be of 5 types:
r = +1 [ perfect positive correlation ]
r = -1 [ perfect negative correlation ]
r = 0 [ no correlation ]
0 < r < 1 [ partially positive correlation ]
0 > r > -1 [ partially negative correlation ]
48
Types of correlation
49
CONCLUSION
• In conclusion we would like to know that the best research studies are
initiated with a statistical plan already created.
• This plan may or may not have been developed with the assistance of a
statistician.
• The first step of data analysis is usually to describe the sample and then
sub groups within the sample. Frequency distribution, mean, median,
mode, range and the standard deviation are the most commonly used
statistics for accomplishing this task.
• This information can also be used as a background for the discussion
regarding inferential statistics.
50
REFERENCES :
 SANJEEV. B SARMUKADDAM, FUNDAMENTALS OF BIOSTATISTICS, 1st EDITION,
NEW DELHI, JITENDRA.P, 2006
 JOHN W. BEST AND JAMES V. KAHN, RESEARCH IN EDUCATION, 9th EDITION,
NEW DELHI, ASOKE K. GHOSH, 2006
 JAY S. KIM AND RONALD J. DAILEY, BIOSTATISTICS FOR ORAL HEALTH CARE, 1st
EDITION, NEW DELHI, BLACKWELL, 2008
 C. R. KOTHARI, RESEARCH METHODOLOGY, 2nd EDITION, NEW DELHI, NEW AGE
INTERNATIONAL LIMITED, 2004
 RONALD N. FORTHOFER, INTRODUCTION TO BIOSTATISTICS, LONDON,
ACADEMIC PRESS, 1995
51
 BRATATI BANERJEE, MAHAJAN’S METHODS IN BIOSTATISTICS, 9th
EDITION, NEW DELHI, JAYPEE BROTHERS, 2018
 F GAO SMITH AND J E SMITH, CLINICAL RESEARCH, 2nd EDITION, UK, BIOS
SCIENTIFIC PUBLISHERS LIMITED, 2005
 JAMES. F JEKEL, EPIDEMIOLOGY, BIOSTATISTICS AND PREVENTIVE
MEDICINE, 3rd EDITION, SAUNDERS, ELSEVIER PUBLICATIONS, 2007
 CHERYL BAGLEY THOMPSON, ‘DESCRIPTIVE DATA ANALYSIS’, AIR
DENTAL JOURNAL, 2009, VOLUME 28 [ 2 ] : 56 - 59
52

More Related Content

PPTX
Descriptive statistics
PPTX
Measures of relationship
PPTX
Descriptive Statistics
PPTX
PPTX
A.1 properties of point estimators
PPTX
Anova (f test) and mean differentiation
PPTX
SAMPLING AND ESTIMATION PPT.pptx
PPTX
Descriptive statistics
Descriptive statistics
Measures of relationship
Descriptive Statistics
A.1 properties of point estimators
Anova (f test) and mean differentiation
SAMPLING AND ESTIMATION PPT.pptx
Descriptive statistics

What's hot (20)

PPT
Descriptive Statistics
PPTX
Statistical inference
PPTX
quartiles,deciles,percentiles.ppt
PPT
Measure of central tendency
PPTX
Population vs sample
PPTX
Multiple linear regression
PPTX
Lecture 6. univariate and bivariate analysis
PPT
PPTX
Measures of central tendency and dispersion
PPTX
Correlation analysis
PPT
PPTX
Levels of measurement
PPTX
Basics of Statistical Analysis
PPTX
Measures of central tendency ppt
PPTX
Frequency Distributions
PPT
Generating the research hypothesis
PPTX
Correlation Analysis
PPTX
Multivariate data analysis
PPTX
Regression Analysis
PPT
Measures of dispersion
Descriptive Statistics
Statistical inference
quartiles,deciles,percentiles.ppt
Measure of central tendency
Population vs sample
Multiple linear regression
Lecture 6. univariate and bivariate analysis
Measures of central tendency and dispersion
Correlation analysis
Levels of measurement
Basics of Statistical Analysis
Measures of central tendency ppt
Frequency Distributions
Generating the research hypothesis
Correlation Analysis
Multivariate data analysis
Regression Analysis
Measures of dispersion
Ad

Similar to descriptive data analysis (20)

PPT
Ch2 Data Description
PPT
Business statistics
PPT
UNIT III -Central Tendency.ppt
PPTX
chapter3 Central Tendency statistics.ppt
PPT
chapter3.ppt
PPTX
Measure OF Central Tendency
PPTX
Statistics for Medical students
PPTX
Unit 3_1.pptx
PPTX
Statr sessions 4 to 6
PDF
SUMMARY MEASURES.pdf
PPTX
3. Statistical Analysis.pptx
PDF
Upload 140103034715-phpapp01 (1)
PPTX
computation of measures of central tendency
PPTX
Measures of central tendency
PPTX
Biostatistics mean median mode unit 1.pptx
PPTX
Descriptive Statistics.pptx
PPT
2. Descriptive Numerical Summary Measures-2023(2).ppt
PPTX
Measures of central tendancy
PPTX
Descriptive statistics
PPTX
2. chapter ii(analyz)
Ch2 Data Description
Business statistics
UNIT III -Central Tendency.ppt
chapter3 Central Tendency statistics.ppt
chapter3.ppt
Measure OF Central Tendency
Statistics for Medical students
Unit 3_1.pptx
Statr sessions 4 to 6
SUMMARY MEASURES.pdf
3. Statistical Analysis.pptx
Upload 140103034715-phpapp01 (1)
computation of measures of central tendency
Measures of central tendency
Biostatistics mean median mode unit 1.pptx
Descriptive Statistics.pptx
2. Descriptive Numerical Summary Measures-2023(2).ppt
Measures of central tendancy
Descriptive statistics
2. chapter ii(analyz)
Ad

Recently uploaded (20)

PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Introduction to Business Data Analytics.
PPTX
IB Computer Science - Internal Assessment.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Quality review (1)_presentation of this 21
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
Computer network topology notes for revision
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
Mega Projects Data Mega Projects Data
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Fluorescence-microscope_Botany_detailed content
Introduction to Business Data Analytics.
IB Computer Science - Internal Assessment.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Major-Components-ofNKJNNKNKNKNKronment.pptx
Quality review (1)_presentation of this 21
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Introduction-to-Cloud-ComputingFinal.pptx
Supervised vs unsupervised machine learning algorithms
oil_refinery_comprehensive_20250804084928 (1).pptx
Computer network topology notes for revision
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to Knowledge Engineering Part 1
Data_Analytics_and_PowerBI_Presentation.pptx
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Mega Projects Data Mega Projects Data

descriptive data analysis

  • 2. DESCRIPTIVE DATA Presented by Dr. P. Gnana Sarita Kumari I MDS Department of Public Health Dentistry 2
  • 3. CONTENTS : INTRODUCTION  TYPES OF VARIABLES AND LEVELS OF MEASUREMENT  MEASURES OF CENTRAL TENDENCY  MEASURES OF DISPERSION  NORMAL DISTRIBUTION  MEASURES OF ASYMMETRY  MEASURES OF RELATIONSHIP  CONCLUSION  REFERENCES 3
  • 4. DESCRIPTIVE ANALYSIS : • The data describe one group and that group only. • Descriptive data analysis limits generalization to the particular group of individuals observed. • No conclusions are extended beyond this group. • It provides valuable information about the nature of a particular group of individuals. INTRODUCTION 4
  • 5. CLASSIFICATION OF VARIABLES QUALITATIVE QUANTITATIVE NOMINAL ORDINAL DISCRETE CONTINUOUS 5
  • 6. LEVELS OF MEASUREMENT • Introduced by STEVENS 6
  • 7. NOMINAL MEASUREMENT SCALE Nomina scale Represents Simplest of data Values in unordered categories No quantitative relationship Numbers are used for the sake of convenience 7
  • 8. ORDINAL MEASUREMENT SCALE Ordina scale Can be ordered or ranked Though ordered is not quantified Number or label assigned does indicate magnitude Precise measurement of differences does not exist 8
  • 9. INTERVAL MEASUREMENT SCALE Interva scale Observations can be ordered Precise differences between units of measure exist No meaningful absolute zero 9
  • 10. RATIO MEASUREMENT SCALE Possess same properties as that of interval scale • Highest level of measurement In this a true zero exist 10
  • 11. MEASURES OF CENTRAL TENDENCY • Mean • Median • Mode 11
  • 12. TYPES OF MEAN • Sample mean • Weighted mean • Geometric mean • Harmonic mean • Mean of two or more means 12
  • 13. SAMPLE MEAN Mean = Total or Sum of observations Number of observations For ungrouped series it is Calculated by : 1. DIRECT METHOD 2. ASSUMED MEAN METHOD Where, 13
  • 14. WEIGHTED MEAN • Grouped data with a range of values :  Also called GRAND MEAN Calculation : 𝑋 𝑤= 𝑤1 𝑋1 + 𝑤2 𝑋2 + …….. + 𝑤 𝑛 𝑋 𝑛 = 𝑖=1 𝑛 𝑤𝑖𝑋𝑖 o By middle point method o By alternative method Let 𝑋1, 𝑋2,….., 𝑋 𝑛 be n measurements, and their relative importance be expressed by a corresponding set of numbers 𝑤1, 𝑤2,…..., 𝑤 𝑛 14
  • 15. GEOMETRIC MEAN • The sample geometric mean of n non-negative observations, 𝑋1, 𝑋2,….., 𝑋 𝑛, in a sample is defined by 𝒏 𝒕𝒉 root of the product. 𝑋 𝐺 = 𝑛 𝑋1. 𝑋2….. 𝑋 𝑛 = [𝑋1, 𝑋2,….., 𝑋 𝑛]1/𝑛 • If there are any negative measurements in a data set, the geometric mean cannot be used. 15
  • 16. HARMONIC MEAN • Harmonic mean is defined as the reciprocal of the average of reciprocals of the values of items of a series. • Harmonic mean 16
  • 17. MEAN OF TWO OR MORE MEANS 17
  • 18. MEDIAN • The median is the value that divides the distribution of data points into two equal parts, that is, the value at which 50% of the data points lie above it and 50% lie below it. • The median is the middle of the quartiles (the values that divide the series into quarters) and the middle of the percentiles (the values that divide the series into defined percentages). 18
  • 19. Calculation : Median for ungrouped series : a) In a series with an odd number of untied values, the values in the series are arranged from lowest to highest, and the value that divides the series in half is the median. b) In a series with even number of untied values, the two values that divide the series in half are determined, and the arithmetic mean of these values is the median. c) An alternative method for calculating the median is to determine the 50% value on a cumulative frequency curve. 19
  • 20. d] If the data include tied scores at the median point, interpolation within the tied scores is necessary. • Lets consider a series of 70, 73, 74, 75, 75, 75, 75, 80 in which the mid point observations were tied. 20
  • 21. Median for grouped data : 21
  • 22. MODE • The mode of a data set is that value that occurs with the greatest frequency. • Whenever there are two non-adjacent scores with the same frequency and they are the highest in the distribution, each score may be referred to as the ‘mode’ and the distribution is ‘bimodal’. • In truly bimodal distribution, the population contains two sub-groups, each of which has a different distribution that peaks at a different point. • Calculation : Mode = Mean – 3 [ Mean – Median ] or = 3 Median – 2 Mean 22
  • 23. MEASURES OF DISPERSION • Percentile • Range • Inter-quartile range • Mean deviation • Standard deviation • Variance • Coefficient of variation • To understand the data more completely, it is necessary to know how the members of the data set arrange themselves about the central or typical value. • The following questions must be answered: 1. How spread out are the data points? 2. How stable are the values in the group? Based on percentiles Based on mean 23
  • 24. RANGE • The range is the difference between the highest and lowest values in a series. Range = Maximum – Minimum. • For example in the following series : 8, 8,10,10,10,12,13,14,15,16,58 Range = 18-8 = 10 min 24
  • 25. PERCENTILE • These are the percentage of observations below the point indicated when all of the observations are ranked in ascending order. • The median is the 50th percentile. • The 75th percentile is the point below which 75% of the observations lie, while the 25th percentile is the point below which 25% of the observations lie. 25
  • 26. INTER-QUARTILE RANGE • The range of a variable between first quartile and the third quartile is called inter-quartile range. • Interquartile range = Q3 – Q1 • Median is the second quartile. • Half of the median is called semi – interquartile range or sometimes quartile deviation which is a measure of dispersion around the mean. 26
  • 27. MEAN DEVIATION • Because the mean has several advantages, it might seem logical to measure dispersion by taking the “average deviation” from the mean. That proves to be useless, because the sum of the deviations from the mean is 0. • However, this inconvenience can easily be solved by computing the mean deviation, which is the average of the absolute value of the deviations from the mean, as shown in the following formula: Mean deviation = |𝑋 − 𝑋| n 27
  • 28. VARIANCE • The variance is the sum of the squared deviations from the mean divided by the number of values in the series minus 1. • Variance is symbolized by 𝑆2 or V. 𝑆2 = Σ(X − X)2 /n where Σ(X − X)2 is called sum of squares. • Dividing by N-1 (called degrees of freedom), instead of dividing by N, is necessary for the sample variance to be an unbiased estimator of the population variance. • The numerator of the variance (i.e., the sum of the squared deviations of the observations from the mean) is an extremely important entity in statistics. It is usually called either the sum of squares (abbreviated SS) or the total sum of squares. 28
  • 29. STANDARD DEVIATION • The standard deviation is a measure of the variability among the individual values within a group. • Loosely defined, it is a description of the average distance of individual observations from the group mean. • From one point of view, however, the s is similar to the mean; that is; it represents the mean of the squared deviations. 29
  • 30. • Taking the mean and the standard deviation together, a sample can be described in terms of its average score and in terms of its average variation. • If more samples were taken from the same population it would be possible to predict with some accuracy the average score of these samples and also the amount of variation. • The mathematical derivation of the standard deviation is presented here in some detail because the intermediate steps in its calculation. • (1) create a theme (called “sum of squares”) that is repeated over and over in statistical arithmetic and (2) create the quantity known as the sample variance. 30
  • 31. • The standard deviation is reported along with the sample mean, usually in the following format: mean ± SD. • This format serves as a pertinent reminder that the SD measures the variability of values surrounding the middle of the data set. • It also leads us to the practical application of the concepts of mean and standard deviation shown in the following rules of thumb: X ± 1 SD encompasses approximately 68% of the values in a group. X ± 2 SD encompasses approximately 95% of the values in a group. X ± 3 SD encompasses approximately 99% of the values in a group. 31
  • 32. • These rules of thumb are useful when deciding whether to report the mean ± SD or the median and range as the appropriate descriptive statistics for a group of data points. • If roughly 95% of the values in a group are contained in the interval ‘X’ ± 2SD, researchers tend to use mean ± SD. Otherwise the median and the range are perhaps more appropriate. 32
  • 33. Applications and characteristics 1. The standard deviation is extremely important in sampling theory, in co relational analysis, in estimating reliability of measures, and in determining relative position of an individual within a distribution of scores and between distributions of scores. 2. The standard deviation is the most widely used estimate of variation because of its known algebraic properties and its amenability to use with other statistics. 3. It also provides a better estimate of variation in the population than the other indexes. 33
  • 34. 4. When the standard deviation of any sample is small, the sample mean is close to any individual value. 5. When standard deviation of a random sample is small, the sample mean is likely to be close to the mean of all the data in the population. 6. The standard deviation decreases when the sample size increases. 34
  • 35. COEFFICIENT OF VARIATION • The coefficient of variation is the ratio of the standard deviation of a series to the arithmetic mean of the series. • The coefficient of variation is unit less and is expressed as a percentage. Application and characteristics The co efficient of variation is used to compare the relative variation, or spread, of the distributions of different series, samples, or populations or of the distributions of different characteristics of a single series. 35
  • 36. Calculation: • The coefficient of variation (CV) is calculated as CV (%) = SD / X х100 • For example, In a typical medical school, the mean weight of 100 fourth-year medical students is 140 lb, with a standard deviation of 28 lb. CV (%) = 28 / 140 х 100 = 20% The coefficient of variation for weight is 28 lb divided by 140 lb, or 20%. 36
  • 37. NORMAL DISTRIBUTION • Normal distribution, also called Gaussian distribution, is a continuous, symmetric, bell shaped distribution and can be defined by a number of measures. • The majority of measurements of continuous data in medicine and biology tend to approximate the theoretical distribution that is known as the normal distribution and is also called the Gaussian distribution (named after Johann Karl Gauss, the person who best described it). 37
  • 38. • The normal distribution is one of the most frequently used distributions in biomedical and dental research. • The normal distribution is a population frequency distribution. • It is characterized by a bell-shaped curve that is unimodal and is symmetric around the mean of the distribution. • The normal curve depends on two parameters: the population mean and the population standard deviation. • In order to discuss the area under the normal curve in terms of easily seen percentages of the population distribution, the normal distribution has been standardized to the normal distribution in which the population mean is 0 and the population standard deviation is 1. • The area under the normal curve can be segmented starting with the mean in the center (on the x axis) and moving by increments of 1 SD above and below the mean. 38
  • 39. Figure shows a standard normal distribution (mean = 0; SD= 1) and the percentages of area under the curve at each increment of SD. 39
  • 40. • The total area beneath the normal curve is 1, or 100% of the observations in the population represented by the curve. • As indicated in the figure, the portion of the area under the curve between the mean and 1 SD is 34.13% of the total area. • The same area is found between the mean and one unit below the mean. • Moving 2 SD more above the mean cuts off an additional 13.59% of the area, and moving a total of 3 SD above the mean cuts off another 2.27%. 40
  • 41. • The theory of the standard normal distribution leads us, therefore, to the following property of a normally distributed variable: Exactly 68.26% of the observations lie within 1 SD of the mean. Exactly 95.45% of the observations lie within 2 SD of the mean. Exactly 99.73% of the observations lie within 3 SD of the mean. • Virtually all of the observations are contained within 3 SD of the mean. This is the justification used by those who label values outside of the interval `X ± 3 SD as “outliers” or unlikely values. • Incidentally, the number of standard deviations away from the mean is called Z score. 41
  • 42. MEASURES OF ASYMMETRY • Skewness • kurtosis 42
  • 43. SKEWNESS A horizontal stretching of a frequency distribution to one side or the other, so that one tail of observations is longer and has more observations than the other tail, is called skewness. 43
  • 44. • If a distribution is skewed, the mean moves farther in the direction of the long tail than does the median, because the mean is more heavily influenced by extreme values. 44
  • 45. KURTOSIS • It is characterized by a vertical stretching of the frequency distribution. • It is the measure of the peakedness of a probability distribution. • As shown in the figure kurtotic distribution could look more peaked or could look more flattened than the bell shaped normal distribution. • A normal distribution has zero kurtosis. 45
  • 46. 46 • Any distribution with kurtosis =3 is called as Mesokurtic. • In Leptokurtic, the central peak is higher & sharper , tails are longer & flatter. • In platykurtic, the central peak is lower & broader, tails are short & thinner.
  • 47. MEASURES OF RELATIONSHIP Correlation : • This is used to assess the relationship between two continuous variables within a group of subjects. • This is used for quantifying any association between two continuous variables. But it does not prove that one particular variable alone causes the change in the other. 47
  • 48. Correlation coefficient : • This a measure of degree of straight line association between two continuous variables. • It is denoted by ‘r’ which may vary from -1 or +1. • This can be of 5 types: r = +1 [ perfect positive correlation ] r = -1 [ perfect negative correlation ] r = 0 [ no correlation ] 0 < r < 1 [ partially positive correlation ] 0 > r > -1 [ partially negative correlation ] 48
  • 50. CONCLUSION • In conclusion we would like to know that the best research studies are initiated with a statistical plan already created. • This plan may or may not have been developed with the assistance of a statistician. • The first step of data analysis is usually to describe the sample and then sub groups within the sample. Frequency distribution, mean, median, mode, range and the standard deviation are the most commonly used statistics for accomplishing this task. • This information can also be used as a background for the discussion regarding inferential statistics. 50
  • 51. REFERENCES :  SANJEEV. B SARMUKADDAM, FUNDAMENTALS OF BIOSTATISTICS, 1st EDITION, NEW DELHI, JITENDRA.P, 2006  JOHN W. BEST AND JAMES V. KAHN, RESEARCH IN EDUCATION, 9th EDITION, NEW DELHI, ASOKE K. GHOSH, 2006  JAY S. KIM AND RONALD J. DAILEY, BIOSTATISTICS FOR ORAL HEALTH CARE, 1st EDITION, NEW DELHI, BLACKWELL, 2008  C. R. KOTHARI, RESEARCH METHODOLOGY, 2nd EDITION, NEW DELHI, NEW AGE INTERNATIONAL LIMITED, 2004  RONALD N. FORTHOFER, INTRODUCTION TO BIOSTATISTICS, LONDON, ACADEMIC PRESS, 1995 51
  • 52.  BRATATI BANERJEE, MAHAJAN’S METHODS IN BIOSTATISTICS, 9th EDITION, NEW DELHI, JAYPEE BROTHERS, 2018  F GAO SMITH AND J E SMITH, CLINICAL RESEARCH, 2nd EDITION, UK, BIOS SCIENTIFIC PUBLISHERS LIMITED, 2005  JAMES. F JEKEL, EPIDEMIOLOGY, BIOSTATISTICS AND PREVENTIVE MEDICINE, 3rd EDITION, SAUNDERS, ELSEVIER PUBLICATIONS, 2007  CHERYL BAGLEY THOMPSON, ‘DESCRIPTIVE DATA ANALYSIS’, AIR DENTAL JOURNAL, 2009, VOLUME 28 [ 2 ] : 56 - 59 52