1. Data Science - Normal Distribution
Dr.M.Pyingkodi
Associate Professor
Dept. of MCA
Kongu Engineering College
Erode, Tamil Nadu,India
2. • A Normal Distribution
• A Normal Distribution is a bell-shaped curve that is symmetric about the
mean, where most of the data points are clustered around the center and
fewer are found as you move away from the mean.
• Characteristics
• 1. Symmetry
• The distribution is symmetric about the mean.
• This means that the left side of the distribution is a mirror image of the
right side.
• 2. Bell-Shaped Curve
• The graph of a normal distribution is shaped like a bell, with a single
peak at the mean.
• It rises gradually on both sides and tails off symmetrically.
• 3. Mean, Median, and Mode Equality
• In a normal distribution, the mean, median, and mode are all equal and
located at the center of the distribution.
• This central point is also known as the peak of the curve.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 2
3. • A Normal Distribution
• 4. 68-95-99.7 Rule
• Approximately 68% of the data falls within one standard deviation of the
mean, about 95% falls within two standard deviations, and about 99.7%
falls within three standard deviations.
• This is often referred to as the empirical rule.
• 5. Asymptotic Nature
• The tails of the normal distribution approach, but never actually touch,
the horizontal axis.
• This means that there is a theoretical possibility of observing values far
from the mean, though these occurrences become increasingly rare.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 3
4. z-score
• A z-score is a statistical measure that indicates how many standard
deviations a data point is from the mean of a dataset.
• It is a way to standardize scores from different distributions, allowing for
comparison.
What a Z-Score Indicates ?
• 1.Position Relative to the Mean
• A z-score tells you how far away a specific value (data point) is from the
mean.
A positive z-score indicates that the data point is above the
mean.
A negative z-score indicates that the data point is below the
mean.
A z-score of 0 means the data point is exactly at the mean.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 4
5. • 2. Standard Deviations
• The value of the z-score represents the number of standard deviations
the data point is from the mean.
• For example, a z-score of 2 means the data point is two standard
deviations above the mean, while a z-score of -1.5 means
• it is one and a half standard deviations below the mean.
• Example
• Consider a dataset of test scores for a class, with the following
characteristics:
•
• Mean (μmuμ) - 70
• Standard Deviation (σsigmaσ) -10
• Suppose a student scores 85 on the test.
• To calculate the z-score for this student's score, we use the formula
• Where
• X = the data point (student's score)
• m= the mean
• σ = the standard deviation
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 5
6. • Substituting in the values
• In this example, the student's z-score is 1.5.
• This means that the student scored 1.5 standard deviations above the
mean score of the class.
• This indicates a relatively high performance compared to the average
student in the class.
• Using z-scores allows us to understand the position of a data point
within the context of the overall dataset, making it easier to compare
scores across different distributions.
• Why Z-Score ?
• Comparison Across Different Datasets
• Z-scores allow us to compare values from different datasets, even if they
have different units or scales.
•
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 6
7. • For instance, if you're comparing the performance of two students on
different tests, the z-scores normalize their scores, making it easier to
compare their relative performance.
• Probability and Percentiles
• In a normal distribution, z-scores are tied to probabilities.
• For example, a z-score of +1 corresponds to about the 84th percentile of
the data, meaning that 84% of the data points are below this score.
• Identifying Outliers
• Z-scores can highlight extreme values. If a z-score is very high or low
(e.g., beyond ±3), the value may be considered an outlier in the dataset.
• Imagine the average height of adult men in a population is 175 cm with a
standard deviation of 10 cm.
• This means he is 1.5 standard deviations taller than the average
• Z-scores make it easier to understand how unusual or typical a value is
relative to the overall data distribution.
Dr.M.Pyingkodi, Associate Professor, Dept of MCA , Kongu Engineering College, Tamil Nadu, India 7