SlideShare a Scribd company logo
Presented by,
Dr. J.C. Miraclin Joyce Pamila,
Professor & HOD,
Department of CSE,GCT, CBE-13.
miraclin@gct.ac.in
1 08/11/25
FOUNDATIONS OF DATA SCIENCE
DESCRIBING DATA
Level of Measurement
 Specifies the extent to which a number (or word or letter)
actually represents some attribute and, therefore, has
implications for the appropriateness of various arithmetic
operations and statistical procedures.
 Nominal: Categorized
 Ordinal : Categorized , Ranked
 Interval : Categorized , Ranked, Equally spaced
 Ratio : Categorized , Ranked, Equally spaced and has a natural zero
Data Types:
Nominal Vs Numeric
Variable Types:
Discrete Vs Continuous
08/11/25
2
DESCRIPTIVE STATISTICS
 Describing Data with Tables and Graphs
 Describing Data with Averages
 Describing Variability
 Normal Distributions and Standard (z) Scores
 Describing Relationships: Correlation
 Regression
08/11/25
3
Descriptive Statistics: statistics provides us with tools—tables, graphs,
averages, ranges, correlations—for organizing and summarizing the
inevitable variability in collections of actual observations or scores.
Inferential Statistics: Statistics also provides tools—a variety of tests and
estimates—for generalizing beyond collections of actual observations.
Examples:
(a) Students in my statistics class are, on average, 23 years old.
(b) The population of the world exceeds 7 billion
(c) Either four or eight years have been the most frequent terms of office
actually served by U.S. presidents.
(d) Sixty-four percent of all college students favor right-to-abortion laws
FDS_Descripdnkdkrnenjetive_analytics.ppt
FDS_Descripdnkdkrnenjetive_analytics.ppt
FDS_Descripdnkdkrnenjetive_analytics.ppt
FDS_Descripdnkdkrnenjetive_analytics.ppt
FDS_Descripdnkdkrnenjetive_analytics.ppt
OUTLIERS
• Check for accuracy
• Might exclude from summaries
• Might enhance understanding
Relative Frequency Distribution
• Relative Frequency
Distributions show the
frequency of each class
as a part or fraction of
the total frequency for
the entire distribution.
• Percentage or
Proportion?
• Improves understanding
• Do not add up to one due
to rounding off!
EXERCISE
Convert to Relative Frequency
and represent the same in the
percentage.
Cumulative Frequency Distribution
•A frequency distribution showing the total
number of observations in each class and
all lower ranked classes.
•Cumulative Percentages are often referred
as percentiles.
•Add to the frequency of each class the
frequency of all the low ranked classes
which gives cumulative frequency
distribution.
• Percentile Rank: Percentage of scores in
the entire distribution with similar or
smaller values than that score.
FREQUENCY DISTRIBUTION FOR QUALITATIVE DATA
• Ordered Qualitative Data: The ordering of the data needs to be
preserved in the frequency table.
• Relative and Cumulative Frequency Distribution: Can be done as
done with Numeric data.
GRAPHS FOR QUANTITATIVE DATA
Histogram:
A Bar Graph for Quantitative Data
Common boundaries between adjacent
bars emphasize the continuity of data , as
with continuous variable.
X- class intervals ;
Y- Class Frequency;
wiggly lines to show breaks in scale.
Frequency Polygon – Line Graph
• A Line graph for quantitative data to
emphasize the continuity of
continuous variables.
• In the histogram, place dots in the
mid of each bar type, and at mid
points on he horizontal axis, in the
absence of bar tops. Connect all the
dots to get a line graph.
• Extend the lower and upper tails to
the mid of the previous and next
classes respectively.
Stem and Leaf Displays
• To sort Quantitative data on
the basis of leading and
trailing digits.
• Draw a vertical line to separate
the stem (Multiples of 10)
from the leaf (Multiples of 1).
• Selection of stems.(Thousands,
Hundreds, One Tenths….)
EXERCISE
DRAW STEM LEAF GRAPH
SHAPES OF FREQUENCY DISTRIBUTION
GRAPH FOR QUALITATIVE DATA
EXERCISE
FDS_Descripdnkdkrnenjetive_analytics.ppt
Measures of Central Tendency
Mode: The value of most happening score.
Bimodal: Distribution with more than two obvious
peaks.
MEDIAN
Middle value when observations are ordered from least to most.
EXERCISES
MEAN
FDS_Descripdnkdkrnenjetive_analytics.ppt
MEASURES OF VARIABILITY
• Measures the amount the values are dispersed or scattered in
the distribution
• Range, IQR, Variance and Standard Deviation
Mean Difference doesn’t help!!!
RANGE & VARIANCE
• Difference between Maximum and Minimum
• The size of the range vary with the size of the group
• Deviation of the mean: Distance between the value and the
mean.
• Deviation above means have +ve values while deviation
below the means have negative values.
• The sum of all these deviations nullify each other.
Variance: Sum of all squared deviations.
NOT VARIANCE BUT STANDARD DEVIATION
• Variance gives squared
dimension which is not
interpretable.
• Standard Deviation: Square root of the
sum of all squared deviations
• It is the rough measure of the average
amount by which values deviate on
either side of the mean.
STANDARD DEVIATION
• For most frequency distributions , majority(68%) of the values are
within one standard deviation on either side of the mean.
• For most frequency distributions , minority(5%) of the values are
within one standard deviation on either side of the mean
SOLVE:
You grow 20 crystals from a solution and measure the length of each
crystal in millimeters.
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Calculate the range, mean, sample standard deviation of the length of the
crystals.
FDS_Descripdnkdkrnenjetive_analytics.ppt
08/11/25
34
FDS_Descripdnkdkrnenjetive_analytics.ppt
FDS_Descripdnkdkrnenjetive_analytics.ppt
NORMAL DISTRIBUTION
NORMAL DISTRIBUTIONS AND STANDARD Z
SCORES
Properties of the Normal Curve
Obtained from a mathematical equation, the normal curve is a
theoretical curve defined for a continuous variable and noted for its
symmetrical bell-shaped form.
■ The normal curve is symmetrical; lower half is the mirror image of
upper half.
■ Being bell shaped, the normal curve peaks above a point midway
along the horizontal spread and then tapers off gradually in either
direction from the peak (without actually touching the horizontal
axis, the tails of a normal curve extend infinitely far).
■ The values of the mean, median and mode, located at a point
midway along the horizontal spread, are the same.
NORMAL CURVE
Z - SCORE
A z score is a unit-free, standardized score that, regardless of the original units
of measurement, indicates how many standard deviations a score is above or
below the mean of its distribution
A z score consists of two parts:
1. a positive or negative sign indicating whether it’s above or
below the mean; and
2. a number indicating the size of its deviation from the mean in
standard deviation units.
(a) Margaret’s IQ of 135, given a mean of 100 and a standard deviation of 15 (135-
100/15=2.33)
(b) a score of 470 on the SAT math test, given a mean of 500 and a standard
deviation of 100 (470-500/100)=-0.3)
(c) a daily production of 2100 loaves of bread by a bakery, given a mean of 2180
and a standard deviation of 50. (2100-2180/50=-1.60)
STANDARD NORMAL CURVE
If the original distribution approximates a normal curve, then the shift to
standard or z scores will always produce a new distribution that
approximates the standard normal curve.
Standard Normal Curve: The tabled normal curve for z scores, with a
mean of 0 and a standard deviation of 1.
Although there is an infinite number of different normal curves, each
with its own mean and standard deviation, there is only one standard
normal curve, with a mean of 0 and a standard deviation of 1.
STANDARD NORMAL TABLE
STANDARD NORMAL TABLE
FINDING PROPOTIONS
Using Table A in Appendix C, find the proportion of the total area identified with
the following statements:
(a)above a z score of 1.80 (b) between the mean and a z
score of –0.43
0.0359
0.1664
Assume that GRE scores
approximate a normal curve with a
mean of 500 and a standard
deviation of 100.
(a) Sketch a normal curve and
shade in the target area described
by each of the following statements:
(i) less than 400
(ii) more than 650
(iii) less than 700
EXERCISE
Assume that SAT math scores approximate a normal curve with a mean of 500
and a standard deviation of 100.
(a) Sketch a normal curve and shade in the target area(s) described by each of
the following statements:
(i) more than 570
(ii) less than 515
(iii) between 520 and 540
FINDING SCORES
Exam scores for a large psychology class approximate a normal curve with a
mean of 230 and a standard deviation of 50. Furthermore, students are graded
“on a curve,” with only the upper 20 percent being awarded grades of A. What is
the lowest score on the exam that receives an A?
EXERCISE
Assume that the annual rainfall in the San Francisco area approximates a
normal curve with a mean of 22 inches and a standard deviation of 4
inches. What are the rainfalls for the more atypical years, defined as the
driest 2.5 percent of all years and the wettest 2.5 percent of all years?
THANK YOU
08/11/25
50

More Related Content

PPTX
Normal distribution
PDF
REPORT MATH.pdf
PDF
Lecture 01 probability distributions
PPTX
normal curve distribution biostatics course.pptx
PDF
Essentials of Statistics for the Behavioral Sciences 3rd Edition Nolan Soluti...
PPTX
Stats chapter 2
PDF
Chapter2 slides-part 2-harish complete
PPT
Descriptivestatistics
Normal distribution
REPORT MATH.pdf
Lecture 01 probability distributions
normal curve distribution biostatics course.pptx
Essentials of Statistics for the Behavioral Sciences 3rd Edition Nolan Soluti...
Stats chapter 2
Chapter2 slides-part 2-harish complete
Descriptivestatistics

Similar to FDS_Descripdnkdkrnenjetive_analytics.ppt (20)

PPTX
Introduction to Educational Statistics.pptx
PPT
Day 4 normal curve and standard scores
PDF
Chapter 4 MMW.pdf
PPTX
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
PDF
AEB801_20222023-lecture_04 Normal Distribution
PDF
AEB801_20222023-lecture_04 Symmetry Skewness
PPTX
Statistics and Probability- NORMAL DISTRIBUTION.pptx
PPTX
Introduction to Statistics Presentation.pptx
PDF
Biostatistics CH Lecture Pack
PPTX
Descriptive Stat numerical_-112700052.pptx
PPTX
Descriptive statistics: how to design and evaluate research in education
PPTX
Descrptive statistics
PPTX
Biostatistics Basics Descriptive and Estimation Methods
PPTX
Stat-Lesson.pptx
PPTX
The-Normal-Distribution, Statics and Pro
PPTX
Introduction-to-Normal-Distribution.pptx
PPTX
NORMAL CURVE in biostatistics and application
PPT
Manpreet kay bhatia Business Statistics.ppt
PPTX
Medical Statistics Part-I:Descriptive statistics
PPT
Introduction to statistics
Introduction to Educational Statistics.pptx
Day 4 normal curve and standard scores
Chapter 4 MMW.pdf
BIOSTATISTICS OVERALL JUNE 20241234567.pptx
AEB801_20222023-lecture_04 Normal Distribution
AEB801_20222023-lecture_04 Symmetry Skewness
Statistics and Probability- NORMAL DISTRIBUTION.pptx
Introduction to Statistics Presentation.pptx
Biostatistics CH Lecture Pack
Descriptive Stat numerical_-112700052.pptx
Descriptive statistics: how to design and evaluate research in education
Descrptive statistics
Biostatistics Basics Descriptive and Estimation Methods
Stat-Lesson.pptx
The-Normal-Distribution, Statics and Pro
Introduction-to-Normal-Distribution.pptx
NORMAL CURVE in biostatistics and application
Manpreet kay bhatia Business Statistics.ppt
Medical Statistics Part-I:Descriptive statistics
Introduction to statistics
Ad

Recently uploaded (20)

PDF
LSR CASEBOOK 2024-25.pdf. very nice casbook
PPTX
Digital Education Presentation for students.
PPTX
AREAS OF SPECIALIZATION AND CAREER OPPORTUNITIES FOR COMMUNICATORS AND JOURNA...
PPTX
Prokaryotes v Eukaryotes PowerPoint.pptx
PDF
Sheri Ann Lowe Compliance Strategist Resume
PPTX
CYBER SECURITY PPT.pptx CYBER SECURITY APPLICATION AND USAGE
PDF
Beginner’s Guide to Digital Marketing.pdf
PPTX
Principles of Inheritance and variation class 12.pptx
PPTX
FINAL PPT.pptx cfyufuyfuyuy8ioyoiuvy ituyc utdfm v
PDF
CV of Architect Professor A F M Mohiuddin Akhand.pdf
PDF
Parts of Speech Quiz Presentation in Orange Blue Illustrative Style.pdf.pdf
PDF
Shopify Store Management_ Complete Guide to E-commerce Success.pdf
PPT
ALLIED MATHEMATICS -I UNIT III MATRICES.ppt
PPTX
A slide for students with the advantagea
PDF
MCQ Practice CBT OL Official Language 1.pptx.pdf
PPT
pwm ppt .pdf long description of pwm....
PDF
Career Overview of John Munro of Hilton Head
PPT
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
PPT
BCH3201 (Enzymes and biocatalysis)-JEB (1).ppt
PPTX
DPT-MAY24.pptx for review and ucploading
LSR CASEBOOK 2024-25.pdf. very nice casbook
Digital Education Presentation for students.
AREAS OF SPECIALIZATION AND CAREER OPPORTUNITIES FOR COMMUNICATORS AND JOURNA...
Prokaryotes v Eukaryotes PowerPoint.pptx
Sheri Ann Lowe Compliance Strategist Resume
CYBER SECURITY PPT.pptx CYBER SECURITY APPLICATION AND USAGE
Beginner’s Guide to Digital Marketing.pdf
Principles of Inheritance and variation class 12.pptx
FINAL PPT.pptx cfyufuyfuyuy8ioyoiuvy ituyc utdfm v
CV of Architect Professor A F M Mohiuddin Akhand.pdf
Parts of Speech Quiz Presentation in Orange Blue Illustrative Style.pdf.pdf
Shopify Store Management_ Complete Guide to E-commerce Success.pdf
ALLIED MATHEMATICS -I UNIT III MATRICES.ppt
A slide for students with the advantagea
MCQ Practice CBT OL Official Language 1.pptx.pdf
pwm ppt .pdf long description of pwm....
Career Overview of John Munro of Hilton Head
Gsisgdkddkvdgjsjdvdbdbdbdghjkhgcvvkkfcxxfg
BCH3201 (Enzymes and biocatalysis)-JEB (1).ppt
DPT-MAY24.pptx for review and ucploading
Ad

FDS_Descripdnkdkrnenjetive_analytics.ppt

  • 1. Presented by, Dr. J.C. Miraclin Joyce Pamila, Professor & HOD, Department of CSE,GCT, CBE-13. miraclin@gct.ac.in 1 08/11/25 FOUNDATIONS OF DATA SCIENCE DESCRIBING DATA
  • 2. Level of Measurement  Specifies the extent to which a number (or word or letter) actually represents some attribute and, therefore, has implications for the appropriateness of various arithmetic operations and statistical procedures.  Nominal: Categorized  Ordinal : Categorized , Ranked  Interval : Categorized , Ranked, Equally spaced  Ratio : Categorized , Ranked, Equally spaced and has a natural zero Data Types: Nominal Vs Numeric Variable Types: Discrete Vs Continuous 08/11/25 2
  • 3. DESCRIPTIVE STATISTICS  Describing Data with Tables and Graphs  Describing Data with Averages  Describing Variability  Normal Distributions and Standard (z) Scores  Describing Relationships: Correlation  Regression 08/11/25 3
  • 4. Descriptive Statistics: statistics provides us with tools—tables, graphs, averages, ranges, correlations—for organizing and summarizing the inevitable variability in collections of actual observations or scores. Inferential Statistics: Statistics also provides tools—a variety of tests and estimates—for generalizing beyond collections of actual observations. Examples: (a) Students in my statistics class are, on average, 23 years old. (b) The population of the world exceeds 7 billion (c) Either four or eight years have been the most frequent terms of office actually served by U.S. presidents. (d) Sixty-four percent of all college students favor right-to-abortion laws
  • 10. OUTLIERS • Check for accuracy • Might exclude from summaries • Might enhance understanding
  • 11. Relative Frequency Distribution • Relative Frequency Distributions show the frequency of each class as a part or fraction of the total frequency for the entire distribution. • Percentage or Proportion? • Improves understanding • Do not add up to one due to rounding off!
  • 12. EXERCISE Convert to Relative Frequency and represent the same in the percentage.
  • 13. Cumulative Frequency Distribution •A frequency distribution showing the total number of observations in each class and all lower ranked classes. •Cumulative Percentages are often referred as percentiles. •Add to the frequency of each class the frequency of all the low ranked classes which gives cumulative frequency distribution. • Percentile Rank: Percentage of scores in the entire distribution with similar or smaller values than that score.
  • 14. FREQUENCY DISTRIBUTION FOR QUALITATIVE DATA • Ordered Qualitative Data: The ordering of the data needs to be preserved in the frequency table. • Relative and Cumulative Frequency Distribution: Can be done as done with Numeric data.
  • 15. GRAPHS FOR QUANTITATIVE DATA Histogram: A Bar Graph for Quantitative Data Common boundaries between adjacent bars emphasize the continuity of data , as with continuous variable. X- class intervals ; Y- Class Frequency; wiggly lines to show breaks in scale.
  • 16. Frequency Polygon – Line Graph • A Line graph for quantitative data to emphasize the continuity of continuous variables. • In the histogram, place dots in the mid of each bar type, and at mid points on he horizontal axis, in the absence of bar tops. Connect all the dots to get a line graph. • Extend the lower and upper tails to the mid of the previous and next classes respectively.
  • 17. Stem and Leaf Displays • To sort Quantitative data on the basis of leading and trailing digits. • Draw a vertical line to separate the stem (Multiples of 10) from the leaf (Multiples of 1). • Selection of stems.(Thousands, Hundreds, One Tenths….)
  • 19. SHAPES OF FREQUENCY DISTRIBUTION
  • 23. Measures of Central Tendency Mode: The value of most happening score. Bimodal: Distribution with more than two obvious peaks.
  • 24. MEDIAN Middle value when observations are ordered from least to most.
  • 26. MEAN
  • 28. MEASURES OF VARIABILITY • Measures the amount the values are dispersed or scattered in the distribution • Range, IQR, Variance and Standard Deviation
  • 30. RANGE & VARIANCE • Difference between Maximum and Minimum • The size of the range vary with the size of the group • Deviation of the mean: Distance between the value and the mean. • Deviation above means have +ve values while deviation below the means have negative values. • The sum of all these deviations nullify each other. Variance: Sum of all squared deviations.
  • 31. NOT VARIANCE BUT STANDARD DEVIATION • Variance gives squared dimension which is not interpretable. • Standard Deviation: Square root of the sum of all squared deviations • It is the rough measure of the average amount by which values deviate on either side of the mean.
  • 32. STANDARD DEVIATION • For most frequency distributions , majority(68%) of the values are within one standard deviation on either side of the mean. • For most frequency distributions , minority(5%) of the values are within one standard deviation on either side of the mean SOLVE: You grow 20 crystals from a solution and measure the length of each crystal in millimeters. 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4 Calculate the range, mean, sample standard deviation of the length of the crystals.
  • 38. NORMAL DISTRIBUTIONS AND STANDARD Z SCORES
  • 39. Properties of the Normal Curve Obtained from a mathematical equation, the normal curve is a theoretical curve defined for a continuous variable and noted for its symmetrical bell-shaped form. ■ The normal curve is symmetrical; lower half is the mirror image of upper half. ■ Being bell shaped, the normal curve peaks above a point midway along the horizontal spread and then tapers off gradually in either direction from the peak (without actually touching the horizontal axis, the tails of a normal curve extend infinitely far). ■ The values of the mean, median and mode, located at a point midway along the horizontal spread, are the same.
  • 41. Z - SCORE A z score is a unit-free, standardized score that, regardless of the original units of measurement, indicates how many standard deviations a score is above or below the mean of its distribution A z score consists of two parts: 1. a positive or negative sign indicating whether it’s above or below the mean; and 2. a number indicating the size of its deviation from the mean in standard deviation units. (a) Margaret’s IQ of 135, given a mean of 100 and a standard deviation of 15 (135- 100/15=2.33) (b) a score of 470 on the SAT math test, given a mean of 500 and a standard deviation of 100 (470-500/100)=-0.3) (c) a daily production of 2100 loaves of bread by a bakery, given a mean of 2180 and a standard deviation of 50. (2100-2180/50=-1.60)
  • 42. STANDARD NORMAL CURVE If the original distribution approximates a normal curve, then the shift to standard or z scores will always produce a new distribution that approximates the standard normal curve. Standard Normal Curve: The tabled normal curve for z scores, with a mean of 0 and a standard deviation of 1. Although there is an infinite number of different normal curves, each with its own mean and standard deviation, there is only one standard normal curve, with a mean of 0 and a standard deviation of 1.
  • 45. FINDING PROPOTIONS Using Table A in Appendix C, find the proportion of the total area identified with the following statements: (a)above a z score of 1.80 (b) between the mean and a z score of –0.43 0.0359 0.1664
  • 46. Assume that GRE scores approximate a normal curve with a mean of 500 and a standard deviation of 100. (a) Sketch a normal curve and shade in the target area described by each of the following statements: (i) less than 400 (ii) more than 650 (iii) less than 700
  • 47. EXERCISE Assume that SAT math scores approximate a normal curve with a mean of 500 and a standard deviation of 100. (a) Sketch a normal curve and shade in the target area(s) described by each of the following statements: (i) more than 570 (ii) less than 515 (iii) between 520 and 540
  • 48. FINDING SCORES Exam scores for a large psychology class approximate a normal curve with a mean of 230 and a standard deviation of 50. Furthermore, students are graded “on a curve,” with only the upper 20 percent being awarded grades of A. What is the lowest score on the exam that receives an A?
  • 49. EXERCISE Assume that the annual rainfall in the San Francisco area approximates a normal curve with a mean of 22 inches and a standard deviation of 4 inches. What are the rainfalls for the more atypical years, defined as the driest 2.5 percent of all years and the wettest 2.5 percent of all years?