SlideShare a Scribd company logo
2
Most read
13
Most read
20
Most read
Introduction to
Basic Statistical Concepts
Statistics is a branch of mathematics that deals with the collection,
organization, analysis, interpretation, and presentation of data. It is used in
various fields such as business, economics, sociology, and more.
Understanding statistical concepts is essential for making informed decisions
and drawing meaningful conclusions.
Descriptive Statistics
Mean, Median, Mode
Descriptive statistics involve methods used
to summarize and describe data. It includes
measures of central tendency such as
mean, median, and mode.
Variability Measures
Descriptive statistics also include measures
of variability, which provide insights into the
spread and dispersion of data.
Measures of Central Tendency
1 Mean
The mean is the average of a set of numbers and is calculated by summing all the
numbers and then dividing by the count of numbers.
2 Median
The median is the middle value when the numbers are arranged in ascending order. It
represents the central tendency of the data.
3 Mode
The mode is the value that appears most frequently in a set of data. It indicates the
most common observation.
Measures of Variability
1 Range
The range is the difference between
the highest and lowest values in a
dataset. It provides a simple measure
of variability.
2 Variance
Variance measures the average
degree to which each point in a
dataset differs from the mean. It
shows how much the data points are
spread out.
3 Standard Deviation
Standard deviation is a measure of the amount of variation or dispersion of a set of
values. It is the square root of the variance.
Inferential Statistics
Definition
Inferential statistics involves using data from a
sample to make predictions or inferences about
a population.
Applications
It is used to determine the probability of
something happening or how accurate a
prediction can be made from a sample.
Hypothesis Testing
Formulate Hypothesis
The first step involves
stating a clear hypothesis
that you want to test based
on existing knowledge or
observations.
Collect Data
After formulating the
hypothesis, data is collected
to test and analyze the
validity of the hypothesis.
Analyze Results
The results are statistically
analyzed to determine
whether to accept or reject
the hypothesis.
Confidence Intervals
1 Upper Bound
The upper bound of a confidence
interval represents the high end of
the interval and provides the
maximum potential value of the
parameter.
2
Lower Bound
The lower bound of a confidence
interval represents the low end of the
interval and provides the minimum
potential value of the parameter.
Types of Data
1 Nominal Data
Nominal data represents categories
without any order or sequence.
Examples include gender, colors, and
names.
2 Ordinal Data
Ordinal data represents categories
with a specific order or rank.
Examples include education levels
and survey ratings.
Sampling Methods
Simple Random Sampling
Every member of the population has an
equal chance of being selected.
Stratified Sampling
The population is divided into subgroups,
and samples are then randomly selected
from each subgroup.
Common Statistical Distributions
Normal Distribution
The bell-shaped curve
represents a symmetrical
distribution with most values
clustered around the mean.
Binomial Distribution
It represents the number of
successes in a fixed number
of independent trials with the
same probability of success in
each trial.
Poisson Distribution
It estimates the number of
events that can happen in a
fixed interval of time or space.
Introduction to
Descriptive Statistics
Using Python
Descriptive statistics is a branch of statistics that involves the collection,
analysis, interpretation, and presentation of data. Its primary focus is on
summarizing and describing the main features of a dataset, providing a
comprehensive and meaningful overview.
Purpose and Goals
Summarization:
Condensing large amounts of data
into key insights.
Exploration:
Identifying patterns, trends, and
outliers within the data.
Communication:
Presenting findings in a clear and understandable manner to
facilitate decision-making.
Types of Descriptive Statistics
Provide a central or typical value
in a dataset.
Common measures include:
• Mean: Average of all values.
• Median: Middle value in a
sorted dataset.
• Mode: Most frequently
occurring value.
Indicate the spread or variability of
the data.
Common measures include:
• Range: Difference between the
maximum and minimum values.
• Variance: Average of the
squared differences from the
mean.
• Standard Deviation: Square
root of the variance.
Describe the distribution or shape of
the data.
Common measures include:
• Skewness: Indicates the
asymmetry of the data distribution.
• Kurtosis: Measures the
"tailedness" or sharpness of the
data distribution.
Measures of
Central Tendency
Measures of
Dispersion
Measures of
Shape
Measures of Central Tendency
Mean
The mean is the average of a set of numbers, calculated by adding all the numbers together and then dividing by the count of numbers.
Consider the following dataset:
[10, 15, 20, 25, 30]
• (10 + 15 + 20 + 25 + 30) / 5 = 20
Measures of Central Tendency
Median
The median is the middle value of a data set when it is ordered from least to greatest. It represents the 50th
percentile of the data.
Consider the following dataset:
[10, 15, 20, 25, 30]
• The Middle value, which is also 20.
Measures of Central Tendency
Mode
The mode is the value that appears most frequently in a given data set. It's the most common observation in the
data.
Consider the following dataset:
[10, 15, 20, 25, 30]
• No Mode in this case.
Measures of Dispersion
Range
The range is the difference
between the largest and the
smallest values within a
dataset. It provides a simple
measure of variability.
Variance
Variance measures the average
degree to which each point in a
dataset differs from the mean. It
indicates the spread of the
data.
Standard Deviation
The standard deviation is a
measure of the amount of
variation or dispersion of a set
of values. It is the square root
of the variance.
Measures of Dispersion
Example:
Indicate the spread or variability of the data.
Consider two datasets:
• Dataset A: [5,5,5,5,5]
• Dataset B: [0,10,0,10,0]
• Both datasets have the same mean (5), but Dataset B has higher dispersion.
Measures of Shape
Example:
Describe the distribution or shape of the data.
Consider two datasets:
• Dataset C: [10,15,20,25,30]
• Dataset D: [10,10,20,30,30]
• Both datasets have the same mean and median, but Dataset C is symmetric, while Dataset D is
skewed.
Interquartile Range (IQR)
1 Definition
The interquartile range (IQR) is a measure of statistical dispersion, or how
scattered spread out, the values in a dataset are.
2 Calculation
It is calculated as the difference between the third quartile (Q3) and the first
quartile (Q1) in a dataset.
Interquartile Range (IQR)
IQR = Q3 – Q1
• Interquartile range is the amount of spread in the middle 50% of a dataset.
• In other words, it is the distance between the first quartile (Q1) and the third quartile (Q3).
How to Find IQR?
Here's how to find the IQR:
Step 1: Put the data in order from least to greatest.
Step 2: Find the median. If the number of data points is odd, the median is the middle data point. If the number of data points is
even, the median is the average of the middle two data points.
Step 3: Find the first quartile (Q1). The first quartile is the median of the data points to the left of the median in the ordered list.
Step 4: Find the third quartile (Q3). The third quartile is the median of the data points to the right of the median in the ordered
list.
Step 5: Calculate IQR by subtracting Q3 – Q1.
Find the IQR of these scores:
1,3,3,3,4,4,4,6,6
Step 1: The data is already in order.
Step 2: Find the median. There are 9 scores, so the median is the middle score.
The median is 4.
Step 3: Find Q1, which is the median of the data to the left of the median.
There is an even number of data points to the left of the median, so we need the average of
the middle two data points.
1,3,3,3
Q1 = (3+3)/2 = 3
The first Quartile (Q1) is 3.
Step 4: Find Q3, which is the median of the data to the right of the median.
There is an even number of data points to the right of the median, so we need the average of
the middle two data points.
4,4,6,6
Q3 = (4+6)/2 = 5
The Third Quartile (Q3) is 5.
Step 5: Calculate the IQR.
IQR = Q3 - Q1
= 5 – 3
= 2
The IQR is 2 points.

More Related Content

PDF
Introduction to Statistical Machine Learning
PPTX
Feedforward neural network
PPT
Cluster analysis
PPTX
Perceptron & Neural Networks
PDF
Bayesian inference
PDF
Cluster analysis
PPSX
Perceptron (neural network)
PDF
Linear regression
Introduction to Statistical Machine Learning
Feedforward neural network
Cluster analysis
Perceptron & Neural Networks
Bayesian inference
Cluster analysis
Perceptron (neural network)
Linear regression

What's hot (20)

PPT
K means Clustering Algorithm
PDF
Naive Bayes
PPSX
ADABoost classifier
PPTX
K means clustering
PPTX
Image feature extraction
PPTX
Introduction Of Software Engineering.pptx
PPTX
Naive bayes
PPTX
Cluster Analysis
PDF
Bias and variance trade off
PPTX
K-means Clustering
PDF
Dimensionality Reduction
PDF
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
PPTX
Machine Learning: Bias and Variance Trade-off
PPT
K mean-clustering
PPTX
Pca(principal components analysis)
PPTX
Minimax
PPT
Pre-Processing and Data Preparation
PPTX
Fuzzy Logic
PDF
Scaling and Normalization
K means Clustering Algorithm
Naive Bayes
ADABoost classifier
K means clustering
Image feature extraction
Introduction Of Software Engineering.pptx
Naive bayes
Cluster Analysis
Bias and variance trade off
K-means Clustering
Dimensionality Reduction
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Machine Learning: Bias and Variance Trade-off
K mean-clustering
Pca(principal components analysis)
Minimax
Pre-Processing and Data Preparation
Fuzzy Logic
Scaling and Normalization
Ad

Similar to Basic Statistical Concepts in Machine Learning.pptx (20)

PPTX
RM presentation by Uzma Fazal.pptx research methodology
PPTX
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
PPTX
050325Online SPSS.pptx spss social science
PPTX
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
PPTX
Data Collection and Analysis Methods.pptx
PPTX
CABT Math 8 measures of central tendency and dispersion
DOCX
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
PPTX
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
PPTX
GROUP_3_DATA_ANALYSIS presentations.pptx
PDF
3. measures of central tendency
PPTX
Basic Statistics & Data Analysis
PPTX
Understanding-Basic-Statistical-Concepts (1).pptx.pptx
PDF
Chapter 4 MMW.pdf
PPTX
STATISTICS.pptx
PPT
MesurMean, median, mode: typical values.
PPTX
Statistics for machine learning shifa noorulain
PDF
Statistics and permeability engineering reports
PPTX
UNIT 3-1.pptx of biostatistics nursing 6th sem
PPTX
Biostatistics mean median mode unit 1.pptx
PPT
Medical Statistics.ppt
RM presentation by Uzma Fazal.pptx research methodology
Descriptive Statistics: Measures of Central Tendency - Measures of Dispersion...
050325Online SPSS.pptx spss social science
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
Data Collection and Analysis Methods.pptx
CABT Math 8 measures of central tendency and dispersion
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
STATISTICAL PROCEDURES (Discriptive Statistics).pptx
GROUP_3_DATA_ANALYSIS presentations.pptx
3. measures of central tendency
Basic Statistics & Data Analysis
Understanding-Basic-Statistical-Concepts (1).pptx.pptx
Chapter 4 MMW.pdf
STATISTICS.pptx
MesurMean, median, mode: typical values.
Statistics for machine learning shifa noorulain
Statistics and permeability engineering reports
UNIT 3-1.pptx of biostatistics nursing 6th sem
Biostatistics mean median mode unit 1.pptx
Medical Statistics.ppt
Ad

Recently uploaded (20)

PPT
Quality review (1)_presentation of this 21
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Mega Projects Data Mega Projects Data
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
Lecture1 pattern recognition............
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
1_Introduction to advance data techniques.pptx
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Mega Projects Data Mega Projects Data
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
oil_refinery_comprehensive_20250804084928 (1).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Acceptance and paychological effects of mandatory extra coach I classes.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
.pdf is not working space design for the following data for the following dat...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Lecture1 pattern recognition............
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Basic Statistical Concepts in Machine Learning.pptx

  • 1. Introduction to Basic Statistical Concepts Statistics is a branch of mathematics that deals with the collection, organization, analysis, interpretation, and presentation of data. It is used in various fields such as business, economics, sociology, and more. Understanding statistical concepts is essential for making informed decisions and drawing meaningful conclusions.
  • 2. Descriptive Statistics Mean, Median, Mode Descriptive statistics involve methods used to summarize and describe data. It includes measures of central tendency such as mean, median, and mode. Variability Measures Descriptive statistics also include measures of variability, which provide insights into the spread and dispersion of data.
  • 3. Measures of Central Tendency 1 Mean The mean is the average of a set of numbers and is calculated by summing all the numbers and then dividing by the count of numbers. 2 Median The median is the middle value when the numbers are arranged in ascending order. It represents the central tendency of the data. 3 Mode The mode is the value that appears most frequently in a set of data. It indicates the most common observation.
  • 4. Measures of Variability 1 Range The range is the difference between the highest and lowest values in a dataset. It provides a simple measure of variability. 2 Variance Variance measures the average degree to which each point in a dataset differs from the mean. It shows how much the data points are spread out. 3 Standard Deviation Standard deviation is a measure of the amount of variation or dispersion of a set of values. It is the square root of the variance.
  • 5. Inferential Statistics Definition Inferential statistics involves using data from a sample to make predictions or inferences about a population. Applications It is used to determine the probability of something happening or how accurate a prediction can be made from a sample.
  • 6. Hypothesis Testing Formulate Hypothesis The first step involves stating a clear hypothesis that you want to test based on existing knowledge or observations. Collect Data After formulating the hypothesis, data is collected to test and analyze the validity of the hypothesis. Analyze Results The results are statistically analyzed to determine whether to accept or reject the hypothesis.
  • 7. Confidence Intervals 1 Upper Bound The upper bound of a confidence interval represents the high end of the interval and provides the maximum potential value of the parameter. 2 Lower Bound The lower bound of a confidence interval represents the low end of the interval and provides the minimum potential value of the parameter.
  • 8. Types of Data 1 Nominal Data Nominal data represents categories without any order or sequence. Examples include gender, colors, and names. 2 Ordinal Data Ordinal data represents categories with a specific order or rank. Examples include education levels and survey ratings.
  • 9. Sampling Methods Simple Random Sampling Every member of the population has an equal chance of being selected. Stratified Sampling The population is divided into subgroups, and samples are then randomly selected from each subgroup.
  • 10. Common Statistical Distributions Normal Distribution The bell-shaped curve represents a symmetrical distribution with most values clustered around the mean. Binomial Distribution It represents the number of successes in a fixed number of independent trials with the same probability of success in each trial. Poisson Distribution It estimates the number of events that can happen in a fixed interval of time or space.
  • 11. Introduction to Descriptive Statistics Using Python Descriptive statistics is a branch of statistics that involves the collection, analysis, interpretation, and presentation of data. Its primary focus is on summarizing and describing the main features of a dataset, providing a comprehensive and meaningful overview.
  • 12. Purpose and Goals Summarization: Condensing large amounts of data into key insights. Exploration: Identifying patterns, trends, and outliers within the data. Communication: Presenting findings in a clear and understandable manner to facilitate decision-making.
  • 13. Types of Descriptive Statistics Provide a central or typical value in a dataset. Common measures include: • Mean: Average of all values. • Median: Middle value in a sorted dataset. • Mode: Most frequently occurring value. Indicate the spread or variability of the data. Common measures include: • Range: Difference between the maximum and minimum values. • Variance: Average of the squared differences from the mean. • Standard Deviation: Square root of the variance. Describe the distribution or shape of the data. Common measures include: • Skewness: Indicates the asymmetry of the data distribution. • Kurtosis: Measures the "tailedness" or sharpness of the data distribution. Measures of Central Tendency Measures of Dispersion Measures of Shape
  • 14. Measures of Central Tendency Mean The mean is the average of a set of numbers, calculated by adding all the numbers together and then dividing by the count of numbers. Consider the following dataset: [10, 15, 20, 25, 30] • (10 + 15 + 20 + 25 + 30) / 5 = 20
  • 15. Measures of Central Tendency Median The median is the middle value of a data set when it is ordered from least to greatest. It represents the 50th percentile of the data. Consider the following dataset: [10, 15, 20, 25, 30] • The Middle value, which is also 20.
  • 16. Measures of Central Tendency Mode The mode is the value that appears most frequently in a given data set. It's the most common observation in the data. Consider the following dataset: [10, 15, 20, 25, 30] • No Mode in this case.
  • 17. Measures of Dispersion Range The range is the difference between the largest and the smallest values within a dataset. It provides a simple measure of variability. Variance Variance measures the average degree to which each point in a dataset differs from the mean. It indicates the spread of the data. Standard Deviation The standard deviation is a measure of the amount of variation or dispersion of a set of values. It is the square root of the variance.
  • 18. Measures of Dispersion Example: Indicate the spread or variability of the data. Consider two datasets: • Dataset A: [5,5,5,5,5] • Dataset B: [0,10,0,10,0] • Both datasets have the same mean (5), but Dataset B has higher dispersion.
  • 19. Measures of Shape Example: Describe the distribution or shape of the data. Consider two datasets: • Dataset C: [10,15,20,25,30] • Dataset D: [10,10,20,30,30] • Both datasets have the same mean and median, but Dataset C is symmetric, while Dataset D is skewed.
  • 20. Interquartile Range (IQR) 1 Definition The interquartile range (IQR) is a measure of statistical dispersion, or how scattered spread out, the values in a dataset are. 2 Calculation It is calculated as the difference between the third quartile (Q3) and the first quartile (Q1) in a dataset.
  • 21. Interquartile Range (IQR) IQR = Q3 – Q1 • Interquartile range is the amount of spread in the middle 50% of a dataset. • In other words, it is the distance between the first quartile (Q1) and the third quartile (Q3).
  • 22. How to Find IQR? Here's how to find the IQR: Step 1: Put the data in order from least to greatest. Step 2: Find the median. If the number of data points is odd, the median is the middle data point. If the number of data points is even, the median is the average of the middle two data points. Step 3: Find the first quartile (Q1). The first quartile is the median of the data points to the left of the median in the ordered list. Step 4: Find the third quartile (Q3). The third quartile is the median of the data points to the right of the median in the ordered list. Step 5: Calculate IQR by subtracting Q3 – Q1.
  • 23. Find the IQR of these scores: 1,3,3,3,4,4,4,6,6 Step 1: The data is already in order. Step 2: Find the median. There are 9 scores, so the median is the middle score. The median is 4. Step 3: Find Q1, which is the median of the data to the left of the median. There is an even number of data points to the left of the median, so we need the average of the middle two data points. 1,3,3,3 Q1 = (3+3)/2 = 3 The first Quartile (Q1) is 3.
  • 24. Step 4: Find Q3, which is the median of the data to the right of the median. There is an even number of data points to the right of the median, so we need the average of the middle two data points. 4,4,6,6 Q3 = (4+6)/2 = 5 The Third Quartile (Q3) is 5. Step 5: Calculate the IQR. IQR = Q3 - Q1 = 5 – 3 = 2 The IQR is 2 points.