SlideShare a Scribd company logo
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423 603
(An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune)
NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Computer Engineering
(NBA Accredited)
Prof. S.A.Shivarkar
Assistant Professor
Contact No.8275032712
Email- shivarkarsandipcomp@sanjivani.org.in
Subject- Foundation of Data Science (PECO311B)
Unit –III: Measures of Data: Scale, Tendency, Variation Shape
DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2
Measures of Data: Scale, Tendency, Variation Shape
 Data measurements scale: nominal scale, ordinal scale, interval
scale
 Ratio scale. Measures of central tendency: mean, median, mode.
 Percentile, decile, quartile. Measures of variation: range, inter-
quartile
 Distance, variance and standard deviation
 Measures of shape: skewness and kurtosis
“ Data have an important story to tell.
They rely on you to give them a voice ”
Before you give them a voice, you have to understand the different
data types.
 There are different ways to categorize data based on the way it has been
collected or its structure.
Data Types and Data Measurement Scale
There are different ways to categorize data based on the way it has
been collected or its structure,
Based on Structure.Another important way to classify data is based on their
structure. It can be categorized into two types.
1) Structured Data: All the data points which have a specific structure and
can be arranged in tabular form (also known as a matrix)
with rows and columns are called structured data.
Ex: Salary of employees arranged with employee id.
Data Types and Data Measurement Scale
2) Unstructured Data: All the data points which are not arranged into any
tabular format are unstructured data.
Ex: Emails, videos, clickstream data, etc.
Data Types and Data Measurement Scale
 Based on Data Collection: Data can be categorized into three types based on
how data has been collected.
1) Cross-Sectional Data
2) Time-Series Data
3) Panel Data
 Cross-Sectional Data: Any data points/values captured on multiple variables
over one specific time period is termed as cross-
sectional data.
Ex: attributes of the employee such as age, salary,
level, team for the year 2019.
Data Types and Data Measurement Scale
Time-Series Data: Any data points/values captured on a single variable over
multiple periods is called time-series data.
Ex: sales of smartphones on a monthly, quarterly,
yearly basis.
Panel Data: A combination of both the cross-sectional and time-series data
is known as Panel data.
Ex: GDP of the various country over different periods
Data Types and Data Measurement Scale
Data Types and Data Measurement Scale
Data Measurement Scale:
Scales of measurement in research and statistics are the different ways
in which variables are defined and grouped into different categories,
 It describes the nature of the values assigned to the variables in a data set,
Measurement is the process of recording observations collected as part
of the research.
 Scaling is the assignment of objects to numbers or semantics.
Data Types and Data Measurement Scale
Data Measurement Scale:
 A measurement scale is used to qualify or quantify data variables,
 The properties evaluated are identity, magnitude, equal intervals and a
minimum value of zero
 Identity: Identity refers to each value having a unique meaning.
 Magnitude: Magnitude means that the values have an ordered relationship
to one another, so there is a specific order to the variables.
Data Types and Data Measurement Scale
Data Measurement Scale:
 Equal intervals: Equal intervals mean that data points along the scale are
equal, so the difference between data points one and two
will be the same as the difference between data points five
and six.
 A minimum value of zero:A minimum value of zero means the scale has a
true zero point.
Degrees, for example, can fall below zero and still have
meaning. But if you weigh nothing, you don’t exist.
Data Types and Data Measurement Scale
Data Measurement Scale:
Data can be divided into four parts based on a measurement scale-
D
E
P
A
R
M
E
N
T
O
F
I
N
F
O
R
M
A
T
I
O
N
T
E
C
H
N
O
L
O
G
Y
,
S
C
O
E
,
K
O
P
A
R
G
A
Data Types and Data Measurement Scale
Data Measurement Scale:
1) Nominal Scale:
 The nominal scale is a scale of measurement that is used for identification
purposes.
It is also known as categorical scale, it assigns numbers to attributes for
easy identity.
These numbers are however not qualitative in nature and only act as labels.
Data Types and Data Measurement Scale
Data Measurement Scale:
1) Nominal Scale:
The only statistical analysis that can be performed on a nominal scale is
the percentage or frequency count.
 It can be analyzed graphically using a bar chart and pie chart.
 Basic mathematical operations are meaningless on Nominal scale
(e.g. subtraction: married -unmarried or ratio: married/unmarried)
Data Types and Data Measurement Scale
Data Measurement Scale:
1) Nominal Scale:
In the example below, the measurement of the popularity of a political party
is measured on a nominal scale.
Which political party are you affiliated with?
Independent
Republican
Democrat
Labeling Independent as “1”, Republican as “2” and Democrat as “3” does
not in any way mean any of the attributes are better than the other. They are
just used as an identity for easy data analysis.
Data Types and Data Measurement Scale
2) Ordinal Scale:
Ordinal Scale involves the ranking or ordering of the attributes depending
on the variable being scaled.
The items in this scale are classified according to the degree of occurrence
of the variable in question.
The attributes on an ordinal scale are usually arranged in ascending or
descending order. It measures the degree of occurrence of the variable.
Data Types and Data Measurement Scale
2) Ordinal Scale:
Ordinal scale can be used in market research, advertising, and
customer satisfaction surveys.
 It uses qualifiers like very, highly, more, less, etc. to depict a degree.
We can perform statistical analysis like median and mode using the ordinal
scale, but not mean.
Data Types and Data Measurement Scale
2) Ordinal Scale:
 For example: A software company may need to ask its users:
 How would you rate our app?
Excellent
Very Good
Good
Bad
Poor
 The attributes in this example are listed in descending order.
Data Types and Data Measurement Scale
Data Measurement Scale:
3) Interval Scale:
The interval scale of data measurement is a scale in which the levels
are ordered and each numerically equal distances on the scale have
equal interval difference.
If it is an extension of the ordinal scale, with the main difference being
the existence of equal intervals.
Data Types and Data Measurement Scale
Data Measurement Scale:
3) Interval Scale:
With an interval scale, you not only know that a given attributeA is bigger
than another attribute B, but also the extent at which A is larger than B.
Also, unlike ordinal and nominal scale, arithmetic operations can
be performed on an interval scale.
It is used in various sectors like in education, medicine, engineering,
etc. Some of these uses include calculating a student’s CGPA, measuring
a patient’s temperature, etc.
Data Measurement Scale
3) Interval Scale:
 Example: A common example is measuring temperature on the Fahrenheit
scale. It can be used in calculating mean, median, mode, range, and
standard deviation.
 Example : Temperature (in centigrade), IQ level.
In such variables, addition or subtraction can be performed but
division doesn’t make sense. As you can say Mumbai has 10
centigrade more than Bangalore, but you saying that Mumbai is
twice hotter than Bangalore is not right, thus ratios don’t make
sense here.
Data Types and Data Measurement Scale
Data Measurement Scale:
4) Ratio Scale:
Ratio Scale is the peak level of data measurement. It is an extension of the
interval scale, therefore satisfying the four characteristics of the
measurement scale; identity, magnitude, equal interval, and the absolute zero
property.
 This level of data measurement allows the researcher to compare both the
differences and the relative magnitude of numbers. Some examples of ratio
scales include length, weight, time, etc.
All the data points which are quantitative in nature falls in this category
Data Types and Data Measurement Scale
Data Measurement Scale:
4) Ratio Scale:
With respect to market research, the common ratio scale examples are
price, number of customers, competitors, etc. It is extensively used in
marketing, advertising, and business sales.
The ratio scale of data measurement is compatible with all statistical
analysis methods like the measures of central tendency (mean, median,
mode, etc.) and measures of dispersion (range, standard deviation, etc.).
Data Types and Data Measurement Scale
Data Measurement Scale:
4) Ratio Scale:
 For example: A survey that collects the weights of the respondents.
Which of the following category do you fall in? Weight
more than 100 kgs
81 – 100 kgs
61 – 80 kgs
40 – 60 kgs
Less than 40 kgs
Measures of central tendency
Measures of central tendency:
 A measure of central tendency (also referred to as measures of centre or
central location) is a summary measure that attempts to describe a whole set
of data with a single value that represents the middle or centre of its
distribution.
Therefore, a measure of central tendency is a way to summarize a large set of
numbers using one single score,
 There are three main measures of central tendency:
1. Mean 2.Median 3. Mode
Measures of central tendency
Measures of central tendency:
 Mean: The mean is the sum of the value of each observation in a dataset
divided by the number of observations. This is also known as the
arithmetic average.
 Advantage of the mean:
The mean can be used for both continuous and discrete numeric data.
Measures of central tendency
Measures of central tendency:
 Limitations of the mean:
 The mean cannot be calculated for categorical data, as the values cannot be
summed,
 As the mean includes every value in the distribution the mean is influenced
by outliers,
 Example: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
The mean is (623/11)= 56.6
Measures of central tendency
Measures of central tendency:
 Median: The median is the middle value in distribution when the values are
arranged in ascending or descending order.
Measures of central tendency
Measures of central tendency:
 Advantage of the median:
The median is less affected by outliers and skewed data than the mean and is
usually the preferred measure of central tendency when the distribution is not
symmetrical.
 Limitation of the median:
The median cannot be identified for categorical nominal data, as it cannot be
logically ordered.
Measures of central tendency
Measures of central tendency
Measures of central tendency:
 Mode: The mode is the most commonly occurring value in a distribution.
 Consider this dataset showing the retirement age of 11 people, in whole years:
54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60
This list shows a simple frequency distribution of the retirement age data.
54 – 3, 55 – 1, 56 – 1, 57 – 2, 58 – 2, 60 – 2
The most commonly occurring value is 54, therefore the mode of this
distribution is 54 years.
Measures of central tendency
Measures of central tendency:
 Advantage of the mode:
The mode has an advantage over the median and the mean as it can be found
for both numerical and categorical (non-numerical) data.
 Limitations of Mode:
It is also possible for there to be more than one mode for the same distribution
of data, (bi-modal, or multi-modal). The presence of more than one mode can
limit the ability of the mode in describing the centre or typical value of the
distribution because a single value to describe the centre cannot be identified.
Measures of central tendency
Measures of central tendency:
 Limitations of Mode:
 In some cases, particularly where the data are continuous, the distribution may
have no mode at all (i.e. if all values are different).
 In cases such as these, it may be better to consider using the median or mean
or group the data into appropriate intervals and find the modal class.
Measures of central tendency
Percentile, decile, quartile:
From the definition of median that it’s the middle point in the axis frequency
distribution curve, and it is divided the area under the curve for two areas have
the same area in the left, and in the right.
From this may be divided the area under the curve for four equally area
and this called quartiles,
 In the same procedure divided the area for ten equally pieces of area is called
deciles,
 Finally where divided the area for hundred equally pieces of area is called
percentiles,
Measures of central tendency
Percentile, decile, quartile:
 Quartile Example:
Measures of central tendency
Percentile, decile, quartile:
 Decile Example:
Measures of central tendency
Percentile, decile, quartile:
Measures of central tendency
Percentile, decile, quartile:
Measures of variation
Variation or Dispersion:
The degree to which numerical data tend to spread about an average value
is called the dispersion, or variation, of the data.
Various measures of this dispersion (or variation) are available, the most
common-
1. Range,
2. Interquartile Distance(IQD),
3. Variance
4. Standard deviation.
Measures of variation
Range:
The range of a set of numbers is the difference between the largest and smallest
numbers in the set.
 Example:
The range of the set 2, 3, 3, 5, 5, 5, 8, 10, 12 is 12 - 2 = 10.
Sometimes the range is given by simply quoting the smallest and largest
numbers;
In the above set, for instance, the range could be indicated as
2 to 12, or 2–12.
Measures of variation
Interquartile Distance (IQD):
 The midpoint of data distribution, or the middle of your four quartiles, is
referred to as the interquartile range (IQR), which is in the middle of the lower
and upper quartiles.
 The IQD is a measurement of how evenly the data is distributed around the
average.
 The formula for Interquartile Range is given below:
Interquartile Distance(IQD)= 𝑄3 − 𝑄1
 The IQD is a useful measure for identifying outliers in data,
Measures of variation
Interquartile Distance (IQD):
Measures of variation
Variance:
 Variance is a measure of variability in the data from mean value,
 It compares every piece of value to the mean, which is why variance differs
from the other measures of variation.
 Variance also displays the spread of the data set,
 variance to compare pieces of data to one another to see how they relate,
Measures of variation
Variance:
Measures of variation
Standard Deviation:
 Standard deviation is a squared root of the variance to get original values.
 Low standard deviation indicates data points close to mean.
 Standard deviation uses the square root of the variance to get original values.
Standard deviation calculates the extent to which the values differ from the
average.
Standard Deviation, the most widely used measure of dispersion, is based
on all values
Measures of variation
Standard Deviation:
Measures of variation
Standard Deviation:
Measures of variation
Standard Deviation:
The procedure to calculate the standard deviation is given below:
Step 1: Compute the mean for the given data set.
Step 2: Subtract the mean from each observation and calculate the square
in each instance.
Step 3: Find the mean of those squared deviations.
Step 4: Finally, take the square root obtained mean to get the
standard deviation.
Measures of variation
Standard Deviation:
Measures of variation
Standard Deviation:
Measures of Shape
Skewness:
Skewness is a statistical measure that assesses the asymmetry of a
probability distribution. It quantifies the extent to which the data is skewed or
shifted to one side.
Positive skewness indicates a longer tail on the right side of the
distribution, while negative skewness indicates a longer tail on the left side.
Skewness helps in understanding the shape and outliers in a dataset
If the values of a specific independent variable (feature) are skewed,
depending on the model, skewness may violate model assumptions or may
reduce the interpretation of feature importance.
Measures of Shape
Skewness:
The symmetrical distribution has zero skewness as all measures of a
central tendency lies in the middle.
Measures of Shape
Types of Skewness
Positive Skewed or Right-Skewed (Positive Skewness)
In statistics, a positively skewed or right-skewed distribution has a long right
tail
Measures of Shape
Types of Skewness
Negative Skewed or Left-Skewed (Negative Skewness)
A negatively skewed or left-skewed distribution has a long left tail;
Measures of Shape
Kurtosis:
The excess kurtosis is used in statistics and probability theory to compare
the kurtosis coefficient with that normal distribution.
Excess kurtosis can be positive (Leptokurtic distribution),
negative (Platykurtic distribution), or near zero (Mesokurtic
distribution),
Leptokurtic or heavy-tailed distribution (kurtosis more than
normal distribution).
 Mesokurtic (kurtosis same as the normal distribution).
 Platykurtic or short-tailed distribution (kurtosis less than normal distribution)
Measures of Shape
Kurtosis:

More Related Content

PDF
2. Numerical Descriptive Measures[1].pdf
PPTX
Types of Data, Key Concept
PPTX
Data sources and data Types of stat.pptx
PPT
Statistics for business and economics Measurement scales.ppt
PPT
Measurement scales on the assessment and intterpreatations of test scores in ...
PDF
Mba724 s2 w2 spss intro & daya types
PDF
Mba724 s2 w2 spss intro & daya types
PPTX
Scale presentations in research methodology
2. Numerical Descriptive Measures[1].pdf
Types of Data, Key Concept
Data sources and data Types of stat.pptx
Statistics for business and economics Measurement scales.ppt
Measurement scales on the assessment and intterpreatations of test scores in ...
Mba724 s2 w2 spss intro & daya types
Mba724 s2 w2 spss intro & daya types
Scale presentations in research methodology

Similar to MEASURES OF DATA: SCALE, TENDENCY, VARIATION SHAPE (20)

PPTX
Statistics 000000000000000000000000.pptx
PPTX
Different types of data in medicine.pptx
PPTX
Statistics (All About Data)
PPTX
Basic statistics
PPT
Data analysis
PPTX
Data Analysis.pptx
PPTX
business Statistics project assingment
PPTX
Four data types Data Scientist should know
PPTX
Unit #1.Introduction to Biostatistics.pptx
PPTX
Introduction to statistics
PPTX
Unit 4 RM.pptx for research methodology for my project work on disaster manag...
PPTX
Data presentation. Faculty will demonstrate use of MS excel in preparing var...
PPT
Final Lecture - 1.ppt
PDF
PDF
Day1, session i- spss
PPTX
Data measurement techniques
PPTX
edu.Chapter-3.1-Statistic-Refresher-1.pptx
PPTX
Introduction to Stats, basic of statistics, z-score, (1).pptx
PPTX
543957106-Introduction-Basic-Concepts-in-Statistics-PPT - Copy.pptx
PPTX
Statistics and Business Research Methods
Statistics 000000000000000000000000.pptx
Different types of data in medicine.pptx
Statistics (All About Data)
Basic statistics
Data analysis
Data Analysis.pptx
business Statistics project assingment
Four data types Data Scientist should know
Unit #1.Introduction to Biostatistics.pptx
Introduction to statistics
Unit 4 RM.pptx for research methodology for my project work on disaster manag...
Data presentation. Faculty will demonstrate use of MS excel in preparing var...
Final Lecture - 1.ppt
Day1, session i- spss
Data measurement techniques
edu.Chapter-3.1-Statistic-Refresher-1.pptx
Introduction to Stats, basic of statistics, z-score, (1).pptx
543957106-Introduction-Basic-Concepts-in-Statistics-PPT - Copy.pptx
Statistics and Business Research Methods
Ad

More from ShivarkarSandip (20)

PDF
STATISTICS AND PROBABILITY FOR DATA SCIENCE,
PDF
Introduction to Data Science: data science process
PDF
Prerquisite for Data Sciecne, KDD, Attribute Type
PDF
NBaysian classifier, Naive Bayes classifier
PDF
Supervised Learning Ensemble Techniques Machine Learning
PDF
Microcontroller 8051- Architecture Memory Organization
PDF
Data Preprocessing -Data Quality Noisy Data
PDF
Supervised Learning Decision Trees Review of Entropy
PDF
Supervised Learning Decision Trees Machine Learning
PDF
Cluster Analysis: Measuring Similarity & Dissimilarity
PDF
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
PDF
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
PDF
Data Warehouse and Architecture, OLAP Operation
PDF
Data Preparation and Preprocessing , Data Cleaning
PDF
Introduction to Data Mining, KDD Process, OLTP and OLAP
PDF
Introduction to Data Mining KDD Process OLAP
PDF
Issues in data mining Patterns Online Analytical Processing
PDF
Introduction to data mining which covers the basics
PDF
Introduction to Data Communication.pdf
PDF
Classification of Signal.pdf
STATISTICS AND PROBABILITY FOR DATA SCIENCE,
Introduction to Data Science: data science process
Prerquisite for Data Sciecne, KDD, Attribute Type
NBaysian classifier, Naive Bayes classifier
Supervised Learning Ensemble Techniques Machine Learning
Microcontroller 8051- Architecture Memory Organization
Data Preprocessing -Data Quality Noisy Data
Supervised Learning Decision Trees Review of Entropy
Supervised Learning Decision Trees Machine Learning
Cluster Analysis: Measuring Similarity & Dissimilarity
Classification, Attribute Selection, Classifiers- Decision Tree, ID3,C4.5,Nav...
Frequent Pattern Analysis, Apriori and FP Growth Algorithm
Data Warehouse and Architecture, OLAP Operation
Data Preparation and Preprocessing , Data Cleaning
Introduction to Data Mining, KDD Process, OLTP and OLAP
Introduction to Data Mining KDD Process OLAP
Issues in data mining Patterns Online Analytical Processing
Introduction to data mining which covers the basics
Introduction to Data Communication.pdf
Classification of Signal.pdf
Ad

Recently uploaded (20)

PDF
Well-logging-methods_new................
PDF
composite construction of structures.pdf
PDF
Digital Logic Computer Design lecture notes
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
Construction Project Organization Group 2.pptx
DOCX
573137875-Attendance-Management-System-original
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
OOP with Java - Java Introduction (Basics)
PPTX
additive manufacturing of ss316l using mig welding
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
PPTX
Geodesy 1.pptx...............................................
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPT
Mechanical Engineering MATERIALS Selection
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPT
Project quality management in manufacturing
PPTX
Foundation to blockchain - A guide to Blockchain Tech
Well-logging-methods_new................
composite construction of structures.pdf
Digital Logic Computer Design lecture notes
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
Construction Project Organization Group 2.pptx
573137875-Attendance-Management-System-original
Embodied AI: Ushering in the Next Era of Intelligent Systems
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
R24 SURVEYING LAB MANUAL for civil enggi
OOP with Java - Java Introduction (Basics)
additive manufacturing of ss316l using mig welding
bas. eng. economics group 4 presentation 1.pptx
CARTOGRAPHY AND GEOINFORMATION VISUALIZATION chapter1 NPTE (2).pptx
Geodesy 1.pptx...............................................
Model Code of Practice - Construction Work - 21102022 .pdf
Mechanical Engineering MATERIALS Selection
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Project quality management in manufacturing
Foundation to blockchain - A guide to Blockchain Tech

MEASURES OF DATA: SCALE, TENDENCY, VARIATION SHAPE

  • 1. Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423 603 (An Autonomous Institute, Affiliated to Savitribai Phule Pune University, Pune) NACC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Computer Engineering (NBA Accredited) Prof. S.A.Shivarkar Assistant Professor Contact No.8275032712 Email- shivarkarsandipcomp@sanjivani.org.in Subject- Foundation of Data Science (PECO311B) Unit –III: Measures of Data: Scale, Tendency, Variation Shape
  • 2. DEPARTMENT OF COMPUTER ENGINEERING, Sanjivani COE, Kopargaon 2 Measures of Data: Scale, Tendency, Variation Shape  Data measurements scale: nominal scale, ordinal scale, interval scale  Ratio scale. Measures of central tendency: mean, median, mode.  Percentile, decile, quartile. Measures of variation: range, inter- quartile  Distance, variance and standard deviation  Measures of shape: skewness and kurtosis
  • 3. “ Data have an important story to tell. They rely on you to give them a voice ” Before you give them a voice, you have to understand the different data types.  There are different ways to categorize data based on the way it has been collected or its structure. Data Types and Data Measurement Scale
  • 4. There are different ways to categorize data based on the way it has been collected or its structure, Based on Structure.Another important way to classify data is based on their structure. It can be categorized into two types. 1) Structured Data: All the data points which have a specific structure and can be arranged in tabular form (also known as a matrix) with rows and columns are called structured data. Ex: Salary of employees arranged with employee id. Data Types and Data Measurement Scale
  • 5. 2) Unstructured Data: All the data points which are not arranged into any tabular format are unstructured data. Ex: Emails, videos, clickstream data, etc. Data Types and Data Measurement Scale
  • 6.  Based on Data Collection: Data can be categorized into three types based on how data has been collected. 1) Cross-Sectional Data 2) Time-Series Data 3) Panel Data  Cross-Sectional Data: Any data points/values captured on multiple variables over one specific time period is termed as cross- sectional data. Ex: attributes of the employee such as age, salary, level, team for the year 2019. Data Types and Data Measurement Scale
  • 7. Time-Series Data: Any data points/values captured on a single variable over multiple periods is called time-series data. Ex: sales of smartphones on a monthly, quarterly, yearly basis. Panel Data: A combination of both the cross-sectional and time-series data is known as Panel data. Ex: GDP of the various country over different periods Data Types and Data Measurement Scale
  • 8. Data Types and Data Measurement Scale Data Measurement Scale: Scales of measurement in research and statistics are the different ways in which variables are defined and grouped into different categories,  It describes the nature of the values assigned to the variables in a data set, Measurement is the process of recording observations collected as part of the research.  Scaling is the assignment of objects to numbers or semantics.
  • 9. Data Types and Data Measurement Scale Data Measurement Scale:  A measurement scale is used to qualify or quantify data variables,  The properties evaluated are identity, magnitude, equal intervals and a minimum value of zero  Identity: Identity refers to each value having a unique meaning.  Magnitude: Magnitude means that the values have an ordered relationship to one another, so there is a specific order to the variables.
  • 10. Data Types and Data Measurement Scale Data Measurement Scale:  Equal intervals: Equal intervals mean that data points along the scale are equal, so the difference between data points one and two will be the same as the difference between data points five and six.  A minimum value of zero:A minimum value of zero means the scale has a true zero point. Degrees, for example, can fall below zero and still have meaning. But if you weigh nothing, you don’t exist.
  • 11. Data Types and Data Measurement Scale Data Measurement Scale: Data can be divided into four parts based on a measurement scale- D E P A R M E N T O F I N F O R M A T I O N T E C H N O L O G Y , S C O E , K O P A R G A
  • 12. Data Types and Data Measurement Scale Data Measurement Scale: 1) Nominal Scale:  The nominal scale is a scale of measurement that is used for identification purposes. It is also known as categorical scale, it assigns numbers to attributes for easy identity. These numbers are however not qualitative in nature and only act as labels.
  • 13. Data Types and Data Measurement Scale Data Measurement Scale: 1) Nominal Scale: The only statistical analysis that can be performed on a nominal scale is the percentage or frequency count.  It can be analyzed graphically using a bar chart and pie chart.  Basic mathematical operations are meaningless on Nominal scale (e.g. subtraction: married -unmarried or ratio: married/unmarried)
  • 14. Data Types and Data Measurement Scale Data Measurement Scale: 1) Nominal Scale: In the example below, the measurement of the popularity of a political party is measured on a nominal scale. Which political party are you affiliated with? Independent Republican Democrat Labeling Independent as “1”, Republican as “2” and Democrat as “3” does not in any way mean any of the attributes are better than the other. They are just used as an identity for easy data analysis.
  • 15. Data Types and Data Measurement Scale 2) Ordinal Scale: Ordinal Scale involves the ranking or ordering of the attributes depending on the variable being scaled. The items in this scale are classified according to the degree of occurrence of the variable in question. The attributes on an ordinal scale are usually arranged in ascending or descending order. It measures the degree of occurrence of the variable.
  • 16. Data Types and Data Measurement Scale 2) Ordinal Scale: Ordinal scale can be used in market research, advertising, and customer satisfaction surveys.  It uses qualifiers like very, highly, more, less, etc. to depict a degree. We can perform statistical analysis like median and mode using the ordinal scale, but not mean.
  • 17. Data Types and Data Measurement Scale 2) Ordinal Scale:  For example: A software company may need to ask its users:  How would you rate our app? Excellent Very Good Good Bad Poor  The attributes in this example are listed in descending order.
  • 18. Data Types and Data Measurement Scale Data Measurement Scale: 3) Interval Scale: The interval scale of data measurement is a scale in which the levels are ordered and each numerically equal distances on the scale have equal interval difference. If it is an extension of the ordinal scale, with the main difference being the existence of equal intervals.
  • 19. Data Types and Data Measurement Scale Data Measurement Scale: 3) Interval Scale: With an interval scale, you not only know that a given attributeA is bigger than another attribute B, but also the extent at which A is larger than B. Also, unlike ordinal and nominal scale, arithmetic operations can be performed on an interval scale. It is used in various sectors like in education, medicine, engineering, etc. Some of these uses include calculating a student’s CGPA, measuring a patient’s temperature, etc.
  • 20. Data Measurement Scale 3) Interval Scale:  Example: A common example is measuring temperature on the Fahrenheit scale. It can be used in calculating mean, median, mode, range, and standard deviation.  Example : Temperature (in centigrade), IQ level. In such variables, addition or subtraction can be performed but division doesn’t make sense. As you can say Mumbai has 10 centigrade more than Bangalore, but you saying that Mumbai is twice hotter than Bangalore is not right, thus ratios don’t make sense here.
  • 21. Data Types and Data Measurement Scale Data Measurement Scale: 4) Ratio Scale: Ratio Scale is the peak level of data measurement. It is an extension of the interval scale, therefore satisfying the four characteristics of the measurement scale; identity, magnitude, equal interval, and the absolute zero property.  This level of data measurement allows the researcher to compare both the differences and the relative magnitude of numbers. Some examples of ratio scales include length, weight, time, etc. All the data points which are quantitative in nature falls in this category
  • 22. Data Types and Data Measurement Scale Data Measurement Scale: 4) Ratio Scale: With respect to market research, the common ratio scale examples are price, number of customers, competitors, etc. It is extensively used in marketing, advertising, and business sales. The ratio scale of data measurement is compatible with all statistical analysis methods like the measures of central tendency (mean, median, mode, etc.) and measures of dispersion (range, standard deviation, etc.).
  • 23. Data Types and Data Measurement Scale Data Measurement Scale: 4) Ratio Scale:  For example: A survey that collects the weights of the respondents. Which of the following category do you fall in? Weight more than 100 kgs 81 – 100 kgs 61 – 80 kgs 40 – 60 kgs Less than 40 kgs
  • 24. Measures of central tendency Measures of central tendency:  A measure of central tendency (also referred to as measures of centre or central location) is a summary measure that attempts to describe a whole set of data with a single value that represents the middle or centre of its distribution. Therefore, a measure of central tendency is a way to summarize a large set of numbers using one single score,  There are three main measures of central tendency: 1. Mean 2.Median 3. Mode
  • 25. Measures of central tendency Measures of central tendency:  Mean: The mean is the sum of the value of each observation in a dataset divided by the number of observations. This is also known as the arithmetic average.  Advantage of the mean: The mean can be used for both continuous and discrete numeric data.
  • 26. Measures of central tendency Measures of central tendency:  Limitations of the mean:  The mean cannot be calculated for categorical data, as the values cannot be summed,  As the mean includes every value in the distribution the mean is influenced by outliers,  Example: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 The mean is (623/11)= 56.6
  • 27. Measures of central tendency Measures of central tendency:  Median: The median is the middle value in distribution when the values are arranged in ascending or descending order.
  • 28. Measures of central tendency Measures of central tendency:  Advantage of the median: The median is less affected by outliers and skewed data than the mean and is usually the preferred measure of central tendency when the distribution is not symmetrical.  Limitation of the median: The median cannot be identified for categorical nominal data, as it cannot be logically ordered.
  • 30. Measures of central tendency Measures of central tendency:  Mode: The mode is the most commonly occurring value in a distribution.  Consider this dataset showing the retirement age of 11 people, in whole years: 54, 54, 54, 55, 56, 57, 57, 58, 58, 60, 60 This list shows a simple frequency distribution of the retirement age data. 54 – 3, 55 – 1, 56 – 1, 57 – 2, 58 – 2, 60 – 2 The most commonly occurring value is 54, therefore the mode of this distribution is 54 years.
  • 31. Measures of central tendency Measures of central tendency:  Advantage of the mode: The mode has an advantage over the median and the mean as it can be found for both numerical and categorical (non-numerical) data.  Limitations of Mode: It is also possible for there to be more than one mode for the same distribution of data, (bi-modal, or multi-modal). The presence of more than one mode can limit the ability of the mode in describing the centre or typical value of the distribution because a single value to describe the centre cannot be identified.
  • 32. Measures of central tendency Measures of central tendency:  Limitations of Mode:  In some cases, particularly where the data are continuous, the distribution may have no mode at all (i.e. if all values are different).  In cases such as these, it may be better to consider using the median or mean or group the data into appropriate intervals and find the modal class.
  • 33. Measures of central tendency Percentile, decile, quartile: From the definition of median that it’s the middle point in the axis frequency distribution curve, and it is divided the area under the curve for two areas have the same area in the left, and in the right. From this may be divided the area under the curve for four equally area and this called quartiles,  In the same procedure divided the area for ten equally pieces of area is called deciles,  Finally where divided the area for hundred equally pieces of area is called percentiles,
  • 34. Measures of central tendency Percentile, decile, quartile:  Quartile Example:
  • 35. Measures of central tendency Percentile, decile, quartile:  Decile Example:
  • 36. Measures of central tendency Percentile, decile, quartile:
  • 37. Measures of central tendency Percentile, decile, quartile:
  • 38. Measures of variation Variation or Dispersion: The degree to which numerical data tend to spread about an average value is called the dispersion, or variation, of the data. Various measures of this dispersion (or variation) are available, the most common- 1. Range, 2. Interquartile Distance(IQD), 3. Variance 4. Standard deviation.
  • 39. Measures of variation Range: The range of a set of numbers is the difference between the largest and smallest numbers in the set.  Example: The range of the set 2, 3, 3, 5, 5, 5, 8, 10, 12 is 12 - 2 = 10. Sometimes the range is given by simply quoting the smallest and largest numbers; In the above set, for instance, the range could be indicated as 2 to 12, or 2–12.
  • 40. Measures of variation Interquartile Distance (IQD):  The midpoint of data distribution, or the middle of your four quartiles, is referred to as the interquartile range (IQR), which is in the middle of the lower and upper quartiles.  The IQD is a measurement of how evenly the data is distributed around the average.  The formula for Interquartile Range is given below: Interquartile Distance(IQD)= 𝑄3 − 𝑄1  The IQD is a useful measure for identifying outliers in data,
  • 42. Measures of variation Variance:  Variance is a measure of variability in the data from mean value,  It compares every piece of value to the mean, which is why variance differs from the other measures of variation.  Variance also displays the spread of the data set,  variance to compare pieces of data to one another to see how they relate,
  • 44. Measures of variation Standard Deviation:  Standard deviation is a squared root of the variance to get original values.  Low standard deviation indicates data points close to mean.  Standard deviation uses the square root of the variance to get original values. Standard deviation calculates the extent to which the values differ from the average. Standard Deviation, the most widely used measure of dispersion, is based on all values
  • 47. Measures of variation Standard Deviation: The procedure to calculate the standard deviation is given below: Step 1: Compute the mean for the given data set. Step 2: Subtract the mean from each observation and calculate the square in each instance. Step 3: Find the mean of those squared deviations. Step 4: Finally, take the square root obtained mean to get the standard deviation.
  • 50. Measures of Shape Skewness: Skewness is a statistical measure that assesses the asymmetry of a probability distribution. It quantifies the extent to which the data is skewed or shifted to one side. Positive skewness indicates a longer tail on the right side of the distribution, while negative skewness indicates a longer tail on the left side. Skewness helps in understanding the shape and outliers in a dataset If the values of a specific independent variable (feature) are skewed, depending on the model, skewness may violate model assumptions or may reduce the interpretation of feature importance.
  • 51. Measures of Shape Skewness: The symmetrical distribution has zero skewness as all measures of a central tendency lies in the middle.
  • 52. Measures of Shape Types of Skewness Positive Skewed or Right-Skewed (Positive Skewness) In statistics, a positively skewed or right-skewed distribution has a long right tail
  • 53. Measures of Shape Types of Skewness Negative Skewed or Left-Skewed (Negative Skewness) A negatively skewed or left-skewed distribution has a long left tail;
  • 54. Measures of Shape Kurtosis: The excess kurtosis is used in statistics and probability theory to compare the kurtosis coefficient with that normal distribution. Excess kurtosis can be positive (Leptokurtic distribution), negative (Platykurtic distribution), or near zero (Mesokurtic distribution), Leptokurtic or heavy-tailed distribution (kurtosis more than normal distribution).  Mesokurtic (kurtosis same as the normal distribution).  Platykurtic or short-tailed distribution (kurtosis less than normal distribution)