SlideShare a Scribd company logo
Discrete Data Mapping : Problem
of HR-Analytics
Debdulal Dutta Roy, Ph.D. (Psy.)
Psychology Research Unit
INDIAN STATISTICAL INSTITUTE, KOLKATA
Workshop : QIP-
STC (AICTE) on HR Analytics- hands on Training.
VGSOM, IIT., Kharagpur
11.5.2015
HR analytics and Discrete data
• HR-analytics cover two approaches broadly - association and
predictive. Discrete data mapping follows former. It is a
multivariate statistical model to explore association of different
data points. Association of discrete data forms neighbourhood. The
map provides knowledge about distances among neighbourhoods,
e.g., neighbourhoods of human resource activities (recruitment,
training, placement, promotion, incentives etc.) and that of
employee performance (attrition, engagement etc.). The model is
useful for big data (data of multiple companies). In this model,
multi dimensional data are plotted on bi-dimensional plot. This
technique allows organizations to decide on relationships and
trends and predict future behaviors or events.
Truth is that you can measure
• Truth=Response – Error
• Any response is affected by fixed or random errors.
• Errors can be controlled by sampling, controlling
environment, instruments, statistics.
• Any response can be measured by discrete and continuous
data.
• Discrete data can not be fractioned but Continuous data
can be fractioned.
• Discrete data can be calculated by frequency or
percentage.
• Both types of data can be interchanged by transformation.
• Transformation looses important properties of original
data.
D. Dutta Roy, ISI., Kolkata
Discrete VS Continuous
• Discrete data can be numeric -- like numbers
of apples -- but it can also be categorical -- like
red or blue, or male or female, or good or bad.
Continuous data are not restricted
to defined separate values, but can occupy
any value over a continuous range.
Lecture notes: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
HR Analytics
• HR analytics data include heads (number of
people) of recruitment, training, placement,
promotion, incentives etc. and those of their
performance like attrition, engagement etc.
• Analytics can prepare, one, two or multi-way
tables.
• Stem-leaf plot can be used to map discrete
data.
D. Dutta Roy, ISI., Kolkata
Stem-Leaf Plot of One-way table of Discrete data
D. Dutta Roy, ISI., Kolkata
Two-Way table or Crosstabulation
• Cross tabulation is a combination of two (or more) frequency tables
arranged such that each cell in the resulting table represents a
unique combination of specific values of crosstabulated variables.
• Thus, crosstabulation allows us to examine frequencies of
observations that belong to specific categories on more than one
variable.
• By examining these frequencies, we can identify relations between
crosstabulated variables. Only categorical (nominal) variables or
variables with a relatively small number of different meaningful
values should be crosstabulated.
• Note that in the cases where we do want to include a continuous
variable in a crosstabulation (e.g., income), we can first recode it
into a particular number of distinct ranges (e.g., low, medium,
high).
• Cross tabulation can be computed through Pivot table in MS-Excel .
Histogram of Two-way table
Test of Significance
• The Pearson Chi-square is the most common
test for significance of the relationship
between categorical variables.
• Coefficient Phi: It is a measure of correlation
between two categorical variables in a 2 x 2
table. Its value can range from 0 (no relation
between factors; Chi-square=0.0) to 1 (perfect
relation between the two factors in the table).
Coefficient of Contingency
• The coefficient of contingency is a Chi-square
based measure of the relation between two
categorical variables (proposed by Pearson,
the originator of the Chi-square test). Its
advantage over the ordinary Chi-square is that
it is more easily interpreted, since its range is
always limited to 0 through 1 (where 0 means
complete independence).
Correspondence Analysis
• The Crosstabs procedure offers several
measures of association and tests of
association but cannot graphically represent
any relationships between the variables.
• Correspondence analysis is to describe the
relationships between two nominal variables
in a correspondence table in a low-
dimensional space.
Frequency Table (N=902 respondents)
Reasons for work
preference 0 1 2 3 4 5 6Total
Achievement 6 31 115 236 265 201 48 902
Application 1 20 50 126 274 296 135 902
Knowledge 3 22 68 156 239 304 110 902
Aesthetic 29 146 249 270 155 43 10 902
Affiliation 29 219 320 202 109 23 0 902
Harm avoidance 85 417 239 100 45 13 3 902
Recognition 10 108 258 299 141 72 14 902
0:least important; 1:Less important; 2: Important; 4:More important; 5:Most important
Frequency distribution provides
information about data grouping
Neighbourhood
• In the frequency table, there are 6 column and
7 Row variables. Neighbourhood can be
formed by clustering the row, column and
row- column correspondence.
• So, partitioning in the row and column
variables is important .
Correspondence of row and col
variables
  Scoring Categories
  0 1 2 3 4 5 6 Total
  f % f % f % f % f % f % f %  
Achievement 6 3.68 31 3.22 115 8.85 236 16.99 265 21.58 201 21.11 48 15 902
Application 1 0.61 20 2.08 50 3.85 126 9.07 274 22.31 296 31.09 135 42.19 902
Knowledge 3 1.84 22 2.28 68 5.23 156 11.23 239 19.46 304 31.93 110 34.38 902
Aesthetic 29 17.79 146 15.16 249 19.17 270 19.44 155 12.62 43 4.52 10 3.13 902
Affiliation 29 17.79 219 22.74 320 24.63 202 14.59 109 8.88 23 2.42 0 0 902
Harm avoidance 85 52.15 417 43.3 239 18.4 100 7.2 45 3.66 13 1.37 3 0.94 902
Recognition 10 6.13 108 11.21 258 19.86 299 21.53 141 11.48 72 7.59 14 4.38 902
Total 163 100 963 100 1299 100 1389 100 1228 100 952 100 320 100 6314
Neighbourhood Data Mapping
(N=902)
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Where in Chi-Square fails, this model works
(Job Analysis Data, N=200)
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Lecture note: Discrete Data Mapping by
D. Dutta Roy, ISI., Kolkata
Thank You

More Related Content

PPTX
Graphs (Biostatistics)
PPTX
Dm
PPTX
Statistics in research
PPTX
Statstics in nursing
PPTX
Descriptive statistics
PPT
Statistical methods
PPT
statistics in nursing
PPTX
Introduction to Descriptive Statistics
Graphs (Biostatistics)
Dm
Statistics in research
Statstics in nursing
Descriptive statistics
Statistical methods
statistics in nursing
Introduction to Descriptive Statistics

What's hot (19)

PPTX
Types of variables and descriptive statistics
PPTX
Introduction to Statistics in Nursing.
PPTX
Descriptive statistics
PPTX
Statistics in research by dr. sudhir sahu
PPT
Univariate, bivariate analysis, hypothesis testing, chi square
PPTX
Lesson 27 using statistical techniques in analyzing data
PDF
Tools and Techniques - Statistics: descriptive statistics
PPT
Day 3 descriptive statistics
PPTX
Statistics for Librarians, Session 2: Descriptive statistics
PPTX
Descriptive Statistics
PPTX
Descriptive Statistics
PPTX
Descriptive statistics
PPT
General Statistics boa
PPTX
Measures of variability
PPT
Business Statistics
PDF
Exploratory data analysis project
PPTX
Descriptive statistics
PDF
Statistics is the science of collection
PPTX
Univariate analysis:Medical statistics Part IV
Types of variables and descriptive statistics
Introduction to Statistics in Nursing.
Descriptive statistics
Statistics in research by dr. sudhir sahu
Univariate, bivariate analysis, hypothesis testing, chi square
Lesson 27 using statistical techniques in analyzing data
Tools and Techniques - Statistics: descriptive statistics
Day 3 descriptive statistics
Statistics for Librarians, Session 2: Descriptive statistics
Descriptive Statistics
Descriptive Statistics
Descriptive statistics
General Statistics boa
Measures of variability
Business Statistics
Exploratory data analysis project
Descriptive statistics
Statistics is the science of collection
Univariate analysis:Medical statistics Part IV
Ad

Similar to Discrete data mapping (20)

PPTX
Basic Statistics in 1 hour.pptx
PDF
Chapter 8 addisional content
PDF
Chapter 8 addisional content
PPTX
Data analysis.pptx
PPTX
COORELATION
PPT
New statistics
PPT
Probability and statistics
PPT
Probability and statistics
PPT
Probability and statistics(exercise answers)
PPT
Statistics1(finals)
PPT
Finals Stat 1
PPT
Probability and statistics
PPT
Probability and statistics(assign 7 and 8)
PPT
Chapter01
PPT
Chapter01
PPTX
Introduction to statistics
PPTX
Presentation1.pptx
PPTX
Research-and-Stats-Report-Riyadh-Group.pptx
PPTX
Categorical_Data_Analysis_Combined_MSc_Biostatistics.pptx
PPTX
Statistics with R
Basic Statistics in 1 hour.pptx
Chapter 8 addisional content
Chapter 8 addisional content
Data analysis.pptx
COORELATION
New statistics
Probability and statistics
Probability and statistics
Probability and statistics(exercise answers)
Statistics1(finals)
Finals Stat 1
Probability and statistics
Probability and statistics(assign 7 and 8)
Chapter01
Chapter01
Introduction to statistics
Presentation1.pptx
Research-and-Stats-Report-Riyadh-Group.pptx
Categorical_Data_Analysis_Combined_MSc_Biostatistics.pptx
Statistics with R
Ad

More from D Dutta Roy (20)

PPSX
Inroads to consciousness
PPSX
Revisiting the fundamental concepts and assumptions of statistics pps
PPSX
Paradigm shift and measurement issues of subjective well being
PPSX
Data visualization in Health related research
PPSX
Checklist research and applications
PPSX
Problems and solution of Technology Adoption in Agriculture
PPS
Research Methodology in management
PPSX
Socio cultural & socio-economic dimensions pps
PPSX
Happiness & Rabindrik psychotherapy
PPSX
Psychological data science
PPSX
Orientation workshop on Rabindrik Psychotherapy
PPSX
Psychoinformatics in management
PPSX
Rabindrik psychotherapy rotary
PPS
Clustering of Rabindrik Human Values
PPSX
Workers Education
PPS
Tribal education
PPS
Psychiatric classificationshow
PPSX
Box whisker show
PPT
Rabindrik psychotherapy
PPS
Teaching pedagogy
Inroads to consciousness
Revisiting the fundamental concepts and assumptions of statistics pps
Paradigm shift and measurement issues of subjective well being
Data visualization in Health related research
Checklist research and applications
Problems and solution of Technology Adoption in Agriculture
Research Methodology in management
Socio cultural & socio-economic dimensions pps
Happiness & Rabindrik psychotherapy
Psychological data science
Orientation workshop on Rabindrik Psychotherapy
Psychoinformatics in management
Rabindrik psychotherapy rotary
Clustering of Rabindrik Human Values
Workers Education
Tribal education
Psychiatric classificationshow
Box whisker show
Rabindrik psychotherapy
Teaching pedagogy

Recently uploaded (20)

PDF
Introduction to the R Programming Language
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PDF
Mega Projects Data Mega Projects Data
PDF
Lecture1 pattern recognition............
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Computer network topology notes for revision
Introduction to the R Programming Language
oil_refinery_comprehensive_20250804084928 (1).pptx
Fluorescence-microscope_Botany_detailed content
Acceptance and paychological effects of mandatory extra coach I classes.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Qualitative Qantitative and Mixed Methods.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Mega Projects Data Mega Projects Data
Lecture1 pattern recognition............
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
ISS -ESG Data flows What is ESG and HowHow
[EN] Industrial Machine Downtime Prediction
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Computer network topology notes for revision

Discrete data mapping

  • 1. Discrete Data Mapping : Problem of HR-Analytics Debdulal Dutta Roy, Ph.D. (Psy.) Psychology Research Unit INDIAN STATISTICAL INSTITUTE, KOLKATA Workshop : QIP- STC (AICTE) on HR Analytics- hands on Training. VGSOM, IIT., Kharagpur 11.5.2015
  • 2. HR analytics and Discrete data • HR-analytics cover two approaches broadly - association and predictive. Discrete data mapping follows former. It is a multivariate statistical model to explore association of different data points. Association of discrete data forms neighbourhood. The map provides knowledge about distances among neighbourhoods, e.g., neighbourhoods of human resource activities (recruitment, training, placement, promotion, incentives etc.) and that of employee performance (attrition, engagement etc.). The model is useful for big data (data of multiple companies). In this model, multi dimensional data are plotted on bi-dimensional plot. This technique allows organizations to decide on relationships and trends and predict future behaviors or events.
  • 3. Truth is that you can measure • Truth=Response – Error • Any response is affected by fixed or random errors. • Errors can be controlled by sampling, controlling environment, instruments, statistics. • Any response can be measured by discrete and continuous data. • Discrete data can not be fractioned but Continuous data can be fractioned. • Discrete data can be calculated by frequency or percentage. • Both types of data can be interchanged by transformation. • Transformation looses important properties of original data. D. Dutta Roy, ISI., Kolkata
  • 4. Discrete VS Continuous • Discrete data can be numeric -- like numbers of apples -- but it can also be categorical -- like red or blue, or male or female, or good or bad. Continuous data are not restricted to defined separate values, but can occupy any value over a continuous range. Lecture notes: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata
  • 5. HR Analytics • HR analytics data include heads (number of people) of recruitment, training, placement, promotion, incentives etc. and those of their performance like attrition, engagement etc. • Analytics can prepare, one, two or multi-way tables. • Stem-leaf plot can be used to map discrete data. D. Dutta Roy, ISI., Kolkata
  • 6. Stem-Leaf Plot of One-way table of Discrete data D. Dutta Roy, ISI., Kolkata
  • 7. Two-Way table or Crosstabulation • Cross tabulation is a combination of two (or more) frequency tables arranged such that each cell in the resulting table represents a unique combination of specific values of crosstabulated variables. • Thus, crosstabulation allows us to examine frequencies of observations that belong to specific categories on more than one variable. • By examining these frequencies, we can identify relations between crosstabulated variables. Only categorical (nominal) variables or variables with a relatively small number of different meaningful values should be crosstabulated. • Note that in the cases where we do want to include a continuous variable in a crosstabulation (e.g., income), we can first recode it into a particular number of distinct ranges (e.g., low, medium, high). • Cross tabulation can be computed through Pivot table in MS-Excel .
  • 9. Test of Significance • The Pearson Chi-square is the most common test for significance of the relationship between categorical variables. • Coefficient Phi: It is a measure of correlation between two categorical variables in a 2 x 2 table. Its value can range from 0 (no relation between factors; Chi-square=0.0) to 1 (perfect relation between the two factors in the table).
  • 10. Coefficient of Contingency • The coefficient of contingency is a Chi-square based measure of the relation between two categorical variables (proposed by Pearson, the originator of the Chi-square test). Its advantage over the ordinary Chi-square is that it is more easily interpreted, since its range is always limited to 0 through 1 (where 0 means complete independence).
  • 11. Correspondence Analysis • The Crosstabs procedure offers several measures of association and tests of association but cannot graphically represent any relationships between the variables. • Correspondence analysis is to describe the relationships between two nominal variables in a correspondence table in a low- dimensional space.
  • 12. Frequency Table (N=902 respondents) Reasons for work preference 0 1 2 3 4 5 6Total Achievement 6 31 115 236 265 201 48 902 Application 1 20 50 126 274 296 135 902 Knowledge 3 22 68 156 239 304 110 902 Aesthetic 29 146 249 270 155 43 10 902 Affiliation 29 219 320 202 109 23 0 902 Harm avoidance 85 417 239 100 45 13 3 902 Recognition 10 108 258 299 141 72 14 902 0:least important; 1:Less important; 2: Important; 4:More important; 5:Most important
  • 14. Neighbourhood • In the frequency table, there are 6 column and 7 Row variables. Neighbourhood can be formed by clustering the row, column and row- column correspondence. • So, partitioning in the row and column variables is important .
  • 15. Correspondence of row and col variables   Scoring Categories   0 1 2 3 4 5 6 Total   f % f % f % f % f % f % f %   Achievement 6 3.68 31 3.22 115 8.85 236 16.99 265 21.58 201 21.11 48 15 902 Application 1 0.61 20 2.08 50 3.85 126 9.07 274 22.31 296 31.09 135 42.19 902 Knowledge 3 1.84 22 2.28 68 5.23 156 11.23 239 19.46 304 31.93 110 34.38 902 Aesthetic 29 17.79 146 15.16 249 19.17 270 19.44 155 12.62 43 4.52 10 3.13 902 Affiliation 29 17.79 219 22.74 320 24.63 202 14.59 109 8.88 23 2.42 0 0 902 Harm avoidance 85 52.15 417 43.3 239 18.4 100 7.2 45 3.66 13 1.37 3 0.94 902 Recognition 10 6.13 108 11.21 258 19.86 299 21.53 141 11.48 72 7.59 14 4.38 902 Total 163 100 963 100 1299 100 1389 100 1228 100 952 100 320 100 6314
  • 16. Neighbourhood Data Mapping (N=902) Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata
  • 17. Where in Chi-Square fails, this model works (Job Analysis Data, N=200) Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata Lecture note: Discrete Data Mapping by D. Dutta Roy, ISI., Kolkata