Discrete data mapping

Discrete Data Mapping : Problem
of HR-Analytics
Debdulal Dutta Roy, Ph.D. (Psy.)
Psychology Research Unit
INDIAN STATISTICAL INSTITUTE, KOLKATA
Workshop : QIP-
STC (AICTE) on HR Analytics- hands on Training.
VGSOM, IIT., Kharagpur
11.5.2015

HR analytics and Discrete data
• HR-analytics cover two approaches broadly - association and
predictive. Discrete data mapping follows former. It is a
multivariate statistical model to explore association of different
data points. Association of discrete data forms neighbourhood. The
map provides knowledge about distances among neighbourhoods,
e.g., neighbourhoods of human resource activities (recruitment,
training, placement, promotion, incentives etc.) and that of
employee performance (attrition, engagement etc.). The model is
useful for big data (data of multiple companies). In this model,
multi dimensional data are plotted on bi-dimensional plot. This
technique allows organizations to decide on relationships and
trends and predict future behaviors or events.

Truth is that you can measure
• Truth=Response – Error
• Any response is affected by fixed or random errors.
• Errors can be controlled by sampling, controlling
environment, instruments, statistics.
• Any response can be measured by discrete and continuous
data.
• Discrete data can not be fractioned but Continuous data
can be fractioned.
• Discrete data can be calculated by frequency or
percentage.
• Both types of data can be interchanged by transformation.
• Transformation looses important properties of original
data.
D. Dutta Roy, ISI., Kolkata

Discrete VS Continuous
• Discrete data can be numeric -- like numbers
of apples -- but it can also be categorical -- like
red or blue, or male or female, or good or bad.
Continuous data are not restricted
to defined separate values, but can occupy
any value over a continuous range.
Lecture notes: Discrete Data Mapping by

HR Analytics
• HR analytics data include heads (number of
people) of recruitment, training, placement,
promotion, incentives etc. and those of their
performance like attrition, engagement etc.
• Analytics can prepare, one, two or multi-way
tables.
• Stem-leaf plot can be used to map discrete
data.

Stem-Leaf Plot of One-way table of Discrete data

Two-Way table or Crosstabulation
• Cross tabulation is a combination of two (or more) frequency tables
arranged such that each cell in the resulting table represents a
unique combination of specific values of crosstabulated variables.
• Thus, crosstabulation allows us to examine frequencies of
observations that belong to specific categories on more than one
variable.
• By examining these frequencies, we can identify relations between
crosstabulated variables. Only categorical (nominal) variables or
variables with a relatively small number of different meaningful
values should be crosstabulated.
• Note that in the cases where we do want to include a continuous
variable in a crosstabulation (e.g., income), we can first recode it
into a particular number of distinct ranges (e.g., low, medium,
high).
• Cross tabulation can be computed through Pivot table in MS-Excel .

Test of Significance
• The Pearson Chi-square is the most common
test for significance of the relationship
between categorical variables.
• Coefficient Phi: It is a measure of correlation
between two categorical variables in a 2 x 2
table. Its value can range from 0 (no relation
between factors; Chi-square=0.0) to 1 (perfect
relation between the two factors in the table).

Coefficient of Contingency
• The coefficient of contingency is a Chi-square
based measure of the relation between two
categorical variables (proposed by Pearson,
the originator of the Chi-square test). Its
advantage over the ordinary Chi-square is that
it is more easily interpreted, since its range is
always limited to 0 through 1 (where 0 means
complete independence).

Correspondence Analysis
• The Crosstabs procedure offers several
measures of association and tests of
association but cannot graphically represent
any relationships between the variables.
• Correspondence analysis is to describe the
relationships between two nominal variables
in a correspondence table in a low-
dimensional space.

Frequency Table (N=902 respondents)
Reasons for work
preference 0 1 2 3 4 5 6Total
Achievement 6 31 115 236 265 201 48 902
Application 1 20 50 126 274 296 135 902
Knowledge 3 22 68 156 239 304 110 902
Aesthetic 29 146 249 270 155 43 10 902
Affiliation 29 219 320 202 109 23 0 902
Harm avoidance 85 417 239 100 45 13 3 902
Recognition 10 108 258 299 141 72 14 902
0:least important; 1:Less important; 2: Important; 4:More important; 5:Most important

Frequency distribution provides
information about data grouping

Neighbourhood
• In the frequency table, there are 6 column and
7 Row variables. Neighbourhood can be
formed by clustering the row, column and
row- column correspondence.
• So, partitioning in the row and column
variables is important .

Correspondence of row and col
variables
Scoring Categories
0 1 2 3 4 5 6 Total
f % f % f % f % f % f % f %
Achievement 6 3.68 31 3.22 115 8.85 236 16.99 265 21.58 201 21.11 48 15 902
Application 1 0.61 20 2.08 50 3.85 126 9.07 274 22.31 296 31.09 135 42.19 902
Knowledge 3 1.84 22 2.28 68 5.23 156 11.23 239 19.46 304 31.93 110 34.38 902
Aesthetic 29 17.79 146 15.16 249 19.17 270 19.44 155 12.62 43 4.52 10 3.13 902
Affiliation 29 17.79 219 22.74 320 24.63 202 14.59 109 8.88 23 2.42 0 0 902
Harm avoidance 85 52.15 417 43.3 239 18.4 100 7.2 45 3.66 13 1.37 3 0.94 902
Recognition 10 6.13 108 11.21 258 19.86 299 21.53 141 11.48 72 7.59 14 4.38 902
Total 163 100 963 100 1299 100 1389 100 1228 100 952 100 320 100 6314

Neighbourhood Data Mapping
(N=902)
Lecture note: Discrete Data Mapping by

Where in Chi-Square fails, this model works
(Job Analysis Data, N=200)

Discrete data mapping

More Related Content

What's hot (19)

Similar to Discrete data mapping (20)

More from D Dutta Roy (20)

Recently uploaded (20)

Discrete data mapping