Models Can Lie

Models Can Lie
An Illustration from Heart Rate Variablity Data
Raju Rimal and Veronika Lindberg
2015/12/09
Norwegian University of Science and Technology
Norwegian University of Life Sciences

Overview
1. Background
2. How data looks like
3. Classification with series stack
4. Classification with series averaged over Series repetition
5. Classification with series averaged over Person-Event
Combination
6. Some Comparison

Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis

Some Background
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response

Some Background
the analysis
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination

Some Background
the analysis
for response
combination
4. Three variation of dataset is used

Some Background
the analysis
for response
combination
◦ Transpose of each frequency windows stacked together

Some Background
the analysis
for response
combination
◦ The average frequencies over time for each Series

Some Background
the analysis
for response
combination
◦ The average frequencies over time for each person-event
combination

Some Background
the analysis
for response
combination
◦ The average frequencies over time for each person-event
combination
5. LDA model is used for discriminant analysis using the
scores obtained form each of the latent variable model with
cross-validation implemented

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
P11P12P127P129P131P23
0.00 0.25 0.50 0.75 1.00
Time(s)
Each block represent
a series divided into
several windows
(rows), 128 columns
each with 16
overlaps. The cell
contains the
frequency values
obtained from fast
fourier transform

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
P11P12P127P129P131P23
0.00 0.25 0.50 0.75 1.00
Time(s)
Each person may
have involved into
multiple activities
which may have
replications

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
P11P12P127P129P131P23
0.00 0.25 0.50 0.75 1.00
Time(s)
Set 1: Each windows
are stacked in a row
to form a big matrix
(may suﬀer from
repeated
measurement). This
contains diﬀerent
parts of same series
in various rows.

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
P11P12P127P129P131P23
0.00 0.25 0.50 0.75 1.00
Time(s)
Set 2: Each series are
averaged over
diﬀerent time
points. Each row
corresponds to one
series.

How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
P11P12P127P129P131P23
0.00 0.25 0.50 0.75 1.00
Time(s)
Set 3: A person can
have multiple series
of same activity
(replication), the
third set is averaged
over each
person-event
conbination. In this
case each row
corresponds to some
speciﬁc event for
some speciﬁc person

Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain

Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS)

Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128
Scores

Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
Scores
y categories,
X Scores from 1 to i
LDA Model: y f (X)
Loop over each additional scores

Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
Scores
y categories,
X Scores from 1 to i
LDA Model: y f (X)
Loop over each additional scores
Error
(cvTest and Train)
For ith
Scores
and jth
cv split
Cross-validation Loop

Classification with series stack

Training and Cross-validation Errors
PLS PCR CPLS
Comp: 4
Error: 0.89
Comp: 79
Error: 0.35
Comp: 56
Error: 0.82
Comp: 128
Error: 0.3
Comp: 29
Error: 0.81
Comp: 70
Error: 0.35
0.84
0.88
0.92
0.3
0.4
0.5
0.6
0.7
0.8
0.9
cvtesttrain
0 50 100 0 50 100 0 50 100
Components
MisclassificationError

Misclassiﬁcations
Training Misclassiﬁcations
PLS PCR CPLS
124 154
151 99
890
240
4 1
890
37 7
251 28
153 95
226 1948
420
60
50
291 1271
698 74
155 123
100 150
872
20 4
4 1
79 10
17 27
214 65
161 87
264 1910
37 5
4 2
3 2
203 1359
574 198
148 130
86 164
84 5
19 5
1 4
75 14
13 31
196 83
158 90
258 1916
34 8
1 5
1 4
194 1368
565 207
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Misclassiﬁcations
Test Misclassiﬁcation
PLS PCR CPLS
8 27
36
17 21
370
9 26
36
9 29
34 3
11 24
36
9 29
32 5
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Plotting Scores
Scoreplot for PCR model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−30 −20 −10 0 10
Comp1 (59.39%)
Comp2(7.38%)
Model:PCR
−4
−2
0
2
4
6
−4 −2 0 2
Comp2 (7.38%)
Comp3(4.31%)
Model:PCR

Plotting Scores
Scoreplot for PLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−20 −10 0 10
Comp1 (59.27%)
Comp2(7.34%)
Model:PLS
−4
−2
0
2
−4 −2 0 2
Comp2 (7.34%)
Comp3(2.31%)
Model:PLS

Plotting Scores
Scoreplot for CPPLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−3
−2
−1
0
1
2
−4 −2 0 2
Comp1 (8.13%)
Comp2(20.65%)
Model:CPPLS
−2
0
2
4
6
−3 −2 −1 0 1 2
Comp2 (20.65%)
Comp3(3.89%)
Model:CPPLS

Classification with series aver-
aged over Series repetition

PLS PCR CPLS
Comp: 15
Error: 0.98
Comp: 114
Error: 0.01
Comp: 128
Error: 0.99
Comp: 127
Error: 0.02
Comp: 68
Error: 0.99
Comp: 99
Error: 0.01
0.985
0.990
0.995
1.000
0.05
0.10
0.15
0.20
cvtesttrain
0 50 100 0 50 100 0 50 100
Components

Misclassiﬁcations
PLS PCR CPLS
1 3
1 3
30
0 3
0 1
0 1
0 3
0 1
0 1
3 174
2 4
0 1
0 2
0 1
8 12
1 3
1 3
0 3
0 3
0 1
0 1
0 3
0 1
0 1
0 177
0 6
0 1
0 2
0 1
2 18
1 3
1 3
0 3
0 3
0 1
0 1
0 3
0 1
0 1
0 177
0 6
0 1
0 2
0 1
3 17
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Misclassiﬁcations
PLS PCR CPLS
0 2
2
0 2
20
0 2
2
0 2
1 1
0 2
2
0 2
1 1
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−6
−4
−2
0
−10 −5 0 5 10
Comp1 (74.44%)
Comp2(9.41%)
Model:PCR
−3
−2
−1
0
1
2
−6 −4 −2 0
Comp2 (9.41%)
Comp3(3.83%)
Model:PCR

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−6
−4
−2
0
2
−5 0 5
Comp1 (63.41%)
Comp2(20.2%)
Model:PLS
−2
0
2
−6 −4 −2 0 2
Comp2 (20.2%)
Comp3(4.03%)
Model:PLS

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−1
0
1
−4 −3 −2 −1 0 1
Comp1 (15.39%)
Comp2(1.79%)
Model:CPPLS
−2
−1
0
1
−1 0 1
Comp2 (1.79%)
Comp3(2.21%)
Model:CPPLS

Classification with series aver-
aged over Person-Event Combina-
tion

PLS PCR CPLS
Comp: 1
Error: 1
Comp: 5
Error: 0.26
Comp: 1
Error: 1
Comp: 5
Error: 0.38
Comp: 1
Error: 1
Comp: 1
Error: 0.87
0.50
0.75
1.00
1.25
1.50
0.4
0.6
0.8
cvtesttrain
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Components

Misclassiﬁcations
PLS PCR CPLS
20
20
30
0 1
0 1
10
10
10
10
10
1 2
10
0 1
10
1 3
20
20
30
0 1
0 1
10
10
10
10
10
1 2
10
0 1
10
1 3
20
20
0 3
10
10
10
10
10
10
10
30
10
10
10
40
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Misclassiﬁcations
PLS PCR CPLS
20
2
20
20
20
2
20
20
20
2
20
0 2
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−10 −5 0 5
Comp1 (72.28%)
Comp2(13.79%)
Model:PCR
−2
0
2
−4 −2 0 2
Comp2 (13.79%)
Comp3(7.15%)
Model:PCR

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−5 0 5 10
Comp1 (72.27%)
Comp2(13.72%)
Model:PLS
−2
0
2
−4 −2 0 2
Comp2 (13.72%)
Comp3(7.12%)
Model:PLS

Plotting Scores
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−0.4
−0.2
0.0
0.2
0.4
−0.4 −0.2 0.0 0.2
Comp1 (2.32%)
Comp2(2.57%)
Model:CPPLS
−0.25
0.00
0.25
0.50
−0.4 −0.2 0.0 0.2 0.4
Comp2 (2.57%)
Comp3(0.26%)
Model:CPPLS

Misclassification Errors
Set 1 Set 2 Set 3
29
56
4
68
128
15
1
1 1
0.0175
0.0219
0.0789
0.3124
0.3276
0.3733
0.6667
0.875
0.9315
0.9521
0.9932
1
CPLS PCR PLS CPLS PCR PLS CPLS PCR PLS
Model
Test Train
Figure: Training and Test Misclassification Error for all the three models. The LDA
models were fitted with the scores obtained from three models with components
(number above each points) needed to get minimum cross-validation error.

References
[1] M Dowle et al. data.table: Extension of Data.frame. R package version 1.9.6. 2015.
url: http://CRAN.R-project.org/package=data.table.
[2] Ulf G Indahl, Kristian Hovde Liland, and Tormod Næs. “Canonical partial least
squares—a uniﬁed PLS approach to classiﬁcation and regression problems”. In:
Journal of Chemometrics 23.9 (2009), pp. 495–504.
[3] Uwe Ligges, Tom Short, and Paul Kienzle. signal: Signal Processing. R package
version 0.7-6. 2015. url: http://CRAN.R-project.org/package=signal.
[4] Harald Martens and Magni Martens. Multivariate analysis of quality: an
introduction. John Wiley & Sons, 2001.
[5] Harald Martens and Tormod Naes. Multivariate calibration. John Wiley &
Sons, 1992.
[6] Hadley Wickham. “ggplot: An Implementation of the Grammar of Graphics”. In:
R package version 0.4. 0 (2006).
[7] Hadley Wickham. “reshape2: Flexibly reshape data: a reboot of the reshape
package”. In: R package version 1.2 (2012).
[8] Yihui Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R
package version 1.11. 2015. url: http://CRAN.R-project.org/package=knitr.

Models Can Lie

More Related Content

Similar to Models Can Lie (20)

Recently uploaded (20)

Models Can Lie