SlideShare a Scribd company logo
Models Can Lie
An Illustration from Heart Rate Variablity Data
Raju Rimal and Veronika Lindberg
2015/12/09
Norwegian University of Science and Technology
Norwegian University of Life Sciences
Overview
1. Background
2. How data looks like
3. Classification with series stack
4. Classification with series averaged over Series repetition
5. Classification with series averaged over Person-Event
Combination
6. Some Comparison
Background
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
4. Three variation of dataset is used
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
4. Three variation of dataset is used
◦ Transpose of each frequency windows stacked together
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
4. Three variation of dataset is used
◦ Transpose of each frequency windows stacked together
◦ The average frequencies over time for each Series
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
4. Three variation of dataset is used
◦ Transpose of each frequency windows stacked together
◦ The average frequencies over time for each Series
◦ The average frequencies over time for each person-event
combination
Some Background
1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in
the analysis
2. CPPLS integrate CCA with PLS to select relevant variables
for response
3. Cross-validation is performed over the observations on
each a) frequency window b) series c) person-event
combination
4. Three variation of dataset is used
◦ Transpose of each frequency windows stacked together
◦ The average frequencies over time for each Series
◦ The average frequencies over time for each person-event
combination
5. LDA model is used for discriminant analysis using the
scores obtained form each of the latent variable model with
cross-validation implemented
How data looks like
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
Each block represent
a series divided into
several windows
(rows), 128 columns
each with 16
overlaps. The cell
contains the
frequency values
obtained from fast
fourier transform
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
Each person may
have involved into
multiple activities
which may have
replications
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
Set 1: Each windows
are stacked in a row
to form a big matrix
(may suffer from
repeated
measurement). This
contains different
parts of same series
in various rows.
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
Set 2: Each series are
averaged over
different time
points. Each row
corresponds to one
series.
How data looks like
0
200
400
600
0
100
200
300
0
100
200
300
400
0
100
200
300
0
100
200
300
0
100
200
300
400
500
Person1Person1Person1Person1Person1Person3
P11P12P127P129P131P23
GymGymSaunaSaunaSaunaSauna
0.00 0.25 0.50 0.75 1.00
Frequency Window (dB)
Time(s)
Set 3: A person can
have multiple series
of same activity
(replication), the
third set is averaged
over each
person-event
conbination. In this
case each row
corresponds to some
specific event for
some specific person
Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS)
Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128
Scores
Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128
Scores
y categories,
X Scores from 1 to i
LDA Model: y f (X)
Loop over each additional scores
Cross-Validation
j
cvTest
1, . . . j − 1, j + 1, . . . , 10
cvTrain
MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128
Scores
y categories,
X Scores from 1 to i
LDA Model: y f (X)
Loop over each additional scores
Error
(cvTest and Train)
For ith
Scores
and jth
cv split
Cross-validation Loop
Classification with series stack
Training and Cross-validation Errors
PLS PCR CPLS
Comp: 4
Error: 0.89
Comp: 79
Error: 0.35
Comp: 56
Error: 0.82
Comp: 128
Error: 0.3
Comp: 29
Error: 0.81
Comp: 70
Error: 0.35
0.84
0.88
0.92
0.3
0.4
0.5
0.6
0.7
0.8
0.9
cvtesttrain
0 50 100 0 50 100 0 50 100
Components
MisclassificationError
Misclassifications
Training Misclassifications
PLS PCR CPLS
124 154
151 99
890
240
4 1
890
37 7
251 28
153 95
226 1948
420
60
50
291 1271
698 74
155 123
100 150
872
20 4
4 1
79 10
17 27
214 65
161 87
264 1910
37 5
4 2
3 2
203 1359
574 198
148 130
86 164
84 5
19 5
1 4
75 14
13 31
196 83
158 90
258 1916
34 8
1 5
1 4
194 1368
565 207
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Misclassifications
Test Misclassification
PLS PCR CPLS
8 27
36
17 21
370
9 26
36
9 29
34 3
11 24
36
9 29
32 5
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Plotting Scores
Scoreplot for PCR model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−30 −20 −10 0 10
Comp1 (59.39%)
Comp2(7.38%)
Model:PCR
−4
−2
0
2
4
6
−4 −2 0 2
Comp2 (7.38%)
Comp3(4.31%)
Model:PCR
Plotting Scores
Scoreplot for PLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−20 −10 0 10
Comp1 (59.27%)
Comp2(7.34%)
Model:PLS
−4
−2
0
2
−4 −2 0 2
Comp2 (7.34%)
Comp3(2.31%)
Model:PLS
Plotting Scores
Scoreplot for CPPLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−3
−2
−1
0
1
2
−4 −2 0 2
Comp1 (8.13%)
Comp2(20.65%)
Model:CPPLS
−2
0
2
4
6
−3 −2 −1 0 1 2
Comp2 (20.65%)
Comp3(3.89%)
Model:CPPLS
Classification with series aver-
aged over Series repetition
Training and Cross-validation Errors
PLS PCR CPLS
Comp: 15
Error: 0.98
Comp: 114
Error: 0.01
Comp: 128
Error: 0.99
Comp: 127
Error: 0.02
Comp: 68
Error: 0.99
Comp: 99
Error: 0.01
0.985
0.990
0.995
1.000
0.05
0.10
0.15
0.20
cvtesttrain
0 50 100 0 50 100 0 50 100
Components
MisclassificationError
Misclassifications
Training Misclassifications
PLS PCR CPLS
1 3
1 3
30
0 3
0 1
0 1
0 3
0 1
0 1
3 174
2 4
0 1
0 2
0 1
8 12
1 3
1 3
0 3
0 3
0 1
0 1
0 3
0 1
0 1
0 177
0 6
0 1
0 2
0 1
2 18
1 3
1 3
0 3
0 3
0 1
0 1
0 3
0 1
0 1
0 177
0 6
0 1
0 2
0 1
3 17
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Misclassifications
Test Misclassification
PLS PCR CPLS
0 2
2
0 2
20
0 2
2
0 2
1 1
0 2
2
0 2
1 1
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Plotting Scores
Scoreplot for PCR model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−6
−4
−2
0
−10 −5 0 5 10
Comp1 (74.44%)
Comp2(9.41%)
Model:PCR
−3
−2
−1
0
1
2
−6 −4 −2 0
Comp2 (9.41%)
Comp3(3.83%)
Model:PCR
Plotting Scores
Scoreplot for PLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−6
−4
−2
0
2
−5 0 5
Comp1 (63.41%)
Comp2(20.2%)
Model:PLS
−2
0
2
−6 −4 −2 0 2
Comp2 (20.2%)
Comp3(4.03%)
Model:PLS
Plotting Scores
Scoreplot for CPPLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−1
0
1
−4 −3 −2 −1 0 1
Comp1 (15.39%)
Comp2(1.79%)
Model:CPPLS
−2
−1
0
1
−1 0 1
Comp2 (1.79%)
Comp3(2.21%)
Model:CPPLS
Classification with series aver-
aged over Person-Event Combina-
tion
Training and Cross-validation Errors
PLS PCR CPLS
Comp: 1
Error: 1
Comp: 5
Error: 0.26
Comp: 1
Error: 1
Comp: 5
Error: 0.38
Comp: 1
Error: 1
Comp: 1
Error: 0.87
0.50
0.75
1.00
1.25
1.50
0.4
0.6
0.8
cvtesttrain
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
Components
MisclassificationError
Misclassifications
Training Misclassifications
PLS PCR CPLS
20
20
30
0 1
0 1
10
10
10
10
10
1 2
10
0 1
10
1 3
20
20
30
0 1
0 1
10
10
10
10
10
1 2
10
0 1
10
1 3
20
20
0 3
10
10
10
10
10
10
10
30
10
10
10
40
Light Jogging
Walking
Walking In Stairs
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Misclassifications
Test Misclassification
PLS PCR CPLS
20
2
20
20
20
2
20
20
20
2
20
0 2
Light Jogging
Resting
Walking
Walking In Stairs
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00
Correct Predictions
FALSE TRUE
Plotting Scores
Scoreplot for PCR model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−10 −5 0 5
Comp1 (72.28%)
Comp2(13.79%)
Model:PCR
−2
0
2
−4 −2 0 2
Comp2 (13.79%)
Comp3(7.15%)
Model:PCR
Plotting Scores
Scoreplot for PLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−4
−2
0
2
−5 0 5 10
Comp1 (72.27%)
Comp2(13.72%)
Model:PLS
−2
0
2
−4 −2 0 2
Comp2 (13.72%)
Comp3(7.12%)
Model:PLS
Plotting Scores
Scoreplot for CPPLS model
event
Alcohol
Cold Shower
Cycling
Gym
Gym Restitution
Light Jogging
Morning Before Exercise
Morning Test
Sauna
Sauna Ice Bath
Shower
Sleeping
Unknown
Walking
Walking In Stairs
−0.4
−0.2
0.0
0.2
0.4
−0.4 −0.2 0.0 0.2
Comp1 (2.32%)
Comp2(2.57%)
Model:CPPLS
−0.25
0.00
0.25
0.50
−0.4 −0.2 0.0 0.2 0.4
Comp2 (2.57%)
Comp3(0.26%)
Model:CPPLS
Some Comparison
Misclassification Errors
Set 1 Set 2 Set 3
29
56
4
68
128
15
1
1 1
0.0175
0.0219
0.0789
0.3124
0.3276
0.3733
0.6667
0.875
0.9315
0.9521
0.9932
1
CPLS PCR PLS CPLS PCR PLS CPLS PCR PLS
Model
MisclassificationError
Test Train
Figure: Training and Test Misclassification Error for all the three models. The LDA
models were fitted with the scores obtained from three models with components
(number above each points) needed to get minimum cross-validation error.
References
[1] M Dowle et al. data.table: Extension of Data.frame. R package version 1.9.6. 2015.
url: http://CRAN.R-project.org/package=data.table.
[2] Ulf G Indahl, Kristian Hovde Liland, and Tormod Næs. “Canonical partial least
squares—a unified PLS approach to classification and regression problems”. In:
Journal of Chemometrics 23.9 (2009), pp. 495–504.
[3] Uwe Ligges, Tom Short, and Paul Kienzle. signal: Signal Processing. R package
version 0.7-6. 2015. url: http://CRAN.R-project.org/package=signal.
[4] Harald Martens and Magni Martens. Multivariate analysis of quality: an
introduction. John Wiley & Sons, 2001.
[5] Harald Martens and Tormod Naes. Multivariate calibration. John Wiley &
Sons, 1992.
[6] Hadley Wickham. “ggplot: An Implementation of the Grammar of Graphics”. In:
R package version 0.4. 0 (2006).
[7] Hadley Wickham. “reshape2: Flexibly reshape data: a reboot of the reshape
package”. In: R package version 1.2 (2012).
[8] Yihui Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R
package version 1.11. 2015. url: http://CRAN.R-project.org/package=knitr.

More Related Content

PPS
Yeu tim dung click
PPT
Slide
PPT
PDF
Projeto Chapada in Jazz 2010
PPTX
Review of Austrailan research on vocational qualification completions
PPT
Presentation1
PDF
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge
PPT
Yeu tim dung click
Slide
Projeto Chapada in Jazz 2010
Review of Austrailan research on vocational qualification completions
Presentation1
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction Challenge

Similar to Models Can Lie (20)

PDF
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
PDF
Translating data to model ICCS2022_pub.pdf
PPT
Real-time PCR.ppt
PDF
Complex models in ecology: challenges and solutions
PDF
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
PDF
Population-Based DNA Variant Analysis
PPTX
annInstance28Nov6pm
PDF
classification of Thyroid disease SVM Report
PDF
Efficient Implementation of Self-Organizing Map for Sparse Input Data
PDF
Deep learning methods applied to physicochemical and toxicological endpoints
PPT
PDF
Translating data to predictive models
PDF
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
PPT
qPCR_1_ppt.ppt----=---------------------
PDF
State estimation with shape variability and ROMS
PDF
ECG_based_Biometric_Recognition_using_Wa.pdf
PPT
Prediction Of Bioactivity From Chemical Structure
PDF
Q pcr symposium2007-pcrarray
PPTX
Bioinfo ngs data format visualization v2
PPTX
Blinkdb
Two-Tailed PCR - New Ultrasensitive and Ultraspecific Technique for the Quant...
Translating data to model ICCS2022_pub.pdf
Real-time PCR.ppt
Complex models in ecology: challenges and solutions
Data Assimilation for the Lorenz (1963) Model using Ensemble and Extended Kal...
Population-Based DNA Variant Analysis
annInstance28Nov6pm
classification of Thyroid disease SVM Report
Efficient Implementation of Self-Organizing Map for Sparse Input Data
Deep learning methods applied to physicochemical and toxicological endpoints
Translating data to predictive models
DETECTION OF RELIABLE SOFTWARE USING SPRT ON TIME DOMAIN DATA
qPCR_1_ppt.ppt----=---------------------
State estimation with shape variability and ROMS
ECG_based_Biometric_Recognition_using_Wa.pdf
Prediction Of Bioactivity From Chemical Structure
Q pcr symposium2007-pcrarray
Bioinfo ngs data format visualization v2
Blinkdb
Ad

Recently uploaded (20)

PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Foundation of Data Science unit number two notes
PPT
ISS -ESG Data flows What is ESG and HowHow
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
annual-report-2024-2025 original latest.
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Foundation of Data Science unit number two notes
ISS -ESG Data flows What is ESG and HowHow
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Fluorescence-microscope_Botany_detailed content
IB Computer Science - Internal Assessment.pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
climate analysis of Dhaka ,Banglades.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
annual-report-2024-2025 original latest.
.pdf is not working space design for the following data for the following dat...
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
Computer network topology notes for revision
Ad

Models Can Lie

  • 1. Models Can Lie An Illustration from Heart Rate Variablity Data Raju Rimal and Veronika Lindberg 2015/12/09 Norwegian University of Science and Technology Norwegian University of Life Sciences
  • 2. Overview 1. Background 2. How data looks like 3. Classification with series stack 4. Classification with series averaged over Series repetition 5. Classification with series averaged over Person-Event Combination 6. Some Comparison
  • 4. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis
  • 5. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response
  • 6. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination
  • 7. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination 4. Three variation of dataset is used
  • 8. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination 4. Three variation of dataset is used ◦ Transpose of each frequency windows stacked together
  • 9. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination 4. Three variation of dataset is used ◦ Transpose of each frequency windows stacked together ◦ The average frequencies over time for each Series
  • 10. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination 4. Three variation of dataset is used ◦ Transpose of each frequency windows stacked together ◦ The average frequencies over time for each Series ◦ The average frequencies over time for each person-event combination
  • 11. Some Background 1. PCR, PLS and Canonically Powered PLS (CPPLS) is used in the analysis 2. CPPLS integrate CCA with PLS to select relevant variables for response 3. Cross-validation is performed over the observations on each a) frequency window b) series c) person-event combination 4. Three variation of dataset is used ◦ Transpose of each frequency windows stacked together ◦ The average frequencies over time for each Series ◦ The average frequencies over time for each person-event combination 5. LDA model is used for discriminant analysis using the scores obtained form each of the latent variable model with cross-validation implemented
  • 13. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s)
  • 14. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s) Each block represent a series divided into several windows (rows), 128 columns each with 16 overlaps. The cell contains the frequency values obtained from fast fourier transform
  • 15. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s) Each person may have involved into multiple activities which may have replications
  • 16. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s) Set 1: Each windows are stacked in a row to form a big matrix (may suffer from repeated measurement). This contains different parts of same series in various rows.
  • 17. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s) Set 2: Each series are averaged over different time points. Each row corresponds to one series.
  • 18. How data looks like 0 200 400 600 0 100 200 300 0 100 200 300 400 0 100 200 300 0 100 200 300 0 100 200 300 400 500 Person1Person1Person1Person1Person1Person3 P11P12P127P129P131P23 GymGymSaunaSaunaSaunaSauna 0.00 0.25 0.50 0.75 1.00 Frequency Window (dB) Time(s) Set 3: A person can have multiple series of same activity (replication), the third set is averaged over each person-event conbination. In this case each row corresponds to some specific event for some specific person
  • 19. Cross-Validation j cvTest 1, . . . j − 1, j + 1, . . . , 10 cvTrain
  • 20. Cross-Validation j cvTest 1, . . . j − 1, j + 1, . . . , 10 cvTrain MVR models (PCR, PLS, CPPLS)
  • 21. Cross-Validation j cvTest 1, . . . j − 1, j + 1, . . . , 10 cvTrain MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128 Scores
  • 22. Cross-Validation j cvTest 1, . . . j − 1, j + 1, . . . , 10 cvTrain MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128 Scores y categories, X Scores from 1 to i LDA Model: y f (X) Loop over each additional scores
  • 23. Cross-Validation j cvTest 1, . . . j − 1, j + 1, . . . , 10 cvTrain MVR models (PCR, PLS, CPPLS) 1, 2, . . . , 128 Scores y categories, X Scores from 1 to i LDA Model: y f (X) Loop over each additional scores Error (cvTest and Train) For ith Scores and jth cv split Cross-validation Loop
  • 25. Training and Cross-validation Errors PLS PCR CPLS Comp: 4 Error: 0.89 Comp: 79 Error: 0.35 Comp: 56 Error: 0.82 Comp: 128 Error: 0.3 Comp: 29 Error: 0.81 Comp: 70 Error: 0.35 0.84 0.88 0.92 0.3 0.4 0.5 0.6 0.7 0.8 0.9 cvtesttrain 0 50 100 0 50 100 0 50 100 Components MisclassificationError
  • 26. Misclassifications Training Misclassifications PLS PCR CPLS 124 154 151 99 890 240 4 1 890 37 7 251 28 153 95 226 1948 420 60 50 291 1271 698 74 155 123 100 150 872 20 4 4 1 79 10 17 27 214 65 161 87 264 1910 37 5 4 2 3 2 203 1359 574 198 148 130 86 164 84 5 19 5 1 4 75 14 13 31 196 83 158 90 258 1916 34 8 1 5 1 4 194 1368 565 207 Light Jogging Walking Walking In Stairs Alcohol Cold Shower Cycling Gym Gym Restitution Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 27. Misclassifications Test Misclassification PLS PCR CPLS 8 27 36 17 21 370 9 26 36 9 29 34 3 11 24 36 9 29 32 5 Light Jogging Resting Walking Walking In Stairs 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 28. Plotting Scores Scoreplot for PCR model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −4 −2 0 2 −30 −20 −10 0 10 Comp1 (59.39%) Comp2(7.38%) Model:PCR −4 −2 0 2 4 6 −4 −2 0 2 Comp2 (7.38%) Comp3(4.31%) Model:PCR
  • 29. Plotting Scores Scoreplot for PLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −4 −2 0 2 −20 −10 0 10 Comp1 (59.27%) Comp2(7.34%) Model:PLS −4 −2 0 2 −4 −2 0 2 Comp2 (7.34%) Comp3(2.31%) Model:PLS
  • 30. Plotting Scores Scoreplot for CPPLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −3 −2 −1 0 1 2 −4 −2 0 2 Comp1 (8.13%) Comp2(20.65%) Model:CPPLS −2 0 2 4 6 −3 −2 −1 0 1 2 Comp2 (20.65%) Comp3(3.89%) Model:CPPLS
  • 31. Classification with series aver- aged over Series repetition
  • 32. Training and Cross-validation Errors PLS PCR CPLS Comp: 15 Error: 0.98 Comp: 114 Error: 0.01 Comp: 128 Error: 0.99 Comp: 127 Error: 0.02 Comp: 68 Error: 0.99 Comp: 99 Error: 0.01 0.985 0.990 0.995 1.000 0.05 0.10 0.15 0.20 cvtesttrain 0 50 100 0 50 100 0 50 100 Components MisclassificationError
  • 33. Misclassifications Training Misclassifications PLS PCR CPLS 1 3 1 3 30 0 3 0 1 0 1 0 3 0 1 0 1 3 174 2 4 0 1 0 2 0 1 8 12 1 3 1 3 0 3 0 3 0 1 0 1 0 3 0 1 0 1 0 177 0 6 0 1 0 2 0 1 2 18 1 3 1 3 0 3 0 3 0 1 0 1 0 3 0 1 0 1 0 177 0 6 0 1 0 2 0 1 3 17 Light Jogging Walking Walking In Stairs Alcohol Cold Shower Cycling Gym Gym Restitution Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 34. Misclassifications Test Misclassification PLS PCR CPLS 0 2 2 0 2 20 0 2 2 0 2 1 1 0 2 2 0 2 1 1 Light Jogging Resting Walking Walking In Stairs 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 35. Plotting Scores Scoreplot for PCR model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −6 −4 −2 0 −10 −5 0 5 10 Comp1 (74.44%) Comp2(9.41%) Model:PCR −3 −2 −1 0 1 2 −6 −4 −2 0 Comp2 (9.41%) Comp3(3.83%) Model:PCR
  • 36. Plotting Scores Scoreplot for PLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −6 −4 −2 0 2 −5 0 5 Comp1 (63.41%) Comp2(20.2%) Model:PLS −2 0 2 −6 −4 −2 0 2 Comp2 (20.2%) Comp3(4.03%) Model:PLS
  • 37. Plotting Scores Scoreplot for CPPLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −1 0 1 −4 −3 −2 −1 0 1 Comp1 (15.39%) Comp2(1.79%) Model:CPPLS −2 −1 0 1 −1 0 1 Comp2 (1.79%) Comp3(2.21%) Model:CPPLS
  • 38. Classification with series aver- aged over Person-Event Combina- tion
  • 39. Training and Cross-validation Errors PLS PCR CPLS Comp: 1 Error: 1 Comp: 5 Error: 0.26 Comp: 1 Error: 1 Comp: 5 Error: 0.38 Comp: 1 Error: 1 Comp: 1 Error: 0.87 0.50 0.75 1.00 1.25 1.50 0.4 0.6 0.8 cvtesttrain 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Components MisclassificationError
  • 40. Misclassifications Training Misclassifications PLS PCR CPLS 20 20 30 0 1 0 1 10 10 10 10 10 1 2 10 0 1 10 1 3 20 20 30 0 1 0 1 10 10 10 10 10 1 2 10 0 1 10 1 3 20 20 0 3 10 10 10 10 10 10 10 30 10 10 10 40 Light Jogging Walking Walking In Stairs Alcohol Cold Shower Cycling Gym Gym Restitution Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 41. Misclassifications Test Misclassification PLS PCR CPLS 20 2 20 20 20 2 20 20 20 2 20 0 2 Light Jogging Resting Walking Walking In Stairs 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 Correct Predictions FALSE TRUE
  • 42. Plotting Scores Scoreplot for PCR model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −4 −2 0 2 −10 −5 0 5 Comp1 (72.28%) Comp2(13.79%) Model:PCR −2 0 2 −4 −2 0 2 Comp2 (13.79%) Comp3(7.15%) Model:PCR
  • 43. Plotting Scores Scoreplot for PLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −4 −2 0 2 −5 0 5 10 Comp1 (72.27%) Comp2(13.72%) Model:PLS −2 0 2 −4 −2 0 2 Comp2 (13.72%) Comp3(7.12%) Model:PLS
  • 44. Plotting Scores Scoreplot for CPPLS model event Alcohol Cold Shower Cycling Gym Gym Restitution Light Jogging Morning Before Exercise Morning Test Sauna Sauna Ice Bath Shower Sleeping Unknown Walking Walking In Stairs −0.4 −0.2 0.0 0.2 0.4 −0.4 −0.2 0.0 0.2 Comp1 (2.32%) Comp2(2.57%) Model:CPPLS −0.25 0.00 0.25 0.50 −0.4 −0.2 0.0 0.2 0.4 Comp2 (2.57%) Comp3(0.26%) Model:CPPLS
  • 46. Misclassification Errors Set 1 Set 2 Set 3 29 56 4 68 128 15 1 1 1 0.0175 0.0219 0.0789 0.3124 0.3276 0.3733 0.6667 0.875 0.9315 0.9521 0.9932 1 CPLS PCR PLS CPLS PCR PLS CPLS PCR PLS Model MisclassificationError Test Train Figure: Training and Test Misclassification Error for all the three models. The LDA models were fitted with the scores obtained from three models with components (number above each points) needed to get minimum cross-validation error.
  • 47. References [1] M Dowle et al. data.table: Extension of Data.frame. R package version 1.9.6. 2015. url: http://CRAN.R-project.org/package=data.table. [2] Ulf G Indahl, Kristian Hovde Liland, and Tormod Næs. “Canonical partial least squares—a unified PLS approach to classification and regression problems”. In: Journal of Chemometrics 23.9 (2009), pp. 495–504. [3] Uwe Ligges, Tom Short, and Paul Kienzle. signal: Signal Processing. R package version 0.7-6. 2015. url: http://CRAN.R-project.org/package=signal. [4] Harald Martens and Magni Martens. Multivariate analysis of quality: an introduction. John Wiley & Sons, 2001. [5] Harald Martens and Tormod Naes. Multivariate calibration. John Wiley & Sons, 1992. [6] Hadley Wickham. “ggplot: An Implementation of the Grammar of Graphics”. In: R package version 0.4. 0 (2006). [7] Hadley Wickham. “reshape2: Flexibly reshape data: a reboot of the reshape package”. In: R package version 1.2 (2012). [8] Yihui Xie. knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.11. 2015. url: http://CRAN.R-project.org/package=knitr.