SlideShare a Scribd company logo
Constrained Support Vector Quantile Regression
for Conditional Quantile Estimation
By Kostas Hatalis
hatalis@gmail.com
Dept. of Electrical & Computer Engineering
Lehigh University, Bethlehem, PA
2016
Kostas Hatalis Recurrent Neural Network 2016 1 / 19
Challenges in nonparametric probabilistic forecasting
Overall there are several challenges to consider when developing a
forecasting method:
Quantile cross-over problem
Multi-step forecasting
Rolling forecasting
Handle multidimensional features
Kostas Hatalis Recurrent Neural Network 2016 2 / 19
Support Vector Machines
To address these challenges I developed a support vector machine (SVM)
formulation for forecasting
min
w
λ w 2
+
1
N
N
i=1
L(yi , f (xi , w)) (1)
where L(yi , f (xi , w)) is a loss function. Why SVM?
good method to start before studying neural networks
robust to outliers
prevents over-fitting by regularization
handles multidimensional features
support kernels for nonlinear modeling
Kostas Hatalis Recurrent Neural Network 2016 3 / 19
Support Vector Machines
Kostas Hatalis Recurrent Neural Network 2016 4 / 19
Objective Function
An SVM classifier amounts to minimizing the hinge loss function:
min
1
n
n
i=1
max (0, 1 − yi (w · xi + b)) + λ w 2
(2)
Where the parameter λ determines the tradeoff between increasing the
margin-size and ensuring that the xi lie on the correct side of the margin.
we can rewrite the optimization problem as a differentiable objective
function:
min
w
1
n
n
i=1
ζi + λ w 2
(3)
subject to
yi (xi · w + b) ≥ 1 − ζi
ζi ≥ 0, for all i.
Kostas Hatalis Recurrent Neural Network 2016 5 / 19
SVQR - Support Vector Regression
Support vector regression:
min
w,b
1
2
w 2
+ C
N
i=1
(ξ−
i + ξ+
i ) (4)
subject to



yi − f (xi ) ≤ ε + ξ−
i ∀i
f (xi ) − yi ≤ ε + ξ+
i ∀i
ξ−
i , ξ+
i ≥ 0 ∀i
The constant C > 0 determines the trade off between the flatness of f and
the amount up to which deviations larger than ε are tolerated.
Kostas Hatalis Recurrent Neural Network 2016 6 / 19
SVQR - Nonlinear Quantile Regression
Nonlinear Quantile Regression (NQR) projects an input vector x into a
potentially higher dimensional feature space F using a nonlinear mapping
function φ(·)
Qy (τ|x) = fτ (x) = wτ φ(x)
where Qy (τ|x) is the τ-th quantile of the distribution of y conditional on
the values of x, wτ is a vector of parameters.
Kostas Hatalis Recurrent Neural Network 2016 7 / 19
SVQR - Primal
To solve the NQR problem it can be expressed as a support vector
regression formulation with non-crossing constraints
min
w,ξ−,ξ+
M
m=1
1
2
wm
2
+ C
N
i=1
(τmξ+
mi + (1 − τm)ξ−
mi )
s.t.



yi − wm φ(xi ) − ξ+
mi ≤ 0, ∀m, ∀i
−yi + wm φ(xi ) − ξ−
mi ≤ 0, ∀m, ∀i
ξ−
mi , ξ+
mi ≥ 0, ∀m, ∀i
wm φ(xi ) − wm+1φ(xi ) ≤ 0, ∀m, ∀i
Kostas Hatalis Recurrent Neural Network 2016 8 / 19
SVQR - Dual
Not easy to solve so I form the Lagrangian dual problem which is
conveniently a quadratic programming problem
min
α+,α−,λ
M
m=1

−
1
2
N
i=1
N
j=1
(α+
mi − α−
mi )(α+
mj − α−
mj )...
K(xi , xj ) +
N
i=1
(α+
mi − α−
mi )yi
−
1
2
N
i=1
N
i=j
(λmi − λm−1i )(λmj − λm−1j )K(xi , xj )
+
N
i=1
N
i=j
(α+
mi − α−
mi )(λmj − λm−1j )K(xi , xj )


subject to



λmi ≥ 0, ∀m∀i
α+
mi ∈ [0, τmC], ∀m∀i
α−
mi ∈ [0, (1 − τm)C], ∀m∀i
Kostas Hatalis Recurrent Neural Network 2016 9 / 19
SVQR - Quantile Estimation
From this dual formulation the conditional quantile τm can then be given
by
Qy (τm|x) = fτm (x) =
N
i=1
(α+
mi − α−
mi )K(x, xi )
−
N
i=1
(λmi − λm−1i )K(x, xi )
Kostas Hatalis Recurrent Neural Network 2016 10 / 19
SVQR - Kernel and Optimization
Given two samples x and x which are represented as feature vectors, the
radial basis function (RBF) kernel is calculated as
K(x, x ) = φ(x) φ(x ) = exp −
||x − x ||2
2σ2
In order to quickly solve for conditional quantile estimates sequential
minimization optimization is applied. Previously I tried the interior-point
convex method (common for QP) but was very slow.
Kostas Hatalis Recurrent Neural Network 2016 11 / 19
SVQR - Wind Features and Data Selection
Case Study: rolling forecasts using GEFCom2014 data.
Testing data: Months of June 2013 to August 2013
Training data: March 2013 to July 2013.
1 Raw wind speeds at 10m and 100m for U and V directions.
2 Derived wind speeds at 10m and 100m.
3 Derived wind direction at 10m and 100m.
4 Derived wind energy at 10m and 100m.
5 Wind shear.
6 Wind energy difference.
7 Wind direction difference.
All features were normalized between 0 and 1.
Kostas Hatalis Recurrent Neural Network 2016 12 / 19
SVQR - Benchmarks
Three naive models are commonly used for benchmarking in probabilistic
wind forecasting applications
Persistence distribution: formed by the most recent observations
such as the past 24 hours of wind power.
Climatology distribution: based on all available past wind power
observations.
Uniform distribution: assumes all wind power values at each time
step occur with equal probability.
Kostas Hatalis Recurrent Neural Network 2016 13 / 19
SVQR - Results
Kostas Hatalis Recurrent Neural Network 2016 14 / 19
Future Work: Deep SVQR
What is Deep SVQR?
Lot of research interest in deep kernel learning and neural network
hybrid SVMs.
To further enhance SVQR performance by better feature selection
A combination of neural networks for feature learning and SVQR to
forecast quantiles.
Optimization of the neural networks is directly linked to the
optimization of the SVQR objective.
Kostas Hatalis Recurrent Neural Network 2016 15 / 19
Future Work: Deep SVQR
Kostas Hatalis Recurrent Neural Network 2016 16 / 19
Smooth Pinball Function
The new objective function to minimize for smooth quantile regression is
Φα(w) =
1
n
n
i=1
Sτ.α(yi − x w)
The gradient vector of the above is
Φα(w) =
1
n
n
i=1
1
1 + exp yi −xi w
α
− τ xi
For a training iteration m, gradient descent can be applied as
w(m)
= w(m−1)
− ηΦα(w(m−1)
)
where η is the learning rate.
Kostas Hatalis Recurrent Neural Network 2016 17 / 19
Unconstrained Smooth -Insensitive SVQR
min
w
M
m=1
1
2
wm Kwm + C
N
i=1
τmui + α log 1 + exp(−
ui
α
)
where ui = yi − Ki wm − and α > 0. Quantile estimates are then given
by
fτ (x) =
N
i=1
wi,τ k(xi , x)
No need for dual, primal can be solved with gradient methods.
Kostas Hatalis Recurrent Neural Network 2016 18 / 19
Work Done
Probabilistic Forecasting by Support Vector Machines
[1] Hatalis, Kostas, et al. ”Constrained Support Vector Quantile
Regression for Nonparametric Probabilistic Prediction of Wind Power.”
AAAI Conference on Artificial Intelligence, Workshop on AI for Smart
Grids and Smart Buildings, July 2017.
Kostas Hatalis Recurrent Neural Network 2016 19 / 19

More Related Content

PDF
20191123 bayes dl-jp
PPTX
Bayesian Neural Networks
PDF
010_20160216_Variational Gaussian Process
PDF
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
PDF
Principal component analysis and matrix factorizations for learning (part 1) ...
PDF
From RNN to neural networks for cyclic undirected graphs
PDF
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
PDF
Information-theoretic clustering with applications
20191123 bayes dl-jp
Bayesian Neural Networks
010_20160216_Variational Gaussian Process
"Automatic Variational Inference in Stan" NIPS2015_yomi2016-01-20
Principal component analysis and matrix factorizations for learning (part 1) ...
From RNN to neural networks for cyclic undirected graphs
Using Principal Component Analysis to Remove Correlated Signal from Astronomi...
Information-theoretic clustering with applications

What's hot (20)

PDF
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
PDF
The reversible residual network
PDF
A short and naive introduction to using network in prediction models
PDF
Polynomial Matrix Decompositions
PDF
SPDE presentation 2012
PDF
Estimating Space-Time Covariance from Finite Sample Sets
PDF
Neural Networks: Radial Bases Functions (RBF)
PDF
PDF
CSC446: Pattern Recognition (LN7)
PDF
Random Matrix Theory in Array Signal Processing: Application Examples
PDF
EFFINET - Initial Presentation
PDF
(研究会輪読) Weight Uncertainty in Neural Networks
PDF
Bayesian Core: Chapter 8
PDF
Convolutional networks and graph networks through kernels
PDF
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
PDF
Machine learning in science and industry — day 3
PDF
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
PDF
Machine learning in science and industry — day 1
PDF
K-means, EM and Mixture models
PDF
About functional SIR
What you can do with a tall-and-skinny QR factorization in Hadoop: Principal ...
The reversible residual network
A short and naive introduction to using network in prediction models
Polynomial Matrix Decompositions
SPDE presentation 2012
Estimating Space-Time Covariance from Finite Sample Sets
Neural Networks: Radial Bases Functions (RBF)
CSC446: Pattern Recognition (LN7)
Random Matrix Theory in Array Signal Processing: Application Examples
EFFINET - Initial Presentation
(研究会輪読) Weight Uncertainty in Neural Networks
Bayesian Core: Chapter 8
Convolutional networks and graph networks through kernels
(DL輪読)Variational Dropout Sparsifies Deep Neural Networks
Machine learning in science and industry — day 3
Program on Mathematical and Statistical Methods for Climate and the Earth Sys...
Machine learning in science and industry — day 1
K-means, EM and Mixture models
About functional SIR
Ad

Similar to Constrained Support Vector Quantile Regression for Conditional Quantile Estimation (20)

PDF
MVPA with SpaceNet: sparse structured priors
PPTX
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
PPTX
Principal component analysis
PDF
MCMC and likelihood-free methods
PDF
A nonlinear approximation of the Bayesian Update formula
PDF
Neural Networks: Principal Component Analysis (PCA)
PDF
Neural Networks: Support Vector machines
PPT
linear SVM.ppt
PPT
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
PDF
PPTX
The world of loss function
PPTX
Anomaly detection using deep one class classifier
PDF
More on randomization semi-definite programming and derandomization
PDF
Pca ppt
PPT
4.Support Vector Machines.ppt machine learning and development
PDF
Smooth Pinball based Quantile Neural Network
PPT
Support Vector machine in Machine learning course
PDF
Clustering:k-means, expect-maximization and gaussian mixture model
PDF
Talk iccf 19_ben_hammouda
PDF
MLHEP Lectures - day 2, basic track
MVPA with SpaceNet: sparse structured priors
principalcomponentanalysis-150314161616-conversion-gate01 (1).pptx
Principal component analysis
MCMC and likelihood-free methods
A nonlinear approximation of the Bayesian Update formula
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Support Vector machines
linear SVM.ppt
PERFORMANCE EVALUATION PARAMETERS FOR MACHINE LEARNING
The world of loss function
Anomaly detection using deep one class classifier
More on randomization semi-definite programming and derandomization
Pca ppt
4.Support Vector Machines.ppt machine learning and development
Smooth Pinball based Quantile Neural Network
Support Vector machine in Machine learning course
Clustering:k-means, expect-maximization and gaussian mixture model
Talk iccf 19_ben_hammouda
MLHEP Lectures - day 2, basic track
Ad

Recently uploaded (20)

PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Lecture1 pattern recognition............
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Computer network topology notes for revision
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PDF
Clinical guidelines as a resource for EBP(1).pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
.pdf is not working space design for the following data for the following dat...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
Miokarditis (Inflamasi pada Otot Jantung)
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
climate analysis of Dhaka ,Banglades.pptx
Computer network topology notes for revision
Data_Analytics_and_PowerBI_Presentation.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
Clinical guidelines as a resource for EBP(1).pdf

Constrained Support Vector Quantile Regression for Conditional Quantile Estimation

  • 1. Constrained Support Vector Quantile Regression for Conditional Quantile Estimation By Kostas Hatalis hatalis@gmail.com Dept. of Electrical & Computer Engineering Lehigh University, Bethlehem, PA 2016 Kostas Hatalis Recurrent Neural Network 2016 1 / 19
  • 2. Challenges in nonparametric probabilistic forecasting Overall there are several challenges to consider when developing a forecasting method: Quantile cross-over problem Multi-step forecasting Rolling forecasting Handle multidimensional features Kostas Hatalis Recurrent Neural Network 2016 2 / 19
  • 3. Support Vector Machines To address these challenges I developed a support vector machine (SVM) formulation for forecasting min w λ w 2 + 1 N N i=1 L(yi , f (xi , w)) (1) where L(yi , f (xi , w)) is a loss function. Why SVM? good method to start before studying neural networks robust to outliers prevents over-fitting by regularization handles multidimensional features support kernels for nonlinear modeling Kostas Hatalis Recurrent Neural Network 2016 3 / 19
  • 4. Support Vector Machines Kostas Hatalis Recurrent Neural Network 2016 4 / 19
  • 5. Objective Function An SVM classifier amounts to minimizing the hinge loss function: min 1 n n i=1 max (0, 1 − yi (w · xi + b)) + λ w 2 (2) Where the parameter λ determines the tradeoff between increasing the margin-size and ensuring that the xi lie on the correct side of the margin. we can rewrite the optimization problem as a differentiable objective function: min w 1 n n i=1 ζi + λ w 2 (3) subject to yi (xi · w + b) ≥ 1 − ζi ζi ≥ 0, for all i. Kostas Hatalis Recurrent Neural Network 2016 5 / 19
  • 6. SVQR - Support Vector Regression Support vector regression: min w,b 1 2 w 2 + C N i=1 (ξ− i + ξ+ i ) (4) subject to    yi − f (xi ) ≤ ε + ξ− i ∀i f (xi ) − yi ≤ ε + ξ+ i ∀i ξ− i , ξ+ i ≥ 0 ∀i The constant C > 0 determines the trade off between the flatness of f and the amount up to which deviations larger than ε are tolerated. Kostas Hatalis Recurrent Neural Network 2016 6 / 19
  • 7. SVQR - Nonlinear Quantile Regression Nonlinear Quantile Regression (NQR) projects an input vector x into a potentially higher dimensional feature space F using a nonlinear mapping function φ(·) Qy (τ|x) = fτ (x) = wτ φ(x) where Qy (τ|x) is the τ-th quantile of the distribution of y conditional on the values of x, wτ is a vector of parameters. Kostas Hatalis Recurrent Neural Network 2016 7 / 19
  • 8. SVQR - Primal To solve the NQR problem it can be expressed as a support vector regression formulation with non-crossing constraints min w,ξ−,ξ+ M m=1 1 2 wm 2 + C N i=1 (τmξ+ mi + (1 − τm)ξ− mi ) s.t.    yi − wm φ(xi ) − ξ+ mi ≤ 0, ∀m, ∀i −yi + wm φ(xi ) − ξ− mi ≤ 0, ∀m, ∀i ξ− mi , ξ+ mi ≥ 0, ∀m, ∀i wm φ(xi ) − wm+1φ(xi ) ≤ 0, ∀m, ∀i Kostas Hatalis Recurrent Neural Network 2016 8 / 19
  • 9. SVQR - Dual Not easy to solve so I form the Lagrangian dual problem which is conveniently a quadratic programming problem min α+,α−,λ M m=1  − 1 2 N i=1 N j=1 (α+ mi − α− mi )(α+ mj − α− mj )... K(xi , xj ) + N i=1 (α+ mi − α− mi )yi − 1 2 N i=1 N i=j (λmi − λm−1i )(λmj − λm−1j )K(xi , xj ) + N i=1 N i=j (α+ mi − α− mi )(λmj − λm−1j )K(xi , xj )   subject to    λmi ≥ 0, ∀m∀i α+ mi ∈ [0, τmC], ∀m∀i α− mi ∈ [0, (1 − τm)C], ∀m∀i Kostas Hatalis Recurrent Neural Network 2016 9 / 19
  • 10. SVQR - Quantile Estimation From this dual formulation the conditional quantile τm can then be given by Qy (τm|x) = fτm (x) = N i=1 (α+ mi − α− mi )K(x, xi ) − N i=1 (λmi − λm−1i )K(x, xi ) Kostas Hatalis Recurrent Neural Network 2016 10 / 19
  • 11. SVQR - Kernel and Optimization Given two samples x and x which are represented as feature vectors, the radial basis function (RBF) kernel is calculated as K(x, x ) = φ(x) φ(x ) = exp − ||x − x ||2 2σ2 In order to quickly solve for conditional quantile estimates sequential minimization optimization is applied. Previously I tried the interior-point convex method (common for QP) but was very slow. Kostas Hatalis Recurrent Neural Network 2016 11 / 19
  • 12. SVQR - Wind Features and Data Selection Case Study: rolling forecasts using GEFCom2014 data. Testing data: Months of June 2013 to August 2013 Training data: March 2013 to July 2013. 1 Raw wind speeds at 10m and 100m for U and V directions. 2 Derived wind speeds at 10m and 100m. 3 Derived wind direction at 10m and 100m. 4 Derived wind energy at 10m and 100m. 5 Wind shear. 6 Wind energy difference. 7 Wind direction difference. All features were normalized between 0 and 1. Kostas Hatalis Recurrent Neural Network 2016 12 / 19
  • 13. SVQR - Benchmarks Three naive models are commonly used for benchmarking in probabilistic wind forecasting applications Persistence distribution: formed by the most recent observations such as the past 24 hours of wind power. Climatology distribution: based on all available past wind power observations. Uniform distribution: assumes all wind power values at each time step occur with equal probability. Kostas Hatalis Recurrent Neural Network 2016 13 / 19
  • 14. SVQR - Results Kostas Hatalis Recurrent Neural Network 2016 14 / 19
  • 15. Future Work: Deep SVQR What is Deep SVQR? Lot of research interest in deep kernel learning and neural network hybrid SVMs. To further enhance SVQR performance by better feature selection A combination of neural networks for feature learning and SVQR to forecast quantiles. Optimization of the neural networks is directly linked to the optimization of the SVQR objective. Kostas Hatalis Recurrent Neural Network 2016 15 / 19
  • 16. Future Work: Deep SVQR Kostas Hatalis Recurrent Neural Network 2016 16 / 19
  • 17. Smooth Pinball Function The new objective function to minimize for smooth quantile regression is Φα(w) = 1 n n i=1 Sτ.α(yi − x w) The gradient vector of the above is Φα(w) = 1 n n i=1 1 1 + exp yi −xi w α − τ xi For a training iteration m, gradient descent can be applied as w(m) = w(m−1) − ηΦα(w(m−1) ) where η is the learning rate. Kostas Hatalis Recurrent Neural Network 2016 17 / 19
  • 18. Unconstrained Smooth -Insensitive SVQR min w M m=1 1 2 wm Kwm + C N i=1 τmui + α log 1 + exp(− ui α ) where ui = yi − Ki wm − and α > 0. Quantile estimates are then given by fτ (x) = N i=1 wi,τ k(xi , x) No need for dual, primal can be solved with gradient methods. Kostas Hatalis Recurrent Neural Network 2016 18 / 19
  • 19. Work Done Probabilistic Forecasting by Support Vector Machines [1] Hatalis, Kostas, et al. ”Constrained Support Vector Quantile Regression for Nonparametric Probabilistic Prediction of Wind Power.” AAAI Conference on Artificial Intelligence, Workshop on AI for Smart Grids and Smart Buildings, July 2017. Kostas Hatalis Recurrent Neural Network 2016 19 / 19