Constrained Support Vector Quantile Regression for Conditional Quantile Estimation

Constrained Support Vector Quantile Regression
for Conditional Quantile Estimation
By Kostas Hatalis
hatalis@gmail.com
Dept. of Electrical & Computer Engineering
Lehigh University, Bethlehem, PA
2016
Kostas Hatalis Recurrent Neural Network 2016 1 / 19

Challenges in nonparametric probabilistic forecasting
Overall there are several challenges to consider when developing a
forecasting method:
Quantile cross-over problem
Multi-step forecasting
Rolling forecasting
Handle multidimensional features

Support Vector Machines
To address these challenges I developed a support vector machine (SVM)
formulation for forecasting
min
w
λ w 2
+
1
N
N
i=1
L(yi , f (xi , w)) (1)
where L(yi , f (xi , w)) is a loss function. Why SVM?
good method to start before studying neural networks
robust to outliers
prevents over-ﬁtting by regularization
handles multidimensional features
support kernels for nonlinear modeling

Support Vector Machines

Objective Function
An SVM classifier amounts to minimizing the hinge loss function:
min
1
n
n
i=1
max (0, 1 − yi (w · xi + b)) + λ w 2
(2)
Where the parameter λ determines the tradeoff between increasing the
margin-size and ensuring that the xi lie on the correct side of the margin.
we can rewrite the optimization problem as a differentiable objective
function:
min
w
1
n
n
i=1
ζi + λ w 2
(3)
subject to
yi (xi · w + b) ≥ 1 − ζi
ζi ≥ 0, for all i.

SVQR - Support Vector Regression
Support vector regression:
min
w,b
1
2
w 2
+ C
N
i=1
(ξ−
i + ξ+
i ) (4)
subject to



yi − f (xi ) ≤ ε + ξ−
i ∀i
f (xi ) − yi ≤ ε + ξ+
i ∀i
ξ−
i , ξ+
i ≥ 0 ∀i
The constant C > 0 determines the trade oﬀ between the ﬂatness of f and
the amount up to which deviations larger than ε are tolerated.

SVQR - Nonlinear Quantile Regression
Nonlinear Quantile Regression (NQR) projects an input vector x into a
potentially higher dimensional feature space F using a nonlinear mapping
function φ(·)
Qy (τ|x) = fτ (x) = wτ φ(x)
where Qy (τ|x) is the τ-th quantile of the distribution of y conditional on
the values of x, wτ is a vector of parameters.

SVQR - Primal
To solve the NQR problem it can be expressed as a support vector
regression formulation with non-crossing constraints
min
w,ξ−,ξ+
M
m=1
1
2
wm
2
+ C
N
i=1
(τmξ+
mi + (1 − τm)ξ−
mi )
s.t.



yi − wm φ(xi ) − ξ+
mi ≤ 0, ∀m, ∀i
−yi + wm φ(xi ) − ξ−
mi ≤ 0, ∀m, ∀i
ξ−
mi , ξ+
mi ≥ 0, ∀m, ∀i
wm φ(xi ) − wm+1φ(xi ) ≤ 0, ∀m, ∀i

SVQR - Dual
Not easy to solve so I form the Lagrangian dual problem which is
conveniently a quadratic programming problem
min
α+,α−,λ
M
m=1

−
1
2
N
i=1
N
j=1
(α+
mi − α−
mi )(α+
mj − α−
mj )...
K(xi , xj ) +
N
i=1
(α+
mi − α−
mi )yi
−
1
2
N
i=1
N
i=j
(λmi − λm−1i )(λmj − λm−1j )K(xi , xj )
+
N
i=1
N
i=j
(α+
mi − α−
mi )(λmj − λm−1j )K(xi , xj )


subject to



λmi ≥ 0, ∀m∀i
α+
mi ∈ [0, τmC], ∀m∀i
α−
mi ∈ [0, (1 − τm)C], ∀m∀i

SVQR - Quantile Estimation
From this dual formulation the conditional quantile τm can then be given
by
Qy (τm|x) = fτm (x) =
N
i=1
(α+
mi − α−
mi )K(x, xi )
−
N
i=1
(λmi − λm−1i )K(x, xi )

SVQR - Kernel and Optimization
Given two samples x and x which are represented as feature vectors, the
radial basis function (RBF) kernel is calculated as
K(x, x ) = φ(x) φ(x ) = exp −
||x − x ||2
2σ2
In order to quickly solve for conditional quantile estimates sequential
minimization optimization is applied. Previously I tried the interior-point
convex method (common for QP) but was very slow.

SVQR - Wind Features and Data Selection
Case Study: rolling forecasts using GEFCom2014 data.
Testing data: Months of June 2013 to August 2013
Training data: March 2013 to July 2013.
1 Raw wind speeds at 10m and 100m for U and V directions.
2 Derived wind speeds at 10m and 100m.
3 Derived wind direction at 10m and 100m.
4 Derived wind energy at 10m and 100m.
5 Wind shear.
6 Wind energy diﬀerence.
7 Wind direction diﬀerence.
All features were normalized between 0 and 1.

SVQR - Benchmarks
Three naive models are commonly used for benchmarking in probabilistic
wind forecasting applications
Persistence distribution: formed by the most recent observations
such as the past 24 hours of wind power.
Climatology distribution: based on all available past wind power
observations.
Uniform distribution: assumes all wind power values at each time
step occur with equal probability.

SVQR - Results

Future Work: Deep SVQR
What is Deep SVQR?
Lot of research interest in deep kernel learning and neural network
hybrid SVMs.
To further enhance SVQR performance by better feature selection
A combination of neural networks for feature learning and SVQR to
forecast quantiles.
Optimization of the neural networks is directly linked to the
optimization of the SVQR objective.

Future Work: Deep SVQR

Smooth Pinball Function
The new objective function to minimize for smooth quantile regression is
Φα(w) =
1
n
n
i=1
Sτ.α(yi − x w)
The gradient vector of the above is
Φα(w) =
1
n
n
i=1
1
1 + exp yi −xi w
α
− τ xi
For a training iteration m, gradient descent can be applied as
w(m)
= w(m−1)
− ηΦα(w(m−1)
)
where η is the learning rate.

Unconstrained Smooth -Insensitive SVQR
min
w
M
m=1
1
2
wm Kwm + C
N
i=1
τmui + α log 1 + exp(−
ui
α
)
where ui = yi − Ki wm − and α > 0. Quantile estimates are then given
by
fτ (x) =
N
i=1
wi,τ k(xi , x)
No need for dual, primal can be solved with gradient methods.

Work Done
Probabilistic Forecasting by Support Vector Machines
[1] Hatalis, Kostas, et al. ”Constrained Support Vector Quantile
Regression for Nonparametric Probabilistic Prediction of Wind Power.”
AAAI Conference on Artiﬁcial Intelligence, Workshop on AI for Smart
Grids and Smart Buildings, July 2017.

Constrained Support Vector Quantile Regression for Conditional Quantile Estimation

More Related Content

What's hot (20)

Similar to Constrained Support Vector Quantile Regression for Conditional Quantile Estimation (20)

Recently uploaded (20)

Constrained Support Vector Quantile Regression for Conditional Quantile Estimation