SlideShare a Scribd company logo
Dynamic Programming Control
for Smart Home
Xuejiao HAN
Master's_Thesis_XuejiaoHAN
Institute for Data Processing
Technische Universität München
Master’s thesis
Dynamic Programming Control for Smart
Home
Xuejiao HAN
September 23, 2015
Xuejiao HAN. Dynamic Programming Control for Smart Home. Master’s thesis, Technis-
che Universität München, Munich, Germany, 2015.
Supervised by Prof. Dr.-Ing. K. Diepold and Johannes Feldmaier / Dominik Meyer; submit-
ted on September 23, 2015 to the Department of Electrical Engineering and Information
Technology of the Technische Universität München.
c 2015 Xuejiao HAN
Institute for Data Processing, Technische Universität München, 80290 München, Germany,
http://www.ldv.ei.tum.de.
This work is licenced under the Creative Commons Attribution 3.0 Germany License. To
view a copy of this licence, visit http://creativecommons.org/licenses/by/3.0/de/ or send
a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California
94105, USA.
Preface
It brings me great pleasure to thank the people who helped to make this thesis possible.
I wish to express my sincere thanks to department of electrical engineering and infor-
mation at TU Munich for hosting the master thesis.
I am very grateful to Prof. Klaus Diepold, who gives me this opportunity to step into
the field of data processing. I also want to thank my supervisors, Dominik Meyer and
Johannes Feldmaier, for their supervision, support and valuable advices in many regards
in elaborating this interesting thesis.
I would like to thank my parents, my friends for their love and support. A special thanks
also goes to my friend Ke Wang, who spent a lot of time discussing the thesis with me and
helped me a lot within this project. I cannot finish this thesis without their sacrifices and
contributions.
Xuejiao HAN
September 23, 2015
3
Master's_Thesis_XuejiaoHAN
Contents
1. Introduction 11
1.1. Photovoltaic Generation in Germany . . . . . . . . . . . . . . . . . . . . . 11
1.2. Economics of Residential PV System . . . . . . . . . . . . . . . . . . . . 12
1.2.1. Feed-in Tariff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.2.2. Cost Calculation of a Sample Residential PV System . . . . . . . . 14
1.3. Energy Demand and Supply . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4. Energy Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5. Structure of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2. Theories 21
2.1. Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2. Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3. Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4. Approximate Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 25
2.4.1. Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2. Method for Approximating Functions . . . . . . . . . . . . . . . . . 26
3. Problem Statement 29
3.1. Electric System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1.1. State of the System . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.2. Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.3. Transition Functions . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.4. Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1.5. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2. Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.1. Princeton Energy Storage Benchmark Datasets . . . . . . . . . . . 32
3.2.2. KNUBIX Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.3. Additional Boundary Conditions . . . . . . . . . . . . . . . . . . . 33
4. Learning Methodology 35
4.1. Overall Implemented Structure . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2. Rule-based Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.1. Without Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.2.2. With Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3. Simple Threshold Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5
Contents
4.4. Linear Programming Formulation . . . . . . . . . . . . . . . . . . . . . . . 40
4.4.1. Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2. Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5. Dynamic Programming Formulation . . . . . . . . . . . . . . . . . . . . . 44
4.5.1. DP Formulation for Deterministic Problems . . . . . . . . . . . . . 50
4.5.2. Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5.3. Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.5.4. DP Formulation for Stochastic Problems . . . . . . . . . . . . . . . 54
4.6. Approximate Dynamic Programming Formulation . . . . . . . . . . . . . . 61
4.6.1. A Linear Lookup Table Approximation . . . . . . . . . . . . . . . . 64
4.6.2. SPAR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5. Evaluation of the Approach 81
6. Conclusions and Future Work 85
A. Appendix 87
6
List of Figures
1.1. Global annual solar irradiance on a horizontal surface in Germany between
1981 and 2010 [DWD] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.2. Munich Monthly PV Production for System Size 5.5 kW DC . . . . . . . . . 13
1.3. Munich Monthly Solar Radiation for System Size 5.5 kW DC . . . . . . . . . 14
1.4. Development of feed-in tariff for small rooftop PV systems under 10 kWp
[IEA-PVPS] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5. Previous development of the feed-in tariff and retail consumer tariff
(Sources: IEA-PVPS, BDEW; retail electricity price: average residential
tariffs for 3-person Household consuming 3,500 kWh of electricity per year) . 16
1.6. Average hourly load profile from Germany for different quarters in 2014
[ENTSO-E] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7. Structure of the average hourly load curve on Weekdays . . . . . . . . . . . 18
1.8. Structure of the average hourly load curve on Weekends . . . . . . . . . . . 19
2.1. The flow of interaction in MDP . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1. Energy flow of the electric system . . . . . . . . . . . . . . . . . . . . . . . 30
3.2. Electricity Price Tariff [SWM, 2015] . . . . . . . . . . . . . . . . . . . . . . 34
3.3. Spot market price on the European Power Exchange (EPEX) on 29.03.2015 34
4.1. Flow chart of the learning methodology . . . . . . . . . . . . . . . . . . . . 36
4.2. Profiles of power flow among PV system without battery, load and grid for
rule-based algorithm for the 29th of March . . . . . . . . . . . . . . . . . . 37
4.3. Profiles of power flow among PV system with battery, load and grid for rule-
based algorithm for the 29th of March . . . . . . . . . . . . . . . . . . . . . 39
4.4. SOC schedule of batteries with rule-based algorithm against PV generation 40
4.5. SOC schedule of batteries with rule-based algorithm against variable elec-
tricity tariffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.6. Profiles of power flow among PV system, load and grid for threshold algo-
rithm with battery for the 29th of March . . . . . . . . . . . . . . . . . . . . 43
4.7. SOC schedule of batteries for threshold algorithm against solar generation . 44
4.8. SOC schedule of batteries for threshold algorithm against variable electricity
tariffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.9. Profiles of power flow among PV system with battery, load and grid for LP
algorithm for the 29th of March under scenario A . . . . . . . . . . . . . . . 46
7
List of Figures
4.10.SOC schedule of batteries for LP algorithm against solar generation under
scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.11.SOC schedule of batteries for LP algorithm against SWM electricity tariff . . 48
4.12.Profiles of power flow among PV system, load and grid for LP algorithm with
battery for the 29th of March under scenario B . . . . . . . . . . . . . . . . 50
4.13.SOC schedule of batteries for LP algorithm against solar generation under
scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.14.SOC schedule of batteries for LP algorithm against EPEX market price . . . 52
4.15.Forward DP algorithm flowchart . . . . . . . . . . . . . . . . . . . . . . . . 53
4.16.Path search in forward DP algorithm . . . . . . . . . . . . . . . . . . . . . 54
4.17.Optimal deterministic forward DP storage algorithm against electricity price
for the system in different cases under scenario A . . . . . . . . . . . . . . 55
4.18.Optimal deterministic forward DP storage algorithm against PV generation
for the system in different cases under scenario A . . . . . . . . . . . . . . 56
4.19.Optimal deterministic forward DP storage algorithm power profile for the sys-
tem in different cases under scenario A . . . . . . . . . . . . . . . . . . . . 57
4.20.Optimal deterministic forward DP storage algorithm against electricity price
for the system in different cases under scenario B . . . . . . . . . . . . . . 58
4.21.Optimal deterministic forward DP storage algorithm against PV generation
for the system in different cases under scenario B . . . . . . . . . . . . . . 59
4.22.Optimal deterministic forward DP storage algorithm power profile for the sys-
tem in different cases under scenario B . . . . . . . . . . . . . . . . . . . . 60
4.23.Results of linear ADP algorithm and sample path from test problem S1 . . . 68
4.24.Approximate path obtained by linear ADP vs. optimal path . . . . . . . . . . 69
4.25.Results of piecewise linear ADP algorithm (a=1) and sample path from test
problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
4.26.Approximate path obtained by piecewise linear ADP (a=1) vs. optimal path . 75
4.27.Results of piecewise linear ADP algorithm (a=10) and sample path from test
problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.28.Approximate path obtained by piecewise linear ADP (a=10) vs. optimal path 77
4.29.Results of piecewise linear ADP algorithm (a=100) and sample path from
test problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.30.Approximate path obtained by piecewise linear ADP (a=10-) vs. optimal path 79
4.31.Objective values for different stepsize rule parameters . . . . . . . . . . . . 79
8
Abstract
The primary purpose of this thesis is to devise near-optimal control policies for a grid
connected residential photovoltaic (PV) system with storage device. It is formulated as
a dynamic multi-period energy storage optimization problem and solved using different
algorithms.
We begin with some simple algorithms like rule-based control, simple threshold control
and linear programming, which do not consider the stochastic characteristics of solar en-
ergy, energy demand and electricity price. However, linear programming (LP) does provide
an optimal result for deterministic problems and serves as a benchmark to compare other
algorithms. Then we implement a dynamic programming (DP) algorithm, which is formed
as a recursive process and proceeds one step at a time. To solve the “curse of dimen-
sionality” problem occurred in dynamic programming, we construct two approximate dy-
namic programming (ADP) algorithms: one using a linear regression model and the other
using a piecewise linear approximation model. Since the accuracy of approximation in ap-
proximate dynamic programming is sample-based, we used the Princeton energy storage
benchmark datasets to improve the policy and compared it to the optimal policy obtained
from the benchmark dataset. The particularity of this thesis remains in the consideration
of residential photovoltaic system in real life. Simulations were carried out over one exem-
plary day, based on the data from a real residential PV system with battery. Optimization
policies are developed according to an analysis of Germany’s current energy policy and
have high application potentials.
Computational results show that compared to the DP algorithm, ADP algorithms can
achieve near-optimal performance with reasonable computational time. Comparative re-
sults of all methods are provided and analyzed.
9
Master's_Thesis_XuejiaoHAN
1. Introduction
According to the study of Fraunhofer ISE, renewable energy as a whole (RE) has reached
approximately 31% of the Germany’s gross power consumption in 2014. Long-term min-
imum targets of the German government are 35 % by 2020, 50% by 2030 and 80% by
2050, and finally increase the share of renewable energy to the country’s overall electricity
consumption.
Among all renewable energies photovoltaic (PV) is regarded as a major part. In 2014,
PV generated power totaled 35.2 TWh and covered around 6.9% of Germany’s net elec-
tricity consumption while roughly 6.1% of Germany’s gross electricity consumption [BDEW,
2015]. On sunny weekdays, PV power can at times cover 35% of the momentary electricity
demand, and on weekends and holidays up to 50%. A study conducted by Royal Dutch
Shell, which is entitled “New Lens Scenarios”, presents that PV will grow into the most
important primary energy source by 2060.
PV energy is attractive regarding to economical and environmental aspects on one hand
and on the other hand it reduces grid operating and transmission costs. A further advan-
tage of feeding in PV is that in addition to feeding in real power, PV plants may contribute
towards improving grid stability and quality.
In this introduction the current situation and state policies for residential rooftop pho-
tovoltaic systems are analyzed. Additionally, photovoltaic generation and consumption in
Germany are discussed.
1.1. Photovoltaic Generation in Germany
Figure 1.1 shows levels of irradiance across Germany. The average total horizontal irra-
diance in Germany between 1981 and 2010 stands at 1,055 kWh/m2
per year and fluc-
tuates according to location between approximately 951 kWh/m2
and 1,257 kWh/m2
per
year [DWD].
The average daily solar insolation value1
on a flat plate PV system determined using
PV Watts for Munich is about 3.42 kWh/(m2
d). Figure 1.2 shows the monthly photovoltaic
production and solar radiation for Munich with a system size of 5.5 kW DC (see Appendix
A for details of the PV system).
Solar radiation in Munich ranges from 1 kWh/m2
per day to 6 kWh/m2
per day for dif-
ferent seasons, while PV production varies between 129 kWh to 768 kWh per month.
1
It refers to the solar insolation which a particular location would receive if the sun were shining at its maxi-
mum value for a certain number of hours. Since the peak solar radiation is 1 kW/m2, the number of peak
sun hours is numerically identical to the average daily solar insolation.
11
1. Introduction
Figure 1.1.: Global annual solar irradiance on a horizontal surface in Germany between 1981 and
2010 [DWD]
Calculations show that the yearly PV production for the sample size PV system in Munich
is 5,512 kWh, while an average 4-person household consumes about 5,009 kWh electricity
per year. It means that the electricity production of the sample size PV system would be
sufficient to supply the equivalence of a 4-person-family’s annual electricity needs.
1.2. Economics of Residential PV System
In recent years, the decrease in investment and electricity generation costs makes PV
systems continually attractive. An analysis published by BSW-Solar, the German Solar
Industry Association, demonstrates that system prices have reduced by more than 50% in
the last few years and the average price for PV rooftop systems of less than 10 kW arrived
12
1.2. Economics of Residential PV System
0
100
200
300
400
500
600
700
800
900
1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
PVProductioninkWh
Month
Figure 1.2.: Munich Monthly PV Production for System Size 5.5 kW DC
at around 1,640 EUR/kWh in 2014. Furthermore, the Levelized Cost of Energy (LCOE)2
for a small rooftop PV system in Germany is around 0.16 EUR/kWh whereas the electricity
price for a private household is around 0.25 EUR/kWh. Moreover, PV energy should be
regarded as an economical choice due to their negligible marginal costs.
1.2.1. Feed-in Tariff
To encourage the development of renewable energy techniques, The German Renewable
Energy Sources Act (German: Erneuerbare-Energien-Gesetz, EEG) has been introduced
and came into force in the year of 2000. The EEG accelerated the German energy tran-
sition from fossil and atomic energy to green energy significantly. Figure 1.4 shows the
development of the feed-in tariff (FiT) for small rooftop systems (< 10 kW) since 2001. All
rates are guaranteed for an operation period of 20 years, independent of the start-up date.
Fifteen years have already passed since Germany introduced the feed-in tariff (FiT)
system in 2000. For a long time, feed-in tariffs were significantly higher than the average
residential electricity tariff (see Figure 1.5), which resulted in a period of energy supply
from 2001 to 2011. During this period of time, private rooftop PV system owners preferred
selling electricity to the power grid to consuming the electricity themselves, for they could
buy electricity from the grid at a lower price and benefit from feed-in tariff policy. In recent
years, the EEG feed-in tariff for PV energy has reduced dramatically, while electricity price
has taken a strongly opposing trend. Since the beginning of 2012, newly installed, small
rooftop installations (<10 kW) have achieved grid parity 3
.
2
represents the per-kilowatthour cost of building and operating a generating plant over an assumed financial
life and duty cycle and is often cited as a convenient summary measure of the overall competitiveness of
different generating technologies.
3
Grid parity occurs when an alternative energy source can generate electricity at a levelized cost of energy
that is less than or equal to the price of buying electricity from the power grid.
13
1. Introduction
0
1
2
3
4
5
6
7
1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
SolarRadiationin(kWh/m^2/day)
Month
Figure 1.3.: Munich Monthly Solar Radiation for System Size 5.5 kW DC
From this intersection point (between 2001 and 2002) onwards, self-consumption has
become the most attractive and profitable business model for every new PV system owner.
In 2015, feed-in tariff for one kilowatt-hour (kWh) electricity from PV has decreased to
12.56 Eurocents, while one kilowatt-hour from grid costs 28.81 Eurocents, more than twice
of the feed-in price. The increasing gap between retail electricity price and feed-in tariff
encourages the PV owners to maximize their self-consumption rate.
In order to increase self-supply ratio, a smaller-sized PV system is expected to be an
economical option for the future according to the inverse relationship between PV system
size and its self-consumption rate. Besides reducing the system size, residential battery
system (RBS), which allows a load shift between electricity peak and off-peak hours, could
be regarded as a solution to realize a higher self-consumption rate. This kind of energy
management is also able to cope with hourly, daily and seasonal fluctuations in PV power
generation. The residential battery system will be further discussed in Section 1.4.
1.2.2. Cost Calculation of a Sample Residential PV System
Depending on irradiance and performance ratio (PR), specific yields of around 900-950
kWh/kWp are typically generated in Germany and in sunnier regions up to 1,000 kWh/kWp.
To satisfy the electricity consumption of a 4-person household, about 20 typical “150 watt”
PV modules are required, which corresponds to 20 square meters of panels.
According to the aforementioned information, the sample size PV system for a 4-person
household in Munich is chosen to be 5.5 kW DC. A simplified cost calculation for this
sample residential PV system is as follows.
The investment time is assumed to be 20 years. To simplify the calculation process,
14
1.2. Economics of Residential PV System
0
10
20
30
40
50
60
70
2001
 2002
 2003
 2004
 2005
 2006
 2007
 2008
 2009
 2010
 2011
 2012
 2013
 2014
 2015
Eurocents/kWh
Year
Figure 1.4.: Development of feed-in tariff for small rooftop PV systems under 10 kWp [IEA-PVPS]
electricity price and feed-in tariff are assumed to be invariable, and the energy demand is
covered by PV generation.
• Installation cost: EUR 18,467
PV system (Invert efficiency 96%): 5.73 kWp x EUR 1,600 per kWp= EUR 9,167
Battery system with 5.5 kWh (KNUT Basix): EUR 9,300
• Average energy consumption: 5,009 kWh per year
• Average energy generation (PWatt): 5,512 kWh per year
• Self-consumption (without consideration of the 60% feed-in constraint):
5,009 kWh x 0.35 EUR/kWh x 20 Jahre = EUR 35,063
• Feed-in (5,512 kWh-5,009 kWh = 503 kWh):
503 kWh x 0.12 EUR/kWh x 20 Jahre = EUR 1,207
• Profit: EUR 35,063 + EUR 1,207-EUR 18,467 = EUR 17,803
However, in real situations the hourly, daily, weekly and seasonal fluctuations in PV
power generation cannot be ignored. It means that the 100% utilization rate of PV genera-
tion is only an ideal level, which cannot be achieved in real situations. To ensure a higher
utilization rate of PV energy and a higher self-consumption rate, we need to manage the
demand via signals from PV system, battery system and power grid, as well as electricity
price and feed-in tariff signals from the energy market.
15
1. Introduction
0
10
20
30
40
50
60
70
2005
 2007
 2009
 2011
 2013
 2015
Euroct/kWh
Year
retail electricity price
feed-in tariff
Grid-parity
Period
Self-consumption
Period
Feed-in Period
Figure 1.5.: Previous development of the feed-in tariff and retail consumer tariff (Sources: IEA-
PVPS, BDEW; retail electricity price: average residential tariffs for 3-person Household consuming
3,500 kWh of electricity per year)
1.3. Energy Demand and Supply
Knowledge of household electricity consumptions is essential for the development of smart
grid integration strategies. Most of the available data focus on aggregated results like total
electricity demand or yearly residential electricity consumption. However, when managing
a smart home with a photovoltaic (PV) system and a storage device, it is important to
obtain detailed information.
Residential electricity consumer data are well protected in Europe due to privacy con-
cerns. Only a few companies monitor electricity consumptions at residential level and they
are not keen on sharing these load profiles.
Figure 1.6 describes the average hourly load profile from Germany for each quarter
in 2014. Figure 1.7 and Figure 1.8 present the results of the average load curve per
household on weekdays and weekends respectively. The data originate from a household
electricity usage report from Intertek, based on a survey of 251 households in England that
was undertaken to monitor the electrical power demand and energy consumption during
the period from May 2010 to July 2011. Compared to Figure 1.6, it can be seen that the
overall pattern of hourly residential load curve and hourly country load curve are alike.
These figures present that the electricity consumption on weekends is obviously higher
than that on weekdays. The use of washing machine is supposed to be mainly responsible
for this difference. Load peak for weekdays and weekends both occur on late afternoon
around 6 pm.
16
1.4. Energy Management
0
10000
20000
30000
40000
50000
60000
70000
80000
1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
Power(MW)
Hours
1st Quarter
2nd Quarter 
3rd Quarter
4th Quarter
Figure 1.6.: Average hourly load profile from Germany for different quarters in 2014 [ENTSO-E]
1.4. Energy Management
One of the most important fields of research and development (R&D) in German PV in-
dustry is economical operation of grid-connected and off-grid PV system solutions includ-
ing energy management and a storage system. Storage devices at home allow the shift
of the consumption of PV power, which reduces peak demand and also increases self-
consumption at the same time. A study from Fraunhofer ISE indicates that a grid-optimized
PV/battery operation reduces the feed-in peak of all systems by about 40% (1). Potential
values of a storage device in managing intermittency of renewable-source electricity is
discussed in (2).
Therefore, researches and efforts today are concentrated on efficient energy storage
systems (ESS). Much of the recent research seems to be focused on weather prediction
and household electricity consumption estimation besides energy flow optimization (3), (4).
Different methods to predict electricity consumption are summarized and compared in (5).
In this thesis only optimized control of energy storage and flow will be discussed.
1.5. Structure of the Work
The structure of the thesis is the following. Chapter 2 presents a literature overview on var-
ious optimization algorithms, focusing on those who are appropriate for energy manage-
ment problems. As one representative the approximate dynamic programming algorithm
is introduced, along with its characteristics and realization methods. Afterwards Chapter
3 specifies the analysis and mathematical modelling of a storage problem and provides a
brief description of the boundary conditions for the proposed system. In Chapter 4 poten-
tial solutions to the problem are discussed. An overview of the implementations are given,
using the datasets provided by Princeton and the data supplied by the manufacturer “KNU-
17
1. Introduction
Figure 1.7.: Structure of the average hourly load curve on Weekdays
BIX”. Thereafter, validation and qualities of different algorithms are evaluated in Chapter
5. Chapter 6 summarizes the main conclusions of the thesis and proposes topics for future
work. The Appendix contains additional model information.
18
1.5. Structure of the Work
Figure 1.8.: Structure of the average hourly load curve on Weekends
19
Master's_Thesis_XuejiaoHAN
2. Theories
In this chapter the theories used in this work are discussed. The first section comprises
the features of a Markov decision process. The second part introduces the Linear Pro-
gramming method, which is a simple static solution based on simplification of the problem.
The last part begins with a brief introduction to Dynamic Programming and then details the
key components and basic concepts of Approximate Dynamic Programming.
2.1. Markov Decision Process
Markov property, which is named after the Russian mathematician Andrey Markov, de-
scribes a memoryless property of a stochastic process: given the present state s and
the action a, the value of the next state s is independent of all previous states and ac-
tions. Markov Decision Process (MDP) is a discrete time stochastic control process, which
satisfies the Markov property and provides an algorithm for making optimal decisions.
A MDP model contains:
• a finite set of states S,
• a finite set of actions A,
• a reward function R(s, a),
• a state transition probability matrix Pa(s, s ), which describes the probability from
state s at time t under action a to next stage state s .
For a discounted Markov decision process another key ingredient is the discount factor
γ ∈ [0, 1], which represents the effect of the future rewards on the present decision.
In a MDP, the probabilistic sequential model can be described as follows (6). At each
discrete time step, the decision maker measures the current state of the environment and
executes an action according to the current policy. As a result, the process transfers into
a next state s with a certain probability and an immediate reward is received by the de-
cision maker after the transition from the current state s to the next state s . The rewards
and transition probabilities are functions of the current state s and the action a. In real
situations, the reward can be profits in asset acquisition problems, cost of time or length
of path in transportation problems. It can also be a function, taking several factors with
different weights into consideration. A sequence of rewards will be received at the end of
the simulation time. The goal of the algorithm is to maximize the cumulative reward at the
end of the whole simulation period. This process is represented in Figure 2.1.
21
2. Theories
state  
reward  
action  
Agent Environment
Figure 2.1.: The flow of interaction in MDP
According to the different types of transition process, MDP problems can be divided into
two classes: deterministic problem and stochastic problem. In deterministic problems, the
next state s is determined when given the present state s and the action a, while the
next state in stochastic MDP is unstable even though the current state and the action are
known. The one-step transition probability Pa
ss , which can be described as
Pa
ss = Pr{st+1 = s |st = s, at = a}, (2.1)
and the expect value of the next reward Ra
ss , which is
Ra
ss = E{rt+1|st = s, at = a, st+1 = s }, (2.2)
completely specify the dynamics of a finite MDP (7). MDP is very useful in dynamic pro-
gramming, since the accumulated rewards are only assumed to be the function of the
current state. MDPs can be solved via linear programming or dynamic programming.
2.2. Linear Programming
Linear programming (LP) is an approach used to optimize a linear objective function, sub-
ject to constraints and bounds. The objective function, which is to be maximized or mini-
mized, is formed as a linear combination of a series of decision variables x = {x1, x2...xn}:
f = c1x1 + c2x2 + ... + cnxn.
Constraints can be divided into linear equality and linear inequality constraints. The sim-
plest and most popular constraint is the requirement that all decision variables be non-
negative. Thus, to determine the optimal decision vector x, LP problems can be expressed
22
2.2. Linear Programming
in a standard form:
maximize c1x1 + c2x2 + ... + cnxn
subject to a11x1 + a12x2 + ... + a1nxn ≤ b1.
a21x1 + a22x2 + ... + a2nxn ≤ b2,
...
am1x1 + am2x2 + ... + amnxn ≤ bm,
and x1, x2, ..., xn ≥ 0,
where a11, ..., amn, b1, ..., bm, c1, ...cn are constant variables.
A proposal of specific values for the decision variables is called a solution (8). The solu-
tion for a LP problem is feasible if it satisfies all constraints. Among all feasible solutions,
the one which obtains the maximum or minimum objective is called the optimal solution.
On the contrary, a solution is infeasible if it contradicts any of the constraints. There is
another situation called unbounded, in which case the optimal objective value is infinitely
large.
The LP optimization algorithm is applied essentially for one-stage problem, while an-
other important field of application of LP is multi-stage optimization. A multi-stage linear
programming may be referred to as a dynamic model and can be formulated as linear
problems with dynamic matrix. For deterministic problems, sum of the sub-problems can
be regarded as a new large scale linear programming problem, and the effectiveness of
this method is based on an accurate estimation and prediction of the energy price, de-
mand and the amount of the exogenous energy information in the future (9). The principle
of this method is simple and it is easy to develop the algorithm for a multi-stage opti-
mization problem. However, the drawback of this algorithm is obvious: when computing
large-scale problems with many more periods, there would have numerous equivalent and
in-equivalent constraints with numerous parameters. In this case, extraordinarily high com-
putational cost and time cost will be produced, which could make the problem intractable.
For stochastic problems, in which case there is an unpredictable disturbance in the system,
the problem needs to be solved over all possibilities and this results in high computational
cost. The elements in constraint matrices will be a function of several parameters and vary
stochastically from time to time. Stochastic linear programs were first introduced by (10).
The problem we have considered here is a multi-period planning problem, which means
the current decision cannot be decoupled from the decisions in future periods. Take the
electricity market as an example, if we produce more electricity than needed, the extra
production might be stored and be used in the next period, which resulted in holding cost
of the storage device and energy savings in the future.
There are different methods to deal with the linear optimization problem. The most com-
mon algorithms used in linear programming are simplex method and interior point method.
Linear programming problems consist of continuous problems and discrete problems. To
solve the discrete problem, mixed-integer linear programming (MILP) has to be used.
23
2. Theories
2.3. Dynamic Programming
The term Dynamic Programming (DP) refers to a collection of algorithms, which can be
used to find an optimal policy that maximizes the cumulative return with a given model.
Dynamic Programming (DP) was first developed by Professor Richard Bellman for solving
multi-stage stochastic decision process. For most of the early DP problems, which were
formed as the calculus of variations problems and using backward induction process to
search for the optimal decisions, the application of DP to control the deterministic process
was not expected (11).
There is a close relationship between dynamic programming and reinforcement learn-
ing. DP algorithms are used to solve the optimization problem when a system model is
available, while Reinforcement Learning (RL) algorithms are model-free and mainly focus
on learning from the interactions between agents and environments. However, both DP
problems and RL problems can be formulated as a Markov decision process (MDP) (12).
The principle of DP is to break a complex problem into a collection of simpler subprob-
lems and store the results of the subproblems to avoid computing the same subproblem
again. The overall optimal solution is the combination of the solutions of these subprob-
lems. For a complex problem, it is important to define the subproblem, sometimes these
smaller subproblems are not obvious. After division into a sequence of subproblems, the
key of DP to obtain the solutions of the problem is the use of a value function to search for
a good policy, which will be described in the following sections.
All dynamic programming problems can be written in a recursive way, using the value
function in current state at a particular point of time and the value of the state that we
transfer to at the next time point. To do so, we need to define a value function Vt (St ), which
represents the value of being in state St at time point t. Compared to reward Ct (St , xt ),
which evaluates the result of the action in an immediate sense, values can be considered
as cumulative rewards in the long run. The basic idea of the recursive form is to take the
effects of the future state into consideration. This equation is known as dynamic program-
ming equation or Bellman equation and can be written as
Vt (St ) = max
x
(Ct (St , xt ) +
St+1∈Ω
P(St+1|St )Vt+1(St+1)), (2.3)
where P(St+1|St ) describes the possibility at state St to transition to next state St+1 at
time point t and reflects the uncertainty in the stochastic problems and Ω is the set of the
possible next state. For deterministic problems P(St+1|St ) = 1 or 0.
Normally, we need to discretize the state variable of the optimization problem and derive
the optimal decision (policy) at each time point t using backward recursion. For determin-
istic problems, it is not difficult to use backward recursion to solve the equation. If both
the current cost ct and the value of next state at next time point Vt+1(St+1) are known and
can be written as functions of current state St , we can solve the problem by differentiating
the Bellman function with respect to the state variable and setting the derivative to be zero
(assuming that we are maximizing a continuously differentiable, concave function). Given
24
2.4. Approximate Dynamic Programming
the initial state of the problem and the calculated optimal decisions, we can easily proceed
the process state after state until the end of the simulation time.
The weakness of DP is the high memory needs, especially for long period or high time
resolution. And for almost all DP problems the state of the problem is not one-dimensional
but a vector. For example, we may have n possible stocks to deal with and each share has
m possible prices, then we would have nm
different states. In some cases, we even have
multi-dimensional decisions to make. This problem limits the application of DP algorithms
and is known as “curse of dimensionality”, which describes the explosion of the state size
with the growing number of dimensions (see (13), Chapter 5).
However, despite the high physical storage cost DP algorithm has a low computational
cost by storing previous values and avoiding multiple recomputations. The principle of
breaking down the complex problem into a sequence of much simpler subproblems pro-
vides a deeper insight into the nature of the problem and makes it simple to build the
algorithm.
2.4. Approximate Dynamic Programming
Because of the requirements to compute and store the value of each discrete state, large
scale dynamic programming problem always becomes intractable. Potential solutions are
provided by approximate dynamic programming (14), (13), which substitute the exact value
function with a statistical approximation. In exact Dynamic Programming, in general, we
step backward in time to compute the exact value function and use the knowledge of
deterioration to produce the optimal decisions and then move to the next stage state and
do it again until the start point. However, when we step forward in time, we need to make
“approximate” decisions based on an approximation of the value function. An appropriate
approximation of the value function is regarded as the key to solve the ADP problem. The
essence of ADP is to replace the true value function with a statistical approximate function,
which is much easier to calculate and can be updated through iterations (15).
2.4.1. Policies
A policy can be regarded as a rule that determines a decision given the state of the system.
There is a range of policies in different forms that deal with dynamic programming. In (13)
the policies are basically grouped into four broad categories:
Myopic policies Without regard to the effect of the decisions from the future, this can be
seen as the most elementary form of policies. The value function in the Bellman equation
is assumed to be zero. The principle of the most basic form of myopic policies is nothing
more than to choose the optimal decision to maximize the contribution in an immediate
sense, which is given by:
25
2. Theories
A(St ) = arg max
x
C(St , x).
Policy function approximations With policy function approximations a policy or a deci-
sion will be captured from the state without using the forecasts directly. We might introduce
a threshold price to our energy system and the approximation here could be simple func-
tions such as a rule to store energy in the battery when prices are lowest during the day
and release energy when prices are highest.
Value function approximations Compared to policy function approximations, value
functions approximations return an estimated value of the function in a determined state
rather than a state-action pair, which is the fundamental element of Q-learning. Since the
complicated value function can be replaced with an approximation of some form, this pol-
icy is considered as the most valid approach to solve the Dynamic Programming problem.
A description of the strategies, which are used to approximate the value functions, will be
seen in the next part.
Lookahead policies This is a method for optimizing the decision now based on the
future information over some horizon. The time horizon depends on the algorithm, and
the exogenous information will also be taken into account to make a better approximation.
Rolling horizon approximation is one of the most popular lookahead policies.
2.4.2. Method for Approximating Functions
The main idea of approximate dynamic programming is to approximate a value function for
making decisions.This section addresses three most popular ways to approximate value
function.
Lookup tables and aggregation Lookup tables can only be applied for discrete state
variables. It returns to an approximation of the value function with a given state s. It
is simple and valid, but sometimes it is not easy to initialize and to improve the value
in the lookup table. The most serious disadvantage of this method is its high memory
requirement.
In order to solve the “curse of dimensionality” in the application of DP, we might aggre-
gate the original state to lower the resolution of the state variable and to decrease the
dimension of the state space. This approach results in a simpler lookup table and signif-
icant reduction in computational cost. The aggregation can be done by simply ignoring a
dimension, discretizing it, aggregating the classification or by any other ways to reduce the
complexity of the problem. For example, in an electricity management problem we might
want to aggregate the state space by discretizing the time from minutes to hours.
26
2.4. Approximate Dynamic Programming
Aggregation is only used for the approximation of the value function. In the transition
function we still use the original, disaggregated state variable.
Parametric models The most essential part of this method is to find a sequence of
proper basis functions and to optimize the parameter vector, which can be seen as a
process to determine the most important features and the corresponding weight of each
feature for the problem with sample realizations. Normally we might find the parameter
vector with a regression model by minimizing the mean square of the error between sample
observations and our predictions. The quality of the results is primarily based on the design
of basis functions.
Nonparametric models The effectiveness of the parametric approach depends on an
appropriate mathematical model. However, some problems might not correspond to any
specific parametric model. The parametric model now will be taken as a restriction and
could cause big errors between approximations and the real observations. The fundamen-
tal purpose of nonparametric model is to receive a well-built local approximation without
being limited to specific function model.
27
Master's_Thesis_XuejiaoHAN
3. Problem Statement
The goal of the electricity management optimization is to maximize the storage operation
revenue and minimize input power from the grid under given energy price tariff. The fun-
damental element that we need to achieve the target is a well-built mathematical model.
In this chapter we first provide a mathematical description of the electric system, then
boundary conditions used in simulations are shown.
3.1. Electric System Model
We consider the problem of managing power flow among solar element, storage device,
grid and consumer, while minimizing the energy expenses. The problem is described in
more detail in (16).
Our system has three parts concerning electricity:
• Local generation. The house is equipped with a solar system. On one hand, elec-
tricity may flow directly from the solar panel to the storage device or it may be used
to satisfy the demand. On the other hand, excessive energy may also be sold to the
power grid under the spot price, which is also realized in our model.
• Consumption. The residential demand for electricity in our model can be satisfied by
the power from the grid, local generator or from the storage device.
• Storage device. To deal with the intermittency of the renewable sources and the
fluctuations of their output, storage device is supposed to be an appropriate solution.
When the electricity price in the energy market is low or the generation of the solar
system exceeds the demand, the surplus could be stored in the storage device for
later consumption. During the periods when the price in the energy market is high or
the load is greater than the generation, we can use the energy in the storage device
to decrease the energy costs.
Figure 3.1 shows the electric system consisting of energy storage, local generation (say,
solar system), electric load and power grid. Arrows indicate power flows among them.
Green arrow describes the flow, with which the householder could make profits, and the
rest power flows are represented with red arrows.
29
3. Problem Statement
Power	
  Grid
Photovoltaic	
  
System
Rt	
  
Storage	
  Device
Demand
xt
GD xt
SD
xt
RD
xt
SRxt
RGxt
GR
Figure 3.1.: Energy flow of the electric system
3.1.1. State of the System
According to Warren B. Powell, a state variable is the minimally dimensioned function of
history that is necessary and sufficient to compute the decision function, the transition
function, and the contribution function.
In this storage optimization problem, the state of the system corresponds to the storage
level of the battery Rt , which indicates the amount of energy in the storage device at time
t.
The current level of solar energy generation Et , the electricity demand Dt and the current
price of electricity Pt are regarded as known information.
3.1.2. Decisions
For each state Rt at time t, we must decide how much electricity to consume and how
much to store in the battery to obtain the optimal result for the whole simulation time. The
decision can be written as follows:
xT
t = (xSD
t , xGD
t , xRD
t , xSR
t , xGR
t , xRG
t ),
where xIJ
t is the amount of energy transferred from I to J at time t with solar, demand,
storage and grid are denoted by S, D, R and G respectively.
Based on the energy estimation and the information from the energy market, the main
decision we should make to optimize the energy consumption is to determine the charge
or discharge operation of the storage device at time t.
30
3.1. Electric System Model
3.1.3. Transition Functions
Since the problem we have discussed in this thesis is a Markov decision process, the next
state of the process depends entirely on the current state of the process and the current
decision taken. We can define a transition function such that, given the current state St ,
the subsequent state St+1 of the process is given by:
St+1 = SM
(St , xt , Wt+1),
where xt is the decision taken, Wt+1 is the new exogenous information that arrives between
time t and t + 1, such as the change of the electricity price or abrupt battery leakage. The
nth sample realization of Wt is denoted Wn
t = ωn
with sample path ωn
∈ Ω. In approximate
dynamic programming, for each test problem, K different sample paths {ω1
...ωk
} will be
simulated to improve the statistical estimation of the value function iteratively.
The transition function for the energy in storage is:
Rt+1 = Rt + ΦT
xt ,
where ΦT
= (0.0, −ηd , ηc, ηc, −ηd ) is a column vector that models the flow of energy into
and out of the storage device with ηc and ηd denote charging and discharging rate of the
storage device respectively. The energy change in the storage device is described by
ΦT
xt = ηc(xSR
t + xGR
t ) − ηd (xRD
t + xRG
t ), when ΦT
xt is positive it means that energy flows
into the storage device at time t, when negative it means that battery discharges at time t.
3.1.4. Objective Functions
The cost function is composed of the benefits from selling the excessive energy, the cost
of the electricity purchased from the grid and the holding cost of storage device. The cost
function is expressed as follows:
C(St ; xt ) = Pt ηd xRG
t − Pt (xGR
t + xGD
t ) − ch(Rt + ΦT
xt ),
where the part Pt ηd xRG
t indicates the profits from selling energy to the grid, Pt (xGR
t + xGD
t )
is the costs of buying energy from the grid, ch(Rt + ΦT
xt ) describes the holding cost of the
storage device and ch is a constant.
The objective function amounts to maximize the final value of the cost function over the
entire simulation time:
F = max
T
t=1
C(St , xt ). (3.1)
31
3. Problem Statement
3.1.5. Constraints
The followings are constraints for our model. They can be divided into three parts, all
variables are supposed to be non-negative for all t:
Energy Storage Level Denote Rc
as storage capacity, γc as the maximum charging
rate and γd as the maximum discharging rate.
The energy supplied by the storage device is limited by the current amount of energy
that is available in the storage device and the maximum discharging rate:
xRD
t + xRG
t ≤ Rt , (3.2)
xRD
t + xRG
t ≤ γd . (3.3)
The energy charged into the storage device is limited by the storage capacity and its
maximum charging rate:
xSR
t + xGR
t ≤ Rc
− Rt , (3.4)
xSR
t + xGR
t ≤ γc. (3.5)
Demand level Denote ηc as the efficiency of the charging process with 0 < ηc < 1
and ηd as the efficiency of the discharging process with 0 < ηd < 1. The demand
at time t must be satisfied and cannot be shifted to a later time point:
xSD
t + ηd xRD
t + xGD
t = Dt . (3.6)
Local generation level The energy supplied by the local generation cannot exceed
the total generation of the solar system at time t:
xSR
t + xSD
t ≤ Et . (3.7)
3.2. Boundary Conditions
For the simulation dataset of data on a sample day from a real residential photovoltaic sys-
tem with battery provided by KNUBIX GmbH and the Princeton energy storage benchmark
datasets are used.
3.2.1. Princeton Energy Storage Benchmark Datasets
The Princeton energy storage benchmark datasets are a series of finite horizon problems
that consist of four components: renewable energy generator, load, storage-device and
power grid. All variables are presented as unitless and can be set by the users based on
appropriate understanding of the electric system.
32
3.2. Boundary Conditions
Wind Data The wind is modeled using a first-order Markov chain.
Demand Data Demand is assumed to be deterministic and given by Dt =
max(0, 3 − 4 sin(2πt
T
)) .
Price Process Two different stochastic price processes are tested in the Princeton
datasets: sinusoidal and first-order Markov chain.
3.2.2. KNUBIX Dataset
Data & Demand Data To simulate a realistic operation, the real recorded data of a
five-person household with a 9.36 kWp PV-system and a 11kWh battery system on
19.03.2015 were used. The household and the electricity system profile originated
from a KNUT 3.3 Intelligence system user in Waldburg. The original one-day data
of this five-person family was simulated at the original resolution of 5 minutes.
Battery The focus of the work is on energy management, the storage system is
therefore considered as a black box and its chemical background is neglected. Only
its electrical characteristics like capacity or maximum charging and discharging rate
will be taken into account. A Lithium iron phosphate battery with a capacity of 11kWh
is used by the KNUT system.
Tariffs Variable tariff is a popular method to appeal the shifting of power consump-
tion of the customers. This kind of electricity tariff policy has taken into force in many
countries like China and Canada. Two kinds of electricity models are used in this
research. Figure 3.2 presents the first variable tariff model, which is applied with
0.2942 EUR/kWh during daytime from 6 a.m to 10 p.m and 0.2412 EUR/kWh during
night time [SWM Profi, 2015]. The second tariff model is shown in Figure 3.3, which
tracks changes of the spot market price on the European Power Exchange (EPEX)
on the sample day (29.03.2015).
3.2.3. Additional Boundary Conditions
Simulation The simulations were carried out on MathWorks MATLAB R2014b for a
time span of one day with a resolution of 5 minutes. The battery was built as a black
box model with maximum capacity and efficiencies taken from literature.
33
3. Problem Statement
20
21
22
23
24
25
26
27
28
29
30
0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
PriceinEurocents/kWh
Hour (h)
Figure 3.2.: Electricity Price Tariff [SWM, 2015]
-5
0
5
10
15
20
25
30
0
 1
 2
 3
 4
 5
 6
 7
 8
 9
 10
11
12
13
14
15
16
17
18
19
20
21
22
23
priceinEurocents/kWh
Hour/h
Figure 3.3.: Spot market price on the European Power Exchange (EPEX) on 29.03.2015
34
4. Learning Methodology
The methodology applied to achieve the optimal results are chosen according to the nature
of the problem. The problem discussed in this thesis is a finite-time optimization problem
with constraints. The preliminary objective of the control algorithm is to satisfy the house-
hold’s electricity demand with minimum costs. Environmental factors and effects on power
grid stability are not considered here. A rule-based algorithm is a fundamental static con-
trol algorithm, which is based on human intelligence and can be easily applied for online
control (17), (18) and (19). Besides rule-based control strategy, other two popular global
optimization algorithms are developed: linear programming (20), (21) and dynamic pro-
gramming (22), (23). Linear programming, which provides a global optimization result, will
be used as a reference to compare different optimization algorithms. Although the flexibil-
ity of dynamic programming allows it to apply to different kinds of optimization problems,
it still suffers from the "curse of dimensionality" and is limited applicable to complicated
large problems. Thus, approximate dynamic programming is introduced to make optimal
decisions by approximating the value function in order to reduce the computational and
time costs (24), (25) and (26).
The methodologies based on above mentioned algorithms are designed in this chapter.
Their results are compared and analysed, using the results of linear programming as a
benchmark.
4.1. Overall Implemented Structure
Figure 4.1 shows the structure of the implemented methodology. The prediction of the
renewable energy generation (here we take PV energy for example) is not realized in
this thesis. The agent collects and aggregates the data of demand, PV generation and
electricity price. An optimization algorithm should be chosen among rule-based control
algorithm, simple threshold control algorithm, linear programming, dynamic programming
and approximate dynamic programming based on the complexity and characteristics of
the system. According to the chosen optimizer, the optimal decision, which determines the
optimal power flow among different parts of the electrical system, will be found and applied
to maximize the agent’s expected discounted reward considering the affects of the future
information and profits.
The basic principle of this smart agent is minimizing the energy costs of the household
by shifting the load and optimizing the power flow. Different optimization algorithms are
developed in order to maintain a better balance among the PV generation curve, demand
curve and the curve of the spot market price.
35
4. Learning Methodology
Input
•  Demand	
  data
•  PV	
  genera1on	
  data
•  Electricity	
  price	
  data
•  PV	
  system	
  data
Data	
  
processing
•  Aggrega1on
Op1miza1on
•  Rule-­‐based
•  LP
•  DP	
  
•  ADP
Output •  Power	
  flow
•  Storage	
  level
Figure 4.1.: Flow chart of the learning methodology
An important part here is the storage device, which serves as a load shifter. The battery
could buy and sell energy, depending on the evolution process of the spot market price
and the solar energy generation.
The details of the electric system have already been mentioned in Chapter 4.The simu-
lation has been carried out in 24 hours for the 29th of March as an exemplary day to obtain
the optimization results and to compare different optimization algorithms.
4.2. Rule-based Control
A rule-based management control is simple and built from experience and heuristic knowl-
edge. We design the optimization policy according to the nature of the problem and our
objective. Though optimality of this algorithm is not guaranteed, it could serve to compare
other algorithms. (27) presents a simple rule-based power management mechanism for
grid connected PV systems with storage and compares its results with the optimization
using dynamic programming.
4.2.1. Without Battery
According to the analysis of the Feed-in tariffs and electricity retail price evolution pre-
sented in Chapter 1, a rule-based control algorithm has been developed. The goal of the
36
4.2. Rule-based Control
algorithm is to increase the self-consumption rate, which presents the difference between
PV generation and energy that fed into the grid at the rate of the PV energy.
The photovoltaic power is first used to cover the electricity demand. When the PV gen-
eration is higher than the load, the excess energy will be fed into the power grid. If the
generated solar energy is lower than the energy demand, electricity should be bought
from the grid to cover the self-consumption.
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
6000
7000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
6000
Energy from Grid
Solar to Grid
Figure 4.2.: Profiles of power flow among PV system without battery, load and grid for rule-based
algorithm for the 29th of March
Figure 4.2 depicts the profiles of power flow among PV system without battery, load and
grid for rule-based algorithm for the 29th of March. As there is no battery integrated in
the system, the feed-in power equals to the difference between electricity demand and PV
generation. It is obvious that the self-consumption rate in this case is low.
4.2.2. With Battery
The difference between the rule-based algorithm with battery and the aforementioned al-
gorithm without battery is the integration of a storage device in the system.
The photovoltaic power is first used to cover the electricity demand. When the PV gener-
ation is higher than the load, the battery will be charged until maximum capacity is reached.
If the battery is full, the excess energy will be fed into the power grid. This control policy
will not be influenced by the electricity price signal, for the feed-in tariff for one kWh from
PV (12.56 Eurocents in 2015) is much lower than the retail electricity price (28.81 Euro-
37
4. Learning Methodology
cents in 2015). If the generated solar energy is lower than the energy demand, especially
during night hours, the energy in the storage device will be used to fulfill the requirements
for household electricity consumption.
The rule-based algorithm is described in Algorithm 1. Here we introduce a term “flag”
as an indication of the difference between demand and PV generation for the whole day
(demand and solar generation are assumed to be known).
Algorithm 1 Rule-based Algorithm
1: Et ← solar energy generation at time t
2: Dt ← demand at time t
3: Pt ← electricity price at time t
4: P0 ← threshold price
5: if Et > Dt then
6: solar energy alone will cover all demands
7: if Pt > P0 then
8: the rest of the solar energy will be sold to the grid
9: else
10: the rest of the solar energy will be stored in battery
11: if Et < Dt then
12: the demand will first be covered by solar production, then battery, then grid
13: if t belongs to off-peak hours and flag > 0 then
14: battery charges
Profiles of power flow among PV system with battery, load and grid for rule-based algo-
rithm for the 29th of March is presented in Figure 4.3. As expected, compared to the case
without storage device there is almost no feed-in power in this case, the energy exchange
between PV system and demand is maintained to the maximum with the application of the
battery.
Figure 4.4 and Figure 4.5 show the SOC schedule of batteries with rule-based algorithm
against PV generation and variable electricity tariffs respectively. During off-peak hours,
the battery charges for electricity supply during peak hours. In the middle of the day,
the battery still discharges even though the solar power reaches to peak, for the solar
production is not sufficient to satisfy the electricity demand.
4.3. Simple Threshold Control
In this section we will discuss a simple threshold control policy, which attempts to achieve
a balance between power grid, storage device, demand and the photovoltaic system. The
energy management is performed at the customer level. The goal of the control is to
minimize the cost of the energy consumption while peak shaving is not considered here.
38
4.3. Simple Threshold Control
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy Exchange with Grid
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-5000
0
5000
10000
15000
Storage level
delta R
Figure 4.3.: Profiles of power flow among PV system with battery, load and grid for rule-based
algorithm for the 29th of March
There are also some studies that focus on the energy management on the utility operator
level and aim to minimize the grid operational cost (28).
It is assumed that the process of the electricity price is known in advance. The principle
of the policy is therefore simple: try to store the energy in the battery when the price of
electricity is low, and then use the energy to satisfy the demand when the price is high.
When the electricity is expensive, the battery could be discharged in order to minimize the
amount of energy purchased from the grid. The agent learns the history price information
to determine a threshold price, which helps to optimize the charge and discharge operation
of the battery. Although threshold control policy is significantly simpler than other optimiza-
tion algorithms like linear programming or dynamic programming, finding optimal threshold
parameters can also provide an effective algorithm, which is close to optimal policy. We
determine the threshold price in two ways based on the strategies in (29):
• The maximum and minimum prices are determined for a certain period of the past;
Threshold Price=30% (Maximum Price-Minimum Price)+Minimum Price
• The average price calculated from the historical data is determined to be the thresh-
old price.
The process of the simple threshold control is described in Algorithm 2.
39
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
PVgeneration[W]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
Figure 4.4.: SOC schedule of batteries with rule-based algorithm against PV generation
In Figure 4.6 we show a plot of the storage level obtained by threshold algorithm along
with the solar energy generation and demand profiles corresponding to KNUBIX test prob-
lem. 4.7 and 4.8 present the SOC process against PV generation and electricity price
respectively. It is obvious that the threshold algorithm was not able to learn the behavior of
the signals. Whether to charge or to discharge the battery is not only based on the energy
spot price and the difference between demand and solar energy, but also on the storage
level (SOC) of the battery. During high energy price period, the battery prefers discharg-
ing to charging, while charging to discharging during low price period. In our simulation,
a minimum limit of 20% is set for the depth of discharge (DOD) to increase the battery
lifetime.
Special attention was paid on German EEG, therefore the 60% feeding power limit was
taken into account. It means, at anytime no more than 60% of the maximum solar energy
should be sold to the grid, and the storage device is responsible for the excessive energy.
4.4. Linear Programming Formulation
In this section we formulate a multi-stage electricity portfolio optimization problem and
show how they can be solved when adapting multi-period linear programming method.
With the aforementioned descriptions of the problem, it is evident that in the presented
formulation, equation (3.1) - (3.7) represent a linear optimization problem at time t. The
40
4.4. Linear Programming Formulation
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.2
0.25
0.3
SOC
Electricity price
Figure 4.5.: SOC schedule of batteries with rule-based algorithm against variable electricity tariffs
optimal solution can be obtained by solving the linear programming problem for each t
given by equation (3.1) subject to equation (3.2) - (3.7).
In this thesis, we only use linear programming to solve deterministic problems. It is as-
sumed that the forecasting information is already available a day before the simulation day,
which is realistic and the optimization results can be used to optimize the strategy on next
day. The effectiveness of this methodology is not only based on the accuracy of prediction
and forecasting, but also on the level of time resolution. For a linear programming problem
with KNUBIX data, which aims to achieve a 24-hour optimization results with a resolution
of 5 minutes, 288 time periods will be simulated. For each time point t, one equivalent
and five in-equivalent constraints are applied. It means for one-day optimization problem,
5*288 in-equivalent constraints and 1*288 equivalent constraints will be computed. The
dimension of the decision vector is 6*1, totally 288*6 decisions will be made at the end of
the simulation time.
Here we use the linprog function in Matlab “Optimization Toolbox” to solve the LP prob-
lem, and interior point method is chosen to be the solution algorithm. Matlab Optimization
Toolbox includes solvers for linear programming, nonlinear optimization, quadratic pro-
gramming and mixed-integer linear programming, which can be used for different continu-
ous or discrete problems.
To solve the problem we first need to set up the Optimization Toolbox by choosing a
solver and an algorithm. As inputs the objective function, initial state, equality constraints,
inequality constraints and bounds are supposed to be provided. The set for the iterations
41
4. Learning Methodology
Algorithm 2 Threshold Algorithm
Et ← solar energy generation at time t
2: Dt ← demand at time t
Pt ← electricity price at time t
4: P0 ← threshold price
if Et > Dt then
6: solar energy alone will cover all demands
if Pt > P0 then
8: the rest of the solar energy will be sold to the grid
else
10: the rest of the solar energy will be stored in battery
if Et < Dt then
12: all solar energy will be used to cover demand
if Pt > P0 then
14: the rest of the demand should be covered first by battery
else
16: the rest of the demand should be covered by grid
if storage of the battery < demand - solar energy generation then
18: the rest of the demand will be covered by grid
and tolerances is optional. Function in the editor can also be directly used to access the
same outputs.
We use the linear programming method for two electricity price scenarios. Scenario
A uses the day-and-night tariff model provided by Stadtwerke Muenchen (SWM), while
scenario B uses the spot market price on the European Power Exchange (EPEX) on the
sample day (29.03.2015). Detailed information of these two price models have already
been given in Section 3.2.
4.4.1. Scenario A
Figure 4.9 illustrates the profiles of power flow among PV system with battery, load and
grid for LP algorithm for the 29th of March with KNUBIX data under scenario A, while
SOC schedule of the battery against electricity price and solar generation are shown in
Figure 4.11 and Figure 4.10. As can be seen from Figure 4.11, the battery charges during
off-peak hours at lower electricity price and experiences a frequent electricity exchange
period during peak hours. The power that exchanges between electricity consumer and
power grid is controlled within a certain range, which is favorable for power grid stability.
At the end of the simulation time, the storage device discharges completely to reach the
maximum return and the minimum cost, which can be observed as an arch in 4.11.
42
4.4. Linear Programming Formulation
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-5000
0
5000
10000
Storage level
delta R
Figure 4.6.: Profiles of power flow among PV system, load and grid for threshold algorithm with
battery for the 29th of March
4.4.2. Scenario B
Figure 4.12 presents the profiles of power flow among PV system with battery, load and
grid for LP algorithm for the 29th of March with KNUBIX data under scenario B. Figure
4.14 and Figure 4.13 illustrate SOC schedule of batteries against electricity price and solar
generation under scenario B. It can be seen that the battery charges a lot at the beginning
of the day when the electricity price is low, although the electricity demand during that
time is not very high. Since the objective function of the linear programming method is the
sum of the cost function from time t = 1 to time t = T, the highest electricity price period
(from about 6:00 p.m. to 10:00 p.m.) is known to the agent in advance. Thus, the storage
device discharges possibly during high price period to satisfy the demand, consuming the
energy that bought from the grid during low price period to avoid purchasing expensive
electricity. Compared to the results under scenario A, the energy arbitrage revenue (from
storing energy purchased at off-peak times and selling it at peak times) plays a more
important role when applying spot market price tariff, since the price signal under scenario
B experiences more changes. The actions of the battery in this case are more complicated
and depend more on the spot market electricity price. Using a variable electricity price tariff
like this, it is better to observe the responses of the system to the price signal.
One of the main drawbacks of linear programming method is that the effectiveness of this
method is based on an accurate estimation and prediction of energy price, demand and the
43
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
PVgeneration[W]
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
Figure 4.7.: SOC schedule of batteries for threshold algorithm against solar generation
amount of solar generation. Linear Programming is appropriate for deterministic problems
and not for stochastic problems, in which case there is an unpredictable disturbance in
the system and the problem needs to be solved over all possible exogenous information.
Considering the volatility of electricity price in real energy market, dynamic programming
and approximate dynamic programming are taken into consideration.
The results of linear programming, however, can be regarded as the true optimal values
and used as a benchmark to test the optimality of our ADP algorithm.
4.5. Dynamic Programming Formulation
The DP algorithm whose flowchart is shown in Figure 4.15 has beeen developed with
Mathwork Matlab 2014b software.
The most important characteristic of DP formulation is the development of a recursive
optimization procedure. In DP algorithm, a multi-stage decision problem is divided into
several one-stage decision problems. Recursive procedure builds to an overall optimal so-
lution of the complex multi-stage problem by first handeling the simple one-stage problem
and sequentially moving one stage at a time and solving the following one-stage problem
until the overall optimal solution is obtained. The basic principle of the recursive procedure
is the so-called “principle of optimality” raised by Bellman:
Principle of optimality
Any optimal policy has the property that, whatever the current state and decision, the
44
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.24
0.25
0.26
0.27
0.28
0.29
0.3
SOC
Electricity price
Figure 4.8.: SOC schedule of batteries for threshold algorithm against variable electricity tariffs
remaining decisions must constitute an optimal policy with regard to the state resulting
from the current decision (30).
Now we further illustrate how to apply the DP algorithm for solving the battery storage
optimization problem.
The storage device allows power exchange with grid, depending on the market spot
price Pt at time t and the forecasting information, which includes both demand and solar
production. Here we take state of charge (SOC) Rt of the battery as the state of the system
at time t, SOC of the storage device is a continuous parameter. In order to apply a dynamic
programming algorithm, ∆R is introduced to discretize Rc
into N states. All possible states
must be an element in state set S = [0, ∆R, 2∆R. . . Rc
], where∆R is the smallest SOC
increment.
The constraint for the battery capacity, which corresponds to the constraints at energy
storage level in Section 3.1.5, should be satisfied:
Rt = i∆R,
Rt+1 = j∆R,
∆Rt = Rt+1 − Rt = (j − i)∆R, with i, j ∈ 1, 2, 3...N,
subject to:
Rt , Rt+1 ∈ [0, Rmax ] ∀t = 1, 2, 3...T,
∆Rt ∈ [−γd , γc] ∀t = 1, 2, 3...T.
45
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-4000
-2000
0
2000
4000
Storage level (scaled)
delta R
Figure 4.9.: Profiles of power flow among PV system with battery, load and grid for LP algorithm
for the 29th of March under scenario A
Before solving the problem, we need to define the reward function. Reward function
for the storage optimization problem corresponds to the cost function in Chapter 3. We
initialize the reward function as a (N +1)2
∗T matrix with all zero elements, where N is ratio
of battery capacity Rc
to increment ∆R and T is the length of simulation time with ∆t = 1.
For each status transition at a determined time point t, we calculate the immediate return
with the following function:
Ct (Rt , xt ) = Pt ηd xRG
t − Pt (xGR
t + xGD
t ) − chRt+1.
If the transition from Rt to Rt+1 does not fulfill the constraints, which means xt (Rt , Rt+1) is
not an admissible decision, we set Ct (Rt , Rt+1) = −∞.
For each single transition from state Rt to Rt+1, there could be several possible decision
vectors, which correspond to different returns. Linear programming methodology, which
was introduced in the previous section should be applied to achieve the optimal decision
for each feasible state transition. Compared to the aforementioned LP problem, one more
equivalent constraint is added in order to reach the determined status Rt+1 after executing
the decision:
Rt+1 = Rt + ηc(xSR
t + xGR
t ) − ηd (xRD
t + xRG
t ).
After calculating the costs for all possible state transitions, a lookup table for state transition
costs is built. The problem now is similar to the classic shortest path problem and can be
solved with a dynamic programming algorithm.
46
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
Figure 4.10.: SOC schedule of batteries for LP algorithm against solar generation under scenario
A
Let P(x) ∈ R(N+1)2
denote the N + 1-by-N + 1 state transition probabilities and Pij (x)
indicates probability of jumping from state Ri to Rj , given action x is taken.
For deterministic problems, possible values of Pij can only be 1 or 0. The state transition
possibility matrix is shown as:













0 δR 2δR ... jδR ... Rc
0
...
δR
2δR
...
...
iδR ... ... pij ... ...
...
...
Rc
...













Given the reward and possibility for a chosen state transition at time t, we build a value
function to take the future effect of the current decision into consideration. Based on the
Bellman equation (for further reading and information see (30)), the optimal value function
47
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.24
0.26
0.28
0.3
SOC
Electricity price
Figure 4.11.: SOC schedule of batteries for LP algorithm against SWM electricity tariff
for the problem can be defined in a recursion form with an assumption VT+1 ≡ 0:
Vt (Rt ) = max[Ct (Rt , xt ) + γ
Rt+1∈S
p(Rt , Rt+1)Vt+1(Rt+1)], ∀t = 1, 2, 3...N,
where γ is a discount factor that indicates the time value. Deterministic problem can be re-
garded as a special case of stochastic problems, for which the state transition probabilities
are all zeros or ones:
Vt (Rt ) = max[Ct (Rt , xt ) + γVt+1(Rt+1)], ∀t = 1, 2, 3...N.
The value function recursive procedures above could be realized as backward or forward
induction. In backward induction process, the final stage of the problem is first to be solved
and the process is moving backward one stage at a time until all stages are covered.
Reversely, in forward induction process, the initial stage of the problem is first to be solved
and the process is moving forward one stage at a time until all stages are covered. The
backward induction algorithm that we developed and applied is presented in Algorihtm 3. It
has to be mentioned that the forward recursion procedure discovers the optimal path to all
states from a determined initial state, while the backward recursion implicitly develops the
optimal solution to a chosen final state. Note that for stochastic problems only backward
recursion could be applied (31). Figure 4.16 illustrates the forward induction process of the
storage optimization problem.
48
4.5. Dynamic Programming Formulation
Algorithm 3 Backward Dynamic Programming Algorithm
Initialization: set initial state as R0, value function VT+1 as 0;
for time from t = T to t = 1 do
for state Ri = 0 to storage state Ri = Rc
at time t do
3: for state Rj = 0 to storage state Rj = Rc
at time t + 1 do
check if the jump from state Ri to state Rj is valid
if It is feasible then
6: calculate the transition cost Ct (Ri , Rj )
else
set cost of the action at this state to minus infinite
9: use the maximum action return at this state to calculate the value of the state
Vt (Ri ) = max
xt
(Ct (Ri , xt ) + Vt+1(Rj ))
for state Ri = R0 at time t = 1 to t = T do
12: pick the optimal next stage state Rj with max(V(Rj )) and save the corresponding
transition route
Algorithm 4 Forward Dynamic Programming Algorithm
Initialization: set initial state as R0, initial state value function as 0;
for state Ri = 0 to storage states Ri = Rc
at time t do
for state Rj = 0 to storage states Rj = Rc
at time t + 1 do
3: check if the jump from state Ri to state Rj is valid
if It is feasible then
calculate state transition cost from state Ri to state Rj
6: else
set transition cost to minus infinite
for state Ri = R0 at time t = 1 do
9: calculate the feasible next state value function based on Bellmann function;
pick the optimal transition with max(V) and save the corresponding transition route
49
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-4000
-2000
0
2000
4000
Storage level (scaled)
delta R
Figure 4.12.: Profiles of power flow among PV system, load and grid for LP algorithm with battery
for the 29th of March under scenario B
4.5.1. DP Formulation for Deterministic Problems
For deterministic problems, we assume that the solar energy, electricity spot market price
and demand evolve deterministically over time. In a deterministic DP process, given the
current state and the selected decision, both the state at the next stage and the immediate
reward of the action will be determined with complete certainty. In other words, Rt+1 can be
determined by Rt and xt . The optimal strategy is not casual with known future disturbance
and can be used as a benchmark for stochastic problems.
Similar to linear programming formulation, the simulations were conducted for two
electricity price scenarios. Scenario A uses the day-and-night tariff model provided by
Stadtwerke Muenchen (SWM), while in scenario B the electricity trade between the sys-
tem and power grid was at a spot market price on the European Power Exchange (EPEX)
on the sample day (29.03.2015).
We have the dynamic programming solution for the electric system with different values
of the smallest SOC increment ∆R. The smaller the value of ∆R is, the higher computa-
tional costs will be created. In this thesis we only applied the value of 0.1 kWh, 0.05 kWh
and 0.02 kWh per ∆t as ∆R to the simulation process due to limited computation re-
sources.
The DP formulation has the following characteristics:
Dynamic Program Parameters
50
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
Figure 4.13.: SOC schedule of batteries for LP algorithm against solar generation under scenario
B
T stages in 5-minute increments (24hours, 288 Levels);
P continuous 5-minute electricity market prices;
R discretized storage levels:
Case 1: discretized in 0.1kWh increments (R ⊆ [0, 11] → 111 Levels),
Case 2: discretized in 0.05kWh increments (R ⊆ [0, 11] → 221 Levels),
Case 3: discretized in 0.02kWh increments (R ⊆ [0, 11] → 551 Levels);
X finite action space of the power flow.
4.5.2. Scenario A
Figure 4.17 and Figure 4.18 represent the optimal deterministic backward DP storage al-
gorithm against electricity market price and PV generation for the system in different cases
under scenario A respectively. From Figure 4.17 we can see that the abrupt decrease of
storage level accords with the abrupt change point of the price curve. During on-peak
times (when the electricity price is high), the storage device makes more contributions to
the demand and discharges continuously until the end of the high price period. Comparing
curves for these three different SOC increment values, conclusions can be drawn that finer
increment results in higher flexibility and higher control level over the battery.
51
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0
0.005
0.01
0.015
0.02
0.025
SOC
Electricity price
Figure 4.14.: SOC schedule of batteries for LP algorithm against EPEX market price
Figure 4.19 shows the optimal deterministic backward DP storage algorithm power pro-
file for the system in different cases under scenario A. From the figure we know that during
day-time, the energy sold to the grid goes up and down depending on the amount of the
solar generation, while the household demand is satisfied by the energy stored in the bat-
tery.
Table 4.1 provides the economic analysis of the DP backward algorithm for these four
cases under scenario A, where negative cost refers to profits. According to the information
from the table, the finer SOC increment we choose, the less cost we need to pay for the day.
The main reason for the decrease in the electricity cost is the reduction in the electricity
that bought from the power grid.
Table 4.1.: DP backward algorithm results analysis for different SOC increments under scenario A
SOC increment Electricity from grid Feed-in electricity Electricity cost for the day
0.1 kWh 32.72 kWh 12.69 kWh - 4.85 Euro
0.05 kWh 29.74 kWh 12.74 kWh - 4.92 Euro
0.02 kWh 25.83 kWh 9.03 kWh - 5.02 Euro
52
4.5. Dynamic Programming Formulation
Start
Ini(aliza(on:	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  value	
  func(on	
  VT+1(St)	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  star(ng	
  state	
  S0
Set	
  t=0
Calculate	
  feasible	
  path	
  set	
  L	
  for	
  St
Calculate	
  cost	
  func(on	
  for	
  each	
  
element	
  in	
  set	
  L
Step	
  backwards	
  to	
  calculate	
  
value	
  func(on	
  for	
  each	
  state
Given	
  St	
  ,	
  St+1
Calculate	
  minimal	
  
cost	
  with	
  LP
Save	
  op(mal	
  
decision	
  xt
π
t<T
t=t+1
Choose	
  op(mal	
  
strategy
End
Y
N
Figure 4.15.: Forward DP algorithm flowchart
4.5.3. Scenario B
Figure 4.20 and Figure 4.21 show the optimal deterministic backward DP storage algorithm
against electricity market price and PV generation for the system in different cases under
scenario B respectively.
Figure 4.22 shows the optimal deterministic backward DP storage algorithm power pro-
file for the system in different cases under scenario B. Although small difference was ob-
served among different power profiles in four cases, the electricity cost under smaller SOC
increment is normally lower except the cost for case 3, for which the SOC increment is 0.2
kWh. The reason for this problem could be the accuracy of the calculation. The analysis
of the DP backward algorithm for these four cases under scenario B is listed in table 4.2,
where negative electricity cost means the householder’s profits from electricity trade with
market.
53
4. Learning Methodology
Initial	
  
Stage
t=1 t=2 t=3 t=T
……R0
R1,1
R1,2
……
……
R2,1 R3,1
R2,2 R3,2
R3,n-­‐1
R3,n
R2,n-­‐1
R2,nR1,n
R1,n-­‐1
RT,1
RT,2
RT,n-­‐1
RT,n
Figure 4.16.: Path search in forward DP algorithm
Table 4.2.: DP backward algorithm results analysis for different SOC increments under scenario B
SOC increment Electricity from grid Feed-in electricity Electricity cost for the day
0.1 kWh 38.03 kWh 18.99 kWh -0.114 Euro
0.05 kWh 34.60 kWh 17.50 kWh -0.150 Euro
0.02 kWh 44.31 kWh 26.96 kWh -0.068 Euro
Although the results here are not as smooth as the ones given by linear programming
optimization, the deterministic DP algorithm does provide a better performance with a finer
discretization. Comparing Figure 4.14 to Figure 4.20, we find that the form and the change
point of the SOC curve are similar. Although we use discretized state variable for DP
formulation while continuous state variable for LP formulation, the performance of the DP
backward algorithm is good even at a big increment level.
4.5.4. DP Formulation for Stochastic Problems
For stochastic problems, since both the state for the next stage and the current return are
uncertain even though the current state and decision are known, the optimal undiscounted
value function should be rewritten as the expectation form with an assumption VT+1 ≡ 0,
that is, there is (by assumption) no electricity cost after the end of the simulation time:
Vt (Rt ) = max
xt
[Ct (Rt , xt , Wt ) + E(Vt+1(Rt , xt , Wt ))] ∀t = 1, 2, 3...N.
In stochastic DP problems, expected values are used to solve problems, by which the
computation of next stage information under uncertain evolution process is difficult. As a
54
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.24
0.26
0.28
0.3
SOC
Electricity price
(a) Case 1, scenario A
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.24
0.26
0.28
0.3
SOC
Electricity price
(b) Case 2, scenario A
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
0.2
0.25
0.3
SOC
Electricity price
(c) Case 3, scenario A
Figure 4.17.: Optimal deterministic forward DP storage algorithm against electricity price for the
system in different cases under scenario A
55
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(a) Case 1, scenario A
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(b) Case 2, scenario A
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
5000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(c) Case 3, scenario A
Figure 4.18.: Optimal deterministic forward DP storage algorithm against PV generation for the
system in different cases under scenario A
56
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-4000
-2000
0
2000
4000
Storage level (scaled)
delta R
(a) Case 1, scenario A
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-2000
-1000
0
1000
2000
Storage level (scaled)
delta R
(b) Case 2, scenario A
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-1000
-500
0
500
1000
Storage level (scaled)
delta R
(c) Case 3, scenario A
Figure 4.19.: Optimal deterministic forward DP storage algorithm power profile for the system in
different cases under scenario A
57
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
-0.02
0
0.02
0.04
SOC
Electricity price
(a) Case 1, scenario B
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
-0.05
0
0.05
SOC
Electricity price
(b) Case 2, scenario B
Hour [h]
0 4 8 12 16 20 24
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Price[Euro/kWh]
-0.05
0
0.05
SOC
Electricity price
(c) Case 3, scenario B
Figure 4.20.: Optimal deterministic forward DP storage algorithm against electricity price for the
system in different cases under scenario B
58
4.5. Dynamic Programming Formulation
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(a) Case 1, scenario B
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(b) Case 2, scenario B
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
1000
2000
3000
4000
5000
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
SOC
(c) Case 3, scenario B
Figure 4.21.: Optimal deterministic forward DP storage algorithm against PV generation for the
system in different cases under scenario B
59
4. Learning Methodology
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-4000
-2000
0
2000
4000
Storage level (scaled)
delta R
(a) Case 1, scenario B
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-2000
-1000
0
1000
2000
Storage level (scaled)
delta R
(b) Case 2, scenario B
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
2000
4000
6000
8000
Energy demand
Solar energy
Hour [h]
0 4 8 12 16 20 24
Power[W]
0
5000
10000
Energy from Grid
Solar to Grid
Hour [h]
0 4 8 12 16 20 24
Power[W]
-4000
-2000
0
2000
4000
Storage level (scaled)
delta R
(c) Case 3, scenario B
Figure 4.22.: Optimal deterministic forward DP storage algorithm power profile for the system in
different cases under scenario B
60
4.6. Approximate Dynamic Programming Formulation
solution, states with zero stage to go are evaluated first, and then states with one stage to
go are evaluated by computing the evaluated value considering all possible decisions, this
procedure is known as backward induction process. The optimal value for each state at
certain stage will be stored.
One of the biggest challenges for dynamic programming is the "curse of dimensional-
ity". As the number of state variables increase, not only the computer time, but also the
required computer memory increases exponentially. For a system with N possible states,
there are N2
combinations at each period and for total T time periods the total number of
combinations is T ∗ N2
. If the decision taken at each time t is a vector, the situation will
be even worse. Even though not all of the states are valid due to the constraints in the
system, it could still result in extraordinarily high computational cost and time cost. Take
the 24-hour KNUBIX data simulation as an example, the direction of the combination will
also be considered and it took much longer time than linear programming to compute opti-
mal decisions even with a big SOC increment ∆R = 0.1kWh. The situation is expected to
be worse with a finer SOC increment value. To achieve a trade-off between optimization
performance and cost of the algorithm, an approximate dynamic programming algorithm
(ADP) has been developed. The detailed information of ADP is given in the following sec-
tion.
4.6. Approximate Dynamic Programming Formulation
When applying the traditional dynamic programming method that we presented in previous
sections, we should loop over all possible states and enumerate all feasible state transi-
tions. We have represented value function by a lookup table and for each state at time t
there is an entry V(St ), resulting in incredibly large computational cost and high memory
requirement because of the need to store too many states and actions in memory. If we try
to improve the performance of DP algorithm further with a possibly small SOC increment,
this strategy cannot be tractable any more.
The foundation of approximate dynamic programming is forward dynamic programming
(13). When we step forward to calculate the value function using
Vt (St ) = max
xt
[Ct (St , xt ) + E(Vt+1(St , xt ))] (4.1)
for each state St , we have not computed the value of Vt+1, not to mention the expectation
of the possible values over exogenous information, so we have to obtain an approximation
of the value function to make a decision.
For each state St at time t, an approximation of the value function Vt (St ) can be linear,
nonlinear separable or nonlinear non-separable. We solve this storage optimization prob-
lem simply by approximating this cost-to-go function and selecting the optimal decision
from the feasible decision set to maximize the sum of the current cost and the estimated
value for next state. The accuracy of the estimation or the choice of an approximation
model directly influences the quality of the optimization results. A lot of research has been
61
4. Learning Methodology
done to approximate the value function: (32) and (33) propose least-squares policy itera-
tion algorithms to approximate the value function in a large Markov decision process; (34)
blends reinforcement learning and math programming to make a nonparametric approxi-
mation of shape-restricted value functions; (35) studies both linear and nonlinear approxi-
mation strategies for stochastic product dispatch problem; (36) proposes a piecewise linear
programming algorithm to assist clinical decision making (optimal dosing) in the controlled
ovarian hyperstimulation (COH) treatment.
The basic idea of an ADP is to follow a sample path, which refers to a sequence of
exogeneous information like the disturbance in demand, electricity price or solar energy
generation. Normally the sequence of realizations could be generated randomly or ob-
tained from a lookup table, a popular distribution or the real-world data. Each sample
path corresponds to a value function iteration. We update the approximate value function
iteratively based on the previous estimation, each time following a fresh sample path. It
means, when we are following the sample path ωn
(ωn
respresents the specific value of
exogeneous information ω at iteration n), we make sample path-based decisions using ap-
proximate value function V
n−1
(St+1) from previous iteration. For each state St , with known
value of next stage Vt+1, we can use equation 4.1 to make a decision. At the end of each it-
eration, we combine the information from the previous iteration with the current information
to update the value of related states.
To summarize, when using approximate dynamic programming to solve the Bellman
equation, decisions will be made under the assumption that value functions for all states at
any time t are known. An initialization for the value function is also important. Approximate
dynamic programming proceeds by improving the optimal decision iteratively and updating
the approximate value function iteratively.
In this thesis, we go forward to approximate the value function parametrically or non-
parametrically, then we solve the problem via backward recursion algorithm with the ap-
proximate value, as we did in the dynamic programming method. For simplicity, we are
going to drop discounting and set discounting factor γ to 1.
This section consists of two parts, in the first part we focus on the application of linear
achitecture to make the approximation, while in the second part we propose a piecewise
linear concave approximation based on the Concave Adaptive Value Estimation (CAVE)
developed by Godfrey and Powell (37).
Before developing an approximation algorithm, K samples of exogenous information
Ω = {ω1
1, ω1
2...ω1
T ...ωK
1 ...ωK
T } are drawn to develop the approximation. Here we use the
Princeton energy storage benchmark dataset S1 to generate a series of state path sam-
ples, the number of the samples K is 256 and each simulation period is set to have 101
time periods. The characteristics for the wind process, price process and demand process
in this dataset are listed as follows:
The Wind process Et is modeled using a first-order Markov chain;
The Price process Pt is assumed to be sinusoidal with respect to time t;
62
4.6. Approximate Dynamic Programming Formulation
The Demand process Dt is modeled as deterministic, following the function Dt =
max(0, 3 − 4 sin(2πt
T
)) .
The storage device with a capacity of 30 and a maximum charging or discharging rate of
5 is used in the simulation. Initial state of charge of the battery is assumed to be 25. For
further information see (16).
Our goal is to find an appropriate approximation to solve the optimization problem:
Vt (St ) = max
xt
{Ct (St , xt ) + γEVt+1(St+1)|St }, (4.2)
for t = 0, 1, ..., T − 1. The expectation in equation 4.2 is over the sample exogenous
information and normally intractable. To avoid computing expectation within maximization,
we use the post-decision state variables to modify the equation 4.2 (38), (39). The post-
decision state variable is the state of the system after we have made a decision but before
any new exogenous information has arrived (13). Pre-decision state variables can be
represented by post-decision state variables with St = Sx
t−1 + Wt . Thus, equation 4.2 can
be rewritten as
Vx
t−1(Sx
t−1) = E[max
xt
{Ct (St , xt ) + γVx
t (Sx
t )|Sx
t−1}], (4.3)
with Vx
t (Sx
t )
def
= E[Vt+1(St+1)].
For a determined sample realization, we propose the original approximation as follows:
ˆvt (St ) = max
xt
{Ct (St , xt ) + γVt+1(St+1)}, (4.4)
where V and ˆv represent two forms of value function, and they are used to update ap-
proximation and will be illustrated in the following parts. Applying the post-decision state
variable, we modify the equation 4.4 into:
ˆvx
t−1(Sx
t−1) = E[max
xt
{Ct (St , xt ) + γV
x
t (Sx
t )|Sx
t−1}]. (4.5)
For time t iteration n, we update ˆv iteratively, using the V from the previous iteration:
ˆvn
t−1(Sx
t−1) = E[max
xt
{Ct (St , xt ) + γV
n−1
t (Sx
t )|Sx
t−1}]. (4.6)
For a given state Sx
t−1 in iteration n, we loop over all feasible actions and for each action
a state Sx
t is built based on the state transition function. With a series of Sx
t−1 and Sx
t
pairs, we simply use equation 4.6 to search for the optimal decision xπ
t . Afterwards, we
move forward until the end of this simulation period when t = T, and then add one to the
iteration number and do it again over the whole simulation cycle with previous estimation
information. The detailed process is described in Algorithm 5.
63
4. Learning Methodology
Algorithm 5 Approximate Dynamic Programming Algorithm
Initialization:
set initial state as R0
set value function VT+1 as 0
initialize value function V
0
(S) for all possible states
for sample path n = 1, 2, ..., N do
simulate the sample path ωn
.
3: for time t = 1, 2, ..., T do
compute
ˆvt
n
= arg max
xt
(Ct (Sn
, xt ) + γ
ω∈ωn
V
n−1
t+1 (Sn
t , xt , ω))
update value function for state Sn
t using
Vt
n
(Sn
t ) = (1 − αn−1)Vt
n−1
(Sn
t ) + αn−1 ˆvt
n
6: compute next stage state Sn
t+1 = SM
(Sn
t , xn
t , ωn
) (assuming that xn
t is the optimal
decision for iteration n)
4.6.1. A Linear Lookup Table Approximation
The simplest way to approximate a value function is to use a linear regression model. In
approximate dynamic programming, basic functions are used to translate the various state
variable information into a series of features to create the linear combinations.
We express basic functions with φf (S), where f is a feature and S is the state variable.
θf is a parameter, which indicates the weight for different feature f. A general form for the
value function approximation can be written as:
Vf (S|θ) =
f∈F
θf φf (S).
The process to define the features and to design basis functions is always complicated
and a poor choice of basis functions may lead to a terrible approximation of the value
function, even though a perfect calculation of the weight parameter vector θ is provided.
(40) presents a way to construct basis functions for linear approximate value function of
a Markov Decision Process (MDP) automatically. Basis functions and parameter vector θ
can also be defined in terms of time.
Let Vn
be the nth observation of the true value function, then our goal is to find an
64
4.6. Approximate Dynamic Programming Formulation
appropriate parameter vector θ that minimizes the mean squared error:
min
θ
N
n=1
(Vn
−
f∈F
θf φf (Sn
))2
.
Considering St = {Rt , Pt , Et , Dt } as our state variable, we define our basis functions as
a linear or nonlinear combination of state variables. We set φ0 = 1, which corresponds to
the constant term of the linear regression model. Let the n observations of basis functions
be
φn
=





φ1
0 φ1
1 · · · φ1
K
φ2
0 φ2
1 · · · φ2
K
...
...
...
...
φn
0 φn
1 · · · φn
K





,
where K + 1 indicates the number of features and n is the number of observations. The n
observations of the true value functions are given by
V =





V1
V2
...
Vn





.
The optimal parameter vector of regression coefficients θ can be estimated using normal
equation
θ = [(φnT
φn
]−1
(φn
)T
Vn
. (4.7)
When programming, the non-invertibility problem may be encountered, for which pseudo-
inverse provides a solution. However, in approximate dynamic programming, optimizing
coefficient vector θ with equation 4.7 would be expensive, the method of recursive least
squares provides a possible solution to solve this problem (13). At each iteration, new
observations are used to update the parameter estimation.
To apply recursive estimation, stochastic gradients need to be introduced in an updating
function that involves the sample observation ˆvt
n
of being in state St and previous iteration
estimate of the value function Vt
n−1
(St ). We are searching for an updating algortihm that
solves
min
Vt
EF(Vt , ˆvt ),
where F(Vt , ˆvt ) = 1
2
(Vt − ˆvt )2
. With a stepsize parameter αn, we write the updating equation
as follows:
Vt
n
= Vt
n−1
− αn−1 F(Vt
n−1
, ˆvt ), (4.8)
where F(Vt
n−1
, ˆvt ) = Vt
n−1
− ˆvt .
65
4. Learning Methodology
After applying the linear regression model, the equation can be reformed as:
θn
= θn−1
− αn−1(V(S|θn−1
) − ˆvn
) θV(S|θn
), (4.9)
where θV(S|θn
) = φ(Sn
).
Instead of using a stepsize parameter, Powell introduced a matrix Hn
to serve as a
scaling matrix. The updating equation for coefficient θ is represented as:
θn
= θn−1
− Hn
φn
ˆn
, (4.10)
where ˆn
= V(S|θn−1
) − ˆvn
(for detailed information see Chapter 9 in (13)).
A simple double-pass policy iteration algorithm, which is an adaption of the ADP algo-
rithm in (13) and uses basis function model with lookup table is described in Algorithm 6.
We introduce a linear regression model in terms of time, which takes the effect of time into
consideration. To improve the quality of the linear regression model and better handle the
problem of the "curse of dimensionality", we use aggregation method to aggregate the de-
mand and price dimensions. To simplify the process, we just aggregate the demand state
variables and price state variables into integers. We then construct the linear regression
model and estimate the regression parameters around aggregated states.
This double-pass algorithm is developed to solve the finite horizon problem using both
forward and backward induction. In forward recursion process, we determine the decision
variable with current policy and build a trajectory of state variables through time. After-
wards, we step backwards through time to update the value function for states in this tra-
jectory, using the next stage value function. When using lookup table method, a discretiza-
tion with increasing resolution is computationally intractable. To solve this dimensionality
problem, state aggregation method is considered as a powerful solution.
We only test deterministic problem in this thesis, using test problem S1 from the Prince-
ton dataset, where the electricity price, wind energy and energy demand are assumed
to evolve deterministically with different dynamics over time. Test problem S1 consists of
T = 101 time periods with ∆t = 1.
In Figure 4.23a we show the storage level obtained by linear ADP along with the wind
energy and demand profiles corresponding to test problem S1. Figure 4.23b shows the
spot electricity price process against storage level of the battery. Since our chosen basis
functions involve electricity price at next time period, it can be observed from the plot that
the battery charges or discharges some time prior to the changes of the electricity price.
Figure 4.24 compares the storage level obtained by linear ADP algorithm to the optimal
SOC process provided by dataset S1. It has to be mentioned that even though the ap-
proximate storage policy is not exactly the same as the optimal one, they follow the same
overall pattern. To evaluate the performance of the linear ADP algorithm quantitatively,
we compare the objective value given by linear ADP to the objective value given by test
problem S1. Take sample path 2 in test problem S1 for an example, the optimal objective
value known from test problem S1 is 1.8914 × 104
, while the value calculated by linear
ADP is 1.8744 × 104
, which is 99.10% of the optimal value.
66
4.6. Approximate Dynamic Programming Formulation
Algorithm 6 A Simple Double-pass Policy Iteration Algorithm Using Basis Functions
Initialization:
Design basis functions φf (S)
Initialize regression coefficients θ0
tf for t = 0, 1, ..., T
Initialize starting state Sn
0 for n = 0, 1, ..., N
Initialize ˆvn,m
T+1 = 0
for policy iteration number n = 1, 2, 3, ..., N do
for sample path m = 1, 2, ..., M do
3: simulate the sample path ωm
for time t = 1, 2, ..., T do
compute
xn,m
t = arg max
xt
(Ct (Sn,m
t , xt ) + γ
f
θn−1
tf φf (Sn,m
t , xt ))
6: compute
Sn,m
t+1 = SM
(Sn,m
t , xn,m
t , Wt (ωm
))
for time t = T, T − 1, ..., 1 do
compute
ˆvt
n,m
= Ct (Sn,m
t , xt ) + γ ˆvn,m
t+1
9: update θn,m−1
f with θn,m
f using equation 4.10
67
4. Learning Methodology
Time period, t
0 20 40 60 80 100 120
0
1
2
3
4
5
6
7
8
9
10
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
Energy demand
SOC
(a) Energy storage profiles along with the wind energy and demand pro-
files
Time period, t
0 20 40 60 80 100 120
Electricityprice
0
10
20
30
40
50
60
70
80
90
100
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Electricity price
SOC
(b) Electricity price process against storage level of the battery
Figure 4.23.: Results of linear ADP algorithm and sample path from test problem S1
68
4.6. Approximate Dynamic Programming Formulation
Time period, t
0 20 40 60 80 100 120
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
calculated SOC
optimal SOC
Figure 4.24.: Approximate path obtained by linear ADP vs. optimal path
Linear approximations are easy to develop, but their performance is not always good.
To provide an acceptable result, we need to choose basis functions carefully, considering
the features of the problem. Generally, linear approximations are appropriate for problems
with large size of resource types and small size of possible resource state values (13).
4.6.2. SPAR Algorithm
While the linear lookup table ADP is independent of problem structures, it can be unstable
and has the disadvantage of high memory consumption when large scale problems are
simulated. Nonlinear value function approximations can improve the quality of the opti-
mization results.
According to the well known result from Vanderbei (8) that any maximizing linear pro-
gramming problem is concave in right-hand side constraints, the value function in our
storage optimization problem is concave, which means slopes of the value function are
monotonically decreasing. The concavity property of objective function is really powerful,
for it guarantees a unique optimum (local optimum point is also global optimum point) and
allows us to focus on exploitation without considering exploration.
Since solving a concave non-linear optimization problem is intractable, we use a piece-
wise linear optimization method to estimate the value function. (41) shows an adaptive
dynamic programming algorithm that applies nonlinear functional approximations to a dy-
namic resources allocation problem.
69
4. Learning Methodology
Assume that our previous iteration approximation Vt
n−1
(S) is concave, we use the afore-
mentioned equation 4.8 to compute the value function Vt
n
(S) at iteration n. It is possible
that Vt
n
(S) violates concavity. To maintain the concavity property of the value function af-
ter updates, a family of algorithms is developed to enforce the concavity of the piecewise
linear approximation functions while updating the slopes for all iterations. Three popular
algorithms are reviewed here:
The Leveling Algorithm. The leveling algorithm simply substitutes the value of the
points that do not satisfy the monotonicity property with a smaller or a bigger value.
The SPAR Algorithm. The SPAR (the Separable, Projective Approximation Rou-
tine) algorithm averages the value of the points that impose the monotonicity (see
Section 11.3 of (13)).
The CAVE Algorithm. The CAVE algorithm maintains the monotonicity by expand-
ing the update range of the value function (42).
Since SPAR algorithm is the one that works well in practice and is easy to realize, we
mainly focus on the methodology developed by SPAR algorithm in this section. We first
introduce the SPAR algorithm briefly and then illustrate the ADP algorithm using SPAR.
We introduce a piecewise linear approximation of the value function with respect to the
storage level R and denote vn
tk as the slope of the kth line segment at time t for iteration n.
vn
tk satisfies monotonicity property. We denote δ as the smallest segment value between
two breakpoints, and all the following illustrations are under the assumption that δ = 1. The
value function around the post-decision state variable can be written as:
Vt
n
(Rx
t ) =
Kt
k=1
vn
tk rtk ,
where
Kt
k=1 rtk = Rt + ΦT
xt with 0 ≤ rtk ≤ δ. Making an appropriate estimation of the
value function is equivalent to finding a proper prediction for the slope variable.
Compared to other updating algorithms, SPAR performs an update using average value
over a determined range. If vt
n
(Rn
) ≥ vt
n
(Rn
+1) for all Rn
, then vt
n
satisfies monotonicity.
If there exists a Rn
with vt
n
(Rn
) < vt
n
(Rn
+ 1), we need to find the largest R such that
vt
n
(R ) ≥
1
Rn − (R − 1)
Rn
r=R
vn
tr .
Then we substitute slopes of the value functions for R = [R , Rn
] with a new slope value
1
Rn−(R −1)
Rn
r=R vn
tr to maintain the concavity property of the value function.
Using piecewise linear value function approximation, we reconstruct the optimization
problem as a deterministic linear programming problem:
F = max
xt ,Kt
[C(Rt , xt , Wt ) +
Kt
k=1
vn−1
tk rtk ], (4.11)
70
4.6. Approximate Dynamic Programming Formulation
subject to
At xt = bt , (4.12)
Aeq,t xt ≤ beq,t , (4.13)
xt ≥ 0, (4.14)
Rt+1 =
Kt
k=1
vn−1
tk rtk . (4.15)
To obtain the slope of the value function, we may first compute the marginal value for
the value function. We denote S+
t = (Rt + δ, Et , Dt , Pt ) and S−
t = (Rt − δ, Et , Dt , Pt ) as the
state of the system after a positive and a negative perturbation in storage level, where δ is
the smallest increment of the SOC level regarding to slope segments. The corresponding
optimal objective values are defined as follows:
F+
= max
x+
t ,K+
t
[C(S+
t , x+
t ) +
K+
t
k=1
vn−1
tk rtk ] ,
F−
= max
x−
t ,K−
t
[C(S−
t , x−
t ) +
K−
t
k=1
vn−1
tk rtk ] .
(4.16)
We obtain the marginal value of the value function ˆvt (Rt ) by solving:
ˆvt (Rt ) =
∂
∂Rt
maxxt ∈χt (C(St , xt ) + Vt
n−1
(St+1))
=
1
δ
(F − F−
).
(4.17)
At the boundaries of the SOC domain, in other words, when R = 0 or R = Rc
, we use right
and left numerical derivatives according to the perturbation of the storage level to estimate
their slopes respectively as follows:
ˆvt (R = 0) ˆvt
+
(R = 0) =
1
δ
(F+
− F),
ˆvt (R = Rc
) ˆvt
−
(R = Rc
) =
1
δ
(F − F−
).
(4.18)
After computing the new slope information at iteration n, we update the value function
with information from the previous iteration n − 1:
vt
n
=
(1 − αn−1)vt
n−1
+ αn−1 ˆvt
n
, if R = Rn
,
vt
n−1
otherwise,
(4.19)
where αn−1 is a stepsize parameter according to a determined rule. We use a deterministic
stepsize rule for the simulation, which is given by:
αn−1 =
a
a + n − 1
, (4.20)
71
4. Learning Methodology
where a is a constant.
The piecewise linear approximation algorithm is outlined in Algorithm 7. This algorithm
requires a concave piecewise linear value function, so we first set all the initial slopes to
zero to maintain the monotonicity property. At the beginning of each iteration n, a sample
realization generated by the Princeton datasets was observed. We obtain the optimal
decision for the current state Sn
t and states after positive and negative perturbation Sn+
t
and Sn−
t . Then we calculate the corresponding objective values. Before the end of the
planning horizon t = T is reached, we compute the next stage state Sn
t+1 according to the
calculated optimal decision. Then we update slopes of the value function using slopes
information from previous iteration and sample observation ˆvt
n
. After updating the value
function we apply SPAR algorithm to maintain the concavity property of the value function.
In the end, we add one to the iteration number and use a sequence of fresh new sample
information to repeat the process and improve the accuracy of slopes v.
Algorithm 7 A Piecewise Linear Approximation Algorithm
Initialization: initialize vtk
0
, ∀t = 1, 2, ..., T and SOC levels k = 1, 2, ..., Rc
/δR
for n = 1, ..., N do
simulate the sample path ωn
∈ Ω
3: for t = 0, ..., T do
obtain xn
t = arg max
xt ∈χt
(C(Sn
t , xt )+Vt
n−1
(St+1)) by solving the LP problem regarding
to Equation 4.11-4.15
calculate F− and F+
as in equation 4.16
6: if t < T then
compute Sn
t+1 = SM
(Sn
t , xn
t , ωn
t )
calculate ˆvt
n
as in equation 4.17 and 4.18
9: update vn
t (St ) ← SPAR(vn−1
t , ˆvn
t )
if n < N then
t ← t + 1
Since slopes of the value function are discrete, we still have to handle the classical
"curse of dimensionality" problem. To solve this problem, we use the simple aggregation
method to aggregate the storage level into several segments. In this simulation, we set
δ = 1, which means at any time t in iteration n we approximate the value function using
Rc
/δ = Rc
slope segments. It has to be mentioned that aggregation is only used in
approximating the value function and not in computing the optimal decision or calculating
the objective value. We use the same dataset (dataset S1 from the Princeton datasets) as
the previous section. The algorithm was tested using three different stepsizes according
to Equation 4.20, namely: a = 1, a = 10 and a = 100.
Figure 4.25a depicts the storage level obtained by piecewise linear ADP along with the
wind energy and demand profiles under stepsize parameter a = 1 corresponding to test
72
4.6. Approximate Dynamic Programming Formulation
problem S1. Figure 4.25b shows the spot electricity price process against storage level of
the battery under stepsize parameter a = 1.
Figure 4.26 compares the storage level obtained by piecewise linear ADP algorithm
under stepsize parameter a = 1 to the optimal SOC process supplied by dataset S1. It
can be observed that they follow the same pattern, even though the amount of charging or
discharging energy is different.
Similar results under stepsize parameter a = 10 and a = 100 are given in Figure 4.27,
4.28 and Figure 4.29, 4.30 respectively.
Figure 4.31 presents objective values up to 256 iterations under the harmonic stepsize
rule with different parameter a (a = 1, 10, 100) as in Equation 4.20. The smaller a is,
the more quickly the stepsize decreases to zero. A larger a prevents the stepsize from
decreasing too fast but experiences large variances due to sensitivity to new observations.
A proper design of the stepsize rule is important to performance of algorithms.
Quantitative evaluation of this algorithm is given in Chapter 5.
73
4. Learning Methodology
Time period, t
0 20 40 60 80 100 120
0
1
2
3
4
5
6
7
8
9
10
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
Energy demand
SOC
(a) Energy storage profiles along with the wind energy and demand pro-
files
Time period, t
0 20 40 60 80 100 120
Electricityprice
0
10
20
30
40
50
60
70
80
90
100
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Electricity price
SOC
(b) Electricity price process against storage level of the battery
Figure 4.25.: Results of piecewise linear ADP algorithm (a=1) and sample path from test problem
S1
74
4.6. Approximate Dynamic Programming Formulation
Time period, t
0 20 40 60 80 100 120
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
calculated SOC
optimal SOC
Figure 4.26.: Approximate path obtained by piecewise linear ADP (a=1) vs. optimal path
75
4. Learning Methodology
Time period, t
0 20 40 60 80 100 120
0
1
2
3
4
5
6
7
8
9
10
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
Energy demand
SOC
(a) Energy storage profiles along with the wind energy and demand pro-
files
Time period, t
0 20 40 60 80 100 120
Electricityprice
0
10
20
30
40
50
60
70
80
90
100
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Electricity price
SOC
(b) Electricity price process against storage level of the battery
Figure 4.27.: Results of piecewise linear ADP algorithm (a=10) and sample path from test problem
S1
76
4.6. Approximate Dynamic Programming Formulation
Time period, t
0 20 40 60 80 100 120
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
calculated SOC
optimal SOC
Figure 4.28.: Approximate path obtained by piecewise linear ADP (a=10) vs. optimal path
77
4. Learning Methodology
Time period, t
0 20 40 60 80 100 120
0
1
2
3
4
5
6
7
8
9
10
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Solar energy
Energy demand
SOC
(a) Energy storage profiles along with the wind energy and demand pro-
files
Time period, t
0 20 40 60 80 100 120
Electricityprice
0
10
20
30
40
50
60
70
80
90
100
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Electricity price
SOC
(b) Electricity price process against storage level of the battery
Figure 4.29.: Results of piecewise linear ADP algorithm (a=100) and sample path from test problem
S1
78
4.6. Approximate Dynamic Programming Formulation
Time period, t
0 20 40 60 80 100 120
SOC
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
calculated SOC
optimal SOC
Figure 4.30.: Approximate path obtained by piecewise linear ADP (a=10-) vs. optimal path
Iteration number
0 50 100 150 200 250 300
ObjectivevalueF
× 104
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
2.1
a=1
a=10
a=100
Figure 4.31.: Objective values for different stepsize rule parameters
79
Master's_Thesis_XuejiaoHAN
5. Evaluation of the Approach
In this chapter we first summarize characteristics of different algorithms, then a quantitative
analysis will be given.
Linear programming provides the optimal results for deterministic problems and is used
as benchmark to evaluate the performance of other algorithms. It is applicable to continu-
ous situations without much computational effort and a discretization of the state variable
is not required. However, the effectiveness of this algorithm decreases with the increasing
dimension of the problem size. When involving with stochastic problems, linear program-
ming can not be used.
Dynamic programming provides a more flexible method to solve the storage optimization
problem and offers excellent algorithm performance. However, from the aspect of time cost
and computational cost, it might not be a wise choice. Reasons for this can be found in the
application of lookup table policy. When applying to large scale problems or problems with
states of fine resolutions, DP algorithm is not tractable any more.
Compared to the dynamic programming algorithm, ADP using linear regression model
saves the memory requirements and also achieves a dramatic reduction in time cost. But
despite of this, the performance of the algorithm relies largely on the design of basis func-
tions and the quality of the result is unstable when applied to different test sample path.
The concave piecewise linear algorithm avoids exploration and focuses on pure exploita-
tion. The optimization problem is turned into a deterministic LP problem with this approx-
imation. This algorithm is especially useful when we use aggregation method or are only
interested in integer solutions, but its performance is not good as expected due to the
limited number of samples provided by Princeton.
To make a comparison of the aforementioned algorithms from the aspect of quantity,
a cost-benefit analysis of the one-day simulation with different optimization algorithms is
provided in Table 5.1. The simulation has been carried out in 24 hours for the 29th of March
as an exemplary day to figure out the optimization results. The final value of the objective
function (cash flow) under day-and-night price tariff model for the threshold control, LP
optimization, DP optimization (case 3 with SOC increment = 0.02) are presented in the
table, where negative costs indicate profits.
It can be seen from Table 5.1 that it makes little difference in electricity costs whether
the PV system is installed with a storage device or the PV system operates with a storage
device that uses threshold control algorithm. DP algorithm improves the performance a lot
simply by reducing feed-in electricity, but there is no doubt that LP algorithm gives the best
result.
Since approximate dynamic programming algorithm was simulated using the Princeton
81
5. Evaluation of the Approach
Table 5.1.: Cash flow analysis under day-and-night price tariff for different algorithms
Electricity cost for
day-and-night
tariff (Euro)
Electricity from
grid (kWh)
Feed-in electricity
(kWh)
PV system without
battery
6.41 26.61 4.35
Threshold control 5.58 24.39 4.47
LP algorithm -5.30 24.00 41.20
DP algorithm
(case 3)
-5.02 25.84 9.03
datasets, we cannot compare its electricity cost to other algorithms directly. We define a
optimization factor β to help evaluate the performance of different algorithms:
β =
Fcalc
Fopt
,
where Fcalc and Fopt denote the calculated objective value and the optimal objective value
respectively.
Using factor β, we can compare the performance of DP and ADP algorithms. The
dynamic programming algorithm provides a near-optimal result, which is 94.72% of the
optimal value provided by linear programming algorithm (5.02/5.30 = 94.72%). According
to the results in Section 4.6, the objective value calculated by linear ADP is 99.10% of
the optimal objective value known from the dataset. The piecewise linear ADP algorithm
performs better than DP but worse than the linear ADP algorithm. We calculated the factor
for different DP and ADP algorithms respectively, The results are shown in Table 5.2. We
can draw a conclusion that ADP algorithm outperforms not only in computational cost but
also in solution accuracy.
82
Table 5.2.: Benchmarking results for DP and ADP algorithms
β (%)
DP algorithm
(case 3)
94.72
Linear regression
ADP
99.70
Piecewise Linear
ADP (a=1)
94.84
Piecewise Linear
ADP (a=10)
95.44
Piecewise Linear
ADP (a=100)
93.87
83
Master's_Thesis_XuejiaoHAN
6. Conclusions and Future Work
This thesis contributes to developing electricity management optimization method for resi-
dential PV system with storage device. We have constructed different algorithms to solve
the electricity management problem and analysed advantages and disadvantages of them.
People could choose algorithms according to their requirements. The day-ahead model
that we have developed is practical in real life, since it is easy to access day-ahead weather
forecast information and electricity prices at spot market. Besides the intermittency of so-
lar generation process, the stochastic characteristics of the electricity price signal are also
taken seriously in the approach. Two variable electricity price scenarios were analyzed
and the obtained results using different algorithms are presented and discussed, namely:
day and night tariff and spot market price tariff. Variable price tariff changes with electricity
consumption and production, it provides consumers with flexibility and also a potential to
make economic benefits.
However, the algorithms discussed in this work have certain limitations. On one hand,
the algorithms are offline, which means the policies or the parameters in the approximate
value function are trained in advance.
When applying the algorithms to other circumstances, the performance may not be as
good as expected. If parameters of the environment fluctuate, we have to adjust our algo-
rithm manually, which results in extra work. Besides, due to the limited size of the sample
set, we may encounter the problem of overfitting when applying linear regression method
in ADP. This problem could be solved by separating the data into training and testing sets
in later work to identify the general features and to increase the accuracy of the model.
On the other hand, only deterministic problems are tested in this thesis. Future work
is to test the algorithms with stochastic problems. In real life, we may have access to the
weather information or the information from the electricity market a few hours in advance.
Thus, rolling horizon strategy, which uses the information from a period of time in the future
to make the current decision, could also be included in our future work. Deterministic
problems can be considered as a rolling horizon problem with a lookahead horizon H = T,
we may test with shorter lookahead horizons and evaluate the results.
Apart from this, the electric system model we considered in this thesis is designed for
an average family of four, while the KNUBIX data used to test the algorithm originated
from a specific family on a specific day in Germany. We could improve accuracy and
flexibility of the algorithm by combining weather forecasts and adding different weather
scenarios to our model, taking factors like temperature, solar irradiation and location into
consideration. We could also introduce a signal to distinguish weekdays from other days,
since the electricity consumption is also higher at weekends or on public holidays.
85
6. Conclusions and Future Work
Computational cost and time cost of the algorithm should be taken seriously if we want
to extend the simulation range from a day to a year or even longer, although linear regres-
sion algorithm and piecewise linear ADP algorithm have shown significant solution time
reduction. One possible solution to improve the computation efficiency is to use compiled
language like C++ directly under Matlab.
86
A. Appendix
In Appendix A the data of the PV system is presented in Table A.1.
87
A. Appendix
Table A.1.: Data of the PV system
Location Munich, Germany
Latitude (deg N) 48.13
Longitude (deg E) 11.7
Elevator (m) 529
DC System Size (kW) 5.5
Array Type fixed (open rack)
Array Tilt (deg) 20
Array Azimuth (deg) 180
System Losses (%) 14
Invert Efficiency (%) 96
DC to AC Size Ratio 1.1
88
Bibliography
[1] H. Wirth and K. Schneider, “Recent facts about photovoltaics in germany,” Fraunhofer
ISE, Freiburg, Germany, Tech. Rep., Sep. 2013.
[2] M. Black and G. Strbac, “Value of storage in providing balancing services for electricity
generation systems with high wind penetration,” Journal of power sources, vol. 162,
no. 2, pp. 949–953, 2006.
[3] M. Beccali, M. Cellura, V. L. Brano, and A. Marvuglia, “Short-term prediction of house-
hold electricity consumption: Assessing weather sensitivity in a mediterranean area,”
Renewable and Sustainable Energy Reviews, vol. 12, no. 8, pp. 2040–2065, 2008.
[4] A. Marvuglia and A. Messineo, “Using recurrent artificial neural networks to forecast
household electricity consumption,” Energy Procedia, vol. 14, pp. 45–55, 2012.
[5] G. K. Tso and K. K. Yau, “Predicting electricity energy consumption: A comparison
of regression analysis, decision tree and neural networks,” Energy, vol. 32, no. 9, pp.
1761–1768, 2007.
[6] M. L. Puterman, Markov decision processes: discrete stochastic dynamic program-
ming. New York, NY: John Wiley & Sons, 2014.
[7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. Cambridge,
MA: MIT press, 1998, vol. 1.
[8] R. J. Vanderbei, Linear programming: Foundations and Extensions. New York, NY:
Springer, 2008, vol. 114.
[9] G. B. Dantzig and G. Infanger, “Multi-stage stochastic linear programs for portfolio
optimization,” Annals of Operations Research, vol. 45, no. 1, pp. 59–76, 1993.
[10] G. B. Dantzig, “Linear programming under uncertainty,” Management science, vol. 1,
no. 3-4, pp. 197–206, 1955.
[11] R. E. Bellman and E. Lee, “History and development of dynamic programming,” Con-
trol Systems Magazine, IEEE, vol. 4, no. 4, pp. 24–28, 1984.
[12] L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement learning and
dynamic programming using function approximators. Boca Raton, FL: CRC press,
2010, vol. 39.
89
Bibliography
[13] W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimension-
ality. New York, NY: John Wiley & Sons, 2007, vol. 703.
[14] D. Bertsekas, Dynamic programming and optimal control. Belmont, MA: Athena
Scientific, 2012, vol. 2.
[15] W. B. Powell, “What you should know about approximate dynamic programming,”
Naval Research Logistics, vol. 56, no. 3, pp. 239–249, 2009.
[16] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate dynamic pro-
gramming algorithm for stochastic control of multidimensional energy storage prob-
lems,” Department of Operations Research and Financial Engineering, Princeton, NJ,
Tech. Rep., 2013.
[17] M. Sorrentino, G. Rizzo, and I. Arsie, “Analysis of a rule-based control strategy for on-
board energy management of series hybrid vehicles,” Control Engineering Practice,
vol. 19, no. 12, pp. 1433–1441, 2011.
[18] S. Teleke, M. E. Baran, S. Bhattacharya, and A. Q. Huang, “Rule-based control of
battery energy storage for dispatching intermittent renewable sources,” Sustainable
Energy, IEEE Transactions on, vol. 1, no. 3, pp. 117–124, 2010.
[19] M. Dicorato, G. Forte, M. Pisani, and M. Trovato, “Planning and operating combined
wind-storage system in electricity market,” Sustainable Energy, IEEE Transactions on,
vol. 3, no. 2, pp. 209–217, 2012.
[20] E. D. Castronuovo and J. Lopes, “On the optimization of the daily operation of a
wind-hydro power plant,” Power Systems, IEEE Transactions on, vol. 19, no. 3, pp.
1599–1606, 2004.
[21] H. Zhang, V. Vittal, G. T. Heydt, and J. Quintero, “A mixed-integer linear program-
ming approach for multi-stage security-constrained transmission expansion planning,”
Power Systems, IEEE Transactions on, vol. 27, no. 2, pp. 1125–1133, 2012.
[22] T. Nguyen, M. L. Crow et al., “Optimization in energy and power management for
renewable-diesel microgrids using dynamic programming algorithm,” in Cyber Tech-
nology in Automation, Control, and Intelligent Systems (CYBER), 2012 IEEE Interna-
tional Conference on. Bangkok, Thailand: IEEE, May 2012, pp. 11–16.
[23] P. Mokrian and M. Stephen, “A stochastic programming framework for the valuation
of electricity storage,” in 26th USAEE/IAEE North American Conference, Cleveland,
OH, Sep. 2006, pp. 24–27.
[24] N. Löhndorf and S. Minner, “Optimal day-ahead trading and storage of renewable
energies—an approximate dynamic programming approach,” Energy Systems, vol. 1,
no. 1, pp. 61–77, 2010.
90
Bibliography
[25] O. Sundström and L. Guzzella, “A generic dynamic programming matlab function,” in
Control Applications & Intelligent Control, 2009 IEEE. St. Petersburg, Russia: IEEE,
Jul. 2009, pp. 1625–1630.
[26] J. M. Nascimento and W. B. Powell, “An optimal approximate dynamic programming
algorithm for the energy dispatch problem with grid-level storage,” Mathematics of
Operations Research, vol. 34, no. 1, pp. 210–237, 2009.
[27] Y. Riffonneau, S. Bacha, F. Barruel, and S. Ploix, “Optimal power flow management
for grid connected pv systems with batteries,” Sustainable Energy, IEEE Transactions
on, vol. 2, no. 3, pp. 309–320, 2011.
[28] I. Koutsopoulos, V. Hatzi, and L. Tassiulas, “Optimal energy storage control policies
for the smart power grid,” in Smart Grid Communications, 2011 IEEE International
Conference on. Brussels, Belgium: IEEE, Oct. 2011, pp. 475–480.
[29] K. Mets, M. Strobbe, T. Verschueren, T. Roelens, F. De Turck, and C. Develder, “Dis-
tributed multi-agent algorithm for residential energy management in smart grids,” in
Network Operations and Management Symposium (NOMS), 2012 IEEE. Maui, HA:
IEEE, Apr. 2012, pp. 435–443.
[30] R. Bellman, Dynamic programming. Princeton University Press.
[31] B. A. McCarl and T. H. Spreen, “Applied mathematical programming using algebraic
systems,” Cambridge, MA, 1997.
[32] J. A. Boyan, “Technical update: Least-squares temporal difference learning,” Machine
Learning, vol. 49, no. 2-3, pp. 233–246, 2002.
[33] M. G. Lagoudakis and R. Parr, “Least-squares policy iteration,” The Journal of Ma-
chine Learning Research, vol. 4, pp. 1107–1149, 2003.
[34] L. Hannah and D. B. Dunson, “Approximate dynamic programming for storage prob-
lems,” in Proceedings of the 28th International Conference on Machine Learning.
Bellevue, WA: ACM, Jun./Jul. 2011, pp. 337–344.
[35] K. P. Papadaki and W. B. Powell, “An adaptive dynamic programming algorithm for
a stochastic multiproduct batch dispatch problem,” Naval Research Logistics (NRL),
vol. 50, no. 7, pp. 742–769, 2003.
[36] M. He, L. Zhao, and W. B. Powell, “Approximate dynamic programming algorithms for
optimal dosage decisions in controlled ovarian hyperstimulation,” European Journal
of Operational Research, vol. 222, no. 2, pp. 328–340, 2012.
[37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free algorithm for the
newsvendor problem with censored demands, with applications to inventory and dis-
tribution,” Management Science, vol. 47, no. 8, pp. 1101–1112, 2001.
91
Bibliography
[38] D. P. Bertsekas, Dynamic programming and optimal control. Boca Raton, FL: CRC
press, 2011, vol. 2.
[39] W. R. Scott, W. B. Powell, and S. Moazehi, “Least squares policy iteration with instru-
mental variables vs. direct policy search: Comparison against optimal benchmarks
using energy storage,” arXiv preprint arXiv:1401.0843, 2014.
[40] P. W. Keller, S. Mannor, and D. Precup, “Automatic basis function construction for
approximate dynamic programming and reinforcement learning,” in Proceedings of
the 23rd international conference on Machine learning. Pittsburgh, PA: ACM, Jun.
2006, pp. 449–456.
[41] G. A. Godfrey and W. B. Powell, “An adaptive dynamic programming algorithm for
dynamic fleet management, i: Single period travel times,” Transportation Science,
vol. 36, no. 1, pp. 21–39, 2002.
[42] W. Powell and G. Godfrey, “An adaptive, distribution-free approximation for the
newsvendor problem with censored demands, with applications to inventory and dis-
tribution problems,” Management Science, vol. 47, no. 8, pp. 1101–1112, 2001.
92

More Related Content

PDF
M sc thesis of nicolo' savioli
PDF
DigSILENT PF - 02 basic pf structure
PDF
Calmet users guide
PDF
Textbook retscreen pv
PDF
useful data for HEVs
PDF
Dmitriy Rivkin Thesis
PDF
Group research project
PDF
Power System Stabilizer (PSS) for generator
M sc thesis of nicolo' savioli
DigSILENT PF - 02 basic pf structure
Calmet users guide
Textbook retscreen pv
useful data for HEVs
Dmitriy Rivkin Thesis
Group research project
Power System Stabilizer (PSS) for generator

What's hot (6)

PDF
Thesis_Eddie_Zisser_final_submission
PDF
Ampacity according to iec 60287
PDF
Ali-Dissertation-5June2015
PDF
mchr dissertation2
PDF
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
DOCX
ME3320_FinalReport
Thesis_Eddie_Zisser_final_submission
Ampacity according to iec 60287
Ali-Dissertation-5June2015
mchr dissertation2
(2013)_Rigaud_-_PhD_Thesis_Models_of_Music_Signal_Informed_by_Physics
ME3320_FinalReport
Ad

Viewers also liked (13)

PDF
Blue Planet
DOCX
Anoop Venugopal (1)
PPTX
Creating Customer Loyalty Programs that Stick
PPTX
Inbetweeners as postmodern
DOC
Shrikant Resume
PPTX
CEREBRO TRIUNO
PPTX
Daily routines
PDF
MBSG USC Athletics Pitch Competition
DOCX
Rupesh Kumar SAP Testing APO and SD
PPT
Confiture et concentré de tomates
PPTX
Concept Map for NHK
PPTX
Survey Analysis
Blue Planet
Anoop Venugopal (1)
Creating Customer Loyalty Programs that Stick
Inbetweeners as postmodern
Shrikant Resume
CEREBRO TRIUNO
Daily routines
MBSG USC Athletics Pitch Competition
Rupesh Kumar SAP Testing APO and SD
Confiture et concentré de tomates
Concept Map for NHK
Survey Analysis
Ad

Similar to Master's_Thesis_XuejiaoHAN (20)

PDF
Semester Project 3: Security of Power Supply
PDF
Thesis-MitchellColgan_LongTerm_PowerSystem_Planning
PDF
andershuss2015
PDF
PDF
Multidimensional optimal droop control for wind resources in dc m 2
PDF
bachelors-thesis
PDF
PDF
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
PDF
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
PDF
Agathos-PHD-uoi-2016
PDF
Agathos-PHD-uoi-2016
PDF
FULLTEXT01 (1).pdf
PDF
Ee380 labmanual
PDF
Project report on Eye tracking interpretation system
PDF
mechatronics lecture notes.pdf
PDF
mechatronics lecture notes.pdf
PDF
PDF
PDF
BE Project Final Report on IVRS
PDF
Semester Project 3: Security of Power Supply
Thesis-MitchellColgan_LongTerm_PowerSystem_Planning
andershuss2015
Multidimensional optimal droop control for wind resources in dc m 2
bachelors-thesis
Efficiency Optimization of Realtime GPU Raytracing in Modeling of Car2Car Com...
Integrating IoT Sensory Inputs For Cloud Manufacturing Based Paradigm
Agathos-PHD-uoi-2016
Agathos-PHD-uoi-2016
FULLTEXT01 (1).pdf
Ee380 labmanual
Project report on Eye tracking interpretation system
mechatronics lecture notes.pdf
mechatronics lecture notes.pdf
BE Project Final Report on IVRS

Master's_Thesis_XuejiaoHAN

  • 1. Dynamic Programming Control for Smart Home Xuejiao HAN
  • 3. Institute for Data Processing Technische Universität München Master’s thesis Dynamic Programming Control for Smart Home Xuejiao HAN September 23, 2015
  • 4. Xuejiao HAN. Dynamic Programming Control for Smart Home. Master’s thesis, Technis- che Universität München, Munich, Germany, 2015. Supervised by Prof. Dr.-Ing. K. Diepold and Johannes Feldmaier / Dominik Meyer; submit- ted on September 23, 2015 to the Department of Electrical Engineering and Information Technology of the Technische Universität München. c 2015 Xuejiao HAN Institute for Data Processing, Technische Universität München, 80290 München, Germany, http://www.ldv.ei.tum.de. This work is licenced under the Creative Commons Attribution 3.0 Germany License. To view a copy of this licence, visit http://creativecommons.org/licenses/by/3.0/de/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California 94105, USA.
  • 5. Preface It brings me great pleasure to thank the people who helped to make this thesis possible. I wish to express my sincere thanks to department of electrical engineering and infor- mation at TU Munich for hosting the master thesis. I am very grateful to Prof. Klaus Diepold, who gives me this opportunity to step into the field of data processing. I also want to thank my supervisors, Dominik Meyer and Johannes Feldmaier, for their supervision, support and valuable advices in many regards in elaborating this interesting thesis. I would like to thank my parents, my friends for their love and support. A special thanks also goes to my friend Ke Wang, who spent a lot of time discussing the thesis with me and helped me a lot within this project. I cannot finish this thesis without their sacrifices and contributions. Xuejiao HAN September 23, 2015 3
  • 7. Contents 1. Introduction 11 1.1. Photovoltaic Generation in Germany . . . . . . . . . . . . . . . . . . . . . 11 1.2. Economics of Residential PV System . . . . . . . . . . . . . . . . . . . . 12 1.2.1. Feed-in Tariff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2.2. Cost Calculation of a Sample Residential PV System . . . . . . . . 14 1.3. Energy Demand and Supply . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4. Energy Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5. Structure of the Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2. Theories 21 2.1. Markov Decision Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2. Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3. Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.4. Approximate Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 25 2.4.1. Policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.4.2. Method for Approximating Functions . . . . . . . . . . . . . . . . . 26 3. Problem Statement 29 3.1. Electric System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1. State of the System . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.2. Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.1.3. Transition Functions . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.4. Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1.5. Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2. Boundary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.1. Princeton Energy Storage Benchmark Datasets . . . . . . . . . . . 32 3.2.2. KNUBIX Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.3. Additional Boundary Conditions . . . . . . . . . . . . . . . . . . . 33 4. Learning Methodology 35 4.1. Overall Implemented Structure . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2. Rule-based Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.1. Without Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2.2. With Battery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3. Simple Threshold Control . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5
  • 8. Contents 4.4. Linear Programming Formulation . . . . . . . . . . . . . . . . . . . . . . . 40 4.4.1. Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.4.2. Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.5. Dynamic Programming Formulation . . . . . . . . . . . . . . . . . . . . . 44 4.5.1. DP Formulation for Deterministic Problems . . . . . . . . . . . . . 50 4.5.2. Scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5.3. Scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.5.4. DP Formulation for Stochastic Problems . . . . . . . . . . . . . . . 54 4.6. Approximate Dynamic Programming Formulation . . . . . . . . . . . . . . 61 4.6.1. A Linear Lookup Table Approximation . . . . . . . . . . . . . . . . 64 4.6.2. SPAR Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5. Evaluation of the Approach 81 6. Conclusions and Future Work 85 A. Appendix 87 6
  • 9. List of Figures 1.1. Global annual solar irradiance on a horizontal surface in Germany between 1981 and 2010 [DWD] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.2. Munich Monthly PV Production for System Size 5.5 kW DC . . . . . . . . . 13 1.3. Munich Monthly Solar Radiation for System Size 5.5 kW DC . . . . . . . . . 14 1.4. Development of feed-in tariff for small rooftop PV systems under 10 kWp [IEA-PVPS] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.5. Previous development of the feed-in tariff and retail consumer tariff (Sources: IEA-PVPS, BDEW; retail electricity price: average residential tariffs for 3-person Household consuming 3,500 kWh of electricity per year) . 16 1.6. Average hourly load profile from Germany for different quarters in 2014 [ENTSO-E] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.7. Structure of the average hourly load curve on Weekdays . . . . . . . . . . . 18 1.8. Structure of the average hourly load curve on Weekends . . . . . . . . . . . 19 2.1. The flow of interaction in MDP . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.1. Energy flow of the electric system . . . . . . . . . . . . . . . . . . . . . . . 30 3.2. Electricity Price Tariff [SWM, 2015] . . . . . . . . . . . . . . . . . . . . . . 34 3.3. Spot market price on the European Power Exchange (EPEX) on 29.03.2015 34 4.1. Flow chart of the learning methodology . . . . . . . . . . . . . . . . . . . . 36 4.2. Profiles of power flow among PV system without battery, load and grid for rule-based algorithm for the 29th of March . . . . . . . . . . . . . . . . . . 37 4.3. Profiles of power flow among PV system with battery, load and grid for rule- based algorithm for the 29th of March . . . . . . . . . . . . . . . . . . . . . 39 4.4. SOC schedule of batteries with rule-based algorithm against PV generation 40 4.5. SOC schedule of batteries with rule-based algorithm against variable elec- tricity tariffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.6. Profiles of power flow among PV system, load and grid for threshold algo- rithm with battery for the 29th of March . . . . . . . . . . . . . . . . . . . . 43 4.7. SOC schedule of batteries for threshold algorithm against solar generation . 44 4.8. SOC schedule of batteries for threshold algorithm against variable electricity tariffs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.9. Profiles of power flow among PV system with battery, load and grid for LP algorithm for the 29th of March under scenario A . . . . . . . . . . . . . . . 46 7
  • 10. List of Figures 4.10.SOC schedule of batteries for LP algorithm against solar generation under scenario A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.11.SOC schedule of batteries for LP algorithm against SWM electricity tariff . . 48 4.12.Profiles of power flow among PV system, load and grid for LP algorithm with battery for the 29th of March under scenario B . . . . . . . . . . . . . . . . 50 4.13.SOC schedule of batteries for LP algorithm against solar generation under scenario B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.14.SOC schedule of batteries for LP algorithm against EPEX market price . . . 52 4.15.Forward DP algorithm flowchart . . . . . . . . . . . . . . . . . . . . . . . . 53 4.16.Path search in forward DP algorithm . . . . . . . . . . . . . . . . . . . . . 54 4.17.Optimal deterministic forward DP storage algorithm against electricity price for the system in different cases under scenario A . . . . . . . . . . . . . . 55 4.18.Optimal deterministic forward DP storage algorithm against PV generation for the system in different cases under scenario A . . . . . . . . . . . . . . 56 4.19.Optimal deterministic forward DP storage algorithm power profile for the sys- tem in different cases under scenario A . . . . . . . . . . . . . . . . . . . . 57 4.20.Optimal deterministic forward DP storage algorithm against electricity price for the system in different cases under scenario B . . . . . . . . . . . . . . 58 4.21.Optimal deterministic forward DP storage algorithm against PV generation for the system in different cases under scenario B . . . . . . . . . . . . . . 59 4.22.Optimal deterministic forward DP storage algorithm power profile for the sys- tem in different cases under scenario B . . . . . . . . . . . . . . . . . . . . 60 4.23.Results of linear ADP algorithm and sample path from test problem S1 . . . 68 4.24.Approximate path obtained by linear ADP vs. optimal path . . . . . . . . . . 69 4.25.Results of piecewise linear ADP algorithm (a=1) and sample path from test problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.26.Approximate path obtained by piecewise linear ADP (a=1) vs. optimal path . 75 4.27.Results of piecewise linear ADP algorithm (a=10) and sample path from test problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 4.28.Approximate path obtained by piecewise linear ADP (a=10) vs. optimal path 77 4.29.Results of piecewise linear ADP algorithm (a=100) and sample path from test problem S1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.30.Approximate path obtained by piecewise linear ADP (a=10-) vs. optimal path 79 4.31.Objective values for different stepsize rule parameters . . . . . . . . . . . . 79 8
  • 11. Abstract The primary purpose of this thesis is to devise near-optimal control policies for a grid connected residential photovoltaic (PV) system with storage device. It is formulated as a dynamic multi-period energy storage optimization problem and solved using different algorithms. We begin with some simple algorithms like rule-based control, simple threshold control and linear programming, which do not consider the stochastic characteristics of solar en- ergy, energy demand and electricity price. However, linear programming (LP) does provide an optimal result for deterministic problems and serves as a benchmark to compare other algorithms. Then we implement a dynamic programming (DP) algorithm, which is formed as a recursive process and proceeds one step at a time. To solve the “curse of dimen- sionality” problem occurred in dynamic programming, we construct two approximate dy- namic programming (ADP) algorithms: one using a linear regression model and the other using a piecewise linear approximation model. Since the accuracy of approximation in ap- proximate dynamic programming is sample-based, we used the Princeton energy storage benchmark datasets to improve the policy and compared it to the optimal policy obtained from the benchmark dataset. The particularity of this thesis remains in the consideration of residential photovoltaic system in real life. Simulations were carried out over one exem- plary day, based on the data from a real residential PV system with battery. Optimization policies are developed according to an analysis of Germany’s current energy policy and have high application potentials. Computational results show that compared to the DP algorithm, ADP algorithms can achieve near-optimal performance with reasonable computational time. Comparative re- sults of all methods are provided and analyzed. 9
  • 13. 1. Introduction According to the study of Fraunhofer ISE, renewable energy as a whole (RE) has reached approximately 31% of the Germany’s gross power consumption in 2014. Long-term min- imum targets of the German government are 35 % by 2020, 50% by 2030 and 80% by 2050, and finally increase the share of renewable energy to the country’s overall electricity consumption. Among all renewable energies photovoltaic (PV) is regarded as a major part. In 2014, PV generated power totaled 35.2 TWh and covered around 6.9% of Germany’s net elec- tricity consumption while roughly 6.1% of Germany’s gross electricity consumption [BDEW, 2015]. On sunny weekdays, PV power can at times cover 35% of the momentary electricity demand, and on weekends and holidays up to 50%. A study conducted by Royal Dutch Shell, which is entitled “New Lens Scenarios”, presents that PV will grow into the most important primary energy source by 2060. PV energy is attractive regarding to economical and environmental aspects on one hand and on the other hand it reduces grid operating and transmission costs. A further advan- tage of feeding in PV is that in addition to feeding in real power, PV plants may contribute towards improving grid stability and quality. In this introduction the current situation and state policies for residential rooftop pho- tovoltaic systems are analyzed. Additionally, photovoltaic generation and consumption in Germany are discussed. 1.1. Photovoltaic Generation in Germany Figure 1.1 shows levels of irradiance across Germany. The average total horizontal irra- diance in Germany between 1981 and 2010 stands at 1,055 kWh/m2 per year and fluc- tuates according to location between approximately 951 kWh/m2 and 1,257 kWh/m2 per year [DWD]. The average daily solar insolation value1 on a flat plate PV system determined using PV Watts for Munich is about 3.42 kWh/(m2 d). Figure 1.2 shows the monthly photovoltaic production and solar radiation for Munich with a system size of 5.5 kW DC (see Appendix A for details of the PV system). Solar radiation in Munich ranges from 1 kWh/m2 per day to 6 kWh/m2 per day for dif- ferent seasons, while PV production varies between 129 kWh to 768 kWh per month. 1 It refers to the solar insolation which a particular location would receive if the sun were shining at its maxi- mum value for a certain number of hours. Since the peak solar radiation is 1 kW/m2, the number of peak sun hours is numerically identical to the average daily solar insolation. 11
  • 14. 1. Introduction Figure 1.1.: Global annual solar irradiance on a horizontal surface in Germany between 1981 and 2010 [DWD] Calculations show that the yearly PV production for the sample size PV system in Munich is 5,512 kWh, while an average 4-person household consumes about 5,009 kWh electricity per year. It means that the electricity production of the sample size PV system would be sufficient to supply the equivalence of a 4-person-family’s annual electricity needs. 1.2. Economics of Residential PV System In recent years, the decrease in investment and electricity generation costs makes PV systems continually attractive. An analysis published by BSW-Solar, the German Solar Industry Association, demonstrates that system prices have reduced by more than 50% in the last few years and the average price for PV rooftop systems of less than 10 kW arrived 12
  • 15. 1.2. Economics of Residential PV System 0 100 200 300 400 500 600 700 800 900 1 2 3 4 5 6 7 8 9 10 11 12 PVProductioninkWh Month Figure 1.2.: Munich Monthly PV Production for System Size 5.5 kW DC at around 1,640 EUR/kWh in 2014. Furthermore, the Levelized Cost of Energy (LCOE)2 for a small rooftop PV system in Germany is around 0.16 EUR/kWh whereas the electricity price for a private household is around 0.25 EUR/kWh. Moreover, PV energy should be regarded as an economical choice due to their negligible marginal costs. 1.2.1. Feed-in Tariff To encourage the development of renewable energy techniques, The German Renewable Energy Sources Act (German: Erneuerbare-Energien-Gesetz, EEG) has been introduced and came into force in the year of 2000. The EEG accelerated the German energy tran- sition from fossil and atomic energy to green energy significantly. Figure 1.4 shows the development of the feed-in tariff (FiT) for small rooftop systems (< 10 kW) since 2001. All rates are guaranteed for an operation period of 20 years, independent of the start-up date. Fifteen years have already passed since Germany introduced the feed-in tariff (FiT) system in 2000. For a long time, feed-in tariffs were significantly higher than the average residential electricity tariff (see Figure 1.5), which resulted in a period of energy supply from 2001 to 2011. During this period of time, private rooftop PV system owners preferred selling electricity to the power grid to consuming the electricity themselves, for they could buy electricity from the grid at a lower price and benefit from feed-in tariff policy. In recent years, the EEG feed-in tariff for PV energy has reduced dramatically, while electricity price has taken a strongly opposing trend. Since the beginning of 2012, newly installed, small rooftop installations (<10 kW) have achieved grid parity 3 . 2 represents the per-kilowatthour cost of building and operating a generating plant over an assumed financial life and duty cycle and is often cited as a convenient summary measure of the overall competitiveness of different generating technologies. 3 Grid parity occurs when an alternative energy source can generate electricity at a levelized cost of energy that is less than or equal to the price of buying electricity from the power grid. 13
  • 16. 1. Introduction 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 10 11 12 SolarRadiationin(kWh/m^2/day) Month Figure 1.3.: Munich Monthly Solar Radiation for System Size 5.5 kW DC From this intersection point (between 2001 and 2002) onwards, self-consumption has become the most attractive and profitable business model for every new PV system owner. In 2015, feed-in tariff for one kilowatt-hour (kWh) electricity from PV has decreased to 12.56 Eurocents, while one kilowatt-hour from grid costs 28.81 Eurocents, more than twice of the feed-in price. The increasing gap between retail electricity price and feed-in tariff encourages the PV owners to maximize their self-consumption rate. In order to increase self-supply ratio, a smaller-sized PV system is expected to be an economical option for the future according to the inverse relationship between PV system size and its self-consumption rate. Besides reducing the system size, residential battery system (RBS), which allows a load shift between electricity peak and off-peak hours, could be regarded as a solution to realize a higher self-consumption rate. This kind of energy management is also able to cope with hourly, daily and seasonal fluctuations in PV power generation. The residential battery system will be further discussed in Section 1.4. 1.2.2. Cost Calculation of a Sample Residential PV System Depending on irradiance and performance ratio (PR), specific yields of around 900-950 kWh/kWp are typically generated in Germany and in sunnier regions up to 1,000 kWh/kWp. To satisfy the electricity consumption of a 4-person household, about 20 typical “150 watt” PV modules are required, which corresponds to 20 square meters of panels. According to the aforementioned information, the sample size PV system for a 4-person household in Munich is chosen to be 5.5 kW DC. A simplified cost calculation for this sample residential PV system is as follows. The investment time is assumed to be 20 years. To simplify the calculation process, 14
  • 17. 1.2. Economics of Residential PV System 0 10 20 30 40 50 60 70 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Eurocents/kWh Year Figure 1.4.: Development of feed-in tariff for small rooftop PV systems under 10 kWp [IEA-PVPS] electricity price and feed-in tariff are assumed to be invariable, and the energy demand is covered by PV generation. • Installation cost: EUR 18,467 PV system (Invert efficiency 96%): 5.73 kWp x EUR 1,600 per kWp= EUR 9,167 Battery system with 5.5 kWh (KNUT Basix): EUR 9,300 • Average energy consumption: 5,009 kWh per year • Average energy generation (PWatt): 5,512 kWh per year • Self-consumption (without consideration of the 60% feed-in constraint): 5,009 kWh x 0.35 EUR/kWh x 20 Jahre = EUR 35,063 • Feed-in (5,512 kWh-5,009 kWh = 503 kWh): 503 kWh x 0.12 EUR/kWh x 20 Jahre = EUR 1,207 • Profit: EUR 35,063 + EUR 1,207-EUR 18,467 = EUR 17,803 However, in real situations the hourly, daily, weekly and seasonal fluctuations in PV power generation cannot be ignored. It means that the 100% utilization rate of PV genera- tion is only an ideal level, which cannot be achieved in real situations. To ensure a higher utilization rate of PV energy and a higher self-consumption rate, we need to manage the demand via signals from PV system, battery system and power grid, as well as electricity price and feed-in tariff signals from the energy market. 15
  • 18. 1. Introduction 0 10 20 30 40 50 60 70 2005 2007 2009 2011 2013 2015 Euroct/kWh Year retail electricity price feed-in tariff Grid-parity Period Self-consumption Period Feed-in Period Figure 1.5.: Previous development of the feed-in tariff and retail consumer tariff (Sources: IEA- PVPS, BDEW; retail electricity price: average residential tariffs for 3-person Household consuming 3,500 kWh of electricity per year) 1.3. Energy Demand and Supply Knowledge of household electricity consumptions is essential for the development of smart grid integration strategies. Most of the available data focus on aggregated results like total electricity demand or yearly residential electricity consumption. However, when managing a smart home with a photovoltaic (PV) system and a storage device, it is important to obtain detailed information. Residential electricity consumer data are well protected in Europe due to privacy con- cerns. Only a few companies monitor electricity consumptions at residential level and they are not keen on sharing these load profiles. Figure 1.6 describes the average hourly load profile from Germany for each quarter in 2014. Figure 1.7 and Figure 1.8 present the results of the average load curve per household on weekdays and weekends respectively. The data originate from a household electricity usage report from Intertek, based on a survey of 251 households in England that was undertaken to monitor the electrical power demand and energy consumption during the period from May 2010 to July 2011. Compared to Figure 1.6, it can be seen that the overall pattern of hourly residential load curve and hourly country load curve are alike. These figures present that the electricity consumption on weekends is obviously higher than that on weekdays. The use of washing machine is supposed to be mainly responsible for this difference. Load peak for weekdays and weekends both occur on late afternoon around 6 pm. 16
  • 19. 1.4. Energy Management 0 10000 20000 30000 40000 50000 60000 70000 80000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Power(MW) Hours 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter Figure 1.6.: Average hourly load profile from Germany for different quarters in 2014 [ENTSO-E] 1.4. Energy Management One of the most important fields of research and development (R&D) in German PV in- dustry is economical operation of grid-connected and off-grid PV system solutions includ- ing energy management and a storage system. Storage devices at home allow the shift of the consumption of PV power, which reduces peak demand and also increases self- consumption at the same time. A study from Fraunhofer ISE indicates that a grid-optimized PV/battery operation reduces the feed-in peak of all systems by about 40% (1). Potential values of a storage device in managing intermittency of renewable-source electricity is discussed in (2). Therefore, researches and efforts today are concentrated on efficient energy storage systems (ESS). Much of the recent research seems to be focused on weather prediction and household electricity consumption estimation besides energy flow optimization (3), (4). Different methods to predict electricity consumption are summarized and compared in (5). In this thesis only optimized control of energy storage and flow will be discussed. 1.5. Structure of the Work The structure of the thesis is the following. Chapter 2 presents a literature overview on var- ious optimization algorithms, focusing on those who are appropriate for energy manage- ment problems. As one representative the approximate dynamic programming algorithm is introduced, along with its characteristics and realization methods. Afterwards Chapter 3 specifies the analysis and mathematical modelling of a storage problem and provides a brief description of the boundary conditions for the proposed system. In Chapter 4 poten- tial solutions to the problem are discussed. An overview of the implementations are given, using the datasets provided by Princeton and the data supplied by the manufacturer “KNU- 17
  • 20. 1. Introduction Figure 1.7.: Structure of the average hourly load curve on Weekdays BIX”. Thereafter, validation and qualities of different algorithms are evaluated in Chapter 5. Chapter 6 summarizes the main conclusions of the thesis and proposes topics for future work. The Appendix contains additional model information. 18
  • 21. 1.5. Structure of the Work Figure 1.8.: Structure of the average hourly load curve on Weekends 19
  • 23. 2. Theories In this chapter the theories used in this work are discussed. The first section comprises the features of a Markov decision process. The second part introduces the Linear Pro- gramming method, which is a simple static solution based on simplification of the problem. The last part begins with a brief introduction to Dynamic Programming and then details the key components and basic concepts of Approximate Dynamic Programming. 2.1. Markov Decision Process Markov property, which is named after the Russian mathematician Andrey Markov, de- scribes a memoryless property of a stochastic process: given the present state s and the action a, the value of the next state s is independent of all previous states and ac- tions. Markov Decision Process (MDP) is a discrete time stochastic control process, which satisfies the Markov property and provides an algorithm for making optimal decisions. A MDP model contains: • a finite set of states S, • a finite set of actions A, • a reward function R(s, a), • a state transition probability matrix Pa(s, s ), which describes the probability from state s at time t under action a to next stage state s . For a discounted Markov decision process another key ingredient is the discount factor γ ∈ [0, 1], which represents the effect of the future rewards on the present decision. In a MDP, the probabilistic sequential model can be described as follows (6). At each discrete time step, the decision maker measures the current state of the environment and executes an action according to the current policy. As a result, the process transfers into a next state s with a certain probability and an immediate reward is received by the de- cision maker after the transition from the current state s to the next state s . The rewards and transition probabilities are functions of the current state s and the action a. In real situations, the reward can be profits in asset acquisition problems, cost of time or length of path in transportation problems. It can also be a function, taking several factors with different weights into consideration. A sequence of rewards will be received at the end of the simulation time. The goal of the algorithm is to maximize the cumulative reward at the end of the whole simulation period. This process is represented in Figure 2.1. 21
  • 24. 2. Theories state   reward   action   Agent Environment Figure 2.1.: The flow of interaction in MDP According to the different types of transition process, MDP problems can be divided into two classes: deterministic problem and stochastic problem. In deterministic problems, the next state s is determined when given the present state s and the action a, while the next state in stochastic MDP is unstable even though the current state and the action are known. The one-step transition probability Pa ss , which can be described as Pa ss = Pr{st+1 = s |st = s, at = a}, (2.1) and the expect value of the next reward Ra ss , which is Ra ss = E{rt+1|st = s, at = a, st+1 = s }, (2.2) completely specify the dynamics of a finite MDP (7). MDP is very useful in dynamic pro- gramming, since the accumulated rewards are only assumed to be the function of the current state. MDPs can be solved via linear programming or dynamic programming. 2.2. Linear Programming Linear programming (LP) is an approach used to optimize a linear objective function, sub- ject to constraints and bounds. The objective function, which is to be maximized or mini- mized, is formed as a linear combination of a series of decision variables x = {x1, x2...xn}: f = c1x1 + c2x2 + ... + cnxn. Constraints can be divided into linear equality and linear inequality constraints. The sim- plest and most popular constraint is the requirement that all decision variables be non- negative. Thus, to determine the optimal decision vector x, LP problems can be expressed 22
  • 25. 2.2. Linear Programming in a standard form: maximize c1x1 + c2x2 + ... + cnxn subject to a11x1 + a12x2 + ... + a1nxn ≤ b1. a21x1 + a22x2 + ... + a2nxn ≤ b2, ... am1x1 + am2x2 + ... + amnxn ≤ bm, and x1, x2, ..., xn ≥ 0, where a11, ..., amn, b1, ..., bm, c1, ...cn are constant variables. A proposal of specific values for the decision variables is called a solution (8). The solu- tion for a LP problem is feasible if it satisfies all constraints. Among all feasible solutions, the one which obtains the maximum or minimum objective is called the optimal solution. On the contrary, a solution is infeasible if it contradicts any of the constraints. There is another situation called unbounded, in which case the optimal objective value is infinitely large. The LP optimization algorithm is applied essentially for one-stage problem, while an- other important field of application of LP is multi-stage optimization. A multi-stage linear programming may be referred to as a dynamic model and can be formulated as linear problems with dynamic matrix. For deterministic problems, sum of the sub-problems can be regarded as a new large scale linear programming problem, and the effectiveness of this method is based on an accurate estimation and prediction of the energy price, de- mand and the amount of the exogenous energy information in the future (9). The principle of this method is simple and it is easy to develop the algorithm for a multi-stage opti- mization problem. However, the drawback of this algorithm is obvious: when computing large-scale problems with many more periods, there would have numerous equivalent and in-equivalent constraints with numerous parameters. In this case, extraordinarily high com- putational cost and time cost will be produced, which could make the problem intractable. For stochastic problems, in which case there is an unpredictable disturbance in the system, the problem needs to be solved over all possibilities and this results in high computational cost. The elements in constraint matrices will be a function of several parameters and vary stochastically from time to time. Stochastic linear programs were first introduced by (10). The problem we have considered here is a multi-period planning problem, which means the current decision cannot be decoupled from the decisions in future periods. Take the electricity market as an example, if we produce more electricity than needed, the extra production might be stored and be used in the next period, which resulted in holding cost of the storage device and energy savings in the future. There are different methods to deal with the linear optimization problem. The most com- mon algorithms used in linear programming are simplex method and interior point method. Linear programming problems consist of continuous problems and discrete problems. To solve the discrete problem, mixed-integer linear programming (MILP) has to be used. 23
  • 26. 2. Theories 2.3. Dynamic Programming The term Dynamic Programming (DP) refers to a collection of algorithms, which can be used to find an optimal policy that maximizes the cumulative return with a given model. Dynamic Programming (DP) was first developed by Professor Richard Bellman for solving multi-stage stochastic decision process. For most of the early DP problems, which were formed as the calculus of variations problems and using backward induction process to search for the optimal decisions, the application of DP to control the deterministic process was not expected (11). There is a close relationship between dynamic programming and reinforcement learn- ing. DP algorithms are used to solve the optimization problem when a system model is available, while Reinforcement Learning (RL) algorithms are model-free and mainly focus on learning from the interactions between agents and environments. However, both DP problems and RL problems can be formulated as a Markov decision process (MDP) (12). The principle of DP is to break a complex problem into a collection of simpler subprob- lems and store the results of the subproblems to avoid computing the same subproblem again. The overall optimal solution is the combination of the solutions of these subprob- lems. For a complex problem, it is important to define the subproblem, sometimes these smaller subproblems are not obvious. After division into a sequence of subproblems, the key of DP to obtain the solutions of the problem is the use of a value function to search for a good policy, which will be described in the following sections. All dynamic programming problems can be written in a recursive way, using the value function in current state at a particular point of time and the value of the state that we transfer to at the next time point. To do so, we need to define a value function Vt (St ), which represents the value of being in state St at time point t. Compared to reward Ct (St , xt ), which evaluates the result of the action in an immediate sense, values can be considered as cumulative rewards in the long run. The basic idea of the recursive form is to take the effects of the future state into consideration. This equation is known as dynamic program- ming equation or Bellman equation and can be written as Vt (St ) = max x (Ct (St , xt ) + St+1∈Ω P(St+1|St )Vt+1(St+1)), (2.3) where P(St+1|St ) describes the possibility at state St to transition to next state St+1 at time point t and reflects the uncertainty in the stochastic problems and Ω is the set of the possible next state. For deterministic problems P(St+1|St ) = 1 or 0. Normally, we need to discretize the state variable of the optimization problem and derive the optimal decision (policy) at each time point t using backward recursion. For determin- istic problems, it is not difficult to use backward recursion to solve the equation. If both the current cost ct and the value of next state at next time point Vt+1(St+1) are known and can be written as functions of current state St , we can solve the problem by differentiating the Bellman function with respect to the state variable and setting the derivative to be zero (assuming that we are maximizing a continuously differentiable, concave function). Given 24
  • 27. 2.4. Approximate Dynamic Programming the initial state of the problem and the calculated optimal decisions, we can easily proceed the process state after state until the end of the simulation time. The weakness of DP is the high memory needs, especially for long period or high time resolution. And for almost all DP problems the state of the problem is not one-dimensional but a vector. For example, we may have n possible stocks to deal with and each share has m possible prices, then we would have nm different states. In some cases, we even have multi-dimensional decisions to make. This problem limits the application of DP algorithms and is known as “curse of dimensionality”, which describes the explosion of the state size with the growing number of dimensions (see (13), Chapter 5). However, despite the high physical storage cost DP algorithm has a low computational cost by storing previous values and avoiding multiple recomputations. The principle of breaking down the complex problem into a sequence of much simpler subproblems pro- vides a deeper insight into the nature of the problem and makes it simple to build the algorithm. 2.4. Approximate Dynamic Programming Because of the requirements to compute and store the value of each discrete state, large scale dynamic programming problem always becomes intractable. Potential solutions are provided by approximate dynamic programming (14), (13), which substitute the exact value function with a statistical approximation. In exact Dynamic Programming, in general, we step backward in time to compute the exact value function and use the knowledge of deterioration to produce the optimal decisions and then move to the next stage state and do it again until the start point. However, when we step forward in time, we need to make “approximate” decisions based on an approximation of the value function. An appropriate approximation of the value function is regarded as the key to solve the ADP problem. The essence of ADP is to replace the true value function with a statistical approximate function, which is much easier to calculate and can be updated through iterations (15). 2.4.1. Policies A policy can be regarded as a rule that determines a decision given the state of the system. There is a range of policies in different forms that deal with dynamic programming. In (13) the policies are basically grouped into four broad categories: Myopic policies Without regard to the effect of the decisions from the future, this can be seen as the most elementary form of policies. The value function in the Bellman equation is assumed to be zero. The principle of the most basic form of myopic policies is nothing more than to choose the optimal decision to maximize the contribution in an immediate sense, which is given by: 25
  • 28. 2. Theories A(St ) = arg max x C(St , x). Policy function approximations With policy function approximations a policy or a deci- sion will be captured from the state without using the forecasts directly. We might introduce a threshold price to our energy system and the approximation here could be simple func- tions such as a rule to store energy in the battery when prices are lowest during the day and release energy when prices are highest. Value function approximations Compared to policy function approximations, value functions approximations return an estimated value of the function in a determined state rather than a state-action pair, which is the fundamental element of Q-learning. Since the complicated value function can be replaced with an approximation of some form, this pol- icy is considered as the most valid approach to solve the Dynamic Programming problem. A description of the strategies, which are used to approximate the value functions, will be seen in the next part. Lookahead policies This is a method for optimizing the decision now based on the future information over some horizon. The time horizon depends on the algorithm, and the exogenous information will also be taken into account to make a better approximation. Rolling horizon approximation is one of the most popular lookahead policies. 2.4.2. Method for Approximating Functions The main idea of approximate dynamic programming is to approximate a value function for making decisions.This section addresses three most popular ways to approximate value function. Lookup tables and aggregation Lookup tables can only be applied for discrete state variables. It returns to an approximation of the value function with a given state s. It is simple and valid, but sometimes it is not easy to initialize and to improve the value in the lookup table. The most serious disadvantage of this method is its high memory requirement. In order to solve the “curse of dimensionality” in the application of DP, we might aggre- gate the original state to lower the resolution of the state variable and to decrease the dimension of the state space. This approach results in a simpler lookup table and signif- icant reduction in computational cost. The aggregation can be done by simply ignoring a dimension, discretizing it, aggregating the classification or by any other ways to reduce the complexity of the problem. For example, in an electricity management problem we might want to aggregate the state space by discretizing the time from minutes to hours. 26
  • 29. 2.4. Approximate Dynamic Programming Aggregation is only used for the approximation of the value function. In the transition function we still use the original, disaggregated state variable. Parametric models The most essential part of this method is to find a sequence of proper basis functions and to optimize the parameter vector, which can be seen as a process to determine the most important features and the corresponding weight of each feature for the problem with sample realizations. Normally we might find the parameter vector with a regression model by minimizing the mean square of the error between sample observations and our predictions. The quality of the results is primarily based on the design of basis functions. Nonparametric models The effectiveness of the parametric approach depends on an appropriate mathematical model. However, some problems might not correspond to any specific parametric model. The parametric model now will be taken as a restriction and could cause big errors between approximations and the real observations. The fundamen- tal purpose of nonparametric model is to receive a well-built local approximation without being limited to specific function model. 27
  • 31. 3. Problem Statement The goal of the electricity management optimization is to maximize the storage operation revenue and minimize input power from the grid under given energy price tariff. The fun- damental element that we need to achieve the target is a well-built mathematical model. In this chapter we first provide a mathematical description of the electric system, then boundary conditions used in simulations are shown. 3.1. Electric System Model We consider the problem of managing power flow among solar element, storage device, grid and consumer, while minimizing the energy expenses. The problem is described in more detail in (16). Our system has three parts concerning electricity: • Local generation. The house is equipped with a solar system. On one hand, elec- tricity may flow directly from the solar panel to the storage device or it may be used to satisfy the demand. On the other hand, excessive energy may also be sold to the power grid under the spot price, which is also realized in our model. • Consumption. The residential demand for electricity in our model can be satisfied by the power from the grid, local generator or from the storage device. • Storage device. To deal with the intermittency of the renewable sources and the fluctuations of their output, storage device is supposed to be an appropriate solution. When the electricity price in the energy market is low or the generation of the solar system exceeds the demand, the surplus could be stored in the storage device for later consumption. During the periods when the price in the energy market is high or the load is greater than the generation, we can use the energy in the storage device to decrease the energy costs. Figure 3.1 shows the electric system consisting of energy storage, local generation (say, solar system), electric load and power grid. Arrows indicate power flows among them. Green arrow describes the flow, with which the householder could make profits, and the rest power flows are represented with red arrows. 29
  • 32. 3. Problem Statement Power  Grid Photovoltaic   System Rt   Storage  Device Demand xt GD xt SD xt RD xt SRxt RGxt GR Figure 3.1.: Energy flow of the electric system 3.1.1. State of the System According to Warren B. Powell, a state variable is the minimally dimensioned function of history that is necessary and sufficient to compute the decision function, the transition function, and the contribution function. In this storage optimization problem, the state of the system corresponds to the storage level of the battery Rt , which indicates the amount of energy in the storage device at time t. The current level of solar energy generation Et , the electricity demand Dt and the current price of electricity Pt are regarded as known information. 3.1.2. Decisions For each state Rt at time t, we must decide how much electricity to consume and how much to store in the battery to obtain the optimal result for the whole simulation time. The decision can be written as follows: xT t = (xSD t , xGD t , xRD t , xSR t , xGR t , xRG t ), where xIJ t is the amount of energy transferred from I to J at time t with solar, demand, storage and grid are denoted by S, D, R and G respectively. Based on the energy estimation and the information from the energy market, the main decision we should make to optimize the energy consumption is to determine the charge or discharge operation of the storage device at time t. 30
  • 33. 3.1. Electric System Model 3.1.3. Transition Functions Since the problem we have discussed in this thesis is a Markov decision process, the next state of the process depends entirely on the current state of the process and the current decision taken. We can define a transition function such that, given the current state St , the subsequent state St+1 of the process is given by: St+1 = SM (St , xt , Wt+1), where xt is the decision taken, Wt+1 is the new exogenous information that arrives between time t and t + 1, such as the change of the electricity price or abrupt battery leakage. The nth sample realization of Wt is denoted Wn t = ωn with sample path ωn ∈ Ω. In approximate dynamic programming, for each test problem, K different sample paths {ω1 ...ωk } will be simulated to improve the statistical estimation of the value function iteratively. The transition function for the energy in storage is: Rt+1 = Rt + ΦT xt , where ΦT = (0.0, −ηd , ηc, ηc, −ηd ) is a column vector that models the flow of energy into and out of the storage device with ηc and ηd denote charging and discharging rate of the storage device respectively. The energy change in the storage device is described by ΦT xt = ηc(xSR t + xGR t ) − ηd (xRD t + xRG t ), when ΦT xt is positive it means that energy flows into the storage device at time t, when negative it means that battery discharges at time t. 3.1.4. Objective Functions The cost function is composed of the benefits from selling the excessive energy, the cost of the electricity purchased from the grid and the holding cost of storage device. The cost function is expressed as follows: C(St ; xt ) = Pt ηd xRG t − Pt (xGR t + xGD t ) − ch(Rt + ΦT xt ), where the part Pt ηd xRG t indicates the profits from selling energy to the grid, Pt (xGR t + xGD t ) is the costs of buying energy from the grid, ch(Rt + ΦT xt ) describes the holding cost of the storage device and ch is a constant. The objective function amounts to maximize the final value of the cost function over the entire simulation time: F = max T t=1 C(St , xt ). (3.1) 31
  • 34. 3. Problem Statement 3.1.5. Constraints The followings are constraints for our model. They can be divided into three parts, all variables are supposed to be non-negative for all t: Energy Storage Level Denote Rc as storage capacity, γc as the maximum charging rate and γd as the maximum discharging rate. The energy supplied by the storage device is limited by the current amount of energy that is available in the storage device and the maximum discharging rate: xRD t + xRG t ≤ Rt , (3.2) xRD t + xRG t ≤ γd . (3.3) The energy charged into the storage device is limited by the storage capacity and its maximum charging rate: xSR t + xGR t ≤ Rc − Rt , (3.4) xSR t + xGR t ≤ γc. (3.5) Demand level Denote ηc as the efficiency of the charging process with 0 < ηc < 1 and ηd as the efficiency of the discharging process with 0 < ηd < 1. The demand at time t must be satisfied and cannot be shifted to a later time point: xSD t + ηd xRD t + xGD t = Dt . (3.6) Local generation level The energy supplied by the local generation cannot exceed the total generation of the solar system at time t: xSR t + xSD t ≤ Et . (3.7) 3.2. Boundary Conditions For the simulation dataset of data on a sample day from a real residential photovoltaic sys- tem with battery provided by KNUBIX GmbH and the Princeton energy storage benchmark datasets are used. 3.2.1. Princeton Energy Storage Benchmark Datasets The Princeton energy storage benchmark datasets are a series of finite horizon problems that consist of four components: renewable energy generator, load, storage-device and power grid. All variables are presented as unitless and can be set by the users based on appropriate understanding of the electric system. 32
  • 35. 3.2. Boundary Conditions Wind Data The wind is modeled using a first-order Markov chain. Demand Data Demand is assumed to be deterministic and given by Dt = max(0, 3 − 4 sin(2πt T )) . Price Process Two different stochastic price processes are tested in the Princeton datasets: sinusoidal and first-order Markov chain. 3.2.2. KNUBIX Dataset Data & Demand Data To simulate a realistic operation, the real recorded data of a five-person household with a 9.36 kWp PV-system and a 11kWh battery system on 19.03.2015 were used. The household and the electricity system profile originated from a KNUT 3.3 Intelligence system user in Waldburg. The original one-day data of this five-person family was simulated at the original resolution of 5 minutes. Battery The focus of the work is on energy management, the storage system is therefore considered as a black box and its chemical background is neglected. Only its electrical characteristics like capacity or maximum charging and discharging rate will be taken into account. A Lithium iron phosphate battery with a capacity of 11kWh is used by the KNUT system. Tariffs Variable tariff is a popular method to appeal the shifting of power consump- tion of the customers. This kind of electricity tariff policy has taken into force in many countries like China and Canada. Two kinds of electricity models are used in this research. Figure 3.2 presents the first variable tariff model, which is applied with 0.2942 EUR/kWh during daytime from 6 a.m to 10 p.m and 0.2412 EUR/kWh during night time [SWM Profi, 2015]. The second tariff model is shown in Figure 3.3, which tracks changes of the spot market price on the European Power Exchange (EPEX) on the sample day (29.03.2015). 3.2.3. Additional Boundary Conditions Simulation The simulations were carried out on MathWorks MATLAB R2014b for a time span of one day with a resolution of 5 minutes. The battery was built as a black box model with maximum capacity and efficiencies taken from literature. 33
  • 36. 3. Problem Statement 20 21 22 23 24 25 26 27 28 29 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 PriceinEurocents/kWh Hour (h) Figure 3.2.: Electricity Price Tariff [SWM, 2015] -5 0 5 10 15 20 25 30 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 priceinEurocents/kWh Hour/h Figure 3.3.: Spot market price on the European Power Exchange (EPEX) on 29.03.2015 34
  • 37. 4. Learning Methodology The methodology applied to achieve the optimal results are chosen according to the nature of the problem. The problem discussed in this thesis is a finite-time optimization problem with constraints. The preliminary objective of the control algorithm is to satisfy the house- hold’s electricity demand with minimum costs. Environmental factors and effects on power grid stability are not considered here. A rule-based algorithm is a fundamental static con- trol algorithm, which is based on human intelligence and can be easily applied for online control (17), (18) and (19). Besides rule-based control strategy, other two popular global optimization algorithms are developed: linear programming (20), (21) and dynamic pro- gramming (22), (23). Linear programming, which provides a global optimization result, will be used as a reference to compare different optimization algorithms. Although the flexibil- ity of dynamic programming allows it to apply to different kinds of optimization problems, it still suffers from the "curse of dimensionality" and is limited applicable to complicated large problems. Thus, approximate dynamic programming is introduced to make optimal decisions by approximating the value function in order to reduce the computational and time costs (24), (25) and (26). The methodologies based on above mentioned algorithms are designed in this chapter. Their results are compared and analysed, using the results of linear programming as a benchmark. 4.1. Overall Implemented Structure Figure 4.1 shows the structure of the implemented methodology. The prediction of the renewable energy generation (here we take PV energy for example) is not realized in this thesis. The agent collects and aggregates the data of demand, PV generation and electricity price. An optimization algorithm should be chosen among rule-based control algorithm, simple threshold control algorithm, linear programming, dynamic programming and approximate dynamic programming based on the complexity and characteristics of the system. According to the chosen optimizer, the optimal decision, which determines the optimal power flow among different parts of the electrical system, will be found and applied to maximize the agent’s expected discounted reward considering the affects of the future information and profits. The basic principle of this smart agent is minimizing the energy costs of the household by shifting the load and optimizing the power flow. Different optimization algorithms are developed in order to maintain a better balance among the PV generation curve, demand curve and the curve of the spot market price. 35
  • 38. 4. Learning Methodology Input •  Demand  data •  PV  genera1on  data •  Electricity  price  data •  PV  system  data Data   processing •  Aggrega1on Op1miza1on •  Rule-­‐based •  LP •  DP   •  ADP Output •  Power  flow •  Storage  level Figure 4.1.: Flow chart of the learning methodology An important part here is the storage device, which serves as a load shifter. The battery could buy and sell energy, depending on the evolution process of the spot market price and the solar energy generation. The details of the electric system have already been mentioned in Chapter 4.The simu- lation has been carried out in 24 hours for the 29th of March as an exemplary day to obtain the optimization results and to compare different optimization algorithms. 4.2. Rule-based Control A rule-based management control is simple and built from experience and heuristic knowl- edge. We design the optimization policy according to the nature of the problem and our objective. Though optimality of this algorithm is not guaranteed, it could serve to compare other algorithms. (27) presents a simple rule-based power management mechanism for grid connected PV systems with storage and compares its results with the optimization using dynamic programming. 4.2.1. Without Battery According to the analysis of the Feed-in tariffs and electricity retail price evolution pre- sented in Chapter 1, a rule-based control algorithm has been developed. The goal of the 36
  • 39. 4.2. Rule-based Control algorithm is to increase the self-consumption rate, which presents the difference between PV generation and energy that fed into the grid at the rate of the PV energy. The photovoltaic power is first used to cover the electricity demand. When the PV gen- eration is higher than the load, the excess energy will be fed into the power grid. If the generated solar energy is lower than the energy demand, electricity should be bought from the grid to cover the self-consumption. Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 6000 7000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 6000 Energy from Grid Solar to Grid Figure 4.2.: Profiles of power flow among PV system without battery, load and grid for rule-based algorithm for the 29th of March Figure 4.2 depicts the profiles of power flow among PV system without battery, load and grid for rule-based algorithm for the 29th of March. As there is no battery integrated in the system, the feed-in power equals to the difference between electricity demand and PV generation. It is obvious that the self-consumption rate in this case is low. 4.2.2. With Battery The difference between the rule-based algorithm with battery and the aforementioned al- gorithm without battery is the integration of a storage device in the system. The photovoltaic power is first used to cover the electricity demand. When the PV gener- ation is higher than the load, the battery will be charged until maximum capacity is reached. If the battery is full, the excess energy will be fed into the power grid. This control policy will not be influenced by the electricity price signal, for the feed-in tariff for one kWh from PV (12.56 Eurocents in 2015) is much lower than the retail electricity price (28.81 Euro- 37
  • 40. 4. Learning Methodology cents in 2015). If the generated solar energy is lower than the energy demand, especially during night hours, the energy in the storage device will be used to fulfill the requirements for household electricity consumption. The rule-based algorithm is described in Algorithm 1. Here we introduce a term “flag” as an indication of the difference between demand and PV generation for the whole day (demand and solar generation are assumed to be known). Algorithm 1 Rule-based Algorithm 1: Et ← solar energy generation at time t 2: Dt ← demand at time t 3: Pt ← electricity price at time t 4: P0 ← threshold price 5: if Et > Dt then 6: solar energy alone will cover all demands 7: if Pt > P0 then 8: the rest of the solar energy will be sold to the grid 9: else 10: the rest of the solar energy will be stored in battery 11: if Et < Dt then 12: the demand will first be covered by solar production, then battery, then grid 13: if t belongs to off-peak hours and flag > 0 then 14: battery charges Profiles of power flow among PV system with battery, load and grid for rule-based algo- rithm for the 29th of March is presented in Figure 4.3. As expected, compared to the case without storage device there is almost no feed-in power in this case, the energy exchange between PV system and demand is maintained to the maximum with the application of the battery. Figure 4.4 and Figure 4.5 show the SOC schedule of batteries with rule-based algorithm against PV generation and variable electricity tariffs respectively. During off-peak hours, the battery charges for electricity supply during peak hours. In the middle of the day, the battery still discharges even though the solar power reaches to peak, for the solar production is not sufficient to satisfy the electricity demand. 4.3. Simple Threshold Control In this section we will discuss a simple threshold control policy, which attempts to achieve a balance between power grid, storage device, demand and the photovoltaic system. The energy management is performed at the customer level. The goal of the control is to minimize the cost of the energy consumption while peak shaving is not considered here. 38
  • 41. 4.3. Simple Threshold Control Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy Exchange with Grid Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -5000 0 5000 10000 15000 Storage level delta R Figure 4.3.: Profiles of power flow among PV system with battery, load and grid for rule-based algorithm for the 29th of March There are also some studies that focus on the energy management on the utility operator level and aim to minimize the grid operational cost (28). It is assumed that the process of the electricity price is known in advance. The principle of the policy is therefore simple: try to store the energy in the battery when the price of electricity is low, and then use the energy to satisfy the demand when the price is high. When the electricity is expensive, the battery could be discharged in order to minimize the amount of energy purchased from the grid. The agent learns the history price information to determine a threshold price, which helps to optimize the charge and discharge operation of the battery. Although threshold control policy is significantly simpler than other optimiza- tion algorithms like linear programming or dynamic programming, finding optimal threshold parameters can also provide an effective algorithm, which is close to optimal policy. We determine the threshold price in two ways based on the strategies in (29): • The maximum and minimum prices are determined for a certain period of the past; Threshold Price=30% (Maximum Price-Minimum Price)+Minimum Price • The average price calculated from the historical data is determined to be the thresh- old price. The process of the simple threshold control is described in Algorithm 2. 39
  • 42. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 PVgeneration[W] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC Figure 4.4.: SOC schedule of batteries with rule-based algorithm against PV generation In Figure 4.6 we show a plot of the storage level obtained by threshold algorithm along with the solar energy generation and demand profiles corresponding to KNUBIX test prob- lem. 4.7 and 4.8 present the SOC process against PV generation and electricity price respectively. It is obvious that the threshold algorithm was not able to learn the behavior of the signals. Whether to charge or to discharge the battery is not only based on the energy spot price and the difference between demand and solar energy, but also on the storage level (SOC) of the battery. During high energy price period, the battery prefers discharg- ing to charging, while charging to discharging during low price period. In our simulation, a minimum limit of 20% is set for the depth of discharge (DOD) to increase the battery lifetime. Special attention was paid on German EEG, therefore the 60% feeding power limit was taken into account. It means, at anytime no more than 60% of the maximum solar energy should be sold to the grid, and the storage device is responsible for the excessive energy. 4.4. Linear Programming Formulation In this section we formulate a multi-stage electricity portfolio optimization problem and show how they can be solved when adapting multi-period linear programming method. With the aforementioned descriptions of the problem, it is evident that in the presented formulation, equation (3.1) - (3.7) represent a linear optimization problem at time t. The 40
  • 43. 4.4. Linear Programming Formulation Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.2 0.25 0.3 SOC Electricity price Figure 4.5.: SOC schedule of batteries with rule-based algorithm against variable electricity tariffs optimal solution can be obtained by solving the linear programming problem for each t given by equation (3.1) subject to equation (3.2) - (3.7). In this thesis, we only use linear programming to solve deterministic problems. It is as- sumed that the forecasting information is already available a day before the simulation day, which is realistic and the optimization results can be used to optimize the strategy on next day. The effectiveness of this methodology is not only based on the accuracy of prediction and forecasting, but also on the level of time resolution. For a linear programming problem with KNUBIX data, which aims to achieve a 24-hour optimization results with a resolution of 5 minutes, 288 time periods will be simulated. For each time point t, one equivalent and five in-equivalent constraints are applied. It means for one-day optimization problem, 5*288 in-equivalent constraints and 1*288 equivalent constraints will be computed. The dimension of the decision vector is 6*1, totally 288*6 decisions will be made at the end of the simulation time. Here we use the linprog function in Matlab “Optimization Toolbox” to solve the LP prob- lem, and interior point method is chosen to be the solution algorithm. Matlab Optimization Toolbox includes solvers for linear programming, nonlinear optimization, quadratic pro- gramming and mixed-integer linear programming, which can be used for different continu- ous or discrete problems. To solve the problem we first need to set up the Optimization Toolbox by choosing a solver and an algorithm. As inputs the objective function, initial state, equality constraints, inequality constraints and bounds are supposed to be provided. The set for the iterations 41
  • 44. 4. Learning Methodology Algorithm 2 Threshold Algorithm Et ← solar energy generation at time t 2: Dt ← demand at time t Pt ← electricity price at time t 4: P0 ← threshold price if Et > Dt then 6: solar energy alone will cover all demands if Pt > P0 then 8: the rest of the solar energy will be sold to the grid else 10: the rest of the solar energy will be stored in battery if Et < Dt then 12: all solar energy will be used to cover demand if Pt > P0 then 14: the rest of the demand should be covered first by battery else 16: the rest of the demand should be covered by grid if storage of the battery < demand - solar energy generation then 18: the rest of the demand will be covered by grid and tolerances is optional. Function in the editor can also be directly used to access the same outputs. We use the linear programming method for two electricity price scenarios. Scenario A uses the day-and-night tariff model provided by Stadtwerke Muenchen (SWM), while scenario B uses the spot market price on the European Power Exchange (EPEX) on the sample day (29.03.2015). Detailed information of these two price models have already been given in Section 3.2. 4.4.1. Scenario A Figure 4.9 illustrates the profiles of power flow among PV system with battery, load and grid for LP algorithm for the 29th of March with KNUBIX data under scenario A, while SOC schedule of the battery against electricity price and solar generation are shown in Figure 4.11 and Figure 4.10. As can be seen from Figure 4.11, the battery charges during off-peak hours at lower electricity price and experiences a frequent electricity exchange period during peak hours. The power that exchanges between electricity consumer and power grid is controlled within a certain range, which is favorable for power grid stability. At the end of the simulation time, the storage device discharges completely to reach the maximum return and the minimum cost, which can be observed as an arch in 4.11. 42
  • 45. 4.4. Linear Programming Formulation Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -5000 0 5000 10000 Storage level delta R Figure 4.6.: Profiles of power flow among PV system, load and grid for threshold algorithm with battery for the 29th of March 4.4.2. Scenario B Figure 4.12 presents the profiles of power flow among PV system with battery, load and grid for LP algorithm for the 29th of March with KNUBIX data under scenario B. Figure 4.14 and Figure 4.13 illustrate SOC schedule of batteries against electricity price and solar generation under scenario B. It can be seen that the battery charges a lot at the beginning of the day when the electricity price is low, although the electricity demand during that time is not very high. Since the objective function of the linear programming method is the sum of the cost function from time t = 1 to time t = T, the highest electricity price period (from about 6:00 p.m. to 10:00 p.m.) is known to the agent in advance. Thus, the storage device discharges possibly during high price period to satisfy the demand, consuming the energy that bought from the grid during low price period to avoid purchasing expensive electricity. Compared to the results under scenario A, the energy arbitrage revenue (from storing energy purchased at off-peak times and selling it at peak times) plays a more important role when applying spot market price tariff, since the price signal under scenario B experiences more changes. The actions of the battery in this case are more complicated and depend more on the spot market electricity price. Using a variable electricity price tariff like this, it is better to observe the responses of the system to the price signal. One of the main drawbacks of linear programming method is that the effectiveness of this method is based on an accurate estimation and prediction of energy price, demand and the 43
  • 46. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 PVgeneration[W] 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC Figure 4.7.: SOC schedule of batteries for threshold algorithm against solar generation amount of solar generation. Linear Programming is appropriate for deterministic problems and not for stochastic problems, in which case there is an unpredictable disturbance in the system and the problem needs to be solved over all possible exogenous information. Considering the volatility of electricity price in real energy market, dynamic programming and approximate dynamic programming are taken into consideration. The results of linear programming, however, can be regarded as the true optimal values and used as a benchmark to test the optimality of our ADP algorithm. 4.5. Dynamic Programming Formulation The DP algorithm whose flowchart is shown in Figure 4.15 has beeen developed with Mathwork Matlab 2014b software. The most important characteristic of DP formulation is the development of a recursive optimization procedure. In DP algorithm, a multi-stage decision problem is divided into several one-stage decision problems. Recursive procedure builds to an overall optimal so- lution of the complex multi-stage problem by first handeling the simple one-stage problem and sequentially moving one stage at a time and solving the following one-stage problem until the overall optimal solution is obtained. The basic principle of the recursive procedure is the so-called “principle of optimality” raised by Bellman: Principle of optimality Any optimal policy has the property that, whatever the current state and decision, the 44
  • 47. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.24 0.25 0.26 0.27 0.28 0.29 0.3 SOC Electricity price Figure 4.8.: SOC schedule of batteries for threshold algorithm against variable electricity tariffs remaining decisions must constitute an optimal policy with regard to the state resulting from the current decision (30). Now we further illustrate how to apply the DP algorithm for solving the battery storage optimization problem. The storage device allows power exchange with grid, depending on the market spot price Pt at time t and the forecasting information, which includes both demand and solar production. Here we take state of charge (SOC) Rt of the battery as the state of the system at time t, SOC of the storage device is a continuous parameter. In order to apply a dynamic programming algorithm, ∆R is introduced to discretize Rc into N states. All possible states must be an element in state set S = [0, ∆R, 2∆R. . . Rc ], where∆R is the smallest SOC increment. The constraint for the battery capacity, which corresponds to the constraints at energy storage level in Section 3.1.5, should be satisfied: Rt = i∆R, Rt+1 = j∆R, ∆Rt = Rt+1 − Rt = (j − i)∆R, with i, j ∈ 1, 2, 3...N, subject to: Rt , Rt+1 ∈ [0, Rmax ] ∀t = 1, 2, 3...T, ∆Rt ∈ [−γd , γc] ∀t = 1, 2, 3...T. 45
  • 48. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -4000 -2000 0 2000 4000 Storage level (scaled) delta R Figure 4.9.: Profiles of power flow among PV system with battery, load and grid for LP algorithm for the 29th of March under scenario A Before solving the problem, we need to define the reward function. Reward function for the storage optimization problem corresponds to the cost function in Chapter 3. We initialize the reward function as a (N +1)2 ∗T matrix with all zero elements, where N is ratio of battery capacity Rc to increment ∆R and T is the length of simulation time with ∆t = 1. For each status transition at a determined time point t, we calculate the immediate return with the following function: Ct (Rt , xt ) = Pt ηd xRG t − Pt (xGR t + xGD t ) − chRt+1. If the transition from Rt to Rt+1 does not fulfill the constraints, which means xt (Rt , Rt+1) is not an admissible decision, we set Ct (Rt , Rt+1) = −∞. For each single transition from state Rt to Rt+1, there could be several possible decision vectors, which correspond to different returns. Linear programming methodology, which was introduced in the previous section should be applied to achieve the optimal decision for each feasible state transition. Compared to the aforementioned LP problem, one more equivalent constraint is added in order to reach the determined status Rt+1 after executing the decision: Rt+1 = Rt + ηc(xSR t + xGR t ) − ηd (xRD t + xRG t ). After calculating the costs for all possible state transitions, a lookup table for state transition costs is built. The problem now is similar to the classic shortest path problem and can be solved with a dynamic programming algorithm. 46
  • 49. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC Figure 4.10.: SOC schedule of batteries for LP algorithm against solar generation under scenario A Let P(x) ∈ R(N+1)2 denote the N + 1-by-N + 1 state transition probabilities and Pij (x) indicates probability of jumping from state Ri to Rj , given action x is taken. For deterministic problems, possible values of Pij can only be 1 or 0. The state transition possibility matrix is shown as:              0 δR 2δR ... jδR ... Rc 0 ... δR 2δR ... ... iδR ... ... pij ... ... ... ... Rc ...              Given the reward and possibility for a chosen state transition at time t, we build a value function to take the future effect of the current decision into consideration. Based on the Bellman equation (for further reading and information see (30)), the optimal value function 47
  • 50. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.24 0.26 0.28 0.3 SOC Electricity price Figure 4.11.: SOC schedule of batteries for LP algorithm against SWM electricity tariff for the problem can be defined in a recursion form with an assumption VT+1 ≡ 0: Vt (Rt ) = max[Ct (Rt , xt ) + γ Rt+1∈S p(Rt , Rt+1)Vt+1(Rt+1)], ∀t = 1, 2, 3...N, where γ is a discount factor that indicates the time value. Deterministic problem can be re- garded as a special case of stochastic problems, for which the state transition probabilities are all zeros or ones: Vt (Rt ) = max[Ct (Rt , xt ) + γVt+1(Rt+1)], ∀t = 1, 2, 3...N. The value function recursive procedures above could be realized as backward or forward induction. In backward induction process, the final stage of the problem is first to be solved and the process is moving backward one stage at a time until all stages are covered. Reversely, in forward induction process, the initial stage of the problem is first to be solved and the process is moving forward one stage at a time until all stages are covered. The backward induction algorithm that we developed and applied is presented in Algorihtm 3. It has to be mentioned that the forward recursion procedure discovers the optimal path to all states from a determined initial state, while the backward recursion implicitly develops the optimal solution to a chosen final state. Note that for stochastic problems only backward recursion could be applied (31). Figure 4.16 illustrates the forward induction process of the storage optimization problem. 48
  • 51. 4.5. Dynamic Programming Formulation Algorithm 3 Backward Dynamic Programming Algorithm Initialization: set initial state as R0, value function VT+1 as 0; for time from t = T to t = 1 do for state Ri = 0 to storage state Ri = Rc at time t do 3: for state Rj = 0 to storage state Rj = Rc at time t + 1 do check if the jump from state Ri to state Rj is valid if It is feasible then 6: calculate the transition cost Ct (Ri , Rj ) else set cost of the action at this state to minus infinite 9: use the maximum action return at this state to calculate the value of the state Vt (Ri ) = max xt (Ct (Ri , xt ) + Vt+1(Rj )) for state Ri = R0 at time t = 1 to t = T do 12: pick the optimal next stage state Rj with max(V(Rj )) and save the corresponding transition route Algorithm 4 Forward Dynamic Programming Algorithm Initialization: set initial state as R0, initial state value function as 0; for state Ri = 0 to storage states Ri = Rc at time t do for state Rj = 0 to storage states Rj = Rc at time t + 1 do 3: check if the jump from state Ri to state Rj is valid if It is feasible then calculate state transition cost from state Ri to state Rj 6: else set transition cost to minus infinite for state Ri = R0 at time t = 1 do 9: calculate the feasible next state value function based on Bellmann function; pick the optimal transition with max(V) and save the corresponding transition route 49
  • 52. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -4000 -2000 0 2000 4000 Storage level (scaled) delta R Figure 4.12.: Profiles of power flow among PV system, load and grid for LP algorithm with battery for the 29th of March under scenario B 4.5.1. DP Formulation for Deterministic Problems For deterministic problems, we assume that the solar energy, electricity spot market price and demand evolve deterministically over time. In a deterministic DP process, given the current state and the selected decision, both the state at the next stage and the immediate reward of the action will be determined with complete certainty. In other words, Rt+1 can be determined by Rt and xt . The optimal strategy is not casual with known future disturbance and can be used as a benchmark for stochastic problems. Similar to linear programming formulation, the simulations were conducted for two electricity price scenarios. Scenario A uses the day-and-night tariff model provided by Stadtwerke Muenchen (SWM), while in scenario B the electricity trade between the sys- tem and power grid was at a spot market price on the European Power Exchange (EPEX) on the sample day (29.03.2015). We have the dynamic programming solution for the electric system with different values of the smallest SOC increment ∆R. The smaller the value of ∆R is, the higher computa- tional costs will be created. In this thesis we only applied the value of 0.1 kWh, 0.05 kWh and 0.02 kWh per ∆t as ∆R to the simulation process due to limited computation re- sources. The DP formulation has the following characteristics: Dynamic Program Parameters 50
  • 53. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC Figure 4.13.: SOC schedule of batteries for LP algorithm against solar generation under scenario B T stages in 5-minute increments (24hours, 288 Levels); P continuous 5-minute electricity market prices; R discretized storage levels: Case 1: discretized in 0.1kWh increments (R ⊆ [0, 11] → 111 Levels), Case 2: discretized in 0.05kWh increments (R ⊆ [0, 11] → 221 Levels), Case 3: discretized in 0.02kWh increments (R ⊆ [0, 11] → 551 Levels); X finite action space of the power flow. 4.5.2. Scenario A Figure 4.17 and Figure 4.18 represent the optimal deterministic backward DP storage al- gorithm against electricity market price and PV generation for the system in different cases under scenario A respectively. From Figure 4.17 we can see that the abrupt decrease of storage level accords with the abrupt change point of the price curve. During on-peak times (when the electricity price is high), the storage device makes more contributions to the demand and discharges continuously until the end of the high price period. Comparing curves for these three different SOC increment values, conclusions can be drawn that finer increment results in higher flexibility and higher control level over the battery. 51
  • 54. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0 0.005 0.01 0.015 0.02 0.025 SOC Electricity price Figure 4.14.: SOC schedule of batteries for LP algorithm against EPEX market price Figure 4.19 shows the optimal deterministic backward DP storage algorithm power pro- file for the system in different cases under scenario A. From the figure we know that during day-time, the energy sold to the grid goes up and down depending on the amount of the solar generation, while the household demand is satisfied by the energy stored in the bat- tery. Table 4.1 provides the economic analysis of the DP backward algorithm for these four cases under scenario A, where negative cost refers to profits. According to the information from the table, the finer SOC increment we choose, the less cost we need to pay for the day. The main reason for the decrease in the electricity cost is the reduction in the electricity that bought from the power grid. Table 4.1.: DP backward algorithm results analysis for different SOC increments under scenario A SOC increment Electricity from grid Feed-in electricity Electricity cost for the day 0.1 kWh 32.72 kWh 12.69 kWh - 4.85 Euro 0.05 kWh 29.74 kWh 12.74 kWh - 4.92 Euro 0.02 kWh 25.83 kWh 9.03 kWh - 5.02 Euro 52
  • 55. 4.5. Dynamic Programming Formulation Start Ini(aliza(on:                                value  func(on  VT+1(St)                              star(ng  state  S0 Set  t=0 Calculate  feasible  path  set  L  for  St Calculate  cost  func(on  for  each   element  in  set  L Step  backwards  to  calculate   value  func(on  for  each  state Given  St  ,  St+1 Calculate  minimal   cost  with  LP Save  op(mal   decision  xt π t<T t=t+1 Choose  op(mal   strategy End Y N Figure 4.15.: Forward DP algorithm flowchart 4.5.3. Scenario B Figure 4.20 and Figure 4.21 show the optimal deterministic backward DP storage algorithm against electricity market price and PV generation for the system in different cases under scenario B respectively. Figure 4.22 shows the optimal deterministic backward DP storage algorithm power pro- file for the system in different cases under scenario B. Although small difference was ob- served among different power profiles in four cases, the electricity cost under smaller SOC increment is normally lower except the cost for case 3, for which the SOC increment is 0.2 kWh. The reason for this problem could be the accuracy of the calculation. The analysis of the DP backward algorithm for these four cases under scenario B is listed in table 4.2, where negative electricity cost means the householder’s profits from electricity trade with market. 53
  • 56. 4. Learning Methodology Initial   Stage t=1 t=2 t=3 t=T ……R0 R1,1 R1,2 …… …… R2,1 R3,1 R2,2 R3,2 R3,n-­‐1 R3,n R2,n-­‐1 R2,nR1,n R1,n-­‐1 RT,1 RT,2 RT,n-­‐1 RT,n Figure 4.16.: Path search in forward DP algorithm Table 4.2.: DP backward algorithm results analysis for different SOC increments under scenario B SOC increment Electricity from grid Feed-in electricity Electricity cost for the day 0.1 kWh 38.03 kWh 18.99 kWh -0.114 Euro 0.05 kWh 34.60 kWh 17.50 kWh -0.150 Euro 0.02 kWh 44.31 kWh 26.96 kWh -0.068 Euro Although the results here are not as smooth as the ones given by linear programming optimization, the deterministic DP algorithm does provide a better performance with a finer discretization. Comparing Figure 4.14 to Figure 4.20, we find that the form and the change point of the SOC curve are similar. Although we use discretized state variable for DP formulation while continuous state variable for LP formulation, the performance of the DP backward algorithm is good even at a big increment level. 4.5.4. DP Formulation for Stochastic Problems For stochastic problems, since both the state for the next stage and the current return are uncertain even though the current state and decision are known, the optimal undiscounted value function should be rewritten as the expectation form with an assumption VT+1 ≡ 0, that is, there is (by assumption) no electricity cost after the end of the simulation time: Vt (Rt ) = max xt [Ct (Rt , xt , Wt ) + E(Vt+1(Rt , xt , Wt ))] ∀t = 1, 2, 3...N. In stochastic DP problems, expected values are used to solve problems, by which the computation of next stage information under uncertain evolution process is difficult. As a 54
  • 57. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.24 0.26 0.28 0.3 SOC Electricity price (a) Case 1, scenario A Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.24 0.26 0.28 0.3 SOC Electricity price (b) Case 2, scenario A Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] 0.2 0.25 0.3 SOC Electricity price (c) Case 3, scenario A Figure 4.17.: Optimal deterministic forward DP storage algorithm against electricity price for the system in different cases under scenario A 55
  • 58. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (a) Case 1, scenario A Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (b) Case 2, scenario A Hour [h] 0 4 8 12 16 20 24 Power[W] 0 5000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (c) Case 3, scenario A Figure 4.18.: Optimal deterministic forward DP storage algorithm against PV generation for the system in different cases under scenario A 56
  • 59. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -4000 -2000 0 2000 4000 Storage level (scaled) delta R (a) Case 1, scenario A Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -2000 -1000 0 1000 2000 Storage level (scaled) delta R (b) Case 2, scenario A Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -1000 -500 0 500 1000 Storage level (scaled) delta R (c) Case 3, scenario A Figure 4.19.: Optimal deterministic forward DP storage algorithm power profile for the system in different cases under scenario A 57
  • 60. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] -0.02 0 0.02 0.04 SOC Electricity price (a) Case 1, scenario B Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] -0.05 0 0.05 SOC Electricity price (b) Case 2, scenario B Hour [h] 0 4 8 12 16 20 24 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Price[Euro/kWh] -0.05 0 0.05 SOC Electricity price (c) Case 3, scenario B Figure 4.20.: Optimal deterministic forward DP storage algorithm against electricity price for the system in different cases under scenario B 58
  • 61. 4.5. Dynamic Programming Formulation Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (a) Case 1, scenario B Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (b) Case 2, scenario B Hour [h] 0 4 8 12 16 20 24 Power[W] 0 1000 2000 3000 4000 5000 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy SOC (c) Case 3, scenario B Figure 4.21.: Optimal deterministic forward DP storage algorithm against PV generation for the system in different cases under scenario B 59
  • 62. 4. Learning Methodology Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -4000 -2000 0 2000 4000 Storage level (scaled) delta R (a) Case 1, scenario B Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -2000 -1000 0 1000 2000 Storage level (scaled) delta R (b) Case 2, scenario B Hour [h] 0 4 8 12 16 20 24 Power[W] 0 2000 4000 6000 8000 Energy demand Solar energy Hour [h] 0 4 8 12 16 20 24 Power[W] 0 5000 10000 Energy from Grid Solar to Grid Hour [h] 0 4 8 12 16 20 24 Power[W] -4000 -2000 0 2000 4000 Storage level (scaled) delta R (c) Case 3, scenario B Figure 4.22.: Optimal deterministic forward DP storage algorithm power profile for the system in different cases under scenario B 60
  • 63. 4.6. Approximate Dynamic Programming Formulation solution, states with zero stage to go are evaluated first, and then states with one stage to go are evaluated by computing the evaluated value considering all possible decisions, this procedure is known as backward induction process. The optimal value for each state at certain stage will be stored. One of the biggest challenges for dynamic programming is the "curse of dimensional- ity". As the number of state variables increase, not only the computer time, but also the required computer memory increases exponentially. For a system with N possible states, there are N2 combinations at each period and for total T time periods the total number of combinations is T ∗ N2 . If the decision taken at each time t is a vector, the situation will be even worse. Even though not all of the states are valid due to the constraints in the system, it could still result in extraordinarily high computational cost and time cost. Take the 24-hour KNUBIX data simulation as an example, the direction of the combination will also be considered and it took much longer time than linear programming to compute opti- mal decisions even with a big SOC increment ∆R = 0.1kWh. The situation is expected to be worse with a finer SOC increment value. To achieve a trade-off between optimization performance and cost of the algorithm, an approximate dynamic programming algorithm (ADP) has been developed. The detailed information of ADP is given in the following sec- tion. 4.6. Approximate Dynamic Programming Formulation When applying the traditional dynamic programming method that we presented in previous sections, we should loop over all possible states and enumerate all feasible state transi- tions. We have represented value function by a lookup table and for each state at time t there is an entry V(St ), resulting in incredibly large computational cost and high memory requirement because of the need to store too many states and actions in memory. If we try to improve the performance of DP algorithm further with a possibly small SOC increment, this strategy cannot be tractable any more. The foundation of approximate dynamic programming is forward dynamic programming (13). When we step forward to calculate the value function using Vt (St ) = max xt [Ct (St , xt ) + E(Vt+1(St , xt ))] (4.1) for each state St , we have not computed the value of Vt+1, not to mention the expectation of the possible values over exogenous information, so we have to obtain an approximation of the value function to make a decision. For each state St at time t, an approximation of the value function Vt (St ) can be linear, nonlinear separable or nonlinear non-separable. We solve this storage optimization prob- lem simply by approximating this cost-to-go function and selecting the optimal decision from the feasible decision set to maximize the sum of the current cost and the estimated value for next state. The accuracy of the estimation or the choice of an approximation model directly influences the quality of the optimization results. A lot of research has been 61
  • 64. 4. Learning Methodology done to approximate the value function: (32) and (33) propose least-squares policy itera- tion algorithms to approximate the value function in a large Markov decision process; (34) blends reinforcement learning and math programming to make a nonparametric approxi- mation of shape-restricted value functions; (35) studies both linear and nonlinear approxi- mation strategies for stochastic product dispatch problem; (36) proposes a piecewise linear programming algorithm to assist clinical decision making (optimal dosing) in the controlled ovarian hyperstimulation (COH) treatment. The basic idea of an ADP is to follow a sample path, which refers to a sequence of exogeneous information like the disturbance in demand, electricity price or solar energy generation. Normally the sequence of realizations could be generated randomly or ob- tained from a lookup table, a popular distribution or the real-world data. Each sample path corresponds to a value function iteration. We update the approximate value function iteratively based on the previous estimation, each time following a fresh sample path. It means, when we are following the sample path ωn (ωn respresents the specific value of exogeneous information ω at iteration n), we make sample path-based decisions using ap- proximate value function V n−1 (St+1) from previous iteration. For each state St , with known value of next stage Vt+1, we can use equation 4.1 to make a decision. At the end of each it- eration, we combine the information from the previous iteration with the current information to update the value of related states. To summarize, when using approximate dynamic programming to solve the Bellman equation, decisions will be made under the assumption that value functions for all states at any time t are known. An initialization for the value function is also important. Approximate dynamic programming proceeds by improving the optimal decision iteratively and updating the approximate value function iteratively. In this thesis, we go forward to approximate the value function parametrically or non- parametrically, then we solve the problem via backward recursion algorithm with the ap- proximate value, as we did in the dynamic programming method. For simplicity, we are going to drop discounting and set discounting factor γ to 1. This section consists of two parts, in the first part we focus on the application of linear achitecture to make the approximation, while in the second part we propose a piecewise linear concave approximation based on the Concave Adaptive Value Estimation (CAVE) developed by Godfrey and Powell (37). Before developing an approximation algorithm, K samples of exogenous information Ω = {ω1 1, ω1 2...ω1 T ...ωK 1 ...ωK T } are drawn to develop the approximation. Here we use the Princeton energy storage benchmark dataset S1 to generate a series of state path sam- ples, the number of the samples K is 256 and each simulation period is set to have 101 time periods. The characteristics for the wind process, price process and demand process in this dataset are listed as follows: The Wind process Et is modeled using a first-order Markov chain; The Price process Pt is assumed to be sinusoidal with respect to time t; 62
  • 65. 4.6. Approximate Dynamic Programming Formulation The Demand process Dt is modeled as deterministic, following the function Dt = max(0, 3 − 4 sin(2πt T )) . The storage device with a capacity of 30 and a maximum charging or discharging rate of 5 is used in the simulation. Initial state of charge of the battery is assumed to be 25. For further information see (16). Our goal is to find an appropriate approximation to solve the optimization problem: Vt (St ) = max xt {Ct (St , xt ) + γEVt+1(St+1)|St }, (4.2) for t = 0, 1, ..., T − 1. The expectation in equation 4.2 is over the sample exogenous information and normally intractable. To avoid computing expectation within maximization, we use the post-decision state variables to modify the equation 4.2 (38), (39). The post- decision state variable is the state of the system after we have made a decision but before any new exogenous information has arrived (13). Pre-decision state variables can be represented by post-decision state variables with St = Sx t−1 + Wt . Thus, equation 4.2 can be rewritten as Vx t−1(Sx t−1) = E[max xt {Ct (St , xt ) + γVx t (Sx t )|Sx t−1}], (4.3) with Vx t (Sx t ) def = E[Vt+1(St+1)]. For a determined sample realization, we propose the original approximation as follows: ˆvt (St ) = max xt {Ct (St , xt ) + γVt+1(St+1)}, (4.4) where V and ˆv represent two forms of value function, and they are used to update ap- proximation and will be illustrated in the following parts. Applying the post-decision state variable, we modify the equation 4.4 into: ˆvx t−1(Sx t−1) = E[max xt {Ct (St , xt ) + γV x t (Sx t )|Sx t−1}]. (4.5) For time t iteration n, we update ˆv iteratively, using the V from the previous iteration: ˆvn t−1(Sx t−1) = E[max xt {Ct (St , xt ) + γV n−1 t (Sx t )|Sx t−1}]. (4.6) For a given state Sx t−1 in iteration n, we loop over all feasible actions and for each action a state Sx t is built based on the state transition function. With a series of Sx t−1 and Sx t pairs, we simply use equation 4.6 to search for the optimal decision xπ t . Afterwards, we move forward until the end of this simulation period when t = T, and then add one to the iteration number and do it again over the whole simulation cycle with previous estimation information. The detailed process is described in Algorithm 5. 63
  • 66. 4. Learning Methodology Algorithm 5 Approximate Dynamic Programming Algorithm Initialization: set initial state as R0 set value function VT+1 as 0 initialize value function V 0 (S) for all possible states for sample path n = 1, 2, ..., N do simulate the sample path ωn . 3: for time t = 1, 2, ..., T do compute ˆvt n = arg max xt (Ct (Sn , xt ) + γ ω∈ωn V n−1 t+1 (Sn t , xt , ω)) update value function for state Sn t using Vt n (Sn t ) = (1 − αn−1)Vt n−1 (Sn t ) + αn−1 ˆvt n 6: compute next stage state Sn t+1 = SM (Sn t , xn t , ωn ) (assuming that xn t is the optimal decision for iteration n) 4.6.1. A Linear Lookup Table Approximation The simplest way to approximate a value function is to use a linear regression model. In approximate dynamic programming, basic functions are used to translate the various state variable information into a series of features to create the linear combinations. We express basic functions with φf (S), where f is a feature and S is the state variable. θf is a parameter, which indicates the weight for different feature f. A general form for the value function approximation can be written as: Vf (S|θ) = f∈F θf φf (S). The process to define the features and to design basis functions is always complicated and a poor choice of basis functions may lead to a terrible approximation of the value function, even though a perfect calculation of the weight parameter vector θ is provided. (40) presents a way to construct basis functions for linear approximate value function of a Markov Decision Process (MDP) automatically. Basis functions and parameter vector θ can also be defined in terms of time. Let Vn be the nth observation of the true value function, then our goal is to find an 64
  • 67. 4.6. Approximate Dynamic Programming Formulation appropriate parameter vector θ that minimizes the mean squared error: min θ N n=1 (Vn − f∈F θf φf (Sn ))2 . Considering St = {Rt , Pt , Et , Dt } as our state variable, we define our basis functions as a linear or nonlinear combination of state variables. We set φ0 = 1, which corresponds to the constant term of the linear regression model. Let the n observations of basis functions be φn =      φ1 0 φ1 1 · · · φ1 K φ2 0 φ2 1 · · · φ2 K ... ... ... ... φn 0 φn 1 · · · φn K      , where K + 1 indicates the number of features and n is the number of observations. The n observations of the true value functions are given by V =      V1 V2 ... Vn      . The optimal parameter vector of regression coefficients θ can be estimated using normal equation θ = [(φnT φn ]−1 (φn )T Vn . (4.7) When programming, the non-invertibility problem may be encountered, for which pseudo- inverse provides a solution. However, in approximate dynamic programming, optimizing coefficient vector θ with equation 4.7 would be expensive, the method of recursive least squares provides a possible solution to solve this problem (13). At each iteration, new observations are used to update the parameter estimation. To apply recursive estimation, stochastic gradients need to be introduced in an updating function that involves the sample observation ˆvt n of being in state St and previous iteration estimate of the value function Vt n−1 (St ). We are searching for an updating algortihm that solves min Vt EF(Vt , ˆvt ), where F(Vt , ˆvt ) = 1 2 (Vt − ˆvt )2 . With a stepsize parameter αn, we write the updating equation as follows: Vt n = Vt n−1 − αn−1 F(Vt n−1 , ˆvt ), (4.8) where F(Vt n−1 , ˆvt ) = Vt n−1 − ˆvt . 65
  • 68. 4. Learning Methodology After applying the linear regression model, the equation can be reformed as: θn = θn−1 − αn−1(V(S|θn−1 ) − ˆvn ) θV(S|θn ), (4.9) where θV(S|θn ) = φ(Sn ). Instead of using a stepsize parameter, Powell introduced a matrix Hn to serve as a scaling matrix. The updating equation for coefficient θ is represented as: θn = θn−1 − Hn φn ˆn , (4.10) where ˆn = V(S|θn−1 ) − ˆvn (for detailed information see Chapter 9 in (13)). A simple double-pass policy iteration algorithm, which is an adaption of the ADP algo- rithm in (13) and uses basis function model with lookup table is described in Algorithm 6. We introduce a linear regression model in terms of time, which takes the effect of time into consideration. To improve the quality of the linear regression model and better handle the problem of the "curse of dimensionality", we use aggregation method to aggregate the de- mand and price dimensions. To simplify the process, we just aggregate the demand state variables and price state variables into integers. We then construct the linear regression model and estimate the regression parameters around aggregated states. This double-pass algorithm is developed to solve the finite horizon problem using both forward and backward induction. In forward recursion process, we determine the decision variable with current policy and build a trajectory of state variables through time. After- wards, we step backwards through time to update the value function for states in this tra- jectory, using the next stage value function. When using lookup table method, a discretiza- tion with increasing resolution is computationally intractable. To solve this dimensionality problem, state aggregation method is considered as a powerful solution. We only test deterministic problem in this thesis, using test problem S1 from the Prince- ton dataset, where the electricity price, wind energy and energy demand are assumed to evolve deterministically with different dynamics over time. Test problem S1 consists of T = 101 time periods with ∆t = 1. In Figure 4.23a we show the storage level obtained by linear ADP along with the wind energy and demand profiles corresponding to test problem S1. Figure 4.23b shows the spot electricity price process against storage level of the battery. Since our chosen basis functions involve electricity price at next time period, it can be observed from the plot that the battery charges or discharges some time prior to the changes of the electricity price. Figure 4.24 compares the storage level obtained by linear ADP algorithm to the optimal SOC process provided by dataset S1. It has to be mentioned that even though the ap- proximate storage policy is not exactly the same as the optimal one, they follow the same overall pattern. To evaluate the performance of the linear ADP algorithm quantitatively, we compare the objective value given by linear ADP to the objective value given by test problem S1. Take sample path 2 in test problem S1 for an example, the optimal objective value known from test problem S1 is 1.8914 × 104 , while the value calculated by linear ADP is 1.8744 × 104 , which is 99.10% of the optimal value. 66
  • 69. 4.6. Approximate Dynamic Programming Formulation Algorithm 6 A Simple Double-pass Policy Iteration Algorithm Using Basis Functions Initialization: Design basis functions φf (S) Initialize regression coefficients θ0 tf for t = 0, 1, ..., T Initialize starting state Sn 0 for n = 0, 1, ..., N Initialize ˆvn,m T+1 = 0 for policy iteration number n = 1, 2, 3, ..., N do for sample path m = 1, 2, ..., M do 3: simulate the sample path ωm for time t = 1, 2, ..., T do compute xn,m t = arg max xt (Ct (Sn,m t , xt ) + γ f θn−1 tf φf (Sn,m t , xt )) 6: compute Sn,m t+1 = SM (Sn,m t , xn,m t , Wt (ωm )) for time t = T, T − 1, ..., 1 do compute ˆvt n,m = Ct (Sn,m t , xt ) + γ ˆvn,m t+1 9: update θn,m−1 f with θn,m f using equation 4.10 67
  • 70. 4. Learning Methodology Time period, t 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy Energy demand SOC (a) Energy storage profiles along with the wind energy and demand pro- files Time period, t 0 20 40 60 80 100 120 Electricityprice 0 10 20 30 40 50 60 70 80 90 100 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Electricity price SOC (b) Electricity price process against storage level of the battery Figure 4.23.: Results of linear ADP algorithm and sample path from test problem S1 68
  • 71. 4.6. Approximate Dynamic Programming Formulation Time period, t 0 20 40 60 80 100 120 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 calculated SOC optimal SOC Figure 4.24.: Approximate path obtained by linear ADP vs. optimal path Linear approximations are easy to develop, but their performance is not always good. To provide an acceptable result, we need to choose basis functions carefully, considering the features of the problem. Generally, linear approximations are appropriate for problems with large size of resource types and small size of possible resource state values (13). 4.6.2. SPAR Algorithm While the linear lookup table ADP is independent of problem structures, it can be unstable and has the disadvantage of high memory consumption when large scale problems are simulated. Nonlinear value function approximations can improve the quality of the opti- mization results. According to the well known result from Vanderbei (8) that any maximizing linear pro- gramming problem is concave in right-hand side constraints, the value function in our storage optimization problem is concave, which means slopes of the value function are monotonically decreasing. The concavity property of objective function is really powerful, for it guarantees a unique optimum (local optimum point is also global optimum point) and allows us to focus on exploitation without considering exploration. Since solving a concave non-linear optimization problem is intractable, we use a piece- wise linear optimization method to estimate the value function. (41) shows an adaptive dynamic programming algorithm that applies nonlinear functional approximations to a dy- namic resources allocation problem. 69
  • 72. 4. Learning Methodology Assume that our previous iteration approximation Vt n−1 (S) is concave, we use the afore- mentioned equation 4.8 to compute the value function Vt n (S) at iteration n. It is possible that Vt n (S) violates concavity. To maintain the concavity property of the value function af- ter updates, a family of algorithms is developed to enforce the concavity of the piecewise linear approximation functions while updating the slopes for all iterations. Three popular algorithms are reviewed here: The Leveling Algorithm. The leveling algorithm simply substitutes the value of the points that do not satisfy the monotonicity property with a smaller or a bigger value. The SPAR Algorithm. The SPAR (the Separable, Projective Approximation Rou- tine) algorithm averages the value of the points that impose the monotonicity (see Section 11.3 of (13)). The CAVE Algorithm. The CAVE algorithm maintains the monotonicity by expand- ing the update range of the value function (42). Since SPAR algorithm is the one that works well in practice and is easy to realize, we mainly focus on the methodology developed by SPAR algorithm in this section. We first introduce the SPAR algorithm briefly and then illustrate the ADP algorithm using SPAR. We introduce a piecewise linear approximation of the value function with respect to the storage level R and denote vn tk as the slope of the kth line segment at time t for iteration n. vn tk satisfies monotonicity property. We denote δ as the smallest segment value between two breakpoints, and all the following illustrations are under the assumption that δ = 1. The value function around the post-decision state variable can be written as: Vt n (Rx t ) = Kt k=1 vn tk rtk , where Kt k=1 rtk = Rt + ΦT xt with 0 ≤ rtk ≤ δ. Making an appropriate estimation of the value function is equivalent to finding a proper prediction for the slope variable. Compared to other updating algorithms, SPAR performs an update using average value over a determined range. If vt n (Rn ) ≥ vt n (Rn +1) for all Rn , then vt n satisfies monotonicity. If there exists a Rn with vt n (Rn ) < vt n (Rn + 1), we need to find the largest R such that vt n (R ) ≥ 1 Rn − (R − 1) Rn r=R vn tr . Then we substitute slopes of the value functions for R = [R , Rn ] with a new slope value 1 Rn−(R −1) Rn r=R vn tr to maintain the concavity property of the value function. Using piecewise linear value function approximation, we reconstruct the optimization problem as a deterministic linear programming problem: F = max xt ,Kt [C(Rt , xt , Wt ) + Kt k=1 vn−1 tk rtk ], (4.11) 70
  • 73. 4.6. Approximate Dynamic Programming Formulation subject to At xt = bt , (4.12) Aeq,t xt ≤ beq,t , (4.13) xt ≥ 0, (4.14) Rt+1 = Kt k=1 vn−1 tk rtk . (4.15) To obtain the slope of the value function, we may first compute the marginal value for the value function. We denote S+ t = (Rt + δ, Et , Dt , Pt ) and S− t = (Rt − δ, Et , Dt , Pt ) as the state of the system after a positive and a negative perturbation in storage level, where δ is the smallest increment of the SOC level regarding to slope segments. The corresponding optimal objective values are defined as follows: F+ = max x+ t ,K+ t [C(S+ t , x+ t ) + K+ t k=1 vn−1 tk rtk ] , F− = max x− t ,K− t [C(S− t , x− t ) + K− t k=1 vn−1 tk rtk ] . (4.16) We obtain the marginal value of the value function ˆvt (Rt ) by solving: ˆvt (Rt ) = ∂ ∂Rt maxxt ∈χt (C(St , xt ) + Vt n−1 (St+1)) = 1 δ (F − F− ). (4.17) At the boundaries of the SOC domain, in other words, when R = 0 or R = Rc , we use right and left numerical derivatives according to the perturbation of the storage level to estimate their slopes respectively as follows: ˆvt (R = 0) ˆvt + (R = 0) = 1 δ (F+ − F), ˆvt (R = Rc ) ˆvt − (R = Rc ) = 1 δ (F − F− ). (4.18) After computing the new slope information at iteration n, we update the value function with information from the previous iteration n − 1: vt n = (1 − αn−1)vt n−1 + αn−1 ˆvt n , if R = Rn , vt n−1 otherwise, (4.19) where αn−1 is a stepsize parameter according to a determined rule. We use a deterministic stepsize rule for the simulation, which is given by: αn−1 = a a + n − 1 , (4.20) 71
  • 74. 4. Learning Methodology where a is a constant. The piecewise linear approximation algorithm is outlined in Algorithm 7. This algorithm requires a concave piecewise linear value function, so we first set all the initial slopes to zero to maintain the monotonicity property. At the beginning of each iteration n, a sample realization generated by the Princeton datasets was observed. We obtain the optimal decision for the current state Sn t and states after positive and negative perturbation Sn+ t and Sn− t . Then we calculate the corresponding objective values. Before the end of the planning horizon t = T is reached, we compute the next stage state Sn t+1 according to the calculated optimal decision. Then we update slopes of the value function using slopes information from previous iteration and sample observation ˆvt n . After updating the value function we apply SPAR algorithm to maintain the concavity property of the value function. In the end, we add one to the iteration number and use a sequence of fresh new sample information to repeat the process and improve the accuracy of slopes v. Algorithm 7 A Piecewise Linear Approximation Algorithm Initialization: initialize vtk 0 , ∀t = 1, 2, ..., T and SOC levels k = 1, 2, ..., Rc /δR for n = 1, ..., N do simulate the sample path ωn ∈ Ω 3: for t = 0, ..., T do obtain xn t = arg max xt ∈χt (C(Sn t , xt )+Vt n−1 (St+1)) by solving the LP problem regarding to Equation 4.11-4.15 calculate F− and F+ as in equation 4.16 6: if t < T then compute Sn t+1 = SM (Sn t , xn t , ωn t ) calculate ˆvt n as in equation 4.17 and 4.18 9: update vn t (St ) ← SPAR(vn−1 t , ˆvn t ) if n < N then t ← t + 1 Since slopes of the value function are discrete, we still have to handle the classical "curse of dimensionality" problem. To solve this problem, we use the simple aggregation method to aggregate the storage level into several segments. In this simulation, we set δ = 1, which means at any time t in iteration n we approximate the value function using Rc /δ = Rc slope segments. It has to be mentioned that aggregation is only used in approximating the value function and not in computing the optimal decision or calculating the objective value. We use the same dataset (dataset S1 from the Princeton datasets) as the previous section. The algorithm was tested using three different stepsizes according to Equation 4.20, namely: a = 1, a = 10 and a = 100. Figure 4.25a depicts the storage level obtained by piecewise linear ADP along with the wind energy and demand profiles under stepsize parameter a = 1 corresponding to test 72
  • 75. 4.6. Approximate Dynamic Programming Formulation problem S1. Figure 4.25b shows the spot electricity price process against storage level of the battery under stepsize parameter a = 1. Figure 4.26 compares the storage level obtained by piecewise linear ADP algorithm under stepsize parameter a = 1 to the optimal SOC process supplied by dataset S1. It can be observed that they follow the same pattern, even though the amount of charging or discharging energy is different. Similar results under stepsize parameter a = 10 and a = 100 are given in Figure 4.27, 4.28 and Figure 4.29, 4.30 respectively. Figure 4.31 presents objective values up to 256 iterations under the harmonic stepsize rule with different parameter a (a = 1, 10, 100) as in Equation 4.20. The smaller a is, the more quickly the stepsize decreases to zero. A larger a prevents the stepsize from decreasing too fast but experiences large variances due to sensitivity to new observations. A proper design of the stepsize rule is important to performance of algorithms. Quantitative evaluation of this algorithm is given in Chapter 5. 73
  • 76. 4. Learning Methodology Time period, t 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy Energy demand SOC (a) Energy storage profiles along with the wind energy and demand pro- files Time period, t 0 20 40 60 80 100 120 Electricityprice 0 10 20 30 40 50 60 70 80 90 100 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Electricity price SOC (b) Electricity price process against storage level of the battery Figure 4.25.: Results of piecewise linear ADP algorithm (a=1) and sample path from test problem S1 74
  • 77. 4.6. Approximate Dynamic Programming Formulation Time period, t 0 20 40 60 80 100 120 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 calculated SOC optimal SOC Figure 4.26.: Approximate path obtained by piecewise linear ADP (a=1) vs. optimal path 75
  • 78. 4. Learning Methodology Time period, t 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy Energy demand SOC (a) Energy storage profiles along with the wind energy and demand pro- files Time period, t 0 20 40 60 80 100 120 Electricityprice 0 10 20 30 40 50 60 70 80 90 100 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Electricity price SOC (b) Electricity price process against storage level of the battery Figure 4.27.: Results of piecewise linear ADP algorithm (a=10) and sample path from test problem S1 76
  • 79. 4.6. Approximate Dynamic Programming Formulation Time period, t 0 20 40 60 80 100 120 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 calculated SOC optimal SOC Figure 4.28.: Approximate path obtained by piecewise linear ADP (a=10) vs. optimal path 77
  • 80. 4. Learning Methodology Time period, t 0 20 40 60 80 100 120 0 1 2 3 4 5 6 7 8 9 10 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Solar energy Energy demand SOC (a) Energy storage profiles along with the wind energy and demand pro- files Time period, t 0 20 40 60 80 100 120 Electricityprice 0 10 20 30 40 50 60 70 80 90 100 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Electricity price SOC (b) Electricity price process against storage level of the battery Figure 4.29.: Results of piecewise linear ADP algorithm (a=100) and sample path from test problem S1 78
  • 81. 4.6. Approximate Dynamic Programming Formulation Time period, t 0 20 40 60 80 100 120 SOC 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 calculated SOC optimal SOC Figure 4.30.: Approximate path obtained by piecewise linear ADP (a=10-) vs. optimal path Iteration number 0 50 100 150 200 250 300 ObjectivevalueF × 104 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 a=1 a=10 a=100 Figure 4.31.: Objective values for different stepsize rule parameters 79
  • 83. 5. Evaluation of the Approach In this chapter we first summarize characteristics of different algorithms, then a quantitative analysis will be given. Linear programming provides the optimal results for deterministic problems and is used as benchmark to evaluate the performance of other algorithms. It is applicable to continu- ous situations without much computational effort and a discretization of the state variable is not required. However, the effectiveness of this algorithm decreases with the increasing dimension of the problem size. When involving with stochastic problems, linear program- ming can not be used. Dynamic programming provides a more flexible method to solve the storage optimization problem and offers excellent algorithm performance. However, from the aspect of time cost and computational cost, it might not be a wise choice. Reasons for this can be found in the application of lookup table policy. When applying to large scale problems or problems with states of fine resolutions, DP algorithm is not tractable any more. Compared to the dynamic programming algorithm, ADP using linear regression model saves the memory requirements and also achieves a dramatic reduction in time cost. But despite of this, the performance of the algorithm relies largely on the design of basis func- tions and the quality of the result is unstable when applied to different test sample path. The concave piecewise linear algorithm avoids exploration and focuses on pure exploita- tion. The optimization problem is turned into a deterministic LP problem with this approx- imation. This algorithm is especially useful when we use aggregation method or are only interested in integer solutions, but its performance is not good as expected due to the limited number of samples provided by Princeton. To make a comparison of the aforementioned algorithms from the aspect of quantity, a cost-benefit analysis of the one-day simulation with different optimization algorithms is provided in Table 5.1. The simulation has been carried out in 24 hours for the 29th of March as an exemplary day to figure out the optimization results. The final value of the objective function (cash flow) under day-and-night price tariff model for the threshold control, LP optimization, DP optimization (case 3 with SOC increment = 0.02) are presented in the table, where negative costs indicate profits. It can be seen from Table 5.1 that it makes little difference in electricity costs whether the PV system is installed with a storage device or the PV system operates with a storage device that uses threshold control algorithm. DP algorithm improves the performance a lot simply by reducing feed-in electricity, but there is no doubt that LP algorithm gives the best result. Since approximate dynamic programming algorithm was simulated using the Princeton 81
  • 84. 5. Evaluation of the Approach Table 5.1.: Cash flow analysis under day-and-night price tariff for different algorithms Electricity cost for day-and-night tariff (Euro) Electricity from grid (kWh) Feed-in electricity (kWh) PV system without battery 6.41 26.61 4.35 Threshold control 5.58 24.39 4.47 LP algorithm -5.30 24.00 41.20 DP algorithm (case 3) -5.02 25.84 9.03 datasets, we cannot compare its electricity cost to other algorithms directly. We define a optimization factor β to help evaluate the performance of different algorithms: β = Fcalc Fopt , where Fcalc and Fopt denote the calculated objective value and the optimal objective value respectively. Using factor β, we can compare the performance of DP and ADP algorithms. The dynamic programming algorithm provides a near-optimal result, which is 94.72% of the optimal value provided by linear programming algorithm (5.02/5.30 = 94.72%). According to the results in Section 4.6, the objective value calculated by linear ADP is 99.10% of the optimal objective value known from the dataset. The piecewise linear ADP algorithm performs better than DP but worse than the linear ADP algorithm. We calculated the factor for different DP and ADP algorithms respectively, The results are shown in Table 5.2. We can draw a conclusion that ADP algorithm outperforms not only in computational cost but also in solution accuracy. 82
  • 85. Table 5.2.: Benchmarking results for DP and ADP algorithms β (%) DP algorithm (case 3) 94.72 Linear regression ADP 99.70 Piecewise Linear ADP (a=1) 94.84 Piecewise Linear ADP (a=10) 95.44 Piecewise Linear ADP (a=100) 93.87 83
  • 87. 6. Conclusions and Future Work This thesis contributes to developing electricity management optimization method for resi- dential PV system with storage device. We have constructed different algorithms to solve the electricity management problem and analysed advantages and disadvantages of them. People could choose algorithms according to their requirements. The day-ahead model that we have developed is practical in real life, since it is easy to access day-ahead weather forecast information and electricity prices at spot market. Besides the intermittency of so- lar generation process, the stochastic characteristics of the electricity price signal are also taken seriously in the approach. Two variable electricity price scenarios were analyzed and the obtained results using different algorithms are presented and discussed, namely: day and night tariff and spot market price tariff. Variable price tariff changes with electricity consumption and production, it provides consumers with flexibility and also a potential to make economic benefits. However, the algorithms discussed in this work have certain limitations. On one hand, the algorithms are offline, which means the policies or the parameters in the approximate value function are trained in advance. When applying the algorithms to other circumstances, the performance may not be as good as expected. If parameters of the environment fluctuate, we have to adjust our algo- rithm manually, which results in extra work. Besides, due to the limited size of the sample set, we may encounter the problem of overfitting when applying linear regression method in ADP. This problem could be solved by separating the data into training and testing sets in later work to identify the general features and to increase the accuracy of the model. On the other hand, only deterministic problems are tested in this thesis. Future work is to test the algorithms with stochastic problems. In real life, we may have access to the weather information or the information from the electricity market a few hours in advance. Thus, rolling horizon strategy, which uses the information from a period of time in the future to make the current decision, could also be included in our future work. Deterministic problems can be considered as a rolling horizon problem with a lookahead horizon H = T, we may test with shorter lookahead horizons and evaluate the results. Apart from this, the electric system model we considered in this thesis is designed for an average family of four, while the KNUBIX data used to test the algorithm originated from a specific family on a specific day in Germany. We could improve accuracy and flexibility of the algorithm by combining weather forecasts and adding different weather scenarios to our model, taking factors like temperature, solar irradiation and location into consideration. We could also introduce a signal to distinguish weekdays from other days, since the electricity consumption is also higher at weekends or on public holidays. 85
  • 88. 6. Conclusions and Future Work Computational cost and time cost of the algorithm should be taken seriously if we want to extend the simulation range from a day to a year or even longer, although linear regres- sion algorithm and piecewise linear ADP algorithm have shown significant solution time reduction. One possible solution to improve the computation efficiency is to use compiled language like C++ directly under Matlab. 86
  • 89. A. Appendix In Appendix A the data of the PV system is presented in Table A.1. 87
  • 90. A. Appendix Table A.1.: Data of the PV system Location Munich, Germany Latitude (deg N) 48.13 Longitude (deg E) 11.7 Elevator (m) 529 DC System Size (kW) 5.5 Array Type fixed (open rack) Array Tilt (deg) 20 Array Azimuth (deg) 180 System Losses (%) 14 Invert Efficiency (%) 96 DC to AC Size Ratio 1.1 88
  • 91. Bibliography [1] H. Wirth and K. Schneider, “Recent facts about photovoltaics in germany,” Fraunhofer ISE, Freiburg, Germany, Tech. Rep., Sep. 2013. [2] M. Black and G. Strbac, “Value of storage in providing balancing services for electricity generation systems with high wind penetration,” Journal of power sources, vol. 162, no. 2, pp. 949–953, 2006. [3] M. Beccali, M. Cellura, V. L. Brano, and A. Marvuglia, “Short-term prediction of house- hold electricity consumption: Assessing weather sensitivity in a mediterranean area,” Renewable and Sustainable Energy Reviews, vol. 12, no. 8, pp. 2040–2065, 2008. [4] A. Marvuglia and A. Messineo, “Using recurrent artificial neural networks to forecast household electricity consumption,” Energy Procedia, vol. 14, pp. 45–55, 2012. [5] G. K. Tso and K. K. Yau, “Predicting electricity energy consumption: A comparison of regression analysis, decision tree and neural networks,” Energy, vol. 32, no. 9, pp. 1761–1768, 2007. [6] M. L. Puterman, Markov decision processes: discrete stochastic dynamic program- ming. New York, NY: John Wiley & Sons, 2014. [7] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction. Cambridge, MA: MIT press, 1998, vol. 1. [8] R. J. Vanderbei, Linear programming: Foundations and Extensions. New York, NY: Springer, 2008, vol. 114. [9] G. B. Dantzig and G. Infanger, “Multi-stage stochastic linear programs for portfolio optimization,” Annals of Operations Research, vol. 45, no. 1, pp. 59–76, 1993. [10] G. B. Dantzig, “Linear programming under uncertainty,” Management science, vol. 1, no. 3-4, pp. 197–206, 1955. [11] R. E. Bellman and E. Lee, “History and development of dynamic programming,” Con- trol Systems Magazine, IEEE, vol. 4, no. 4, pp. 24–28, 1984. [12] L. Busoniu, R. Babuska, B. De Schutter, and D. Ernst, Reinforcement learning and dynamic programming using function approximators. Boca Raton, FL: CRC press, 2010, vol. 39. 89
  • 92. Bibliography [13] W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimension- ality. New York, NY: John Wiley & Sons, 2007, vol. 703. [14] D. Bertsekas, Dynamic programming and optimal control. Belmont, MA: Athena Scientific, 2012, vol. 2. [15] W. B. Powell, “What you should know about approximate dynamic programming,” Naval Research Logistics, vol. 56, no. 3, pp. 239–249, 2009. [16] D. F. Salas and W. B. Powell, “Benchmarking a scalable approximate dynamic pro- gramming algorithm for stochastic control of multidimensional energy storage prob- lems,” Department of Operations Research and Financial Engineering, Princeton, NJ, Tech. Rep., 2013. [17] M. Sorrentino, G. Rizzo, and I. Arsie, “Analysis of a rule-based control strategy for on- board energy management of series hybrid vehicles,” Control Engineering Practice, vol. 19, no. 12, pp. 1433–1441, 2011. [18] S. Teleke, M. E. Baran, S. Bhattacharya, and A. Q. Huang, “Rule-based control of battery energy storage for dispatching intermittent renewable sources,” Sustainable Energy, IEEE Transactions on, vol. 1, no. 3, pp. 117–124, 2010. [19] M. Dicorato, G. Forte, M. Pisani, and M. Trovato, “Planning and operating combined wind-storage system in electricity market,” Sustainable Energy, IEEE Transactions on, vol. 3, no. 2, pp. 209–217, 2012. [20] E. D. Castronuovo and J. Lopes, “On the optimization of the daily operation of a wind-hydro power plant,” Power Systems, IEEE Transactions on, vol. 19, no. 3, pp. 1599–1606, 2004. [21] H. Zhang, V. Vittal, G. T. Heydt, and J. Quintero, “A mixed-integer linear program- ming approach for multi-stage security-constrained transmission expansion planning,” Power Systems, IEEE Transactions on, vol. 27, no. 2, pp. 1125–1133, 2012. [22] T. Nguyen, M. L. Crow et al., “Optimization in energy and power management for renewable-diesel microgrids using dynamic programming algorithm,” in Cyber Tech- nology in Automation, Control, and Intelligent Systems (CYBER), 2012 IEEE Interna- tional Conference on. Bangkok, Thailand: IEEE, May 2012, pp. 11–16. [23] P. Mokrian and M. Stephen, “A stochastic programming framework for the valuation of electricity storage,” in 26th USAEE/IAEE North American Conference, Cleveland, OH, Sep. 2006, pp. 24–27. [24] N. Löhndorf and S. Minner, “Optimal day-ahead trading and storage of renewable energies—an approximate dynamic programming approach,” Energy Systems, vol. 1, no. 1, pp. 61–77, 2010. 90
  • 93. Bibliography [25] O. Sundström and L. Guzzella, “A generic dynamic programming matlab function,” in Control Applications & Intelligent Control, 2009 IEEE. St. Petersburg, Russia: IEEE, Jul. 2009, pp. 1625–1630. [26] J. M. Nascimento and W. B. Powell, “An optimal approximate dynamic programming algorithm for the energy dispatch problem with grid-level storage,” Mathematics of Operations Research, vol. 34, no. 1, pp. 210–237, 2009. [27] Y. Riffonneau, S. Bacha, F. Barruel, and S. Ploix, “Optimal power flow management for grid connected pv systems with batteries,” Sustainable Energy, IEEE Transactions on, vol. 2, no. 3, pp. 309–320, 2011. [28] I. Koutsopoulos, V. Hatzi, and L. Tassiulas, “Optimal energy storage control policies for the smart power grid,” in Smart Grid Communications, 2011 IEEE International Conference on. Brussels, Belgium: IEEE, Oct. 2011, pp. 475–480. [29] K. Mets, M. Strobbe, T. Verschueren, T. Roelens, F. De Turck, and C. Develder, “Dis- tributed multi-agent algorithm for residential energy management in smart grids,” in Network Operations and Management Symposium (NOMS), 2012 IEEE. Maui, HA: IEEE, Apr. 2012, pp. 435–443. [30] R. Bellman, Dynamic programming. Princeton University Press. [31] B. A. McCarl and T. H. Spreen, “Applied mathematical programming using algebraic systems,” Cambridge, MA, 1997. [32] J. A. Boyan, “Technical update: Least-squares temporal difference learning,” Machine Learning, vol. 49, no. 2-3, pp. 233–246, 2002. [33] M. G. Lagoudakis and R. Parr, “Least-squares policy iteration,” The Journal of Ma- chine Learning Research, vol. 4, pp. 1107–1149, 2003. [34] L. Hannah and D. B. Dunson, “Approximate dynamic programming for storage prob- lems,” in Proceedings of the 28th International Conference on Machine Learning. Bellevue, WA: ACM, Jun./Jul. 2011, pp. 337–344. [35] K. P. Papadaki and W. B. Powell, “An adaptive dynamic programming algorithm for a stochastic multiproduct batch dispatch problem,” Naval Research Logistics (NRL), vol. 50, no. 7, pp. 742–769, 2003. [36] M. He, L. Zhao, and W. B. Powell, “Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation,” European Journal of Operational Research, vol. 222, no. 2, pp. 328–340, 2012. [37] G. A. Godfrey and W. B. Powell, “An adaptive, distribution-free algorithm for the newsvendor problem with censored demands, with applications to inventory and dis- tribution,” Management Science, vol. 47, no. 8, pp. 1101–1112, 2001. 91
  • 94. Bibliography [38] D. P. Bertsekas, Dynamic programming and optimal control. Boca Raton, FL: CRC press, 2011, vol. 2. [39] W. R. Scott, W. B. Powell, and S. Moazehi, “Least squares policy iteration with instru- mental variables vs. direct policy search: Comparison against optimal benchmarks using energy storage,” arXiv preprint arXiv:1401.0843, 2014. [40] P. W. Keller, S. Mannor, and D. Precup, “Automatic basis function construction for approximate dynamic programming and reinforcement learning,” in Proceedings of the 23rd international conference on Machine learning. Pittsburgh, PA: ACM, Jun. 2006, pp. 449–456. [41] G. A. Godfrey and W. B. Powell, “An adaptive dynamic programming algorithm for dynamic fleet management, i: Single period travel times,” Transportation Science, vol. 36, no. 1, pp. 21–39, 2002. [42] W. Powell and G. Godfrey, “An adaptive, distribution-free approximation for the newsvendor problem with censored demands, with applications to inventory and dis- tribution problems,” Management Science, vol. 47, no. 8, pp. 1101–1112, 2001. 92