SlideShare a Scribd company logo
Approximate Dynamic Programming using
 Fluid and Diffusion Approximations
                                    with Applications to Power Management


Speaker: Dayu Huang

Wei Chen, Dayu Huang, Ankur A. Kulkarni,1Jayakrishnan Unnikrishnan, Quanyan Zhu,
Prashant Mehta, Sean Meyn, and Adam Wierman 2

 Coordinated Science Laboratory, UIUC
 Dept. of IESE, UIUC 1
 Dept. of CS, California Inst. of Tech. 2
                                                             4                                                   120



                                                             3                                                   100



                                                                                                                     80
                                                                                                                                              J
                                                             2



                                                             1                                                       60




National Science Foundation (ECS-0523620 and CCF-0830511),   0                                                       40




ITMANET DARPA RK 2006-07284, and Microsoft Research
                                                             −1                                                      20



                                                             −2                                              n       0                                                      x
                                                                  0   1   2   3   4   5   6   7   8   9   10 x 104        0   2   4   6   8   10   12   14   16   18   20
Introduction
MDP model               Control


                                  i.i.d
Cost
Minimize average cost
Introduction
MDP model               Control


                                  i.i.d
Cost
Minimize average cost

Generator
Introduction
MDP model                                Control


                                                   i.i.d
Cost
Minimize average cost

 Average Cost Optimality Equation (ACOE)




                             Generator


        Relative value function

  Solve ACOE and Find
TD Learning
 The “curse of dimensionality”:
   Complexity of solving ACOE grows exponentially with
   the dimension of the state space.

 Approximate        within a nite-dimensional function class


 Criterion: minimize the mean-squre error


                               solved by stochastic approximation algorithms
TD Learning
 The “curse of dimensionality”:
    Complexity of solving ACOE grows exponentially with
    the dimension of the state space.

 Approximate         within a nite-dimensional function class


 Criterion: minimize the mean-squre error


                                solved by stochastic approximation algorithms

Problem: How to select the basis functions                    ?

                               key to the success of TD learning
Total cost for
                                                               an associated deterministic model

    is a tight approximation to
      120



      100


                    Fluid value function
       80
                    Relative value function
       60



       40



       20



            0   2    4      6      8       10   12   14   16      18    20


can be used as a part of the basis
Related Work
Multiclass queueing network

                                   Meyn 1997, Meyn 1997b



                 optimal control   Chen and Meyn 1999

                   simulation      Hendersen et.al. 2003

             network scheduling    Veatch 2004
                    and routing    Moallemi, Kumar and Van Roy 2006

                                   Meyn 2007       Control Techniques for
                                                    Complex Networks


               other approaches     Tsitsiklis and Van Roy 1997
                                    Mannor, Menache and Shimkin 2005
Related Work
Multiclass queueing network

                                   Meyn 1997, Meyn 1997b



                 optimal control   Chen and Meyn 1999

                   simulation      Hendersen et.al. 2003

             network scheduling    Veatch 2004
                    and routing    Moallemi, Kumar and Van Roy 2006

                                   Meyn 2007       Control Techniques for
                                                    Complex Networks


               other approaches     Tsitsiklis and Van Roy 1997
                                    Mannor, Menache and Shimkin 2005
  Taylor series approximation      this work
Power Management via Speed Scaling
                                                           Bansal, Kimbrel and Pruhs 2007
                                                            Wierman, Andrew and Tang 2009

Single processor
                                          job arrivals

                             processing rate
                             determined by the current power

Control the processing speed to balance delay and energy costs



                                                         Kaxiras and Martonosi 2008
Processor design: polynomial cost                        Wierman, Andrew and Tang 2009
                                          This talk
We also consider
for wireless communication applications
Fluid Model                       MDP


Fluid model:




Total Cost



Total Cost Optimality Equation (TCOE) for the uid model:
Why Fluid Model?                   MDP




 First order Taylor series approximation
Why Fluid Model?                   MDP




 First order Taylor series approximation




  TCOE



  ACOE



         almost solves the ACOE            Simple but
                                           powerful idea!
Policy

         180

         160           Stochastic optimal policy
         140                  myopic policy
         120           Di erence
         100

          80

          60

          40

          20

           0

         −20
               0   2    4      6      8       10   12   14   16   18   20   x
Value Iteration

     250



     200



     150               Initialization:   V0     0
                       Initialization:   V0 =
     100



      50



      0
              5   10                15                  20   n




                                                    (See also [Chen and Meyn 1999])
Approximation of the Cost Function

 Error Analysis
                                constant?
Approximation of the Cost Function

 Error Analysis
                                 constant?


 Surrogate cost




                  approximates

     Bounds on           ?
Structure Results on the Fluid Solution
Lower Bound




              Convexity of
Lower Bound




              Convexity of
Upper Bound
Upper Bound
Upper Bound
Approach Based on Fluid and Di usion Models
                                                                             this talk: uid model

                                                                 Total cost for
Value function of the uid model                                  an associated deterministic model

      is a tight approximation to
        120



        100


                      Fluid value function
         80
                      Relative value function
         60



         40



         20



              0   2    4      6      8       10   12   14   16     18   20


  can be used as a part of the basis
TD Learning Experiment


         Basis functions:


4                                                     120



3                                                     100              Approximate relative value function
                                                                       Fluid value function
2                                                         80
                                                                       Relative value function

1                                                         60



0                                                         40



−1                                                        20



−2                                                        0
     0   1   2   3   4   5   6    7   8    9   10 x 104        0   2    4      6      8       10   12        14   16   18   20



 Estimates of Coe cients for the case of quadratic cost
TD Learning with Policy Improvement




   3       Average cost at stage




   2


       0        5            10     15         20            25




  Nearly optimal after just a few iterations
                                                    Need the value of the optimal policy
Conclusions

   The uid value function can be used as a part of the basis for TD-learning.


   Motivated by analysis using Taylor series expansion:
     The uid value function almost solves ACOE. In particular,
     it solves the ACOE for a slightly di erent cost function; and
     the error term can be estimated.


   TD learning with policy improvement gives a near optimal policy
   in a few iterations, as shown by experiments.


   Application in power management for processors.
References

  [1] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman.
      Approximate dynamic programming using fluid and diffusion approximations with applications to power
      management. Accepted for inclusion in the 48th IEEE Conference on Decision and Control, December
      16-18 2009.

  [1] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear in Proceedings of
      the 48th IEEE Conference on Decision and Control, December 16-18 2009.

  [1] R.-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing networks. Queueing
      Syst. Theory Appl., 32(1-3):65–97, 1999.

  [1] S. G. Henderson, S. P. Meyn, and V. B. Tadi´. Performance evaluation and policy selection in multiclass
                                                   c
      networks. Discrete Event Dynamic Systems: Theory and Applications, 13(1-2):149–189, 2003. Special
      issue on learning, optimization and decision making (invited).

  [1] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general
      state space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997.

  [1] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007.

  [1] C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming for
      queueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.

More Related Content

PDF
Approximate Dynamic Programming using Fluid and Diffusion Approximations with...
PDF
IJEST12-04-09-150
PDF
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
PDF
Hr2615091514
PPTX
Peridynamic simulation of delamination propagation in fiber-reinforced composite
PDF
Despeckling of Ultrasound Imaging using Median Regularized Coupled Pde
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
PPTX
Possessions and the extended self
Approximate Dynamic Programming using Fluid and Diffusion Approximations with...
IJEST12-04-09-150
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Hr2615091514
Peridynamic simulation of delamination propagation in fiber-reinforced composite
Despeckling of Ultrasound Imaging using Median Regularized Coupled Pde
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
Possessions and the extended self

Similar to Approximate dynamic programming using fluid and diffusion approximations with applications to power management (20)

PDF
Introducing LCS to Digital Design Verification
PDF
optimization and preparation processes.pdf
PDF
Script md a[1]
PDF
Fulltext
PDF
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
PDF
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
PDF
PDF
Approximate Algorithms for the Network Pricing Problem with Congestion - MS t...
PDF
Stochastic Programming
PDF
supervised and relational topic models
PDF
An introduction to Machine Learning
PDF
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA
PDF
65487681 60444264-engineering-optimization-theory-and-practice-4th-edition
PPT
Fast optimization intevacoct6_3final
PDF
Statistically adaptive learning for a general class of..
PDF
Statistically adaptive learning for a general class of..
PDF
Progr dinamica de_vazut
PDF
Start MPC
PDF
Eurogen v
PDF
Reading Materials for Operational Research
Introducing LCS to Digital Design Verification
optimization and preparation processes.pdf
Script md a[1]
Fulltext
All Minimal and Maximal Open Single Machine Scheduling Problems Are Polynomia...
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
Approximate Algorithms for the Network Pricing Problem with Congestion - MS t...
Stochastic Programming
supervised and relational topic models
An introduction to Machine Learning
Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA
65487681 60444264-engineering-optimization-theory-and-practice-4th-edition
Fast optimization intevacoct6_3final
Statistically adaptive learning for a general class of..
Statistically adaptive learning for a general class of..
Progr dinamica de_vazut
Start MPC
Eurogen v
Reading Materials for Operational Research
Ad

More from Sean Meyn (20)

PDF
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
PDF
DeepLearn2022 1. Goals & AlgorithmDesign.pdf
PDF
DeepLearn2022 3. TD and Q Learning
PDF
DeepLearn2022 2. Variance Matters
PDF
Smart Grid Tutorial - January 2019
PDF
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
PDF
Irrational Agents and the Power Grid
PDF
Zap Q-Learning - ISMP 2018
PDF
Introducing Zap Q-Learning
PDF
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
PDF
State estimation and Mean-Field Control with application to demand dispatch
PDF
Demand-Side Flexibility for Reliable Ancillary Services
PDF
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
PDF
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
PDF
Why Do We Ignore Risk in Power Economics?
PDF
Distributed Randomized Control for Ancillary Service to the Power Grid
PDF
Ancillary service to the grid from deferrable loads: the case for intelligent...
PDF
2012 Tutorial: Markets for Differentiated Electric Power Products
PDF
Control Techniques for Complex Systems
PDF
Tutorial for Energy Systems Week - Cambridge 2010
Quasi-Stochastic Approximation: Algorithm Design Principles with Applications...
DeepLearn2022 1. Goals & AlgorithmDesign.pdf
DeepLearn2022 3. TD and Q Learning
DeepLearn2022 2. Variance Matters
Smart Grid Tutorial - January 2019
State Space Collapse in Resource Allocation for Demand Dispatch - May 2019
Irrational Agents and the Power Grid
Zap Q-Learning - ISMP 2018
Introducing Zap Q-Learning
Reinforcement Learning: Hidden Theory and New Super-Fast Algorithms
State estimation and Mean-Field Control with application to demand dispatch
Demand-Side Flexibility for Reliable Ancillary Services
Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Serv...
Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid: Elim...
Why Do We Ignore Risk in Power Economics?
Distributed Randomized Control for Ancillary Service to the Power Grid
Ancillary service to the grid from deferrable loads: the case for intelligent...
2012 Tutorial: Markets for Differentiated Electric Power Products
Control Techniques for Complex Systems
Tutorial for Energy Systems Week - Cambridge 2010
Ad

Recently uploaded (20)

PDF
SEVA- Fashion designing-Presentation.pdf
PPT
EGWHermeneuticsffgggggggggggggggggggggggggggggggg.ppt
PDF
Urban Design Final Project-Site Analysis
PPTX
Special finishes, classification and types, explanation
PPTX
Fundamental Principles of Visual Graphic Design.pptx
PDF
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
PPTX
12. Community Pharmacy and How to organize it
PDF
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
PDF
Facade & Landscape Lighting Techniques and Trends.pptx.pdf
PPT
UNIT I- Yarn, types, explanation, process
PPT
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
PDF
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
PPTX
HPE Aruba-master-icon-library_052722.pptx
PPTX
ANATOMY OF ANTERIOR CHAMBER ANGLE AND GONIOSCOPY.pptx
PDF
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
PDF
BRANDBOOK-Presidential Award Scheme-Kenya-2023
PPTX
An introduction to AI in research and reference management
PPTX
Tenders & Contracts Works _ Services Afzal.pptx
PPTX
AC-Unit1.pptx CRYPTOGRAPHIC NNNNFOR ALL
PDF
Urban Design Final Project-Context
SEVA- Fashion designing-Presentation.pdf
EGWHermeneuticsffgggggggggggggggggggggggggggggggg.ppt
Urban Design Final Project-Site Analysis
Special finishes, classification and types, explanation
Fundamental Principles of Visual Graphic Design.pptx
Trusted Executive Protection Services in Ontario — Discreet & Professional.pdf
12. Community Pharmacy and How to organize it
Key Trends in Website Development 2025 | B3AITS - Bow & 3 Arrows IT Solutions
Facade & Landscape Lighting Techniques and Trends.pptx.pdf
UNIT I- Yarn, types, explanation, process
WHY_R12 Uaafafafpgradeaffafafafaffff.ppt
Design Thinking - Module 1 - Introduction To Design Thinking - Dr. Rohan Dasg...
HPE Aruba-master-icon-library_052722.pptx
ANATOMY OF ANTERIOR CHAMBER ANGLE AND GONIOSCOPY.pptx
Emailing DDDX-MBCaEiB.pdf DDD_Europe_2022_Intro_to_Context_Mapping_pdf-165590...
BRANDBOOK-Presidential Award Scheme-Kenya-2023
An introduction to AI in research and reference management
Tenders & Contracts Works _ Services Afzal.pptx
AC-Unit1.pptx CRYPTOGRAPHIC NNNNFOR ALL
Urban Design Final Project-Context

Approximate dynamic programming using fluid and diffusion approximations with applications to power management

  • 1. Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management Speaker: Dayu Huang Wei Chen, Dayu Huang, Ankur A. Kulkarni,1Jayakrishnan Unnikrishnan, Quanyan Zhu, Prashant Mehta, Sean Meyn, and Adam Wierman 2 Coordinated Science Laboratory, UIUC Dept. of IESE, UIUC 1 Dept. of CS, California Inst. of Tech. 2 4 120 3 100 80 J 2 1 60 National Science Foundation (ECS-0523620 and CCF-0830511), 0 40 ITMANET DARPA RK 2006-07284, and Microsoft Research −1 20 −2 n 0 x 0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20
  • 2. Introduction MDP model Control i.i.d Cost Minimize average cost
  • 3. Introduction MDP model Control i.i.d Cost Minimize average cost Generator
  • 4. Introduction MDP model Control i.i.d Cost Minimize average cost Average Cost Optimality Equation (ACOE) Generator Relative value function Solve ACOE and Find
  • 5. TD Learning The “curse of dimensionality”: Complexity of solving ACOE grows exponentially with the dimension of the state space. Approximate within a nite-dimensional function class Criterion: minimize the mean-squre error solved by stochastic approximation algorithms
  • 6. TD Learning The “curse of dimensionality”: Complexity of solving ACOE grows exponentially with the dimension of the state space. Approximate within a nite-dimensional function class Criterion: minimize the mean-squre error solved by stochastic approximation algorithms Problem: How to select the basis functions ? key to the success of TD learning
  • 7. Total cost for an associated deterministic model is a tight approximation to 120 100 Fluid value function 80 Relative value function 60 40 20 0 2 4 6 8 10 12 14 16 18 20 can be used as a part of the basis
  • 8. Related Work Multiclass queueing network Meyn 1997, Meyn 1997b optimal control Chen and Meyn 1999 simulation Hendersen et.al. 2003 network scheduling Veatch 2004 and routing Moallemi, Kumar and Van Roy 2006 Meyn 2007 Control Techniques for Complex Networks other approaches Tsitsiklis and Van Roy 1997 Mannor, Menache and Shimkin 2005
  • 9. Related Work Multiclass queueing network Meyn 1997, Meyn 1997b optimal control Chen and Meyn 1999 simulation Hendersen et.al. 2003 network scheduling Veatch 2004 and routing Moallemi, Kumar and Van Roy 2006 Meyn 2007 Control Techniques for Complex Networks other approaches Tsitsiklis and Van Roy 1997 Mannor, Menache and Shimkin 2005 Taylor series approximation this work
  • 10. Power Management via Speed Scaling Bansal, Kimbrel and Pruhs 2007 Wierman, Andrew and Tang 2009 Single processor job arrivals processing rate determined by the current power Control the processing speed to balance delay and energy costs Kaxiras and Martonosi 2008 Processor design: polynomial cost Wierman, Andrew and Tang 2009 This talk We also consider for wireless communication applications
  • 11. Fluid Model MDP Fluid model: Total Cost Total Cost Optimality Equation (TCOE) for the uid model:
  • 12. Why Fluid Model? MDP First order Taylor series approximation
  • 13. Why Fluid Model? MDP First order Taylor series approximation TCOE ACOE almost solves the ACOE Simple but powerful idea!
  • 14. Policy 180 160 Stochastic optimal policy 140 myopic policy 120 Di erence 100 80 60 40 20 0 −20 0 2 4 6 8 10 12 14 16 18 20 x
  • 15. Value Iteration 250 200 150 Initialization: V0 0 Initialization: V0 = 100 50 0 5 10 15 20 n (See also [Chen and Meyn 1999])
  • 16. Approximation of the Cost Function Error Analysis constant?
  • 17. Approximation of the Cost Function Error Analysis constant? Surrogate cost approximates Bounds on ?
  • 18. Structure Results on the Fluid Solution
  • 19. Lower Bound Convexity of
  • 20. Lower Bound Convexity of
  • 24. Approach Based on Fluid and Di usion Models this talk: uid model Total cost for Value function of the uid model an associated deterministic model is a tight approximation to 120 100 Fluid value function 80 Relative value function 60 40 20 0 2 4 6 8 10 12 14 16 18 20 can be used as a part of the basis
  • 25. TD Learning Experiment Basis functions: 4 120 3 100 Approximate relative value function Fluid value function 2 80 Relative value function 1 60 0 40 −1 20 −2 0 0 1 2 3 4 5 6 7 8 9 10 x 104 0 2 4 6 8 10 12 14 16 18 20 Estimates of Coe cients for the case of quadratic cost
  • 26. TD Learning with Policy Improvement 3 Average cost at stage 2 0 5 10 15 20 25 Nearly optimal after just a few iterations Need the value of the optimal policy
  • 27. Conclusions The uid value function can be used as a part of the basis for TD-learning. Motivated by analysis using Taylor series expansion: The uid value function almost solves ACOE. In particular, it solves the ACOE for a slightly di erent cost function; and the error term can be estimated. TD learning with policy improvement gives a near optimal policy in a few iterations, as shown by experiments. Application in power management for processors.
  • 28. References [1] W. Chen, D. Huang, A. Kulkarni, J. Unnikrishnan, Q. Zhu, P. Mehta, S. Meyn, and A. Wierman. Approximate dynamic programming using fluid and diffusion approximations with applications to power management. Accepted for inclusion in the 48th IEEE Conference on Decision and Control, December 16-18 2009. [1] P. Mehta and S. Meyn. Q-learning and Pontryagin’s Minimum Principle. To appear in Proceedings of the 48th IEEE Conference on Decision and Control, December 16-18 2009. [1] R.-R. Chen and S. P. Meyn. Value iteration and optimization of multiclass queueing networks. Queueing Syst. Theory Appl., 32(1-3):65–97, 1999. [1] S. G. Henderson, S. P. Meyn, and V. B. Tadi´. Performance evaluation and policy selection in multiclass c networks. Discrete Event Dynamic Systems: Theory and Applications, 13(1-2):149–189, 2003. Special issue on learning, optimization and decision making (invited). [1] S. P. Meyn. The policy iteration algorithm for average reward Markov decision processes with general state space. IEEE Trans. Automat. Control, 42(12):1663–1680, 1997. [1] S. P. Meyn. Control Techniques for Complex Networks. Cambridge University Press, Cambridge, 2007. [1] C. Moallemi, S. Kumar, and B. Van Roy. Approximate and data-driven dynamic programming for queueing networks. Preprint available at http://moallemi.com/ciamac/research-interests.php, 2008.