- Ahmadi, M.; Rosolia, U.; Ingham, M.; Murray, R.; Ames, A. Constrained Risk-Averse Markov Decision Processes. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, New York, NY, USA, 7â12 February 2020.
Paper not yet in RePEc: Add citation now
- Altman, E. Constrained Markov Decision Processes; Routledge: London, UK, 1999.
Paper not yet in RePEc: Add citation now
Bakker, H.; Dunke, F.; Nickel, S. A structuring review on multi-stage optimization under uncertainty: Aligning concepts from theory and practice. Omega 2020, 96, 102080. [CrossRef]
Basso, R.; Kulcsár, B.; Sanchez-Diaz, I.; Qu, X. Dynamic stochastic electric vehicle routing with safe reinforcement learning. Transp. Res. Part E Logist. Transp. Rev. 2022, 157, 102496. [CrossRef]
Boda, K.; Filar, J.A. Time Consistent Dynamic Risk Measures. Math. Methods Oper. Res. 2006, 63, 169â186. [CrossRef]
- Boland, N.; Christiansen, J.; Dandurand, B.; Eberhard, A.; Oliveira, F. A parallelizable augmented Lagrangian method applied to large-scale non-convex-constrained optimization problems. Math. Program. 2019, 175, 503â536. [CrossRef]
Paper not yet in RePEc: Add citation now
- Borkar, V.S. A convex analytic approach to Markov decision processes. Probab. Theory Relat. Fields 1988, 78, 583â602. [CrossRef]
Paper not yet in RePEc: Add citation now
- Borkar, V.S. An actor-critic algorithm for constrained Markov decision processes. Syst. Control Lett. 2005, 54, 207â213. [CrossRef]
Paper not yet in RePEc: Add citation now
- Chen, X.; Karimi, B.; Zhao, W.; Li, P. On the Convergence of Decentralized Adaptive Gradient Methods. arXiv 2021, arXiv:2109.03194. Available online: https://ui.adsabs.harvard.edu/abs/2021arXiv210903194C (accessed on 26 May 2024).
Paper not yet in RePEc: Add citation now
- Chow, Y.; Ghavamzadeh, M.; Janson, L.; Pavone, M. Risk-constrained reinforcement learning with percentile risk criteria. J. Mach. Learn. Res. 2017, 18, 6070â6120.
Paper not yet in RePEc: Add citation now
- Chow, Y.; Nachum, O.; Duenez-Guzman, E.; Ghavamzadeh, M. A lyapunov-based approach to safe reinforcement learning. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, Montréal, QC, Canada, 2â8 December 2018.
Paper not yet in RePEc: Add citation now
Coache, A.; Jaimungal, S.; Cartea, . Conditionally Elicitable Dynamic Risk Measures for Deep Reinforcement Learning. SSRN Electron. J. 2023, 14, 1249â1289. [CrossRef]
- Collins, A.G.E. Reinforcement learning: Bringing together computation and cognition. Curr. Opin. Behav. Sci. 2019, 29, 63â68. [CrossRef]
Paper not yet in RePEc: Add citation now
- Dalal, G.; Dvijotham, K.; VecerÃk, M.; Hester, T.; Paduraru, C.; Tassa, Y.J.A. Safe Exploration in Continuous Action Spaces. arXiv 2018, arXiv:1801.08757.
Paper not yet in RePEc: Add citation now
- Demizu, T.; Fukazawa, Y.; Morita, H. Inventory management of new products in retailers using model-based deep reinforcement learning. Expert Syst. Appl. 2023, 229, 120256. [CrossRef]
Paper not yet in RePEc: Add citation now
- Ding, S.; Wang, J.; Du, Y.; Shi, Y. Reduced Policy Optimization for Continuous Control with Hard Constraints. arXiv 2023, arXiv:2310.09574.
Paper not yet in RePEc: Add citation now
- Dinh Thai, H.; Nguyen Van, H.; Diep, N.N.; Ekram, H.; Dusit, N. Markov Decision Process and Reinforcement Learning. In Deep Reinforcement Learning for Wireless Communications and Networking: Theory, Applications and Implementation; Wiley-IEEE Press: Hoboken, NJ, USA, 2023; pp. 25â36.
Paper not yet in RePEc: Add citation now
Dowd, K.; Cotter, J. Spectral Risk Measures and the Choice of Risk Aversion Function. arXiv 2011, arXiv:1103.5668.
- Dowson, O.; Kapelevich, L. SDDP.jl: A Julia Package for Stochastic Dual Dynamic Programming. INFORMS J. Comput. 2021, 33, 27â33. [CrossRef]
Paper not yet in RePEc: Add citation now
- Escudero, L.F.; GarÃn, M.A.; Monge, J.F.; Unzueta, A. On preparedness resource allocation planning for natural disaster relief under endogenous uncertainty with time-consistent risk-averse management. Comput. Oper. Res. 2018, 98, 84â102. [CrossRef]
Paper not yet in RePEc: Add citation now
- Gillies, A.W. Some Aspects of Analysis and Probability. Phys. Bull. 1959, 10, 65. [CrossRef]
Paper not yet in RePEc: Add citation now
- Gu, S.; Yang, L.; Du, Y.; Chen, G.; Walter, F.; Wang, J.; Yang, Y.; Knoll, A. A Review of Safe Reinforcement Learning: Methods, Theory and Applications. arXiv 2022, arXiv:2205.10330. Mathematics 2024, 12, 1954 28 of 29
Paper not yet in RePEc: Add citation now
- Habib, M.S. Robust Optimization for Post-Disaster Debris Management in Humanitarian Supply Chain: A Sustainable Recovery Approach. Ph.D. Thesis, Hanyang University, Seoul, Republic of Korea, 2018.
Paper not yet in RePEc: Add citation now
- Habib, M.S.; Maqsood, M.H.; Ahmed, N.; Tayyab, M.; Omair, M. A multi-objective robust possibilistic programming approach for sustainable disaster waste management under disruptions and uncertainties. Int. J. Disaster Risk Reduct. 2022, 75, 102967. [CrossRef]
Paper not yet in RePEc: Add citation now
- Habib, M.S.; Sarkar, B. A multi-objective approach to sustainable disaster waste management. In Proceedings of the International Conference on Industrial Engineering and Operations Management, Paris, Farance, 26â27 July 2018; pp. 1072â1083.
Paper not yet in RePEc: Add citation now
- Hildebrandt, F.D.; Thomas, B.W.; Ulmer, M.W. Opportunities for reinforcement learning in stochastic dynamic vehicle routing. Comput. Oper. Res. 2023, 150, 106071. [CrossRef]
Paper not yet in RePEc: Add citation now
- Hussain, A.; Masood, T.; Munir, H.; Habib, M.S.; Farooq, M.U. Developing resilience in disaster relief operations management through lean transformation. Prod. Plan. Control 2023, 34, 1475â1496. [CrossRef]
Paper not yet in RePEc: Add citation now
- Kamyabniya, A.; Sauré, A.; Salman, F.S.; Bénichou, N.; Patrick, J. Optimization models for disaster response operations: A literature review. OR Spectr. 2024, 46, 1â47. [CrossRef]
Paper not yet in RePEc: Add citation now
- Lee, J.; Lee, K.; Moon, I. A reinforcement learning approach for multi-fleet aircraft recovery under airline disruption. Appl. Soft Comput. 2022, 129, 109556. [CrossRef]
Paper not yet in RePEc: Add citation now
- Li, J.; Fridovich-Keil, D.; Sojoudi, S.; Tomlin, C.J. Augmented Lagrangian Method for Instantaneously Constrained Reinforcement Learning Problems. In Proceedings of the 2021 60th IEEE Conference on Decision and Control (CDC), Austin, TX, USA, 14â17 December 2021; pp. 2982â2989.
Paper not yet in RePEc: Add citation now
- Liu, K.; Yang, L.; Zhao, Y.; Zhang, Z.-H. Multi-period stochastic programming for relief delivery considering evolving transportation network and temporary facility relocation/closure. Transp. Res. Part E Logist. Transp. Rev. 2023, 180, 103357. [CrossRef]
Paper not yet in RePEc: Add citation now
- Liu, P.; Zhang, Y.; Bao, F.; Yao, X.; Zhang, C. Multi-type data fusion framework based on deep reinforcement learning for algorithmic trading. Appl. Intell. 2023, 53, 1683â1706. [CrossRef]
Paper not yet in RePEc: Add citation now
- Lockwood, P.L.; Klein-Flügge, M.C. Computational modelling of social cognition and behaviourâA reinforcement learning primer. Soc. Cogn. Affect. Neurosci. 2020, 16, 761â771. [CrossRef] [PubMed]
Paper not yet in RePEc: Add citation now
- Morillo, J.L.; Zéphyr, L.; Pérez, J.F.; Lindsay Anderson, C.; Cadena, . Risk-averse stochastic dual dynamic programming approach for the operation of a hydro-dominated power system in the presence of wind uncertainty. Int. J. Electr. Power Energy Syst. 2020, 115, 105469. [CrossRef]
Paper not yet in RePEc: Add citation now
- Nguyen, N.D.; Nguyen, T.T.; Vamplew, P.; Dazeley, R.; Nahavandi, S. A Prioritized objective actor-critic method for deep reinforcement learning. Neural Comput. Appl. 2021, 33, 10335â10349. [CrossRef]
Paper not yet in RePEc: Add citation now
- Paternain, S.; Chamon, L.F.O.; Calvo-Fullana, M.; Ribeiro, A. Constrained reinforcement learning has zero duality gap. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8â14 December 2019; Curran Associates Inc.: New York, NY, USA, 2019; p. 679. Mathematics 2024, 12, 1954 29 of 29
Paper not yet in RePEc: Add citation now
- Peng, X.B.; Abbeel, P.; Levine, S.; Panne, M.V.D. DeepMimic: Example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 2018, 37, 143. [CrossRef]
Paper not yet in RePEc: Add citation now
- Rao, J.J.; Ravulapati, K.K.; Das, T.K. A simulation-based approach to study stochastic inventory-planning games. Int. J. Syst. Sci. 2003, 34, 717â730. [CrossRef]
Paper not yet in RePEc: Add citation now
- Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1997. (In English)
Paper not yet in RePEc: Add citation now
- RodrÃguez-EspÃndola, O. Two-stage stochastic formulation for relief operations with multiple agencies in simultaneous disasters. OR Spectr. 2023, 45, 477â523. [CrossRef]
Paper not yet in RePEc: Add citation now
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, Lille, France, 6â11 July 2015. Available online: https://proceedings.mlr.press/v37/schulman15.html (accessed on 26 May 2024).
Paper not yet in RePEc: Add citation now
Shapiro, A.; Tekaya, W.; da Costa, J.P.; Soares, M.P. Risk neutral and risk averse Stochastic Dual Dynamic Programming method. Eur. J. Oper. Res. 2013, 224, 375â391. [CrossRef]
- Shavandi, A.; Khedmati, M. A multi-agent deep reinforcement learning framework for algorithmic trading in financial markets. Expert Syst. Appl. 2022, 208, 118124. [CrossRef]
Paper not yet in RePEc: Add citation now
Shi, T.; Xu, C.; Dong, W.; Zhou, H.; Bokhari, A.; Klemeš, J.J.; Han, N. Research on energy management of hydrogen electric coupling system based on deep reinforcement learning. Energy 2023, 282, 128174. [CrossRef]
- Tamar, A.; Castro, D.D.; Mannor, S. Policy gradients with variance related risk criteria. In Proceedings of the 29th International Coference on International Conference on Machine Learning, Edinburgh, UK, 26 Juneâ1 July 2012.
Paper not yet in RePEc: Add citation now
- Tamar, A.; Mannor, S. Variance Adjusted Actor Critic Algorithms. arXiv 2013, arXiv:1310.3697.
Paper not yet in RePEc: Add citation now
Van Wassenhove, L.N. Humanitarian aid logistics: Supply chain management in high gear. J. Oper. Res. Soc. 2006, 57, 475â489. [CrossRef]
- Venkatasatish, R.; Dhanamjayulu, C. Reinforcement learning based energy management systems and hydrogen refuelling stations for fuel cell electric vehicles: An overview. Int. J. Hydrogen Energy 2022, 47, 27646â27670. [CrossRef]
Paper not yet in RePEc: Add citation now
Wang, D.; Yang, K.; Yang, L. Risk-averse two-stage distributionally robust optimisation for logistics planning in disaster relief management. Int. J. Prod. Res. 2023, 61, 668â691. [CrossRef]
- Wang, K.; Long, C.; Ong, D.J.; Zhang, J.; Yuan, X.M. Single-Site Perishable Inventory Management Under Uncertainties: A Deep Reinforcement Learning Approach. IEEE Trans. Knowl. Data Eng. 2023, 35, 10807â10813. [CrossRef]
Paper not yet in RePEc: Add citation now
- Wang, Y.; Zhan, S.S.; Jiao, R.; Wang, Z.; Jin, W.; Yang, Z.; Wang, Z.; Huang, C.; Zhu, Q. Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments. In Proceedings of the 40th International Conference on Machine Learning, Proceedings of Machine Learning Research, Honolulu, HI, USA, 23â29 July 2023; pp. 36593â36604. Available online: https://proceedings.mlr.press/v202/wang23as.html (accessed on 26 May 2024).
Paper not yet in RePEc: Add citation now
- Wang, Z.; Shi, X.; Ma, C.; Wu, L.; Wu, J. CCPO: Conservatively Constrained Policy Optimization Using State Augmentation; IOS Press: Amsterdam, The Netherlands, 2023.
Paper not yet in RePEc: Add citation now
Waubert de Puiseau, C.; Meyes, R.; Meisen, T. On reliability of reinforcement learning based production scheduling systems: A comparative survey. J. Intell. Manuf. 2022, 33, 911â927. [CrossRef]
- Yang, Q.; Simão, T.D.; Tindemans, S.H.; Spaan, M.T.J. Safety-constrained reinforcement learning with a distributional safety critic. Mach. Learn. 2023, 112, 859â887. [CrossRef]
Paper not yet in RePEc: Add citation now
- Yin, X.; Büyüktahtakın, IÌ.E. Risk-averse multi-stage stochastic programming to optimizing vaccine allocation and treatment logistics for effective epidemic response. IISE Trans. Healthc. Syst. Eng. 2022, 12, 52â74. [CrossRef]
Paper not yet in RePEc: Add citation now
Yu, G.; Liu, A.; Sun, H. Risk-averse flexible policy on ambulance allocation in humanitarian operations under uncertainty. Int. J. Prod. Res. 2021, 59, 2588â2610. [CrossRef]
Yu, L.; Yang, H.; Miao, L.; Zhang, C. Rollout algorithms for resource allocation in humanitarian logistics. IISE Trans. 2019, 51, 887â909. [CrossRef]
- Yu, L.; Zhang, C.; Jiang, J.; Yang, H.; Shang, H. Reinforcement learning approach for resource allocation in humanitarian logistics. Expert Syst. Appl. 2021, 173, 114663. [CrossRef]
Paper not yet in RePEc: Add citation now
- Zabihi, Z.; Moghadam, A.M.E.; Rezvani, M.H. Reinforcement Learning Methods for Computing Offloading: A Systematic Review. ACM Comput. Surv. 2023, 56, 17. [CrossRef]
Paper not yet in RePEc: Add citation now
- Zhang, L.; Shen, L.; Yang, L.; Chen, S.; Wang, X.; Yuan, B.; Tao, D. Penalized Proximal Policy Optimization for Safe Reinforcement Learning. arXiv 2022, arXiv:2205.11814, 3719â3725.
Paper not yet in RePEc: Add citation now
- Zhuang, X.; Zhang, Y.; Han, L.; Jiang, J.; Hu, L.; Wu, S. Two-stage stochastic programming with robust constraints for the logistics network post-disruption response strategy optimization. Front. Eng. Manag. 2023, 10, 67â81. [CrossRef]
Paper not yet in RePEc: Add citation now