create a website

Recent advances in reinforcement learning in finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
In: Mathematical Finance.
RePEc:bla:mathfi:v:33:y:2023:i:3:p:437-503.

Full description at Econpapers || Download paper

Cited: 23

Citations received by this document

Cites: 263

References cited by this document

Cocites: 29

Documents which have cited the same bibliography

Coauthors: 0

Authors who have wrote about the same topic

Citations

Citations received by this document

  1. Explainable post hoc portfolio management financial policy of a Deep Reinforcement Learning agent. (2025). Garrido-Merchn, Eduardo C ; De-La, Alejandra ; Coronado-Vaca, Mara.
    In: PLOS ONE.
    RePEc:plo:pone00:0315528.

    Full description at Econpapers || Download paper

  2. Integration of investor behavioral perspective and climate change in reinforcement learning for portfolio optimization. (2025). Jebabli, Ikram ; Bouyaddou, Youssef.
    In: Research in International Business and Finance.
    RePEc:eee:riibaf:v:73:y:2025:i:pb:s027553192400432x.

    Full description at Econpapers || Download paper

  3. High-dimensional multi-period portfolio allocation using deep reinforcement learning. (2025). Olmo, Jose ; Atwi, Majed ; Jiang, Yifu.
    In: International Review of Economics & Finance.
    RePEc:eee:reveco:v:98:y:2025:i:c:s1059056025001595.

    Full description at Econpapers || Download paper

  4. Is the difference between deep hedging and delta hedging a statistical arbitrage?. (2025). Prez, Carlos Octavio ; Godin, Frdric ; Gauthier, Genevive ; Franois, Pascal.
    In: Finance Research Letters.
    RePEc:eee:finlet:v:73:y:2025:i:c:s1544612324016192.

    Full description at Econpapers || Download paper

  5. Risk-sensitive Reinforcement Learning Based on Convex Scoring Functions. (2025). Liu, Yang ; Yu, Xiang ; Han, Shanyu.
    In: Papers.
    RePEc:arx:papers:2505.04553.

    Full description at Econpapers || Download paper

  6. Financial Data Analysis with Robust Federated Logistic Regression. (2025). Kulkarni, Sanjeev R ; Yang, Kun ; Krishnan, Nikhil.
    In: Papers.
    RePEc:arx:papers:2504.20250.

    Full description at Econpapers || Download paper

  7. Diffusion Factor Models: Generating High-Dimensional Returns with Factor Structure. (2025). Zhang, Ruixun ; Xu, Yumin ; Chen, Minshuo.
    In: Papers.
    RePEc:arx:papers:2504.06566.

    Full description at Econpapers || Download paper

  8. Predicting Liquidity-Aware Bond Yields using Causal GANs and Deep Reinforcement Learning with LLM Evaluation. (2025). Sinha, Aarush ; Srinivasan, Srinitish ; Unnikrishnan, Srihari ; Walia, Jaskaran Singh.
    In: Papers.
    RePEc:arx:papers:2502.17011.

    Full description at Econpapers || Download paper

  9. FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading. (2025). Lin, Mingquan ; Peng, Xueqing ; Yu, Yangyang ; Cao, Yupeng ; Wang, Keyi ; Smith, Kaleb E ; Deng, Zhiyang ; Xie, Qianqian ; Xiong, Guojun ; Ananiadou, Sophia ; Huang, Jimin ; Liu, Xiao-Yang.
    In: Papers.
    RePEc:arx:papers:2502.11433.

    Full description at Econpapers || Download paper

  10. Regret-Optimized Portfolio Enhancement through Deep Reinforcement Learning and Future Looking Rewards. (2025). Garz, Rub'En ; Gulcehre, Caglar ; Terekhov, Mikhail ; Karzanov, Daniil ; Detyniecki, Marcin ; Raffinot, Thomas.
    In: Papers.
    RePEc:arx:papers:2502.02619.

    Full description at Econpapers || Download paper

  11. Markov decision processes with risk-sensitive criteria: an overview. (2024). Jakiewicz, Anna ; Buerle, Nicole.
    In: Mathematical Methods of Operations Research.
    RePEc:spr:mathme:v:99:y:2024:i:1:d:10.1007_s00186-024-00857-0.

    Full description at Econpapers || Download paper

  12. Integrating Deep Learning and Reinforcement Learning for Enhanced Financial Risk Forecasting in Supply Chain Management. (2024). Yao, Fengtong ; Cui, Yuanfei.
    In: Journal of the Knowledge Economy.
    RePEc:spr:jknowl:v:15:y:2024:i:4:d:10.1007_s13132-024-01946-5.

    Full description at Econpapers || Download paper

  13. The impact of guarantee network on the risk of corporate stock price crash: Discussing the moderating effect of internal control quality. (2024). Weng, Yudong ; Wang, Ziqi ; Yu, Hongxiang.
    In: International Review of Economics & Finance.
    RePEc:eee:reveco:v:96:y:2024:i:pc:s1059056024007202.

    Full description at Econpapers || Download paper

  14. Relationship between deep hedging and delta hedging: Leveraging a statistical arbitrage strategy. (2024). Nakagawa, Kei ; Horikawa, Hiroaki.
    In: Finance Research Letters.
    RePEc:eee:finlet:v:62:y:2024:i:pa:s1544612324001314.

    Full description at Econpapers || Download paper

  15. Reinforcement learning for continuous-time mean-variance portfolio selection in a regime-switching market. (2024). Li, Lingfei ; Wu, BO.
    In: Journal of Economic Dynamics and Control.
    RePEc:eee:dyncon:v:158:y:2024:i:c:s0165188923001938.

    Full description at Econpapers || Download paper

  16. Fast Deep Hedging with Second-Order Optimization. (2024). Wood, Ben ; Gonon, Lukas ; Akkari, Amira ; Mueller, Konrad.
    In: Papers.
    RePEc:arx:papers:2410.22568.

    Full description at Econpapers || Download paper

  17. Reinforcement Learning in High-frequency Market Making. (2024). Ding, Zihan ; Zheng, Yuheng.
    In: Papers.
    RePEc:arx:papers:2407.21025.

    Full description at Econpapers || Download paper

  18. Is the difference between deep hedging and delta hedging a statistical arbitrage?. (2024). Franccois, Pascal ; Gauthier, Genevieve ; Fr'ed'eric Godin, ; Octavio, Carlos.
    In: Papers.
    RePEc:arx:papers:2407.14736.

    Full description at Econpapers || Download paper

  19. Financial Assets Dependency Prediction Utilizing Spatiotemporal Patterns. (2024). Hung, Wilfred Siu ; Zhao, Pengfei ; Zhu, Haoren ; Lee, Dik Lun.
    In: Papers.
    RePEc:arx:papers:2406.11886.

    Full description at Econpapers || Download paper

  20. Curriculum Learning and Imitation Learning for Model-free Control on Financial Time-series. (2024). Jang, Yuntae ; Koh, Woosung ; Kim, Woo Chang ; Choi, Insu ; Kang, Gimin.
    In: Papers.
    RePEc:arx:papers:2311.13326.

    Full description at Econpapers || Download paper

  21. Reinforcement Learning for Financial Index Tracking. (2024). He, Xuedong ; Peng, Xianhua ; Gong, Chenyin.
    In: Papers.
    RePEc:arx:papers:2308.02820.

    Full description at Econpapers || Download paper

  22. Deep Reinforcement Learning for Dynamic Stock Option Hedging: A Review. (2023). Lawryshyn, Yuri ; Pickard, Reilly.
    In: Mathematics.
    RePEc:gam:jmathe:v:11:y:2023:i:24:p:4943-:d:1299173.

    Full description at Econpapers || Download paper

  23. Discrete-Time Mean-Variance Strategy Based on Reinforcement Learning. (2023). Shi, Yun ; Li, Xun ; Cui, Xiangyu ; Zhao, SI.
    In: Papers.
    RePEc:arx:papers:2312.15385.

    Full description at Econpapers || Download paper

References

References cited by this document

  1. Abbasi‐Yadkori, Y., Bartlett, P., Bhatia, K., Lazic, N., Szepesvari, C., & Weisz, G. (2019). Politex: Regret bounds for policy iteration using expert prediction. In International conference on machine learning (pp. 3692–3702). PMLR.
    Paper not yet in RePEc: Add citation now
  2. Abernethy, J. D., & Kale, S. (2013). Adaptive market making via online learning. In NIPS (pp. 2058–2066). Citeseer.
    Paper not yet in RePEc: Add citation now
  3. Aboussalah, A. M. (2020). What is the value of the cross‐sectional approach to deep reinforcement learning? Available at SSRN, 22(6), 1091–1111.
    Paper not yet in RePEc: Add citation now
  4. Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In International conference on machine learning (pp. 22–31). PMLR.
    Paper not yet in RePEc: Add citation now
  5. Agarwal, A., Bartlett, P., & Dama, M. (2010). Optimal allocation strategies for the dark pool problem. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 9–16). JMLR Workshop and Conference Proceedings.
    Paper not yet in RePEc: Add citation now
  6. Agarwal, A., Kakade, S. M., Lee, J. D., & Mahajan, G. (2021). On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98), 1–76.
    Paper not yet in RePEc: Add citation now
  7. Agarwal, A., Kakade, S., & Yang, L. F. (2020). Model‐based reinforcement learning with a generative model is minimax optimal. In Conference on learning theory (pp. 67–83). PMLR.
    Paper not yet in RePEc: Add citation now
  8. Almgren, R., & Chriss, N. (2001). Optimal execution of portfolio transactions. Journal of Risk, 3, 5–40.
    Paper not yet in RePEc: Add citation now
  9. Alsabah, H., Capponi, A., Ruiz Lacedelli, O., & Stern, M. (2021). Robo‐advising: Learning investors' risk preferences via portfolio choices. Journal of Financial Econometrics, 19(2), 369–392.
    Paper not yet in RePEc: Add citation now
  10. Asadi, K., & Littman, M. L. (2017). An alternative softmax operator for reinforcement learning. In International conference on machine learning (pp. 243–252). PMLR.
    Paper not yet in RePEc: Add citation now
  11. Avellaneda, M., & Stoikov, S. (2008). High‐frequency trading in a limit order book. Quantitative Finance, 8(3), 217–224.
    Paper not yet in RePEc: Add citation now
  12. Azar, M. G., Munos, R., & Kappen, B. (2012). On the sample complexity of reinforcement learning with a generative model. arXiv preprint arXiv:1206.6461, 1707–1714.
    Paper not yet in RePEc: Add citation now
  13. Azar, M. G., Munos, R., & Kappen, H. J. (2013). Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning, 91(3), 325–349.
    Paper not yet in RePEc: Add citation now
  14. Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. In International conference on machine learning (pp. 263–272). PMLR.
    Paper not yet in RePEc: Add citation now
  15. Baldacci, B., & Manziuk, I. (2020). Adaptive trading strategies across liquidity pools. arXiv preprint arXiv:2008.07807.

  16. Baldacci, B., Manziuk, I., Mastrolia, T., & Rosenbaum, M. (2019). Market making and incentives design in the presence of a dark pool: A deep reinforcement learning approach. arXiv preprint arXiv:1912.01129.

  17. Bao, W., & Liu, X.‐y. (2019). Multi‐agent deep reinforcement learning for liquidation strategy analysis. arXiv preprint arXiv:1906.11046.

  18. Basak, S., & Chabakauri, G. (2010). Dynamic mean‐variance asset allocation. The Review of Financial Studies, 23(8), 2970–3016.
    Paper not yet in RePEc: Add citation now
  19. Basei, M., Guo, X., Hu, A., & Zhang, Y. (2021). Logarithmic regret for episodic continuous‐time linear‐quadratic reinforcement learning over a finite‐time horizon. Available at SSRN 3848428, 23(178), 1–34.
    Paper not yet in RePEc: Add citation now
  20. Beck, C. L., & Srikant, R. (2012). Error bounds for constant step‐size Q‐learning. Systems & Control Letters, 61(12), 1203–1208.
    Paper not yet in RePEc: Add citation now
  21. Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279.
    Paper not yet in RePEc: Add citation now
  22. Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (Monographs on statistics and applied probability) (Vol. 5, pp. 7). Chapman and Hall.
    Paper not yet in RePEc: Add citation now
  23. Bhandari, J., & Russo, D. (2019). Global optimality guarantees for policy gradient methods. arXiv preprint arXiv:1906.01786.
    Paper not yet in RePEc: Add citation now
  24. Bhandari, J., Russo, D., & Singal, R. (2018). A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory (pp. 1691–1692). PMLR.
    Paper not yet in RePEc: Add citation now
  25. Bhatnagar, S. (2010). An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59(12), 760–766.
    Paper not yet in RePEc: Add citation now
  26. Bjork, T., & Murgoci, A. (2010). A general theory of Markovian time inconsistent stochastic control problems. Available at SSRN 1694759.
    Paper not yet in RePEc: Add citation now
  27. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637–654.

  28. Bradtke, S. J., & Barto, A. G. (1996). Linear least‐squares algorithms for temporal difference learning. Machine Learning, 22(1), 33–57.
    Paper not yet in RePEc: Add citation now
  29. Brafman, R. I., & Tennenholtz, M. (2002). R‐max‐a general polynomial time algorithm for near‐optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
    Paper not yet in RePEc: Add citation now
  30. Broadie, M., & Detemple, J. B. (2004). Anniversary article: Option pricing: Valuation models and applications. Management Science, 50(9), 1145–1177.

  31. Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
    Paper not yet in RePEc: Add citation now
  32. Cai, Q., Yang, Z., Lee, J., & Wang, Z. (2019). Neural temporal‐difference learning converges to global optima. In Advances in neural information processing systems.
    Paper not yet in RePEc: Add citation now
  33. Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton University Press.
    Paper not yet in RePEc: Add citation now
  34. Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual k‐armed bandit versus Q‐learning. arXiv preprint arXiv:2007.01623.
    Paper not yet in RePEc: Add citation now
  35. Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27.

  36. Capponi, A., Olafsson, S., & Zariphopoulou, T. (2021). Personalized robo‐advising: Enhancing investment through client interaction. Management Science, 68(4), 2485–2512.
    Paper not yet in RePEc: Add citation now
  37. Carbonneau, A., & Godin, F. (2021). Equal risk pricing of derivatives with deep hedging. Quantitative Finance, 21(4), 593–608.

  38. Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and high‐frequency trading. Cambridge University Press.
    Paper not yet in RePEc: Add citation now
  39. Cartea, Á., Jaimungal, S., & Sánchez‐Betancourt, L. (2021). Deep reinforcement learning for algorithmic trading. Available at SSRN.
    Paper not yet in RePEc: Add citation now
  40. Cayci, S., Satpathi, S., He, N., & Srikant, R. (2021). Sample complexity and overparameterization bounds for projection‐free neural TD learning. arXiv preprint arXiv:2103.01391.
    Paper not yet in RePEc: Add citation now
  41. Cen, S., Cheng, C., Chen, Y., Wei, Y., & Chi, Y. (2020). Fast global convergence of natural policy gradient methods with entropy regularization. arXiv preprint arXiv:2007.06558.
    Paper not yet in RePEc: Add citation now
  42. Chakraborti, A., Toke, I. M., Patriarca, M., & Abergel, F. (2011). Econophysics review: I. empirical facts. Quantitative Finance, 11(7), 991–1012.

  43. Chan, N. T., & Shelton, C. (2001). An electronic market‐maker (Technical Report). MIT.
    Paper not yet in RePEc: Add citation now
  44. Charpentier, A., Elie, R., & Remlinger, C. (2021). Reinforcement learning in economics and finance. Computational Economics, 1–38.
    Paper not yet in RePEc: Add citation now
  45. Chen, J., & Jiang, N. (2019). Information‐theoretic considerations in batch reinforcement learning. In International conference on machine learning (pp. 1042–1051). PMLR.
    Paper not yet in RePEc: Add citation now
  46. Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2019). Learning to optimize under non‐stationarity. In Proceedings of the 22nd international conference on artificial intelligence and statistics (pp. 1079–1087). PMLR.
    Paper not yet in RePEc: Add citation now
  47. Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2020). Reinforcement learning for non‐stationary Markov decision processes: The blessing of (more) optimism. In International conference on machine learning (pp. 1843–1854). PMLR.
    Paper not yet in RePEc: Add citation now
  48. Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2017). Risk‐constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1), 6070–6120.
    Paper not yet in RePEc: Add citation now
  49. Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk‐sensitive and robust decision‐making: A CVaR optimization approach. In NIPS'15 (pp. 1522–1530). MIT Press.
    Paper not yet in RePEc: Add citation now
  50. Coache, A., & Jaimungal, S. (2021). Reinforcement learning with dynamic convex risk measures. arXiv preprint arXiv:2112.13414.
    Paper not yet in RePEc: Add citation now
  51. Cong, L. W., Tang, K., Wang, J., & Zhang, Y. (2021). Alphaportfolio: Direct construction through deep reinforcement learning and interpretable ai. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.
    Paper not yet in RePEc: Add citation now
  52. Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236.

  53. Cont, R., & Kukanov, A. (2017). Optimal order placement in limit order markets. Quantitative Finance, 17(1), 21–39.

  54. Cox, J. C., Ross, S. A., & Rubinstein, M. (1979). Option pricing: A simplified approach. Journal of Financial Economics, 7(3), 229–263.

  55. Dabérius, K., Granat, E., & Karlsson, P. (2019). Deep execution‐value and policy based reinforcement learning for trading and beating market benchmarks. Available at SSRN 3374766.
    Paper not yet in RePEc: Add citation now
  56. Dabney, W., Ostrovski, G., & Barreto, A. (2020). Temporally‐extended ε‐greedy exploration. arXiv preprint arXiv:2006.01782.
    Paper not yet in RePEc: Add citation now
  57. Dai, B., Shaw, A., Li, L., Xiao, L., He, N., Liu, Z., Chen, J., & Song, L. (2018). SBEED: Convergent reinforcement learning with nonlinear function approximation. In International conference on machine learning (pp. 1125–1134). PMLR.
    Paper not yet in RePEc: Add citation now
  58. Dalal, G., Szörényi, B., Thoppe, G., & Mannor, S. (2018). Finite sample analyses for TD(0) with function approximation. In 32th AAAI conference on artificial intelligence.
    Paper not yet in RePEc: Add citation now
  59. Dann, C., & Brunskill, E. (2015). Sample complexity of episodic fixed‐horizon reinforcement learning. In NIPS'15 (pp. 2818–2826). MIT Press.
    Paper not yet in RePEc: Add citation now
  60. Dann, C., Lattimore, T., & Brunskill, E. (2017). Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning. In Proceedings of the 31st international conference on neural information processing systems, NIPS'17 (pp. 5717–5727).
    Paper not yet in RePEc: Add citation now
  61. Dann, C., Mansour, Y., Mohri, M., Sekhari, A., & Sridharan, K. (2022). Guarantees for epsilon‐Greedy reinforcement learning with function approximation. In International conference on machine learning (pp. 4666–4689). PMLR.
    Paper not yet in RePEc: Add citation now
  62. Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–664.
    Paper not yet in RePEc: Add citation now
  63. Ding, D., Wei, X., Yang, Z., Wang, Z., & Jovanovic, M. (2021). Provably efficient safe exploration via primal‐dual policy optimization. In International conference on artificial intelligence and statistics (pp. 3304–3312). PMLR.
    Paper not yet in RePEc: Add citation now
  64. Dixon, M. F., Halperin, I., & Bilokon, P. (2020). Machine learning in finance. Springer.
    Paper not yet in RePEc: Add citation now
  65. Dixon, M., & Halperin, I. (2020). G‐learner and girl: Goal based wealth management with reinforcement learning. arXiv preprint arXiv:2002.10990.

  66. Du, J., Jin, M., Kolm, P. N., Ritter, G., Wang, Y., & Zhang, B. (2020). Deep reinforcement learning for option replication and hedging. The Journal of Financial Data Science, 2(4), 44–57.
    Paper not yet in RePEc: Add citation now
  67. Du, X., Zhai, J., & Lv, K. (2016). Algorithm trading using Q‐learning and recurrent reinforcement learning. Positions, 1, 1.
    Paper not yet in RePEc: Add citation now
  68. Dubrov, B. (2015). Monte Carlo simulation with machine learning for pricing American options and convertible bonds. Available at SSRN 2684523.
    Paper not yet in RePEc: Add citation now
  69. Eriksson, H., & Dimitrakakis, C. (2019). Epistemic risk‐sensitive reinforcement learning. arXiv preprint arXiv:1906.06273.
    Paper not yet in RePEc: Add citation now
  70. Even‐Dar, E., Mansour, Y., & Bartlett, P. (2003). Learning rates for Q‐learning. Journal of Machine Learning Research, 5(1), 1–25.
    Paper not yet in RePEc: Add citation now
  71. Fan, J., Ma, C., & Zhong, Y. (2021). A selective overview of deep learning. Statistical Science, 36, 264–290.
    Paper not yet in RePEc: Add citation now
  72. Fan, J., Wang, Z., Xie, Y., & Yang, Z. (2020). A theoretical analysis of deep Q‐learning. In Learning for dynamics and control (pp. 486–489). PMLR.
    Paper not yet in RePEc: Add citation now
  73. Farahmand, A. M., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2008). Regularized policy iteration. In Advances in neural information processing systems 21 ‐ Proceedings of the 2008 conference (pp. 441–448).
    Paper not yet in RePEc: Add citation now
  74. Fazel, M., Ge, R., Kakade, S. M., & Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning (pp. 1467–1476). PMLR.
    Paper not yet in RePEc: Add citation now
  75. Fei, Y., Yang, Z., Chen, Y., Wang, Z., & Xie, Q. (2020). Risk‐sensitive reinforcement learning: Near‐optimal risk‐sample tradeoff in regret. In NeurIPS.
    Paper not yet in RePEc: Add citation now
  76. Fermanian, J.‐D., Guéant, O., & Rachez, A. (2015). Agents' behavior on multi‐dealer‐to‐client bond trading platforms. CREST, Center for Research in Economics and Statistics.
    Paper not yet in RePEc: Add citation now
  77. Figlewski, S. (1989). Options arbitrage in imperfect markets. The Journal of Finance, 44(5), 1289–1311.

  78. Fischer, T. G. (2018). Reinforcement learning in financial markets—a survey (Technical Report). FAU Discussion Papers in Economics.
    Paper not yet in RePEc: Add citation now
  79. François‐Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., & Pineau, J. (2018). An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 11(3‐4), 219–354.
    Paper not yet in RePEc: Add citation now
  80. François‐Lavet, V., Rabusseau, G., Pineau, J., Ernst, D., & Fonteneau, R. (2019). On overfitting and asymptotic bias in batch reinforcement learning with partial observability. Journal of Artificial Intelligence Research, 65, 1–30.
    Paper not yet in RePEc: Add citation now
  81. Fu, Z., Yang, Z., & Wang, Z. (2021). Single‐timescale actor‐critic provably finds globally optimal policy. In International conference on learning representations.
    Paper not yet in RePEc: Add citation now
  82. Gajane, P., Ortner, R., & Auer, P. (2018). A sliding‐window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. arXiv preprint arXiv:1805.10066.
    Paper not yet in RePEc: Add citation now
  83. Ganchev, K., Nevmyvaka, Y., Kearns, M., & Vaughan, J. W. (2010). Censored exploration and the dark pool problem. Communications of the ACM, 53(5), 99–107.
    Paper not yet in RePEc: Add citation now
  84. Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi‐agent dealer market. arXiv preprint arXiv:1911.05892.

  85. Gao, X., Xu, Z. Q., & Zhou, X. Y. (2020). State‐dependent temperature control for langevin diffusions. arXiv preprint arXiv:2011.07456.
    Paper not yet in RePEc: Add citation now
  86. Gao, Z., Han, Y., Ren, Z., & Zhou, Z. (2019). Batched multi‐armed bandits problem. In Advances in Neural Information Processing Systems (Vol. 32).
    Paper not yet in RePEc: Add citation now
  87. Garcelon, E., Ghavamzadeh, M., Lazaric, A., & Pirotta, M. (2020). Conservative exploration in reinforcement learning. In International conference on artificial intelligence and statistics (pp. 1431–1441). PMLR.
    Paper not yet in RePEc: Add citation now
  88. Gašperov, B., & Kostanjčar, Z. (2021). Market making with signals through deep reinforcement learning. IEEE Access, 9, 61611–61622.
    Paper not yet in RePEc: Add citation now
  89. Geist, M., Scherrer, B., & Pietquin, O. (2019). A theory of regularized Markov decision processes. In International conference on machine learning (pp. 2160–2169). PMLR.
    Paper not yet in RePEc: Add citation now
  90. Giurca, A., & Borovkova, S. (2021). Delta hedging of derivatives using deep reinforcement learning. Available at SSRN 3847272.
    Paper not yet in RePEc: Add citation now
  91. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
    Paper not yet in RePEc: Add citation now
  92. Goodfellow, I., Pouget‐Abadie, J., Mirza, M., Xu, B., Warde‐Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems (Vol. 27).
    Paper not yet in RePEc: Add citation now
  93. Goodfellow, I., Warde‐Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International conference on machine learning (pp. 1319–1327). PMLR.
    Paper not yet in RePEc: Add citation now
  94. Gopalan, A., & Mannor, S. (2015). Thompson sampling for learning parameterized Markov decision processes. In Conference on learning theory (pp. 861–898). PMLR.
    Paper not yet in RePEc: Add citation now
  95. Gordon, G. J. (1996). Stable fitted reinforcement learning. In Advances in neural information processing systems (pp. 1052–1058).
    Paper not yet in RePEc: Add citation now
  96. Grau‐Moya, J., Leibfried, F., & Vrancx, P. (2018). Soft Q‐learning with mutual‐information regularization. In International conference on learning representations.
    Paper not yet in RePEc: Add citation now
  97. Grinold, R. C., & Kahn, R. N. (2000). Active portfolio management. McGraw‐Hill.
    Paper not yet in RePEc: Add citation now
  98. Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q‐learning with model‐based acceleration. In International conference on machine learning (pp. 2829–2838). PMLR.
    Paper not yet in RePEc: Add citation now
  99. Guéant, O., & Manziuk, I. (2019). Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. Applied Mathematical Finance, 26(5), 387–452.

  100. Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2012). Optimal portfolio liquidation with limit orders. SIAM Journal on Financial Mathematics, 3(1), 740–764.

  101. Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2013). Dealing with the inventory risk: A solution to the market making problem. Mathematics and Financial Economics, 7(4), 477–507.

  102. Guilbaud, F., & Pham, H. (2013). Optimal high‐frequency trading with limit and market orders. Quantitative Finance, 13(1), 79–94.
    Paper not yet in RePEc: Add citation now
  103. Guo, X., Hu, A., & Zhang, Y. (2021). Reinforcement learning for linear‐convex models with jumps via stability analysis of feedback controls. arXiv preprint arXiv:2104.09311.
    Paper not yet in RePEc: Add citation now
  104. Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy‐based policies. In International conference on machine learning (pp. 1352–1361). PMLR.
    Paper not yet in RePEc: Add citation now
  105. Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic: Off‐policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861–1870). PMLR.
    Paper not yet in RePEc: Add citation now
  106. Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic algorithms and applications. arXiv preprint arXiv:1812.05905.
    Paper not yet in RePEc: Add citation now
  107. Hakansson, N. H. (1971). Multi‐period mean‐variance analysis: Toward a general theory of portfolio choice. The Journal of Finance, 26(4), 857–884.
    Paper not yet in RePEc: Add citation now
  108. Halperin, I. (2019). The QLBS Q‐learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553.

  109. Halperin, I. (2020). QLBS: Q‐learner in the Black‐Scholes (‐Merton) worlds. The Journal of Derivatives, 28(1), 99–122.
    Paper not yet in RePEc: Add citation now
  110. Hambly, B., Xu, R., & Yang, H. (2021). Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM Journal on Control and Optimization, 59(5), 3359–3391.

  111. Hasselt, H. (2010). Double Q‐learning. In Advances in Neural Information Processing Systems (Vol. 23, pp. 2613–2621.
    Paper not yet in RePEc: Add citation now
  112. Hendricks, D., & Wilcox, D. (2014). A reinforcement learning extension to the Almgren‐Chriss framework for optimal trade execution. In 2014 IEEE Conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 457–464). IEEE.
    Paper not yet in RePEc: Add citation now
  113. Henrotte, P. (1993). Transaction costs and duplication strategies. Graduate School of Business, Stanford University.
    Paper not yet in RePEc: Add citation now
  114. Heston, S. (1993). A closed‐form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6, 327–343.

  115. Huang, N. E., Wu, M.‐L., Qu, W., Long, S. R., & Shen, S. S. (2003). Applications of Hilbert–Huang transform to non‐stationary financial time series analysis. Applied Stochastic Models in Business and Industry, 19(3), 245–268.
    Paper not yet in RePEc: Add citation now
  116. Ian, O., Benjamin, V. R., & Daniel, R. (2013). (More) Efficient reinforcement learning via posterior sampling. In Proceedings of the 26th international conference on neural information processing systems, NIPS'13, (Vol. 2, pp. 3003–3011).
    Paper not yet in RePEc: Add citation now
  117. Jaimungal, S., Pesenti, S. M., Wang, Y. S., & Tatsat, H. (2021). Robust risk‐aware reinforcement learning. Available at SSRN 3910498, 13(1), 213–226.
    Paper not yet in RePEc: Add citation now
  118. Jeong, G., & Kim, H. Y. (2019). Improving financial trading decisions using deep Q‐learning: Predicting the number of shares, action strategies, and transfer learning. Expert Systems with Applications, 117, 125–138.
    Paper not yet in RePEc: Add citation now
  119. Jia, Y., & Zhou, X. Y. (2021). Policy gradient and actor‐critic learning in continuous time and space: Theory and algorithms. arXiv preprint arXiv:2111.11232.
    Paper not yet in RePEc: Add citation now
  120. Jia, Y., & Zhou, X. Y. (2022). Policy evaluation and temporal‐difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research, 23(154), 1–55.

  121. Jiang, J., Kelly, B. T., & Xiu, D. (2020). (Re‐) Imag (in) ing price trends [Research paper]. Chicago Booth.
    Paper not yet in RePEc: Add citation now
  122. Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.

  123. Jin, C., Allen‐Zhu, Z., Bubeck, S., & Jordan, M. I. (2018). Is Q‐learning provably efficient? In Advances in neural information processing systems (Vol. 31).
    Paper not yet in RePEc: Add citation now
  124. Jin, C., Yang, Z., Wang, Z., & Jordan, M. I. (2020). Provably efficient reinforcement learning with linear function approximation. In Conference on learning theory (pp. 2137–2143). PMLR.
    Paper not yet in RePEc: Add citation now
  125. Kakade, S. M. (2001). A natural policy gradient. In Advances in neural information processing systems (Vol. 14).
    Paper not yet in RePEc: Add citation now
  126. Karpe, M., Fang, J., Ma, Z., & Wang, C. (2020). Multi‐agent reinforcement learning in a realistic limit order book market simulation. In Proceedings of the first ACM international conference on AI in finance, ICAIF'20.
    Paper not yet in RePEc: Add citation now
  127. Ke, T. T., Shen, Z.‐J. M., & Villas‐Boas, J. M. (2016). Search for information on multiple products. Management Science, 62(12), 3576–3603.

  128. Kearns, M., & Singh, S. (2002). Near‐optimal reinforcement learning in polynomial time. Machine Learning, 49(2), 209–232.
    Paper not yet in RePEc: Add citation now
  129. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations (ICLR).
    Paper not yet in RePEc: Add citation now
  130. Klöppel, S., & Schweizer, M. (2007). Dynamic indifference valuation via convex risk measures. Mathematical Finance, 17(4), 599–627.

  131. Koenig, S., & Simmons, R. G. (1993). Complexity analysis of real‐time reinforcement learning. In AAAI (pp. 99–107).
    Paper not yet in RePEc: Add citation now
  132. Kolm, P. N., & Ritter, G. (2019). Dynamic replication and hedging: A reinforcement learning approach. The Journal of Financial Data Science, 1(1), 159–171.
    Paper not yet in RePEc: Add citation now
  133. Kolm, P. N., & Ritter, G. (2020). Modern perspectives on reinforcement learning in finance. The Journal of Machine Learning in Finance, 1, 28.
    Paper not yet in RePEc: Add citation now
  134. Konda, V. (2002). Actor‐critic algorithms (PhD thesis). MIT.
    Paper not yet in RePEc: Add citation now
  135. Konda, V. R., & Tsitsiklis, J. N. (2000). Actor‐critic algorithms. In Advances in neural information processing systems (pp. 1008–1014).
    Paper not yet in RePEc: Add citation now
  136. Kühn, C., & Stroh, M. (2010). Optimal portfolios of a small investor in a limit order market: A shadow price approach. Mathematics and Financial Economics, 3(2), 45–72.
    Paper not yet in RePEc: Add citation now
  137. Kumar, H., Koppel, A., & Ribeiro, A. (2019). On the sample complexity of actor‐critic method for reinforcement learning with function approximation. arXiv preprint arXiv:1910.08412.
    Paper not yet in RePEc: Add citation now
  138. Lagoudakis, M. G., & Parr, R. (2003). Least‐squares policy iteration. The Journal of Machine Learning Research, 4, 1107–1149.
    Paper not yet in RePEc: Add citation now
  139. Lakshminarayanan, C., & Szepesvari, C. (2018). Linear stochastic approximation: How far does constant step‐size and iterate averaging go? In International conference on artificial intelligence and statistics (pp. 1347–1355). PMLR.
    Paper not yet in RePEc: Add citation now
  140. Lattimore, T., & Hutter, M. (2012). PAC bounds for discounted MDPs. In International conference on algorithmic learning theory (pp. 320–334). Springer.
    Paper not yet in RePEc: Add citation now
  141. Lattimore, T., Szepesvari, C., & Weisz, G. (2020). Learning with good feature representations in bandits and in RL with a generative model. In International conference on machine learning (pp. 5662–5670). PMLR.
    Paper not yet in RePEc: Add citation now
  142. LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks (Vol. 3361). MIT Press.
    Paper not yet in RePEc: Add citation now
  143. Leland, H. E. (1985). Option pricing and replication with transactions costs. The Journal of Finance, 40(5), 1283–1301.

  144. Li, D., & Ng, W.‐L. (2000). Optimal dynamic portfolio selection: Multiperiod mean‐variance formulation. Mathematical Finance, 10(3), 387–406.
    Paper not yet in RePEc: Add citation now
  145. Li, L. (2009). A unifying framework for computational reinforcement learning theory. Rutgers—The State University of New Jersey—New Brunswick.
    Paper not yet in RePEc: Add citation now
  146. Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR.
    Paper not yet in RePEc: Add citation now
  147. Liang, Z., Chen, H., Zhu, J., Jiang, K., & Li, Y. (2018). Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940.

  148. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In 4th international conference on learning representations (ICLR).
    Paper not yet in RePEc: Add citation now
  149. Lin, L.‐J. (1992). Self‐improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3‐4), 293–321.
    Paper not yet in RePEc: Add citation now
  150. Lin, S., & Beling, P. A. (2020). An end‐to‐end optimal trade execution framework based on proximal policy optimization. In IJCAI (pp. 4548–4554).
    Paper not yet in RePEc: Add citation now
  151. Liu, B., Cai, Q., Yang, Z., & Wang, Z. (2019). Neural trust region/proximal policy optimization attains globally optimal policy. In Advances in neural information processing systems (Vol. 32).
    Paper not yet in RePEc: Add citation now
  152. Liu, X.‐Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C. D., Wang, Z., & Guo, J. (2022). FinRL‐Meta: Market environments and benchmarks for data‐driven financial reinforcement learning. arXiv preprint arXiv:2211.03107.
    Paper not yet in RePEc: Add citation now
  153. Liu, X.‐Y., Yang, H., Gao, J., & Wang, C. D. (2021). FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).

  154. Liu, Y., Zhang, K., Basar, T., & Yin, W. (2020). An improved analysis of (variance‐reduced) policy gradient and natural policy gradient methods. In NeurIPS.
    Paper not yet in RePEc: Add citation now
  155. Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least‐squares approach. The Review of Financial Studies, 14(1), 113–147.

  156. Mao, W., Zhang, K., Zhu, R., Simchi‐Levi, D., & Başar, T. (2020). Model‐free non‐stationary RL: Near‐optimal regret and applications in multi‐agent RL and inventory control. arXiv preprint arXiv:2010.03161.
    Paper not yet in RePEc: Add citation now
  157. Markowitz, H. M. (1952). Portfolio selection. Journal of Finance, 7(1), 77–91.
    Paper not yet in RePEc: Add citation now
  158. massoud Farahmand, A., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2009). Regularized fitted Q‐iteration for planning in continuous‐space Markovian decision problems. In 2009 American control conference (pp. 725–730). IEEE.
    Paper not yet in RePEc: Add citation now
  159. Mei, J., Xiao, C., Szepesvari, C., & Schuurmans, D. (2020). On the global convergence rates of softmax policy gradient methods. In International conference on machine learning (pp. 6820–6829). PMLR.
    Paper not yet in RePEc: Add citation now
  160. Melo, F. S., & Ribeiro, M. I. (2007). Q‐learning with linear function approximation. In International conference on computational learning theory (pp. 308–322). Springer.
    Paper not yet in RePEc: Add citation now
  161. Meng, T. L., & Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4(3), 110.

  162. Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of Economics and Management Science, 4, 141–183.

  163. Merton, R. C., & Samuelson, P. A. (1974). Fallacy of the log‐normal approximation to optimal portfolio decision‐making over many periods. Journal of Financial Economics, 1(1), 67–94.

  164. Mihatsch, O., & Neuneier, R. (2002). Risk‐sensitive reinforcement learning. Machine Learning, 49(2), 267–290.
    Paper not yet in RePEc: Add citation now
  165. Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). PMLR.
    Paper not yet in RePEc: Add citation now
  166. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
    Paper not yet in RePEc: Add citation now
  167. Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17(5‐6), 441–470.
    Paper not yet in RePEc: Add citation now
  168. Mosavi, A., Faghan, Y., Ghamisi, P., Duan, P., Ardabili, S. F., Salwana, E., & Band, S. S. (2020). Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8(10), 1640.

  169. Mossin, J. (1968). Optimal multiperiod portfolio policies. The Journal of Business, 41(2), 215–229.

  170. Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate o (1/k 2). Dokl. Akad. Nauk SSSR, 269, 543–547.
    Paper not yet in RePEc: Add citation now
  171. Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on machine learning (pp. 673–680).
    Paper not yet in RePEc: Add citation now
  172. Ning, B., Ling, F. H. T., & Jaimungal, S. (2018). Double deep Q‐learning for optimal execution. arXiv preprint arXiv:1812.06600.
    Paper not yet in RePEc: Add citation now
  173. Obizhaeva, A. A., & Wang, J. (2013). Optimal trading strategy and supply/demand dynamics. Journal of Financial Markets, 16(1), 1–32.

  174. Ouyang, Y., Gagrani, M., Nayyar, A., & Jain, R. (2017). Learning unknown Markov decision processes: A Thompson sampling approach. In Advances in neural information processing systems (Vol. 30).
    Paper not yet in RePEc: Add citation now
  175. Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
    Paper not yet in RePEc: Add citation now
  176. Papini, M., Binaghi, D., Canonaco, G., Pirotta, M., & Restelli, M. (2018). Stochastic variance‐reduced policy gradient. In International conference on machine learning (pp. 4026–4035). PMLR.
    Paper not yet in RePEc: Add citation now
  177. Park, H., Sim, M. K., & Choi, D. G. (2020). An intelligent financial portfolio trading strategy using deep Q‐learning. Expert Systems with Applications, 158, 113573.
    Paper not yet in RePEc: Add citation now
  178. Patel, Y. (2018). Optimizing market making using multi‐agent reinforcement learning. arXiv preprint arXiv:1812.10252.
    Paper not yet in RePEc: Add citation now
  179. Pedersen, J. L., & Peskir, G. (2017). Optimal mean‐variance portfolio selection. Mathematics and Financial Economics, 11(2), 137–160.
    Paper not yet in RePEc: Add citation now
  180. Pendharkar, P. C., & Cusatis, P. (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications, 103, 1–13.
    Paper not yet in RePEc: Add citation now
  181. Perold, A. F. (1988). The implementation shortfall: Paper versus reality. Journal of Portfolio Management, 14(3), 4–9.
    Paper not yet in RePEc: Add citation now
  182. Pomatto, L., Strack, P., & Tamuz, O. (2018). The cost of information. arXiv preprint arXiv:1812.04211.
    Paper not yet in RePEc: Add citation now
  183. Powell, W. B. (2021). Reinforcement learning and stochastic optimization. John Wiley & Sons.
    Paper not yet in RePEc: Add citation now
  184. Preis, T. (2011). Price‐time priority and pro rata matching in an order book model of financial markets. In Econophysics of order‐driven markets (pp. 65–72). Springer.
    Paper not yet in RePEc: Add citation now
  185. Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons.
    Paper not yet in RePEc: Add citation now
  186. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
    Paper not yet in RePEc: Add citation now
  187. Ren, Z., & Zhou, Z. (2020). Dynamic batch learning in high‐dimensional sparse linear contextual bandits. arXiv preprint arXiv:2008.11918.
    Paper not yet in RePEc: Add citation now
  188. Riedmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In European conference on machine learning (pp. 317–328). Springer.
    Paper not yet in RePEc: Add citation now
  189. Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
    Paper not yet in RePEc: Add citation now
  190. Samuelson, P. A. (1975). Lifetime portfolio selection by dynamic stochastic programming. In Stochastic optimization models in finance (pp. 517–524). Academic Press.
    Paper not yet in RePEc: Add citation now
  191. Sato, Y. (2019). Model‐free reinforcement learning for financial portfolios: A brief survey. arXiv preprint arXiv:1904.04973.
    Paper not yet in RePEc: Add citation now
  192. Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889–1897). PMLR.
    Paper not yet in RePEc: Add citation now
  193. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
    Paper not yet in RePEc: Add citation now
  194. Sewak, M. (2019). Policy‐based reinforcement learning approaches. In Deep reinforcement learning (pp. 127–140). Springer.
    Paper not yet in RePEc: Add citation now
  195. Shani, L., Efroni, Y., & Mannor, S. (2020). Adaptive trust region policy optimization: Global convergence and faster rates for regularized MDPs. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 5668–5675).
    Paper not yet in RePEc: Add citation now
  196. Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39(1), 119–138.
    Paper not yet in RePEc: Add citation now
  197. Shen, Y., Huang, R., Yan, C., & Obermayer, K. (2014). Risk‐averse reinforcement learning for algorithmic trading. In 2014 IEEE conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 391–398). IEEE.
    Paper not yet in RePEc: Add citation now
  198. Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk‐sensitive reinforcement learning. Neural Computation, 26(7), 1298–1328.
    Paper not yet in RePEc: Add citation now
  199. Shen, Z., Ribeiro, A., Hassani, H., Qian, H., & Mi, C. (2019). Hessian aided policy gradient. In International conference on machine learning (pp. 5729–5738). PMLR.
    Paper not yet in RePEc: Add citation now
  200. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395). PMLR.
    Paper not yet in RePEc: Add citation now
  201. Simchi‐Levi, D., & Xu, Y. (2020). Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Available at SSRN 3562765, 47(3), 1904–1931.
    Paper not yet in RePEc: Add citation now
  202. Sortino, F. A., & Price, L. N. (1994). Performance measurement in a downside risk framework. The Journal of Investing, 3(3), 59–64.
    Paper not yet in RePEc: Add citation now
  203. Spooner, T., & Savani, R. (2020). Robust Market Making via Adversarial Reinforcement Learning. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI‐20 (pp. 4590–4596).

  204. Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In International foundation for autonomous agents and multiagent systems, AAMAS'18 (pp. 434–442).

  205. Steinbach, M. (2001). Markowitz revisited: Mean‐variance models in financial portfolio analysis. SIAM Review, 43, 31–85.
    Paper not yet in RePEc: Add citation now
  206. Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model‐based interval estimation. In Proceedings of the 22nd international conference on machine learning ICML'05 (pp. 856–863). Association for Computing Machinery.
    Paper not yet in RePEc: Add citation now
  207. Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10(11), 2413–2444.
    Paper not yet in RePEc: Add citation now
  208. Strotz, R. H. (1955). Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies, 23(3), 165–180.

  209. Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks. In ICML.
    Paper not yet in RePEc: Add citation now
  210. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
    Paper not yet in RePEc: Add citation now
  211. Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057–1063).
    Paper not yet in RePEc: Add citation now
  212. Szita, I., & Szepesvári, C. (2010). Model‐based reinforcement learning with nearly tight exploration complexity bounds. In ICML (pp. 1031–1038).
    Paper not yet in RePEc: Add citation now
  213. Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015). Policy gradient for coherent risk measures. In Advances in neural information processing systems (Vol. 28).
    Paper not yet in RePEc: Add citation now
  214. Tang, W., Zhang, P. Y., & Zhou, X. Y. (2021). Exploratory HJB equations and their convergence. arXiv preprint arXiv:2109.10269.
    Paper not yet in RePEc: Add citation now
  215. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
    Paper not yet in RePEc: Add citation now
  216. Tieleman, T., & Hinton, G. (2012). Lecture 6.5‐rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26–31.
    Paper not yet in RePEc: Add citation now
  217. Torrey, L., & Shavlik, J. (2010). Transfer learning. In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques (pp. 242–264). IGI Global.
    Paper not yet in RePEc: Add citation now
  218. Touati, A., & Vincent, P. (2020). Efficient learning in non‐stationary linear Markov decision processes. arXiv preprint arXiv:2010.12870.
    Paper not yet in RePEc: Add citation now
  219. Vadori, N., Ganesh, S., Reddy, P., & Veloso, M. (2020). Risk‐sensitive reinforcement learning: A martingale approach to reward uncertainty. arXiv preprint arXiv:2006.12686.
    Paper not yet in RePEc: Add citation now
  220. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q‐learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30).
    Paper not yet in RePEc: Add citation now
  221. Vigna, E. (2016). On time consistency for mean‐variance portfolio selection. Collegio Carlo Alberto Notebook, 476.
    Paper not yet in RePEc: Add citation now
  222. Von Luxburg, U., & Schölkopf, B. (2011). Statistical learning theory: Models, concepts, and results. In Handbook of the history of logic, (Vol. 10, pp. 651–706). Elsevier.
    Paper not yet in RePEc: Add citation now
  223. Wang, H. (2019). Large scale continuous‐time mean‐variance portfolio allocation via reinforcement learning. Available at SSRN 3428125.
    Paper not yet in RePEc: Add citation now
  224. Wang, H., & Yu, S. (2021). Robo‐advising: Enhancing investment with inverse optimization and deep reinforcement learning. arXiv preprint arXiv:2105.09264.

  225. Wang, H., & Zhou, X. Y. (2020). Continuous‐time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4), 1273–1308.
    Paper not yet in RePEc: Add citation now
  226. Wang, H., Zariphopoulou, T., & Zhou, X. (2020). Exploration versus exploitation in reinforcement learning: A stochastic control approach. Journal of Machine Learning Research, 21, 1–34.
    Paper not yet in RePEc: Add citation now
  227. Wang, L., Cai, Q., Yang, Z., & Wang, Z. (2020). Neural policy gradient methods: Global optimality and rates of convergence. In International conference on learning representations.
    Paper not yet in RePEc: Add citation now
  228. Wang, Y., Dong, K., Chen, X., & Wang, L. (2020). Q‐learning with UCB exploration is sample efficient for infinite‐horizon MDP. In International conference on learning representations.
    Paper not yet in RePEc: Add citation now
  229. Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & Freitas, N. (2017). Sample efficient actor‐critic with experience replay. In International conference on learning representations (ICLR) (pp. 1–13).
    Paper not yet in RePEc: Add citation now
  230. Wei, C.‐Y., & Luo, H. (2021). Non‐stationary reinforcement learning without prior knowledge: An optimal black‐box approach. In Conference on learning theory (pp. 4300–4354). PMLR.
    Paper not yet in RePEc: Add citation now
  231. Wei, C.‐Y., Jahromi, M. J., Luo, H., Sharma, H., & Jain, R. (2020). Model‐free reinforcement learning in infinite‐horizon average‐reward Markov decision processes. In International conference on machine learning (pp. 10170–10180). PMLR.
    Paper not yet in RePEc: Add citation now
  232. Wei, H., Wang, Y., Mangu, L., & Decker, K. (2019). Model‐based reinforcement learning for predictions and control for limit order books. arXiv preprint arXiv:1910.03743.
    Paper not yet in RePEc: Add citation now
  233. Wiese, M., Knobloch, R., Korn, R., & Kretschmer, P. (2020). Quant GANs: Deep generation of financial time series. Quantitative Finance, 20(9), 1419–1440.

  234. Williams, R. J. (1992). Simple statistical gradient‐following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.
    Paper not yet in RePEc: Add citation now
  235. Wu, Y., Shariff, R., Lattimore, T., & Szepesvári, C. (2016). Conservative bandits. In International conference on machine learning (pp. 1254–1262). PMLR.
    Paper not yet in RePEc: Add citation now
  236. Xiao, H., Zhou, Z., Ren, T., Bai, Y., & Liu, W. (2020). Time‐consistent strategies for multi‐period mean‐variance portfolio optimization with the serially correlated returns. Communications in Statistics‐Theory and Methods, 49(12), 2831–2868.
    Paper not yet in RePEc: Add citation now
  237. Xiong, H., Xu, T., Liang, Y., & Zhang, W. (2021). Non‐asymptotic convergence of Adam‐type reinforcement learning algorithms under Markovian sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10460–10468.
    Paper not yet in RePEc: Add citation now
  238. Xiong, H., Zhao, L., Liang, Y., & Zhang, W. (2020). Finite‐time analysis for double Q‐learning. In Advances in neural information processing systems (Vol. 33).
    Paper not yet in RePEc: Add citation now
  239. Xiong, Z., Liu, X.‐Y., Zhong, S., Yang, H., & Walid, A. (2018). Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522.
    Paper not yet in RePEc: Add citation now
  240. Xu, P., & Gu, Q. (2020). A finite‐time analysis of Q‐learning with neural network function approximation. In International conference on machine learning (pp. 10555–10565). PMLR.
    Paper not yet in RePEc: Add citation now
  241. Xu, P., Gao, F., & Gu, Q. (2020a). An improved convergence analysis of stochastic variance‐reduced policy gradient. In Uncertainty in artificial intelligence (pp. 541–551). PMLR.
    Paper not yet in RePEc: Add citation now
  242. Xu, P., Gao, F., & Gu, Q. (2020b). Sample efficient policy gradient methods with recursive variance reduction. In International conference on learning representations.
    Paper not yet in RePEc: Add citation now
  243. Xu, T., Wang, Z., & Liang, Y. (2020a). Improving sample complexity bounds for (natural) actor‐critic algorithms. In Advances in neural information processing systems (Vol. 33, pp. 4358–4369).
    Paper not yet in RePEc: Add citation now
  244. Xu, T., Wang, Z., & Liang, Y. (2020b). Non‐asymptotic convergence analysis of two time‐scale (natural) actor‐critic algorithms. arXiv preprint arXiv:2005.03557.
    Paper not yet in RePEc: Add citation now
  245. Xu, T., Yang, Z., Wang, Z., & Liang, Y. (2021). Doubly robust off‐policy actor‐critic: Convergence and optimality. arXiv preprint arXiv:2102.11866.
    Paper not yet in RePEc: Add citation now
  246. Yang, H., Liu, X.‐Y., & Wu, Q. (2018). A practical machine learning approach for dynamic stock recommendation. In 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE) (pp. 1693–1697). IEEE.
    Paper not yet in RePEc: Add citation now
  247. Yang, L., & Wang, M. (2019). Sample‐optimal parametric Q‐learning using linearly additive features. In International conference on machine learning (pp. 6995–7004). PMLR.
    Paper not yet in RePEc: Add citation now
  248. Yang, L., & Wang, M. (2020). Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. In International conference on machine learning (pp. 10746–10756). PMLR.
    Paper not yet in RePEc: Add citation now
  249. Yang, R., Sun, X., & Narasimhan, K. (2019). A generalized algorithm for multi‐objective reinforcement learning and policy adaptation. In Advances in neural information processing systems (Vol. 32).
    Paper not yet in RePEc: Add citation now
  250. Ye, Z., Deng, W., Zhou, S., Xu, Y., & Guan, J. (2020). Optimal trade execution based on deep deterministic policy gradient. In Database systems for advanced applications (pp. 638–654). Springer International Publishing.
    Paper not yet in RePEc: Add citation now
  251. Yu, M., & Sun, S. (2020). Policy‐based reinforcement learning for time series anomaly detection. Engineering Applications of Artificial Intelligence, 95, 103919.
    Paper not yet in RePEc: Add citation now
  252. Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., & Dasgupta, S. (2019). Model‐based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740.
    Paper not yet in RePEc: Add citation now
  253. Yu, S., Wang, H., & Dong, C. (2020). Learning risk preferences from investment portfolios using inverse optimization. arXiv preprint arXiv:2010.01687.
    Paper not yet in RePEc: Add citation now
  254. Zhang, G., & Chen, Y. (2020). Reinforcement learning for optimal market making with the presence of rebate. Available at SSRN 3646753.
    Paper not yet in RePEc: Add citation now
  255. Zhang, J., Kim, J., O'Donoghue, B., & Boyd, S. (2021). Sample efficient reinforcement learning with REINFORCE. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 10887–10895).
    Paper not yet in RePEc: Add citation now
  256. Zhang, K., Koppel, A., Zhu, H., & Basar, T. (2020). Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6), 3586–3612.
    Paper not yet in RePEc: Add citation now
  257. Zhang, Z., Zohren, S., & Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science, 2(2), 25–40.
    Paper not yet in RePEc: Add citation now
  258. Zhao, M., & Linetsky, V. (2021). High frequency automated market making algorithms with adverse selection risk control via reinforcement learning. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).
    Paper not yet in RePEc: Add citation now
  259. Zheng, L., & Ratliff, L. (2020). Constrained upper confidence reinforcement learning. In Learning for dynamics and control (pp. 620–629). PMLR.
    Paper not yet in RePEc: Add citation now
  260. Zhou, D., Chen, J., & Gu, Q. (2020). Provable multi‐objective reinforcement learning with generative models. arXiv preprint arXiv:2011.10134.
    Paper not yet in RePEc: Add citation now
  261. Zhou, X. Y., & Li, D. (2000). Continuous‐time mean‐variance portfolio selection: A stochastic LQ framework. Applied Mathematics and Optimization, 42(1), 19–33.
    Paper not yet in RePEc: Add citation now
  262. Zivot, E. (2017). Introduction to computational finance and financial econometrics. Chapman & Hall CRC.
    Paper not yet in RePEc: Add citation now
  263. Zou, S., Xu, T., & Liang, Y. (2019). Finite‐sample analysis for SARSA with linear function approximation. In Advances in neural information processing systems (Vol. 32, pp. 8668–8678).
    Paper not yet in RePEc: Add citation now

Cocites

Documents in RePEc which have cited the same bibliography

  1. Optimal risk-aware interest rates for decentralized lending protocols. (2025). Toke, Ioane Muni ; Challet, Damien ; Baude, Bastien.
    In: Papers.
    RePEc:arx:papers:2502.19862.

    Full description at Econpapers || Download paper

  2. Delegated portfolio management with random default. (2024). Mastrolia, Thibaut ; Gennaro, Alberto.
    In: Papers.
    RePEc:arx:papers:2410.13103.

    Full description at Econpapers || Download paper

  3. Optimal Limit Order Book Trading Strategies with Stochastic Volatility in the Underlying Asset. (2023). Aksoy, Umit ; Uur, Omur ; Aydoan, Burcu.
    In: Computational Economics.
    RePEc:kap:compec:v:62:y:2023:i:1:d:10.1007_s10614-022-10272-4.

    Full description at Econpapers || Download paper

  4. Reinforcement Learning in Economics and Finance. (2023). Elie, Romuald ; Remlinger, Carl ; Charpentier, Arthur.
    In: Computational Economics.
    RePEc:kap:compec:v:62:y:2023:i:1:d:10.1007_s10614-021-10119-4.

    Full description at Econpapers || Download paper

  5. Recent advances in reinforcement learning in finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
    In: Mathematical Finance.
    RePEc:bla:mathfi:v:33:y:2023:i:3:p:437-503.

    Full description at Econpapers || Download paper

  6. Optimal Execution Using Reinforcement Learning. (2023). He, Jiafa ; Zheng, Cong ; Yang, Can.
    In: Papers.
    RePEc:arx:papers:2306.17178.

    Full description at Econpapers || Download paper

  7. Stock Market Prediction via Deep Learning Techniques: A Survey. (2023). Jiao, Yang ; Zhao, Qingying ; Yan, Qingsen ; Liu, Lingqiao ; Zou, Jinan ; Shi, Javen Qinfeng ; Cao, Haiyao ; Abbasnejad, Ehsan.
    In: Papers.
    RePEc:arx:papers:2212.12717.

    Full description at Econpapers || Download paper

  8. Solvability of Differential Riccati Equations and Applications to Algorithmic Trading with Signals. (2023). Drissi, Fayccal.
    In: Papers.
    RePEc:arx:papers:2202.07478.

    Full description at Econpapers || Download paper

  9. Recent Advances in Reinforcement Learning in Finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
    In: Papers.
    RePEc:arx:papers:2112.04553.

    Full description at Econpapers || Download paper

  10. Deep reinforcement learning for the optimal placement of cryptocurrency limit orders. (2022). Schnaubelt, Matthias.
    In: European Journal of Operational Research.
    RePEc:eee:ejores:v:296:y:2022:i:3:p:993-1006.

    Full description at Econpapers || Download paper

  11. FinRL-Meta: Market Environments and Benchmarks for Data-Driven Financial Reinforcement Learning. (2022). Wang, Zhaoran ; Liu, Xiao-Yang ; Zhu, Ming ; Rui, Jingyang ; Gao, Jiechao ; Guo, Jian ; Yang, Hongyang ; Xia, Ziyi.
    In: Papers.
    RePEc:arx:papers:2211.03107.

    Full description at Econpapers || Download paper

  12. Deep Reinforcement Learning for Market Making Under a Hawkes Process-Based Limit Order Book Model. (2022). Kostanjvcar, Zvonko ; Gavsperov, Bruno.
    In: Papers.
    RePEc:arx:papers:2207.09951.

    Full description at Econpapers || Download paper

  13. A mean-field game of market-making against strategic traders. (2022). Bergault, Philippe ; Possamai, Dylan ; Baldacci, Bastien.
    In: Papers.
    RePEc:arx:papers:2203.13053.

    Full description at Econpapers || Download paper

  14. FinRL-Meta: A Universe of Near-Real Market Environments for Data-Driven Deep Reinforcement Learning in Quantitative Finance. (2022). Wang, Zhaoran ; Liu, Xiao-Yang ; Yang, Liuqing ; Rui, Jingyang ; Gao, Jiechao ; Guo, Jian.
    In: Papers.
    RePEc:arx:papers:2112.06753.

    Full description at Econpapers || Download paper

  15. Optimal incentives in a limit order book: a SPDE control approach. (2022). Bergault, Philippe ; Baldacci, Bastien.
    In: Papers.
    RePEc:arx:papers:2112.00375.

    Full description at Econpapers || Download paper

  16. Multi-asset optimal execution and statistical arbitrage strategies under Ornstein-Uhlenbeck dynamics. (2022). Bergault, Philippe ; Gu, Olivier ; Drissi, Fayccal.
    In: Papers.
    RePEc:arx:papers:2103.13773.

    Full description at Econpapers || Download paper

  17. FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance. (2022). Liu, Xiao-Yang ; Zhang, Runjia ; Chen, Qian ; Yang, Liuqing ; Wang, Christina Dan ; Xiao, Bowen.
    In: Papers.
    RePEc:arx:papers:2011.09607.

    Full description at Econpapers || Download paper

  18. Reinforcement Learning Approaches to Optimal Market Making. (2021). Posedel Šimović, Petra ; Kostanjar, Zvonko ; Begui, Stjepan ; Gaperov, Bruno.
    In: Mathematics.
    RePEc:gam:jmathe:v:9:y:2021:i:21:p:2689-:d:662748.

    Full description at Econpapers || Download paper

  19. FinRL: Deep Reinforcement Learning Framework to Automate Trading in Quantitative Finance. (2021). Liu, Xiao-Yang ; Gao, Jiechao ; Yang, Hongyang ; Wang, Christina Dan.
    In: Papers.
    RePEc:arx:papers:2111.09395.

    Full description at Econpapers || Download paper

  20. Deep equal risk pricing of financial derivatives with non-translation invariant risk measures. (2021). Fr'ed'eric Godin, ; Carbonneau, Alexandre.
    In: Papers.
    RePEc:arx:papers:2107.11340.

    Full description at Econpapers || Download paper

  21. Policy Gradient Methods for the Noisy Linear Quadratic Regulator over a Finite Horizon. (2021). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
    In: Papers.
    RePEc:arx:papers:2011.10300.

    Full description at Econpapers || Download paper

  22. Deep reinforcement learning for the optimal placement of cryptocurrency limit orders. (2020). Schnaubelt, Matthias.
    In: FAU Discussion Papers in Economics.
    RePEc:zbw:iwqwdp:052020.

    Full description at Econpapers || Download paper

  23. Optimal trading without optimal control. (2020). Benveniste, Jerome ; Ritter, Gordon ; Baldacci, Bastien.
    In: Papers.
    RePEc:arx:papers:2012.12945.

    Full description at Econpapers || Download paper

  24. Bridging the gap between Markowitz planning and deep reinforcement learning. (2020). Benhamou, Eric ; Ungari, Sandrine ; Saltiel, David ; Mukhopadhyay, Abhishek.
    In: Papers.
    RePEc:arx:papers:2010.09108.

    Full description at Econpapers || Download paper

  25. AAMDRL: Augmented Asset Management with Deep Reinforcement Learning. (2020). Benhamou, Eric ; Ungari, Sandrine ; Atif, Jamal ; Saltiel, David ; Mukhopadhyay, Abhishek.
    In: Papers.
    RePEc:arx:papers:2010.08497.

    Full description at Econpapers || Download paper

  26. Adaptive trading strategies across liquidity pools. (2020). Manziuk, Iuliia ; Baldacci, Bastien.
    In: Papers.
    RePEc:arx:papers:2008.07807.

    Full description at Econpapers || Download paper

  27. Multi-Agent Reinforcement Learning in a Realistic Limit Order Book Market Simulation. (2020). Ma, Zhongyao ; Karpe, Michael ; Fang, Jin ; Wang, Chen.
    In: Papers.
    RePEc:arx:papers:2006.05574.

    Full description at Econpapers || Download paper

  28. Reinforcement Learning in Economics and Finance. (2020). Elie, Romuald ; Remlinger, Carl ; Charpentier, Arthur.
    In: Papers.
    RePEc:arx:papers:2003.10014.

    Full description at Econpapers || Download paper

  29. Optimistic Bull or Pessimistic Bear: Adaptive Deep Reinforcement Learning for Stock Portfolio Allocation. (2019). Zhan, Yuancheng ; Li, Yinchuan ; Liu, Xiao-Yang.
    In: Papers.
    RePEc:arx:papers:1907.01503.

    Full description at Econpapers || Download paper

Coauthors

Authors registered in RePEc who have wrote about the same topic

Report date: 2025-10-02 04:10:37 || Missing content? Let us know

CitEc is a RePEc service, providing citation data for Economics since 2001. Last updated August, 3 2024. Contact: Jose Manuel Barrueco.