Recent advances in reinforcement learning in finance

Hambly, Ben; Yang, Huining; Xu, Renyuan

Abbasi‐Yadkori, Y., Bartlett, P., Bhatia, K., Lazic, N., Szepesvari, C., & Weisz, G. (2019). Politex: Regret bounds for policy iteration using expert prediction. In International conference on machine learning (pp. 3692–3702). PMLR.
Paper not yet in RePEc: Add citation now
Abernethy, J. D., & Kale, S. (2013). Adaptive market making via online learning. In NIPS (pp. 2058–2066). Citeseer.
Paper not yet in RePEc: Add citation now
Aboussalah, A. M. (2020). What is the value of the cross‐sectional approach to deep reinforcement learning? Available at SSRN, 22(6), 1091–1111.
Paper not yet in RePEc: Add citation now
Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In International conference on machine learning (pp. 22–31). PMLR.
Paper not yet in RePEc: Add citation now
Agarwal, A., Bartlett, P., & Dama, M. (2010). Optimal allocation strategies for the dark pool problem. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 9–16). JMLR Workshop and Conference Proceedings.
Paper not yet in RePEc: Add citation now
Agarwal, A., Kakade, S. M., Lee, J. D., & Mahajan, G. (2021). On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98), 1–76.
Paper not yet in RePEc: Add citation now
Agarwal, A., Kakade, S., & Yang, L. F. (2020). Model‐based reinforcement learning with a generative model is minimax optimal. In Conference on learning theory (pp. 67–83). PMLR.
Paper not yet in RePEc: Add citation now
Almgren, R., & Chriss, N. (2001). Optimal execution of portfolio transactions. Journal of Risk, 3, 5–40.
Paper not yet in RePEc: Add citation now
Alsabah, H., Capponi, A., Ruiz Lacedelli, O., & Stern, M. (2021). Robo‐advising: Learning investors' risk preferences via portfolio choices. Journal of Financial Econometrics, 19(2), 369–392.
Paper not yet in RePEc: Add citation now
Asadi, K., & Littman, M. L. (2017). An alternative softmax operator for reinforcement learning. In International conference on machine learning (pp. 243–252). PMLR.
Paper not yet in RePEc: Add citation now
Avellaneda, M., & Stoikov, S. (2008). High‐frequency trading in a limit order book. Quantitative Finance, 8(3), 217–224.
Paper not yet in RePEc: Add citation now
Azar, M. G., Munos, R., & Kappen, B. (2012). On the sample complexity of reinforcement learning with a generative model. arXiv preprint arXiv:1206.6461, 1707–1714.
Paper not yet in RePEc: Add citation now
Azar, M. G., Munos, R., & Kappen, H. J. (2013). Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning, 91(3), 325–349.
Paper not yet in RePEc: Add citation now
Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. In International conference on machine learning (pp. 263–272). PMLR.
Paper not yet in RePEc: Add citation now
Baldacci, B., & Manziuk, I. (2020). Adaptive trading strategies across liquidity pools. arXiv preprint arXiv:2008.07807.
Baldacci, B., Manziuk, I., Mastrolia, T., & Rosenbaum, M. (2019). Market making and incentives design in the presence of a dark pool: A deep reinforcement learning approach. arXiv preprint arXiv:1912.01129.
Bao, W., & Liu, X.‐y. (2019). Multi‐agent deep reinforcement learning for liquidation strategy analysis. arXiv preprint arXiv:1906.11046.
Basak, S., & Chabakauri, G. (2010). Dynamic mean‐variance asset allocation. The Review of Financial Studies, 23(8), 2970–3016.
Paper not yet in RePEc: Add citation now
Basei, M., Guo, X., Hu, A., & Zhang, Y. (2021). Logarithmic regret for episodic continuous‐time linear‐quadratic reinforcement learning over a finite‐time horizon. Available at SSRN 3848428, 23(178), 1–34.
Paper not yet in RePEc: Add citation now
Beck, C. L., & Srikant, R. (2012). Error bounds for constant step‐size Q‐learning. Systems & Control Letters, 61(12), 1203–1208.
Paper not yet in RePEc: Add citation now
Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279.
Paper not yet in RePEc: Add citation now
Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (Monographs on statistics and applied probability) (Vol. 5, pp. 7). Chapman and Hall.
Paper not yet in RePEc: Add citation now
Bhandari, J., & Russo, D. (2019). Global optimality guarantees for policy gradient methods. arXiv preprint arXiv:1906.01786.
Paper not yet in RePEc: Add citation now
Bhandari, J., Russo, D., & Singal, R. (2018). A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory (pp. 1691–1692). PMLR.
Paper not yet in RePEc: Add citation now
Bhatnagar, S. (2010). An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59(12), 760–766.
Paper not yet in RePEc: Add citation now
Bjork, T., & Murgoci, A. (2010). A general theory of Markovian time inconsistent stochastic control problems. Available at SSRN 1694759.
Paper not yet in RePEc: Add citation now
Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637–654.
Bradtke, S. J., & Barto, A. G. (1996). Linear least‐squares algorithms for temporal difference learning. Machine Learning, 22(1), 33–57.
Paper not yet in RePEc: Add citation now
Brafman, R. I., & Tennenholtz, M. (2002). R‐max‐a general polynomial time algorithm for near‐optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
Paper not yet in RePEc: Add citation now
Broadie, M., & Detemple, J. B. (2004). Anniversary article: Option pricing: Valuation models and applications. Management Science, 50(9), 1145–1177.
Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
Paper not yet in RePEc: Add citation now
Cai, Q., Yang, Z., Lee, J., & Wang, Z. (2019). Neural temporal‐difference learning converges to global optima. In Advances in neural information processing systems.
Paper not yet in RePEc: Add citation now
Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton University Press.
Paper not yet in RePEc: Add citation now
Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual k‐armed bandit versus Q‐learning. arXiv preprint arXiv:2007.01623.
Paper not yet in RePEc: Add citation now
Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27.
Capponi, A., Olafsson, S., & Zariphopoulou, T. (2021). Personalized robo‐advising: Enhancing investment through client interaction. Management Science, 68(4), 2485–2512.
Paper not yet in RePEc: Add citation now
Carbonneau, A., & Godin, F. (2021). Equal risk pricing of derivatives with deep hedging. Quantitative Finance, 21(4), 593–608.
Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and high‐frequency trading. Cambridge University Press.
Paper not yet in RePEc: Add citation now
Cartea, Á., Jaimungal, S., & Sánchez‐Betancourt, L. (2021). Deep reinforcement learning for algorithmic trading. Available at SSRN.
Paper not yet in RePEc: Add citation now
Cayci, S., Satpathi, S., He, N., & Srikant, R. (2021). Sample complexity and overparameterization bounds for projection‐free neural TD learning. arXiv preprint arXiv:2103.01391.
Paper not yet in RePEc: Add citation now
Cen, S., Cheng, C., Chen, Y., Wei, Y., & Chi, Y. (2020). Fast global convergence of natural policy gradient methods with entropy regularization. arXiv preprint arXiv:2007.06558.
Paper not yet in RePEc: Add citation now
Chakraborti, A., Toke, I. M., Patriarca, M., & Abergel, F. (2011). Econophysics review: I. empirical facts. Quantitative Finance, 11(7), 991–1012.
Chan, N. T., & Shelton, C. (2001). An electronic market‐maker (Technical Report). MIT.
Paper not yet in RePEc: Add citation now
Charpentier, A., Elie, R., & Remlinger, C. (2021). Reinforcement learning in economics and finance. Computational Economics, 1–38.
Paper not yet in RePEc: Add citation now
Chen, J., & Jiang, N. (2019). Information‐theoretic considerations in batch reinforcement learning. In International conference on machine learning (pp. 1042–1051). PMLR.
Paper not yet in RePEc: Add citation now
Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2019). Learning to optimize under non‐stationarity. In Proceedings of the 22nd international conference on artificial intelligence and statistics (pp. 1079–1087). PMLR.
Paper not yet in RePEc: Add citation now
Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2020). Reinforcement learning for non‐stationary Markov decision processes: The blessing of (more) optimism. In International conference on machine learning (pp. 1843–1854). PMLR.
Paper not yet in RePEc: Add citation now
Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2017). Risk‐constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1), 6070–6120.
Paper not yet in RePEc: Add citation now
Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk‐sensitive and robust decision‐making: A CVaR optimization approach. In NIPS'15 (pp. 1522–1530). MIT Press.
Paper not yet in RePEc: Add citation now
Coache, A., & Jaimungal, S. (2021). Reinforcement learning with dynamic convex risk measures. arXiv preprint arXiv:2112.13414.
Paper not yet in RePEc: Add citation now
Cong, L. W., Tang, K., Wang, J., & Zhang, Y. (2021). Alphaportfolio: Direct construction through deep reinforcement learning and interpretable ai. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.
Paper not yet in RePEc: Add citation now
Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236.
Cont, R., & Kukanov, A. (2017). Optimal order placement in limit order markets. Quantitative Finance, 17(1), 21–39.
Cox, J. C., Ross, S. A., & Rubinstein, M. (1979). Option pricing: A simplified approach. Journal of Financial Economics, 7(3), 229–263.
Dabérius, K., Granat, E., & Karlsson, P. (2019). Deep execution‐value and policy based reinforcement learning for trading and beating market benchmarks. Available at SSRN 3374766.
Paper not yet in RePEc: Add citation now
Dabney, W., Ostrovski, G., & Barreto, A. (2020). Temporally‐extended ε‐greedy exploration. arXiv preprint arXiv:2006.01782.
Paper not yet in RePEc: Add citation now
Dai, B., Shaw, A., Li, L., Xiao, L., He, N., Liu, Z., Chen, J., & Song, L. (2018). SBEED: Convergent reinforcement learning with nonlinear function approximation. In International conference on machine learning (pp. 1125–1134). PMLR.
Paper not yet in RePEc: Add citation now
Dalal, G., Szörényi, B., Thoppe, G., & Mannor, S. (2018). Finite sample analyses for TD(0) with function approximation. In 32th AAAI conference on artificial intelligence.
Paper not yet in RePEc: Add citation now
Dann, C., & Brunskill, E. (2015). Sample complexity of episodic fixed‐horizon reinforcement learning. In NIPS'15 (pp. 2818–2826). MIT Press.
Paper not yet in RePEc: Add citation now
Dann, C., Lattimore, T., & Brunskill, E. (2017). Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning. In Proceedings of the 31st international conference on neural information processing systems, NIPS'17 (pp. 5717–5727).
Paper not yet in RePEc: Add citation now
Dann, C., Mansour, Y., Mohri, M., Sekhari, A., & Sridharan, K. (2022). Guarantees for epsilon‐Greedy reinforcement learning with function approximation. In International conference on machine learning (pp. 4666–4689). PMLR.
Paper not yet in RePEc: Add citation now
Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–664.
Paper not yet in RePEc: Add citation now
Ding, D., Wei, X., Yang, Z., Wang, Z., & Jovanovic, M. (2021). Provably efficient safe exploration via primal‐dual policy optimization. In International conference on artificial intelligence and statistics (pp. 3304–3312). PMLR.
Paper not yet in RePEc: Add citation now
Dixon, M. F., Halperin, I., & Bilokon, P. (2020). Machine learning in finance. Springer.
Paper not yet in RePEc: Add citation now
Dixon, M., & Halperin, I. (2020). G‐learner and girl: Goal based wealth management with reinforcement learning. arXiv preprint arXiv:2002.10990.
Du, J., Jin, M., Kolm, P. N., Ritter, G., Wang, Y., & Zhang, B. (2020). Deep reinforcement learning for option replication and hedging. The Journal of Financial Data Science, 2(4), 44–57.
Paper not yet in RePEc: Add citation now
Du, X., Zhai, J., & Lv, K. (2016). Algorithm trading using Q‐learning and recurrent reinforcement learning. Positions, 1, 1.
Paper not yet in RePEc: Add citation now
Dubrov, B. (2015). Monte Carlo simulation with machine learning for pricing American options and convertible bonds. Available at SSRN 2684523.
Paper not yet in RePEc: Add citation now
Eriksson, H., & Dimitrakakis, C. (2019). Epistemic risk‐sensitive reinforcement learning. arXiv preprint arXiv:1906.06273.
Paper not yet in RePEc: Add citation now
Even‐Dar, E., Mansour, Y., & Bartlett, P. (2003). Learning rates for Q‐learning. Journal of Machine Learning Research, 5(1), 1–25.
Paper not yet in RePEc: Add citation now
Fan, J., Ma, C., & Zhong, Y. (2021). A selective overview of deep learning. Statistical Science, 36, 264–290.
Paper not yet in RePEc: Add citation now
Fan, J., Wang, Z., Xie, Y., & Yang, Z. (2020). A theoretical analysis of deep Q‐learning. In Learning for dynamics and control (pp. 486–489). PMLR.
Paper not yet in RePEc: Add citation now
Farahmand, A. M., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2008). Regularized policy iteration. In Advances in neural information processing systems 21 ‐ Proceedings of the 2008 conference (pp. 441–448).
Paper not yet in RePEc: Add citation now
Fazel, M., Ge, R., Kakade, S. M., & Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning (pp. 1467–1476). PMLR.
Paper not yet in RePEc: Add citation now
Fei, Y., Yang, Z., Chen, Y., Wang, Z., & Xie, Q. (2020). Risk‐sensitive reinforcement learning: Near‐optimal risk‐sample tradeoff in regret. In NeurIPS.
Paper not yet in RePEc: Add citation now
Fermanian, J.‐D., Guéant, O., & Rachez, A. (2015). Agents' behavior on multi‐dealer‐to‐client bond trading platforms. CREST, Center for Research in Economics and Statistics.
Paper not yet in RePEc: Add citation now
Figlewski, S. (1989). Options arbitrage in imperfect markets. The Journal of Finance, 44(5), 1289–1311.
Fischer, T. G. (2018). Reinforcement learning in financial markets—a survey (Technical Report). FAU Discussion Papers in Economics.
Paper not yet in RePEc: Add citation now
François‐Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., & Pineau, J. (2018). An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 11(3‐4), 219–354.
Paper not yet in RePEc: Add citation now
François‐Lavet, V., Rabusseau, G., Pineau, J., Ernst, D., & Fonteneau, R. (2019). On overfitting and asymptotic bias in batch reinforcement learning with partial observability. Journal of Artificial Intelligence Research, 65, 1–30.
Paper not yet in RePEc: Add citation now
Fu, Z., Yang, Z., & Wang, Z. (2021). Single‐timescale actor‐critic provably finds globally optimal policy. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
Gajane, P., Ortner, R., & Auer, P. (2018). A sliding‐window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. arXiv preprint arXiv:1805.10066.
Paper not yet in RePEc: Add citation now
Ganchev, K., Nevmyvaka, Y., Kearns, M., & Vaughan, J. W. (2010). Censored exploration and the dark pool problem. Communications of the ACM, 53(5), 99–107.
Paper not yet in RePEc: Add citation now
Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi‐agent dealer market. arXiv preprint arXiv:1911.05892.
Gao, X., Xu, Z. Q., & Zhou, X. Y. (2020). State‐dependent temperature control for langevin diffusions. arXiv preprint arXiv:2011.07456.
Paper not yet in RePEc: Add citation now
Gao, Z., Han, Y., Ren, Z., & Zhou, Z. (2019). Batched multi‐armed bandits problem. In Advances in Neural Information Processing Systems (Vol. 32).
Paper not yet in RePEc: Add citation now
Garcelon, E., Ghavamzadeh, M., Lazaric, A., & Pirotta, M. (2020). Conservative exploration in reinforcement learning. In International conference on artificial intelligence and statistics (pp. 1431–1441). PMLR.
Paper not yet in RePEc: Add citation now
Gašperov, B., & Kostanjčar, Z. (2021). Market making with signals through deep reinforcement learning. IEEE Access, 9, 61611–61622.
Paper not yet in RePEc: Add citation now
Geist, M., Scherrer, B., & Pietquin, O. (2019). A theory of regularized Markov decision processes. In International conference on machine learning (pp. 2160–2169). PMLR.
Paper not yet in RePEc: Add citation now
Giurca, A., & Borovkova, S. (2021). Delta hedging of derivatives using deep reinforcement learning. Available at SSRN 3847272.
Paper not yet in RePEc: Add citation now
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Paper not yet in RePEc: Add citation now
Goodfellow, I., Pouget‐Abadie, J., Mirza, M., Xu, B., Warde‐Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems (Vol. 27).
Paper not yet in RePEc: Add citation now
Goodfellow, I., Warde‐Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International conference on machine learning (pp. 1319–1327). PMLR.
Paper not yet in RePEc: Add citation now
Gopalan, A., & Mannor, S. (2015). Thompson sampling for learning parameterized Markov decision processes. In Conference on learning theory (pp. 861–898). PMLR.
Paper not yet in RePEc: Add citation now
Gordon, G. J. (1996). Stable fitted reinforcement learning. In Advances in neural information processing systems (pp. 1052–1058).
Paper not yet in RePEc: Add citation now
Grau‐Moya, J., Leibfried, F., & Vrancx, P. (2018). Soft Q‐learning with mutual‐information regularization. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
Grinold, R. C., & Kahn, R. N. (2000). Active portfolio management. McGraw‐Hill.
Paper not yet in RePEc: Add citation now
Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q‐learning with model‐based acceleration. In International conference on machine learning (pp. 2829–2838). PMLR.
Paper not yet in RePEc: Add citation now
Guéant, O., & Manziuk, I. (2019). Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. Applied Mathematical Finance, 26(5), 387–452.
Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2012). Optimal portfolio liquidation with limit orders. SIAM Journal on Financial Mathematics, 3(1), 740–764.
Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2013). Dealing with the inventory risk: A solution to the market making problem. Mathematics and Financial Economics, 7(4), 477–507.
Guilbaud, F., & Pham, H. (2013). Optimal high‐frequency trading with limit and market orders. Quantitative Finance, 13(1), 79–94.
Paper not yet in RePEc: Add citation now
Guo, X., Hu, A., & Zhang, Y. (2021). Reinforcement learning for linear‐convex models with jumps via stability analysis of feedback controls. arXiv preprint arXiv:2104.09311.
Paper not yet in RePEc: Add citation now
Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy‐based policies. In International conference on machine learning (pp. 1352–1361). PMLR.
Paper not yet in RePEc: Add citation now
Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic: Off‐policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861–1870). PMLR.
Paper not yet in RePEc: Add citation now
Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic algorithms and applications. arXiv preprint arXiv:1812.05905.
Paper not yet in RePEc: Add citation now
Hakansson, N. H. (1971). Multi‐period mean‐variance analysis: Toward a general theory of portfolio choice. The Journal of Finance, 26(4), 857–884.
Paper not yet in RePEc: Add citation now
Halperin, I. (2019). The QLBS Q‐learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553.
Halperin, I. (2020). QLBS: Q‐learner in the Black‐Scholes (‐Merton) worlds. The Journal of Derivatives, 28(1), 99–122.
Paper not yet in RePEc: Add citation now
Hambly, B., Xu, R., & Yang, H. (2021). Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM Journal on Control and Optimization, 59(5), 3359–3391.
Hasselt, H. (2010). Double Q‐learning. In Advances in Neural Information Processing Systems (Vol. 23, pp. 2613–2621.
Paper not yet in RePEc: Add citation now
Hendricks, D., & Wilcox, D. (2014). A reinforcement learning extension to the Almgren‐Chriss framework for optimal trade execution. In 2014 IEEE Conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 457–464). IEEE.
Paper not yet in RePEc: Add citation now
Henrotte, P. (1993). Transaction costs and duplication strategies. Graduate School of Business, Stanford University.
Paper not yet in RePEc: Add citation now
Heston, S. (1993). A closed‐form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6, 327–343.
Huang, N. E., Wu, M.‐L., Qu, W., Long, S. R., & Shen, S. S. (2003). Applications of Hilbert–Huang transform to non‐stationary financial time series analysis. Applied Stochastic Models in Business and Industry, 19(3), 245–268.
Paper not yet in RePEc: Add citation now
Ian, O., Benjamin, V. R., & Daniel, R. (2013). (More) Efficient reinforcement learning via posterior sampling. In Proceedings of the 26th international conference on neural information processing systems, NIPS'13, (Vol. 2, pp. 3003–3011).
Paper not yet in RePEc: Add citation now
Jaimungal, S., Pesenti, S. M., Wang, Y. S., & Tatsat, H. (2021). Robust risk‐aware reinforcement learning. Available at SSRN 3910498, 13(1), 213–226.
Paper not yet in RePEc: Add citation now
Jeong, G., & Kim, H. Y. (2019). Improving financial trading decisions using deep Q‐learning: Predicting the number of shares, action strategies, and transfer learning. Expert Systems with Applications, 117, 125–138.
Paper not yet in RePEc: Add citation now
Jia, Y., & Zhou, X. Y. (2021). Policy gradient and actor‐critic learning in continuous time and space: Theory and algorithms. arXiv preprint arXiv:2111.11232.
Paper not yet in RePEc: Add citation now
Jia, Y., & Zhou, X. Y. (2022). Policy evaluation and temporal‐difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research, 23(154), 1–55.
Jiang, J., Kelly, B. T., & Xiu, D. (2020). (Re‐) Imag (in) ing price trends [Research paper]. Chicago Booth.
Paper not yet in RePEc: Add citation now
Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
Jin, C., Allen‐Zhu, Z., Bubeck, S., & Jordan, M. I. (2018). Is Q‐learning provably efficient? In Advances in neural information processing systems (Vol. 31).
Paper not yet in RePEc: Add citation now
Jin, C., Yang, Z., Wang, Z., & Jordan, M. I. (2020). Provably efficient reinforcement learning with linear function approximation. In Conference on learning theory (pp. 2137–2143). PMLR.
Paper not yet in RePEc: Add citation now
Kakade, S. M. (2001). A natural policy gradient. In Advances in neural information processing systems (Vol. 14).
Paper not yet in RePEc: Add citation now
Karpe, M., Fang, J., Ma, Z., & Wang, C. (2020). Multi‐agent reinforcement learning in a realistic limit order book market simulation. In Proceedings of the first ACM international conference on AI in finance, ICAIF'20.
Paper not yet in RePEc: Add citation now
Ke, T. T., Shen, Z.‐J. M., & Villas‐Boas, J. M. (2016). Search for information on multiple products. Management Science, 62(12), 3576–3603.
Kearns, M., & Singh, S. (2002). Near‐optimal reinforcement learning in polynomial time. Machine Learning, 49(2), 209–232.
Paper not yet in RePEc: Add citation now
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations (ICLR).
Paper not yet in RePEc: Add citation now
Klöppel, S., & Schweizer, M. (2007). Dynamic indifference valuation via convex risk measures. Mathematical Finance, 17(4), 599–627.
Koenig, S., & Simmons, R. G. (1993). Complexity analysis of real‐time reinforcement learning. In AAAI (pp. 99–107).
Paper not yet in RePEc: Add citation now
Kolm, P. N., & Ritter, G. (2019). Dynamic replication and hedging: A reinforcement learning approach. The Journal of Financial Data Science, 1(1), 159–171.
Paper not yet in RePEc: Add citation now
Kolm, P. N., & Ritter, G. (2020). Modern perspectives on reinforcement learning in finance. The Journal of Machine Learning in Finance, 1, 28.
Paper not yet in RePEc: Add citation now
Konda, V. (2002). Actor‐critic algorithms (PhD thesis). MIT.
Paper not yet in RePEc: Add citation now
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor‐critic algorithms. In Advances in neural information processing systems (pp. 1008–1014).
Paper not yet in RePEc: Add citation now
Kühn, C., & Stroh, M. (2010). Optimal portfolios of a small investor in a limit order market: A shadow price approach. Mathematics and Financial Economics, 3(2), 45–72.
Paper not yet in RePEc: Add citation now
Kumar, H., Koppel, A., & Ribeiro, A. (2019). On the sample complexity of actor‐critic method for reinforcement learning with function approximation. arXiv preprint arXiv:1910.08412.
Paper not yet in RePEc: Add citation now
Lagoudakis, M. G., & Parr, R. (2003). Least‐squares policy iteration. The Journal of Machine Learning Research, 4, 1107–1149.
Paper not yet in RePEc: Add citation now
Lakshminarayanan, C., & Szepesvari, C. (2018). Linear stochastic approximation: How far does constant step‐size and iterate averaging go? In International conference on artificial intelligence and statistics (pp. 1347–1355). PMLR.
Paper not yet in RePEc: Add citation now
Lattimore, T., & Hutter, M. (2012). PAC bounds for discounted MDPs. In International conference on algorithmic learning theory (pp. 320–334). Springer.
Paper not yet in RePEc: Add citation now
Lattimore, T., Szepesvari, C., & Weisz, G. (2020). Learning with good feature representations in bandits and in RL with a generative model. In International conference on machine learning (pp. 5662–5670). PMLR.
Paper not yet in RePEc: Add citation now
LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks (Vol. 3361). MIT Press.
Paper not yet in RePEc: Add citation now
Leland, H. E. (1985). Option pricing and replication with transactions costs. The Journal of Finance, 40(5), 1283–1301.
Li, D., & Ng, W.‐L. (2000). Optimal dynamic portfolio selection: Multiperiod mean‐variance formulation. Mathematical Finance, 10(3), 387–406.
Paper not yet in RePEc: Add citation now
Li, L. (2009). A unifying framework for computational reinforcement learning theory. Rutgers—The State University of New Jersey—New Brunswick.
Paper not yet in RePEc: Add citation now
Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR.
Paper not yet in RePEc: Add citation now
Liang, Z., Chen, H., Zhu, J., Jiang, K., & Li, Y. (2018). Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In 4th international conference on learning representations (ICLR).
Paper not yet in RePEc: Add citation now
Lin, L.‐J. (1992). Self‐improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3‐4), 293–321.
Paper not yet in RePEc: Add citation now
Lin, S., & Beling, P. A. (2020). An end‐to‐end optimal trade execution framework based on proximal policy optimization. In IJCAI (pp. 4548–4554).
Paper not yet in RePEc: Add citation now
Liu, B., Cai, Q., Yang, Z., & Wang, Z. (2019). Neural trust region/proximal policy optimization attains globally optimal policy. In Advances in neural information processing systems (Vol. 32).
Paper not yet in RePEc: Add citation now
Liu, X.‐Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C. D., Wang, Z., & Guo, J. (2022). FinRL‐Meta: Market environments and benchmarks for data‐driven financial reinforcement learning. arXiv preprint arXiv:2211.03107.
Paper not yet in RePEc: Add citation now
Liu, X.‐Y., Yang, H., Gao, J., & Wang, C. D. (2021). FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).
Liu, Y., Zhang, K., Basar, T., & Yin, W. (2020). An improved analysis of (variance‐reduced) policy gradient and natural policy gradient methods. In NeurIPS.
Paper not yet in RePEc: Add citation now
Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least‐squares approach. The Review of Financial Studies, 14(1), 113–147.
Mao, W., Zhang, K., Zhu, R., Simchi‐Levi, D., & Başar, T. (2020). Model‐free non‐stationary RL: Near‐optimal regret and applications in multi‐agent RL and inventory control. arXiv preprint arXiv:2010.03161.
Paper not yet in RePEc: Add citation now
Markowitz, H. M. (1952). Portfolio selection. Journal of Finance, 7(1), 77–91.
Paper not yet in RePEc: Add citation now
massoud Farahmand, A., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2009). Regularized fitted Q‐iteration for planning in continuous‐space Markovian decision problems. In 2009 American control conference (pp. 725–730). IEEE.
Paper not yet in RePEc: Add citation now
Mei, J., Xiao, C., Szepesvari, C., & Schuurmans, D. (2020). On the global convergence rates of softmax policy gradient methods. In International conference on machine learning (pp. 6820–6829). PMLR.
Paper not yet in RePEc: Add citation now
Melo, F. S., & Ribeiro, M. I. (2007). Q‐learning with linear function approximation. In International conference on computational learning theory (pp. 308–322). Springer.
Paper not yet in RePEc: Add citation now
Meng, T. L., & Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4(3), 110.
Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of Economics and Management Science, 4, 141–183.
Merton, R. C., & Samuelson, P. A. (1974). Fallacy of the log‐normal approximation to optimal portfolio decision‐making over many periods. Journal of Financial Economics, 1(1), 67–94.
Mihatsch, O., & Neuneier, R. (2002). Risk‐sensitive reinforcement learning. Machine Learning, 49(2), 267–290.
Paper not yet in RePEc: Add citation now
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). PMLR.
Paper not yet in RePEc: Add citation now
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Paper not yet in RePEc: Add citation now
Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17(5‐6), 441–470.
Paper not yet in RePEc: Add citation now
Mosavi, A., Faghan, Y., Ghamisi, P., Duan, P., Ardabili, S. F., Salwana, E., & Band, S. S. (2020). Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8(10), 1640.
Mossin, J. (1968). Optimal multiperiod portfolio policies. The Journal of Business, 41(2), 215–229.
Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate o (1/k 2). Dokl. Akad. Nauk SSSR, 269, 543–547.
Paper not yet in RePEc: Add citation now
Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on machine learning (pp. 673–680).
Paper not yet in RePEc: Add citation now
Ning, B., Ling, F. H. T., & Jaimungal, S. (2018). Double deep Q‐learning for optimal execution. arXiv preprint arXiv:1812.06600.
Paper not yet in RePEc: Add citation now
Obizhaeva, A. A., & Wang, J. (2013). Optimal trading strategy and supply/demand dynamics. Journal of Financial Markets, 16(1), 1–32.
Ouyang, Y., Gagrani, M., Nayyar, A., & Jain, R. (2017). Learning unknown Markov decision processes: A Thompson sampling approach. In Advances in neural information processing systems (Vol. 30).
Paper not yet in RePEc: Add citation now
Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Paper not yet in RePEc: Add citation now
Papini, M., Binaghi, D., Canonaco, G., Pirotta, M., & Restelli, M. (2018). Stochastic variance‐reduced policy gradient. In International conference on machine learning (pp. 4026–4035). PMLR.
Paper not yet in RePEc: Add citation now
Park, H., Sim, M. K., & Choi, D. G. (2020). An intelligent financial portfolio trading strategy using deep Q‐learning. Expert Systems with Applications, 158, 113573.
Paper not yet in RePEc: Add citation now
Patel, Y. (2018). Optimizing market making using multi‐agent reinforcement learning. arXiv preprint arXiv:1812.10252.
Paper not yet in RePEc: Add citation now
Pedersen, J. L., & Peskir, G. (2017). Optimal mean‐variance portfolio selection. Mathematics and Financial Economics, 11(2), 137–160.
Paper not yet in RePEc: Add citation now
Pendharkar, P. C., & Cusatis, P. (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications, 103, 1–13.
Paper not yet in RePEc: Add citation now
Perold, A. F. (1988). The implementation shortfall: Paper versus reality. Journal of Portfolio Management, 14(3), 4–9.
Paper not yet in RePEc: Add citation now
Pomatto, L., Strack, P., & Tamuz, O. (2018). The cost of information. arXiv preprint arXiv:1812.04211.
Paper not yet in RePEc: Add citation now
Powell, W. B. (2021). Reinforcement learning and stochastic optimization. John Wiley & Sons.
Paper not yet in RePEc: Add citation now
Preis, T. (2011). Price‐time priority and pro rata matching in an order book model of financial markets. In Econophysics of order‐driven markets (pp. 65–72). Springer.
Paper not yet in RePEc: Add citation now
Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons.
Paper not yet in RePEc: Add citation now
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
Paper not yet in RePEc: Add citation now
Ren, Z., & Zhou, Z. (2020). Dynamic batch learning in high‐dimensional sparse linear contextual bandits. arXiv preprint arXiv:2008.11918.
Paper not yet in RePEc: Add citation now
Riedmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In European conference on machine learning (pp. 317–328). Springer.
Paper not yet in RePEc: Add citation now
Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Paper not yet in RePEc: Add citation now
Samuelson, P. A. (1975). Lifetime portfolio selection by dynamic stochastic programming. In Stochastic optimization models in finance (pp. 517–524). Academic Press.
Paper not yet in RePEc: Add citation now
Sato, Y. (2019). Model‐free reinforcement learning for financial portfolios: A brief survey. arXiv preprint arXiv:1904.04973.
Paper not yet in RePEc: Add citation now
Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889–1897). PMLR.
Paper not yet in RePEc: Add citation now
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Paper not yet in RePEc: Add citation now
Sewak, M. (2019). Policy‐based reinforcement learning approaches. In Deep reinforcement learning (pp. 127–140). Springer.
Paper not yet in RePEc: Add citation now
Shani, L., Efroni, Y., & Mannor, S. (2020). Adaptive trust region policy optimization: Global convergence and faster rates for regularized MDPs. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 5668–5675).
Paper not yet in RePEc: Add citation now
Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39(1), 119–138.
Paper not yet in RePEc: Add citation now
Shen, Y., Huang, R., Yan, C., & Obermayer, K. (2014). Risk‐averse reinforcement learning for algorithmic trading. In 2014 IEEE conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 391–398). IEEE.
Paper not yet in RePEc: Add citation now
Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk‐sensitive reinforcement learning. Neural Computation, 26(7), 1298–1328.
Paper not yet in RePEc: Add citation now
Shen, Z., Ribeiro, A., Hassani, H., Qian, H., & Mi, C. (2019). Hessian aided policy gradient. In International conference on machine learning (pp. 5729–5738). PMLR.
Paper not yet in RePEc: Add citation now
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395). PMLR.
Paper not yet in RePEc: Add citation now
Simchi‐Levi, D., & Xu, Y. (2020). Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Available at SSRN 3562765, 47(3), 1904–1931.
Paper not yet in RePEc: Add citation now
Sortino, F. A., & Price, L. N. (1994). Performance measurement in a downside risk framework. The Journal of Investing, 3(3), 59–64.
Paper not yet in RePEc: Add citation now
Spooner, T., & Savani, R. (2020). Robust Market Making via Adversarial Reinforcement Learning. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI‐20 (pp. 4590–4596).
Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In International foundation for autonomous agents and multiagent systems, AAMAS'18 (pp. 434–442).
Steinbach, M. (2001). Markowitz revisited: Mean‐variance models in financial portfolio analysis. SIAM Review, 43, 31–85.
Paper not yet in RePEc: Add citation now
Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model‐based interval estimation. In Proceedings of the 22nd international conference on machine learning ICML'05 (pp. 856–863). Association for Computing Machinery.
Paper not yet in RePEc: Add citation now
Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10(11), 2413–2444.
Paper not yet in RePEc: Add citation now
Strotz, R. H. (1955). Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies, 23(3), 165–180.
Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks. In ICML.
Paper not yet in RePEc: Add citation now
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Paper not yet in RePEc: Add citation now
Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057–1063).
Paper not yet in RePEc: Add citation now
Szita, I., & Szepesvári, C. (2010). Model‐based reinforcement learning with nearly tight exploration complexity bounds. In ICML (pp. 1031–1038).
Paper not yet in RePEc: Add citation now
Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015). Policy gradient for coherent risk measures. In Advances in neural information processing systems (Vol. 28).
Paper not yet in RePEc: Add citation now
Tang, W., Zhang, P. Y., & Zhou, X. Y. (2021). Exploratory HJB equations and their convergence. arXiv preprint arXiv:2109.10269.
Paper not yet in RePEc: Add citation now
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
Paper not yet in RePEc: Add citation now
Tieleman, T., & Hinton, G. (2012). Lecture 6.5‐rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26–31.
Paper not yet in RePEc: Add citation now
Torrey, L., & Shavlik, J. (2010). Transfer learning. In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques (pp. 242–264). IGI Global.
Paper not yet in RePEc: Add citation now
Touati, A., & Vincent, P. (2020). Efficient learning in non‐stationary linear Markov decision processes. arXiv preprint arXiv:2010.12870.
Paper not yet in RePEc: Add citation now
Vadori, N., Ganesh, S., Reddy, P., & Veloso, M. (2020). Risk‐sensitive reinforcement learning: A martingale approach to reward uncertainty. arXiv preprint arXiv:2006.12686.
Paper not yet in RePEc: Add citation now
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q‐learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30).
Paper not yet in RePEc: Add citation now
Vigna, E. (2016). On time consistency for mean‐variance portfolio selection. Collegio Carlo Alberto Notebook, 476.
Paper not yet in RePEc: Add citation now
Von Luxburg, U., & Schölkopf, B. (2011). Statistical learning theory: Models, concepts, and results. In Handbook of the history of logic, (Vol. 10, pp. 651–706). Elsevier.
Paper not yet in RePEc: Add citation now
Wang, H. (2019). Large scale continuous‐time mean‐variance portfolio allocation via reinforcement learning. Available at SSRN 3428125.
Paper not yet in RePEc: Add citation now
Wang, H., & Yu, S. (2021). Robo‐advising: Enhancing investment with inverse optimization and deep reinforcement learning. arXiv preprint arXiv:2105.09264.
Wang, H., & Zhou, X. Y. (2020). Continuous‐time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4), 1273–1308.
Paper not yet in RePEc: Add citation now
Wang, H., Zariphopoulou, T., & Zhou, X. (2020). Exploration versus exploitation in reinforcement learning: A stochastic control approach. Journal of Machine Learning Research, 21, 1–34.
Paper not yet in RePEc: Add citation now
Wang, L., Cai, Q., Yang, Z., & Wang, Z. (2020). Neural policy gradient methods: Global optimality and rates of convergence. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
Wang, Y., Dong, K., Chen, X., & Wang, L. (2020). Q‐learning with UCB exploration is sample efficient for infinite‐horizon MDP. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & Freitas, N. (2017). Sample efficient actor‐critic with experience replay. In International conference on learning representations (ICLR) (pp. 1–13).
Paper not yet in RePEc: Add citation now
Wei, C.‐Y., & Luo, H. (2021). Non‐stationary reinforcement learning without prior knowledge: An optimal black‐box approach. In Conference on learning theory (pp. 4300–4354). PMLR.
Paper not yet in RePEc: Add citation now
Wei, C.‐Y., Jahromi, M. J., Luo, H., Sharma, H., & Jain, R. (2020). Model‐free reinforcement learning in infinite‐horizon average‐reward Markov decision processes. In International conference on machine learning (pp. 10170–10180). PMLR.
Paper not yet in RePEc: Add citation now
Wei, H., Wang, Y., Mangu, L., & Decker, K. (2019). Model‐based reinforcement learning for predictions and control for limit order books. arXiv preprint arXiv:1910.03743.
Paper not yet in RePEc: Add citation now
Wiese, M., Knobloch, R., Korn, R., & Kretschmer, P. (2020). Quant GANs: Deep generation of financial time series. Quantitative Finance, 20(9), 1419–1440.
Williams, R. J. (1992). Simple statistical gradient‐following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.
Paper not yet in RePEc: Add citation now
Wu, Y., Shariff, R., Lattimore, T., & Szepesvári, C. (2016). Conservative bandits. In International conference on machine learning (pp. 1254–1262). PMLR.
Paper not yet in RePEc: Add citation now
Xiao, H., Zhou, Z., Ren, T., Bai, Y., & Liu, W. (2020). Time‐consistent strategies for multi‐period mean‐variance portfolio optimization with the serially correlated returns. Communications in Statistics‐Theory and Methods, 49(12), 2831–2868.
Paper not yet in RePEc: Add citation now
Xiong, H., Xu, T., Liang, Y., & Zhang, W. (2021). Non‐asymptotic convergence of Adam‐type reinforcement learning algorithms under Markovian sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10460–10468.
Paper not yet in RePEc: Add citation now
Xiong, H., Zhao, L., Liang, Y., & Zhang, W. (2020). Finite‐time analysis for double Q‐learning. In Advances in neural information processing systems (Vol. 33).
Paper not yet in RePEc: Add citation now
Xiong, Z., Liu, X.‐Y., Zhong, S., Yang, H., & Walid, A. (2018). Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522.
Paper not yet in RePEc: Add citation now
Xu, P., & Gu, Q. (2020). A finite‐time analysis of Q‐learning with neural network function approximation. In International conference on machine learning (pp. 10555–10565). PMLR.
Paper not yet in RePEc: Add citation now
Xu, P., Gao, F., & Gu, Q. (2020a). An improved convergence analysis of stochastic variance‐reduced policy gradient. In Uncertainty in artificial intelligence (pp. 541–551). PMLR.
Paper not yet in RePEc: Add citation now
Xu, P., Gao, F., & Gu, Q. (2020b). Sample efficient policy gradient methods with recursive variance reduction. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
Xu, T., Wang, Z., & Liang, Y. (2020a). Improving sample complexity bounds for (natural) actor‐critic algorithms. In Advances in neural information processing systems (Vol. 33, pp. 4358–4369).
Paper not yet in RePEc: Add citation now
Xu, T., Wang, Z., & Liang, Y. (2020b). Non‐asymptotic convergence analysis of two time‐scale (natural) actor‐critic algorithms. arXiv preprint arXiv:2005.03557.
Paper not yet in RePEc: Add citation now
Xu, T., Yang, Z., Wang, Z., & Liang, Y. (2021). Doubly robust off‐policy actor‐critic: Convergence and optimality. arXiv preprint arXiv:2102.11866.
Paper not yet in RePEc: Add citation now
Yang, H., Liu, X.‐Y., & Wu, Q. (2018). A practical machine learning approach for dynamic stock recommendation. In 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE) (pp. 1693–1697). IEEE.
Paper not yet in RePEc: Add citation now
Yang, L., & Wang, M. (2019). Sample‐optimal parametric Q‐learning using linearly additive features. In International conference on machine learning (pp. 6995–7004). PMLR.
Paper not yet in RePEc: Add citation now
Yang, L., & Wang, M. (2020). Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. In International conference on machine learning (pp. 10746–10756). PMLR.
Paper not yet in RePEc: Add citation now
Yang, R., Sun, X., & Narasimhan, K. (2019). A generalized algorithm for multi‐objective reinforcement learning and policy adaptation. In Advances in neural information processing systems (Vol. 32).
Paper not yet in RePEc: Add citation now
Ye, Z., Deng, W., Zhou, S., Xu, Y., & Guan, J. (2020). Optimal trade execution based on deep deterministic policy gradient. In Database systems for advanced applications (pp. 638–654). Springer International Publishing.
Paper not yet in RePEc: Add citation now
Yu, M., & Sun, S. (2020). Policy‐based reinforcement learning for time series anomaly detection. Engineering Applications of Artificial Intelligence, 95, 103919.
Paper not yet in RePEc: Add citation now
Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., & Dasgupta, S. (2019). Model‐based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740.
Paper not yet in RePEc: Add citation now
Yu, S., Wang, H., & Dong, C. (2020). Learning risk preferences from investment portfolios using inverse optimization. arXiv preprint arXiv:2010.01687.
Paper not yet in RePEc: Add citation now
Zhang, G., & Chen, Y. (2020). Reinforcement learning for optimal market making with the presence of rebate. Available at SSRN 3646753.
Paper not yet in RePEc: Add citation now
Zhang, J., Kim, J., O'Donoghue, B., & Boyd, S. (2021). Sample efficient reinforcement learning with REINFORCE. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 10887–10895).
Paper not yet in RePEc: Add citation now
Zhang, K., Koppel, A., Zhu, H., & Basar, T. (2020). Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6), 3586–3612.
Paper not yet in RePEc: Add citation now
Zhang, Z., Zohren, S., & Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science, 2(2), 25–40.
Paper not yet in RePEc: Add citation now
Zhao, M., & Linetsky, V. (2021). High frequency automated market making algorithms with adverse selection risk control via reinforcement learning. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).
Paper not yet in RePEc: Add citation now
Zheng, L., & Ratliff, L. (2020). Constrained upper confidence reinforcement learning. In Learning for dynamics and control (pp. 620–629). PMLR.
Paper not yet in RePEc: Add citation now
Zhou, D., Chen, J., & Gu, Q. (2020). Provable multi‐objective reinforcement learning with generative models. arXiv preprint arXiv:2011.10134.
Paper not yet in RePEc: Add citation now
Zhou, X. Y., & Li, D. (2000). Continuous‐time mean‐variance portfolio selection: A stochastic LQ framework. Applied Mathematics and Optimization, 42(1), 19–33.
Paper not yet in RePEc: Add citation now
Zivot, E. (2017). Introduction to computational finance and financial econometrics. Chapman & Hall CRC.
Paper not yet in RePEc: Add citation now
Zou, S., Xu, T., & Liang, Y. (2019). Finite‐sample analysis for SARSA with linear function approximation. In Advances in neural information processing systems (Vol. 32, pp. 8668–8678).
Paper not yet in RePEc: Add citation now

Recent advances in reinforcement learning in finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
In: Mathematical Finance.
RePEc:bla:mathfi:v:33:y:2023:i:3:p:437-503.
Full description at Econpapers || Download paper

Cited: 23

Cites: 263

Cocites: 29

Coauthors: 0

Citations

Citations received by this document

References

References cited by this document

Cocites

Documents in RePEc which have cited the same bibliography

Coauthors

Authors registered in RePEc who have wrote about the same topic

Report date: 2025-10-02 04:10:37 || Missing content? Let us know

Recent advances in reinforcement learning in finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben. In: Mathematical Finance. RePEc:bla:mathfi:v:33:y:2023:i:3:p:437-503. Full description at Econpapers || Download paper

Citations received by this document

References cited by this document

Documents in RePEc which have cited the same bibliography

Authors registered in RePEc who have wrote about the same topic

Report date: 2025-10-02 04:10:37 || Missing content? Let us know

Recent advances in reinforcement learning in finance. (2023). Xu, Renyuan ; Yang, Huining ; Hambly, Ben.
In: Mathematical Finance.
RePEc:bla:mathfi:v:33:y:2023:i:3:p:437-503.
Full description at Econpapers || Download paper