- Abbasi‐Yadkori, Y., Bartlett, P., Bhatia, K., Lazic, N., Szepesvari, C., & Weisz, G. (2019). Politex: Regret bounds for policy iteration using expert prediction. In International conference on machine learning (pp. 3692–3702). PMLR.
Paper not yet in RePEc: Add citation now
- Abernethy, J. D., & Kale, S. (2013). Adaptive market making via online learning. In NIPS (pp. 2058–2066). Citeseer.
Paper not yet in RePEc: Add citation now
- Aboussalah, A. M. (2020). What is the value of the cross‐sectional approach to deep reinforcement learning? Available at SSRN, 22(6), 1091–1111.
Paper not yet in RePEc: Add citation now
- Achiam, J., Held, D., Tamar, A., & Abbeel, P. (2017). Constrained policy optimization. In International conference on machine learning (pp. 22–31). PMLR.
Paper not yet in RePEc: Add citation now
- Agarwal, A., Bartlett, P., & Dama, M. (2010). Optimal allocation strategies for the dark pool problem. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 9–16). JMLR Workshop and Conference Proceedings.
Paper not yet in RePEc: Add citation now
- Agarwal, A., Kakade, S. M., Lee, J. D., & Mahajan, G. (2021). On the theory of policy gradient methods: Optimality, approximation, and distribution shift. Journal of Machine Learning Research, 22(98), 1–76.
Paper not yet in RePEc: Add citation now
- Agarwal, A., Kakade, S., & Yang, L. F. (2020). Model‐based reinforcement learning with a generative model is minimax optimal. In Conference on learning theory (pp. 67–83). PMLR.
Paper not yet in RePEc: Add citation now
- Almgren, R., & Chriss, N. (2001). Optimal execution of portfolio transactions. Journal of Risk, 3, 5–40.
Paper not yet in RePEc: Add citation now
- Alsabah, H., Capponi, A., Ruiz Lacedelli, O., & Stern, M. (2021). Robo‐advising: Learning investors' risk preferences via portfolio choices. Journal of Financial Econometrics, 19(2), 369–392.
Paper not yet in RePEc: Add citation now
- Asadi, K., & Littman, M. L. (2017). An alternative softmax operator for reinforcement learning. In International conference on machine learning (pp. 243–252). PMLR.
Paper not yet in RePEc: Add citation now
- Avellaneda, M., & Stoikov, S. (2008). High‐frequency trading in a limit order book. Quantitative Finance, 8(3), 217–224.
Paper not yet in RePEc: Add citation now
- Azar, M. G., Munos, R., & Kappen, B. (2012). On the sample complexity of reinforcement learning with a generative model. arXiv preprint arXiv:1206.6461, 1707–1714.
Paper not yet in RePEc: Add citation now
- Azar, M. G., Munos, R., & Kappen, H. J. (2013). Minimax PAC bounds on the sample complexity of reinforcement learning with a generative model. Machine Learning, 91(3), 325–349.
Paper not yet in RePEc: Add citation now
- Azar, M. G., Osband, I., & Munos, R. (2017). Minimax regret bounds for reinforcement learning. In International conference on machine learning (pp. 263–272). PMLR.
Paper not yet in RePEc: Add citation now
Baldacci, B., & Manziuk, I. (2020). Adaptive trading strategies across liquidity pools. arXiv preprint arXiv:2008.07807.
Baldacci, B., Manziuk, I., Mastrolia, T., & Rosenbaum, M. (2019). Market making and incentives design in the presence of a dark pool: A deep reinforcement learning approach. arXiv preprint arXiv:1912.01129.
Bao, W., & Liu, X.‐y. (2019). Multi‐agent deep reinforcement learning for liquidation strategy analysis. arXiv preprint arXiv:1906.11046.
- Basak, S., & Chabakauri, G. (2010). Dynamic mean‐variance asset allocation. The Review of Financial Studies, 23(8), 2970–3016.
Paper not yet in RePEc: Add citation now
- Basei, M., Guo, X., Hu, A., & Zhang, Y. (2021). Logarithmic regret for episodic continuous‐time linear‐quadratic reinforcement learning over a finite‐time horizon. Available at SSRN 3848428, 23(178), 1–34.
Paper not yet in RePEc: Add citation now
- Beck, C. L., & Srikant, R. (2012). Error bounds for constant step‐size Q‐learning. Systems & Control Letters, 61(12), 1203–1208.
Paper not yet in RePEc: Add citation now
- Bellemare, M. G., Naddaf, Y., Veness, J., & Bowling, M. (2013). The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47, 253–279.
Paper not yet in RePEc: Add citation now
- Berry, D. A., & Fristedt, B. (1985). Bandit problems: Sequential allocation of experiments (Monographs on statistics and applied probability) (Vol. 5, pp. 7). Chapman and Hall.
Paper not yet in RePEc: Add citation now
- Bhandari, J., & Russo, D. (2019). Global optimality guarantees for policy gradient methods. arXiv preprint arXiv:1906.01786.
Paper not yet in RePEc: Add citation now
- Bhandari, J., Russo, D., & Singal, R. (2018). A finite time analysis of temporal difference learning with linear function approximation. In Conference on learning theory (pp. 1691–1692). PMLR.
Paper not yet in RePEc: Add citation now
- Bhatnagar, S. (2010). An actor–critic algorithm with function approximation for discounted cost constrained Markov decision processes. Systems & Control Letters, 59(12), 760–766.
Paper not yet in RePEc: Add citation now
- Bjork, T., & Murgoci, A. (2010). A general theory of Markovian time inconsistent stochastic control problems. Available at SSRN 1694759.
Paper not yet in RePEc: Add citation now
Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81(3), 637–654.
- Bradtke, S. J., & Barto, A. G. (1996). Linear least‐squares algorithms for temporal difference learning. Machine Learning, 22(1), 33–57.
Paper not yet in RePEc: Add citation now
- Brafman, R. I., & Tennenholtz, M. (2002). R‐max‐a general polynomial time algorithm for near‐optimal reinforcement learning. Journal of Machine Learning Research, 3, 213–231.
Paper not yet in RePEc: Add citation now
Broadie, M., & Detemple, J. B. (2004). Anniversary article: Option pricing: Valuation models and applications. Management Science, 50(9), 1145–1177.
- Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging. Quantitative Finance, 19(8), 1271–1291.
Paper not yet in RePEc: Add citation now
- Cai, Q., Yang, Z., Lee, J., & Wang, Z. (2019). Neural temporal‐difference learning converges to global optima. In Advances in neural information processing systems.
Paper not yet in RePEc: Add citation now
- Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton University Press.
Paper not yet in RePEc: Add citation now
- Cannelli, L., Nuti, G., Sala, M., & Szehr, O. (2020). Hedging using reinforcement learning: Contextual k‐armed bandit versus Q‐learning. arXiv preprint arXiv:2007.01623.
Paper not yet in RePEc: Add citation now
Cao, J., Chen, J., Hull, J., & Poulos, Z. (2021). Deep hedging of derivatives using reinforcement learning. The Journal of Financial Data Science, 3(1), 10–27.
- Capponi, A., Olafsson, S., & Zariphopoulou, T. (2021). Personalized robo‐advising: Enhancing investment through client interaction. Management Science, 68(4), 2485–2512.
Paper not yet in RePEc: Add citation now
Carbonneau, A., & Godin, F. (2021). Equal risk pricing of derivatives with deep hedging. Quantitative Finance, 21(4), 593–608.
- Cartea, Á., Jaimungal, S., & Penalva, J. (2015). Algorithmic and high‐frequency trading. Cambridge University Press.
Paper not yet in RePEc: Add citation now
- Cartea, Á., Jaimungal, S., & Sánchez‐Betancourt, L. (2021). Deep reinforcement learning for algorithmic trading. Available at SSRN.
Paper not yet in RePEc: Add citation now
- Cayci, S., Satpathi, S., He, N., & Srikant, R. (2021). Sample complexity and overparameterization bounds for projection‐free neural TD learning. arXiv preprint arXiv:2103.01391.
Paper not yet in RePEc: Add citation now
- Cen, S., Cheng, C., Chen, Y., Wei, Y., & Chi, Y. (2020). Fast global convergence of natural policy gradient methods with entropy regularization. arXiv preprint arXiv:2007.06558.
Paper not yet in RePEc: Add citation now
Chakraborti, A., Toke, I. M., Patriarca, M., & Abergel, F. (2011). Econophysics review: I. empirical facts. Quantitative Finance, 11(7), 991–1012.
- Chan, N. T., & Shelton, C. (2001). An electronic market‐maker (Technical Report). MIT.
Paper not yet in RePEc: Add citation now
- Charpentier, A., Elie, R., & Remlinger, C. (2021). Reinforcement learning in economics and finance. Computational Economics, 1–38.
Paper not yet in RePEc: Add citation now
- Chen, J., & Jiang, N. (2019). Information‐theoretic considerations in batch reinforcement learning. In International conference on machine learning (pp. 1042–1051). PMLR.
Paper not yet in RePEc: Add citation now
- Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2019). Learning to optimize under non‐stationarity. In Proceedings of the 22nd international conference on artificial intelligence and statistics (pp. 1079–1087). PMLR.
Paper not yet in RePEc: Add citation now
- Cheung, W. C., Simchi‐Levi, D., & Zhu, R. (2020). Reinforcement learning for non‐stationary Markov decision processes: The blessing of (more) optimism. In International conference on machine learning (pp. 1843–1854). PMLR.
Paper not yet in RePEc: Add citation now
- Chow, Y., Ghavamzadeh, M., Janson, L., & Pavone, M. (2017). Risk‐constrained reinforcement learning with percentile risk criteria. The Journal of Machine Learning Research, 18(1), 6070–6120.
Paper not yet in RePEc: Add citation now
- Chow, Y., Tamar, A., Mannor, S., & Pavone, M. (2015). Risk‐sensitive and robust decision‐making: A CVaR optimization approach. In NIPS'15 (pp. 1522–1530). MIT Press.
Paper not yet in RePEc: Add citation now
- Coache, A., & Jaimungal, S. (2021). Reinforcement learning with dynamic convex risk measures. arXiv preprint arXiv:2112.13414.
Paper not yet in RePEc: Add citation now
- Cong, L. W., Tang, K., Wang, J., & Zhang, Y. (2021). Alphaportfolio: Direct construction through deep reinforcement learning and interpretable ai. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.
Paper not yet in RePEc: Add citation now
Cont, R. (2001). Empirical properties of asset returns: stylized facts and statistical issues. Quantitative Finance, 1(2), 223–236.
Cont, R., & Kukanov, A. (2017). Optimal order placement in limit order markets. Quantitative Finance, 17(1), 21–39.
Cox, J. C., Ross, S. A., & Rubinstein, M. (1979). Option pricing: A simplified approach. Journal of Financial Economics, 7(3), 229–263.
- Dabérius, K., Granat, E., & Karlsson, P. (2019). Deep execution‐value and policy based reinforcement learning for trading and beating market benchmarks. Available at SSRN 3374766.
Paper not yet in RePEc: Add citation now
- Dabney, W., Ostrovski, G., & Barreto, A. (2020). Temporally‐extended ε‐greedy exploration. arXiv preprint arXiv:2006.01782.
Paper not yet in RePEc: Add citation now
- Dai, B., Shaw, A., Li, L., Xiao, L., He, N., Liu, Z., Chen, J., & Song, L. (2018). SBEED: Convergent reinforcement learning with nonlinear function approximation. In International conference on machine learning (pp. 1125–1134). PMLR.
Paper not yet in RePEc: Add citation now
- Dalal, G., Szörényi, B., Thoppe, G., & Mannor, S. (2018). Finite sample analyses for TD(0) with function approximation. In 32th AAAI conference on artificial intelligence.
Paper not yet in RePEc: Add citation now
- Dann, C., & Brunskill, E. (2015). Sample complexity of episodic fixed‐horizon reinforcement learning. In NIPS'15 (pp. 2818–2826). MIT Press.
Paper not yet in RePEc: Add citation now
- Dann, C., Lattimore, T., & Brunskill, E. (2017). Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning. In Proceedings of the 31st international conference on neural information processing systems, NIPS'17 (pp. 5717–5727).
Paper not yet in RePEc: Add citation now
- Dann, C., Mansour, Y., Mohri, M., Sekhari, A., & Sridharan, K. (2022). Guarantees for epsilon‐Greedy reinforcement learning with function approximation. In International conference on machine learning (pp. 4666–4689). PMLR.
Paper not yet in RePEc: Add citation now
- Deng, Y., Bao, F., Kong, Y., Ren, Z., & Dai, Q. (2017). Deep direct reinforcement learning for financial signal representation and trading. IEEE Transactions on Neural Networks and Learning Systems, 28(3), 653–664.
Paper not yet in RePEc: Add citation now
- Ding, D., Wei, X., Yang, Z., Wang, Z., & Jovanovic, M. (2021). Provably efficient safe exploration via primal‐dual policy optimization. In International conference on artificial intelligence and statistics (pp. 3304–3312). PMLR.
Paper not yet in RePEc: Add citation now
- Dixon, M. F., Halperin, I., & Bilokon, P. (2020). Machine learning in finance. Springer.
Paper not yet in RePEc: Add citation now
Dixon, M., & Halperin, I. (2020). G‐learner and girl: Goal based wealth management with reinforcement learning. arXiv preprint arXiv:2002.10990.
- Du, J., Jin, M., Kolm, P. N., Ritter, G., Wang, Y., & Zhang, B. (2020). Deep reinforcement learning for option replication and hedging. The Journal of Financial Data Science, 2(4), 44–57.
Paper not yet in RePEc: Add citation now
- Du, X., Zhai, J., & Lv, K. (2016). Algorithm trading using Q‐learning and recurrent reinforcement learning. Positions, 1, 1.
Paper not yet in RePEc: Add citation now
- Dubrov, B. (2015). Monte Carlo simulation with machine learning for pricing American options and convertible bonds. Available at SSRN 2684523.
Paper not yet in RePEc: Add citation now
- Eriksson, H., & Dimitrakakis, C. (2019). Epistemic risk‐sensitive reinforcement learning. arXiv preprint arXiv:1906.06273.
Paper not yet in RePEc: Add citation now
- Even‐Dar, E., Mansour, Y., & Bartlett, P. (2003). Learning rates for Q‐learning. Journal of Machine Learning Research, 5(1), 1–25.
Paper not yet in RePEc: Add citation now
- Fan, J., Ma, C., & Zhong, Y. (2021). A selective overview of deep learning. Statistical Science, 36, 264–290.
Paper not yet in RePEc: Add citation now
- Fan, J., Wang, Z., Xie, Y., & Yang, Z. (2020). A theoretical analysis of deep Q‐learning. In Learning for dynamics and control (pp. 486–489). PMLR.
Paper not yet in RePEc: Add citation now
- Farahmand, A. M., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2008). Regularized policy iteration. In Advances in neural information processing systems 21 ‐ Proceedings of the 2008 conference (pp. 441–448).
Paper not yet in RePEc: Add citation now
- Fazel, M., Ge, R., Kakade, S. M., & Mesbahi, M. (2018). Global convergence of policy gradient methods for the linear quadratic regulator. In International conference on machine learning (pp. 1467–1476). PMLR.
Paper not yet in RePEc: Add citation now
- Fei, Y., Yang, Z., Chen, Y., Wang, Z., & Xie, Q. (2020). Risk‐sensitive reinforcement learning: Near‐optimal risk‐sample tradeoff in regret. In NeurIPS.
Paper not yet in RePEc: Add citation now
- Fermanian, J.‐D., Guéant, O., & Rachez, A. (2015). Agents' behavior on multi‐dealer‐to‐client bond trading platforms. CREST, Center for Research in Economics and Statistics.
Paper not yet in RePEc: Add citation now
Figlewski, S. (1989). Options arbitrage in imperfect markets. The Journal of Finance, 44(5), 1289–1311.
- Fischer, T. G. (2018). Reinforcement learning in financial markets—a survey (Technical Report). FAU Discussion Papers in Economics.
Paper not yet in RePEc: Add citation now
- François‐Lavet, V., Henderson, P., Islam, R., Bellemare, M. G., & Pineau, J. (2018). An introduction to deep reinforcement learning. Foundations and Trends in Machine Learning, 11(3‐4), 219–354.
Paper not yet in RePEc: Add citation now
- François‐Lavet, V., Rabusseau, G., Pineau, J., Ernst, D., & Fonteneau, R. (2019). On overfitting and asymptotic bias in batch reinforcement learning with partial observability. Journal of Artificial Intelligence Research, 65, 1–30.
Paper not yet in RePEc: Add citation now
- Fu, Z., Yang, Z., & Wang, Z. (2021). Single‐timescale actor‐critic provably finds globally optimal policy. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
- Gajane, P., Ortner, R., & Auer, P. (2018). A sliding‐window algorithm for Markov decision processes with arbitrarily changing rewards and transitions. arXiv preprint arXiv:1805.10066.
Paper not yet in RePEc: Add citation now
- Ganchev, K., Nevmyvaka, Y., Kearns, M., & Vaughan, J. W. (2010). Censored exploration and the dark pool problem. Communications of the ACM, 53(5), 99–107.
Paper not yet in RePEc: Add citation now
Ganesh, S., Vadori, N., Xu, M., Zheng, H., Reddy, P., & Veloso, M. (2019). Reinforcement learning for market making in a multi‐agent dealer market. arXiv preprint arXiv:1911.05892.
- Gao, X., Xu, Z. Q., & Zhou, X. Y. (2020). State‐dependent temperature control for langevin diffusions. arXiv preprint arXiv:2011.07456.
Paper not yet in RePEc: Add citation now
- Gao, Z., Han, Y., Ren, Z., & Zhou, Z. (2019). Batched multi‐armed bandits problem. In Advances in Neural Information Processing Systems (Vol. 32).
Paper not yet in RePEc: Add citation now
- Garcelon, E., Ghavamzadeh, M., Lazaric, A., & Pirotta, M. (2020). Conservative exploration in reinforcement learning. In International conference on artificial intelligence and statistics (pp. 1431–1441). PMLR.
Paper not yet in RePEc: Add citation now
- Gašperov, B., & Kostanjčar, Z. (2021). Market making with signals through deep reinforcement learning. IEEE Access, 9, 61611–61622.
Paper not yet in RePEc: Add citation now
- Geist, M., Scherrer, B., & Pietquin, O. (2019). A theory of regularized Markov decision processes. In International conference on machine learning (pp. 2160–2169). PMLR.
Paper not yet in RePEc: Add citation now
- Giurca, A., & Borovkova, S. (2021). Delta hedging of derivatives using deep reinforcement learning. Available at SSRN 3847272.
Paper not yet in RePEc: Add citation now
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT press.
Paper not yet in RePEc: Add citation now
- Goodfellow, I., Pouget‐Abadie, J., Mirza, M., Xu, B., Warde‐Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. Advances in Neural Information Processing Systems (Vol. 27).
Paper not yet in RePEc: Add citation now
- Goodfellow, I., Warde‐Farley, D., Mirza, M., Courville, A., & Bengio, Y. (2013). Maxout networks. In International conference on machine learning (pp. 1319–1327). PMLR.
Paper not yet in RePEc: Add citation now
- Gopalan, A., & Mannor, S. (2015). Thompson sampling for learning parameterized Markov decision processes. In Conference on learning theory (pp. 861–898). PMLR.
Paper not yet in RePEc: Add citation now
- Gordon, G. J. (1996). Stable fitted reinforcement learning. In Advances in neural information processing systems (pp. 1052–1058).
Paper not yet in RePEc: Add citation now
- Grau‐Moya, J., Leibfried, F., & Vrancx, P. (2018). Soft Q‐learning with mutual‐information regularization. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
- Grinold, R. C., & Kahn, R. N. (2000). Active portfolio management. McGraw‐Hill.
Paper not yet in RePEc: Add citation now
- Gu, S., Lillicrap, T., Sutskever, I., & Levine, S. (2016). Continuous deep Q‐learning with model‐based acceleration. In International conference on machine learning (pp. 2829–2838). PMLR.
Paper not yet in RePEc: Add citation now
Guéant, O., & Manziuk, I. (2019). Deep reinforcement learning for market making in corporate bonds: beating the curse of dimensionality. Applied Mathematical Finance, 26(5), 387–452.
Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2012). Optimal portfolio liquidation with limit orders. SIAM Journal on Financial Mathematics, 3(1), 740–764.
Guéant, O., Lehalle, C.‐A., & Fernandez‐Tapia, J. (2013). Dealing with the inventory risk: A solution to the market making problem. Mathematics and Financial Economics, 7(4), 477–507.
- Guilbaud, F., & Pham, H. (2013). Optimal high‐frequency trading with limit and market orders. Quantitative Finance, 13(1), 79–94.
Paper not yet in RePEc: Add citation now
- Guo, X., Hu, A., & Zhang, Y. (2021). Reinforcement learning for linear‐convex models with jumps via stability analysis of feedback controls. arXiv preprint arXiv:2104.09311.
Paper not yet in RePEc: Add citation now
- Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement learning with deep energy‐based policies. In International conference on machine learning (pp. 1352–1361). PMLR.
Paper not yet in RePEc: Add citation now
- Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic: Off‐policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861–1870). PMLR.
Paper not yet in RePEc: Add citation now
- Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V., Zhu, H., Gupta, A., Abbeel, P., & Levine, S. (2018). Soft actor‐critic algorithms and applications. arXiv preprint arXiv:1812.05905.
Paper not yet in RePEc: Add citation now
- Hakansson, N. H. (1971). Multi‐period mean‐variance analysis: Toward a general theory of portfolio choice. The Journal of Finance, 26(4), 857–884.
Paper not yet in RePEc: Add citation now
Halperin, I. (2019). The QLBS Q‐learner goes NuQlear: Fitted Q iteration, inverse RL, and option portfolios. Quantitative Finance, 19(9), 1543–1553.
- Halperin, I. (2020). QLBS: Q‐learner in the Black‐Scholes (‐Merton) worlds. The Journal of Derivatives, 28(1), 99–122.
Paper not yet in RePEc: Add citation now
Hambly, B., Xu, R., & Yang, H. (2021). Policy gradient methods for the noisy linear quadratic regulator over a finite horizon. SIAM Journal on Control and Optimization, 59(5), 3359–3391.
- Hasselt, H. (2010). Double Q‐learning. In Advances in Neural Information Processing Systems (Vol. 23, pp. 2613–2621.
Paper not yet in RePEc: Add citation now
- Hendricks, D., & Wilcox, D. (2014). A reinforcement learning extension to the Almgren‐Chriss framework for optimal trade execution. In 2014 IEEE Conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 457–464). IEEE.
Paper not yet in RePEc: Add citation now
- Henrotte, P. (1993). Transaction costs and duplication strategies. Graduate School of Business, Stanford University.
Paper not yet in RePEc: Add citation now
Heston, S. (1993). A closed‐form solution for options with stochastic volatility with applications to bond and currency options. Review of Financial Studies, 6, 327–343.
- Huang, N. E., Wu, M.‐L., Qu, W., Long, S. R., & Shen, S. S. (2003). Applications of Hilbert–Huang transform to non‐stationary financial time series analysis. Applied Stochastic Models in Business and Industry, 19(3), 245–268.
Paper not yet in RePEc: Add citation now
- Ian, O., Benjamin, V. R., & Daniel, R. (2013). (More) Efficient reinforcement learning via posterior sampling. In Proceedings of the 26th international conference on neural information processing systems, NIPS'13, (Vol. 2, pp. 3003–3011).
Paper not yet in RePEc: Add citation now
- Jaimungal, S., Pesenti, S. M., Wang, Y. S., & Tatsat, H. (2021). Robust risk‐aware reinforcement learning. Available at SSRN 3910498, 13(1), 213–226.
Paper not yet in RePEc: Add citation now
- Jeong, G., & Kim, H. Y. (2019). Improving financial trading decisions using deep Q‐learning: Predicting the number of shares, action strategies, and transfer learning. Expert Systems with Applications, 117, 125–138.
Paper not yet in RePEc: Add citation now
- Jia, Y., & Zhou, X. Y. (2021). Policy gradient and actor‐critic learning in continuous time and space: Theory and algorithms. arXiv preprint arXiv:2111.11232.
Paper not yet in RePEc: Add citation now
Jia, Y., & Zhou, X. Y. (2022). Policy evaluation and temporal‐difference learning in continuous time and space: A martingale approach. Journal of Machine Learning Research, 23(154), 1–55.
- Jiang, J., Kelly, B. T., & Xiu, D. (2020). (Re‐) Imag (in) ing price trends [Research paper]. Chicago Booth.
Paper not yet in RePEc: Add citation now
Jiang, Z., Xu, D., & Liang, J. (2017). A deep reinforcement learning framework for the financial portfolio management problem. arXiv preprint arXiv:1706.10059.
- Jin, C., Allen‐Zhu, Z., Bubeck, S., & Jordan, M. I. (2018). Is Q‐learning provably efficient? In Advances in neural information processing systems (Vol. 31).
Paper not yet in RePEc: Add citation now
- Jin, C., Yang, Z., Wang, Z., & Jordan, M. I. (2020). Provably efficient reinforcement learning with linear function approximation. In Conference on learning theory (pp. 2137–2143). PMLR.
Paper not yet in RePEc: Add citation now
- Kakade, S. M. (2001). A natural policy gradient. In Advances in neural information processing systems (Vol. 14).
Paper not yet in RePEc: Add citation now
- Karpe, M., Fang, J., Ma, Z., & Wang, C. (2020). Multi‐agent reinforcement learning in a realistic limit order book market simulation. In Proceedings of the first ACM international conference on AI in finance, ICAIF'20.
Paper not yet in RePEc: Add citation now
Ke, T. T., Shen, Z.‐J. M., & Villas‐Boas, J. M. (2016). Search for information on multiple products. Management Science, 62(12), 3576–3603.
- Kearns, M., & Singh, S. (2002). Near‐optimal reinforcement learning in polynomial time. Machine Learning, 49(2), 209–232.
Paper not yet in RePEc: Add citation now
- Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. In Proceedings of the 3rd international conference on learning representations (ICLR).
Paper not yet in RePEc: Add citation now
Klöppel, S., & Schweizer, M. (2007). Dynamic indifference valuation via convex risk measures. Mathematical Finance, 17(4), 599–627.
- Koenig, S., & Simmons, R. G. (1993). Complexity analysis of real‐time reinforcement learning. In AAAI (pp. 99–107).
Paper not yet in RePEc: Add citation now
- Kolm, P. N., & Ritter, G. (2019). Dynamic replication and hedging: A reinforcement learning approach. The Journal of Financial Data Science, 1(1), 159–171.
Paper not yet in RePEc: Add citation now
- Kolm, P. N., & Ritter, G. (2020). Modern perspectives on reinforcement learning in finance. The Journal of Machine Learning in Finance, 1, 28.
Paper not yet in RePEc: Add citation now
- Konda, V. (2002). Actor‐critic algorithms (PhD thesis). MIT.
Paper not yet in RePEc: Add citation now
- Konda, V. R., & Tsitsiklis, J. N. (2000). Actor‐critic algorithms. In Advances in neural information processing systems (pp. 1008–1014).
Paper not yet in RePEc: Add citation now
- Kühn, C., & Stroh, M. (2010). Optimal portfolios of a small investor in a limit order market: A shadow price approach. Mathematics and Financial Economics, 3(2), 45–72.
Paper not yet in RePEc: Add citation now
- Kumar, H., Koppel, A., & Ribeiro, A. (2019). On the sample complexity of actor‐critic method for reinforcement learning with function approximation. arXiv preprint arXiv:1910.08412.
Paper not yet in RePEc: Add citation now
- Lagoudakis, M. G., & Parr, R. (2003). Least‐squares policy iteration. The Journal of Machine Learning Research, 4, 1107–1149.
Paper not yet in RePEc: Add citation now
- Lakshminarayanan, C., & Szepesvari, C. (2018). Linear stochastic approximation: How far does constant step‐size and iterate averaging go? In International conference on artificial intelligence and statistics (pp. 1347–1355). PMLR.
Paper not yet in RePEc: Add citation now
- Lattimore, T., & Hutter, M. (2012). PAC bounds for discounted MDPs. In International conference on algorithmic learning theory (pp. 320–334). Springer.
Paper not yet in RePEc: Add citation now
- Lattimore, T., Szepesvari, C., & Weisz, G. (2020). Learning with good feature representations in bandits and in RL with a generative model. In International conference on machine learning (pp. 5662–5670). PMLR.
Paper not yet in RePEc: Add citation now
- LeCun, Y., & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. In The handbook of brain theory and neural networks (Vol. 3361). MIT Press.
Paper not yet in RePEc: Add citation now
Leland, H. E. (1985). Option pricing and replication with transactions costs. The Journal of Finance, 40(5), 1283–1301.
- Li, D., & Ng, W.‐L. (2000). Optimal dynamic portfolio selection: Multiperiod mean‐variance formulation. Mathematical Finance, 10(3), 387–406.
Paper not yet in RePEc: Add citation now
- Li, L. (2009). A unifying framework for computational reinforcement learning theory. Rutgers—The State University of New Jersey—New Brunswick.
Paper not yet in RePEc: Add citation now
- Li, Y., Szepesvari, C., & Schuurmans, D. (2009). Learning exercise policies for American options. In Artificial intelligence and statistics (pp. 352–359). PMLR.
Paper not yet in RePEc: Add citation now
Liang, Z., Chen, H., Zhu, J., Jiang, K., & Li, Y. (2018). Adversarial deep reinforcement learning in portfolio management. arXiv preprint arXiv:1808.09940.
- Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2016). Continuous control with deep reinforcement learning. In 4th international conference on learning representations (ICLR).
Paper not yet in RePEc: Add citation now
- Lin, L.‐J. (1992). Self‐improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8(3‐4), 293–321.
Paper not yet in RePEc: Add citation now
- Lin, S., & Beling, P. A. (2020). An end‐to‐end optimal trade execution framework based on proximal policy optimization. In IJCAI (pp. 4548–4554).
Paper not yet in RePEc: Add citation now
- Liu, B., Cai, Q., Yang, Z., & Wang, Z. (2019). Neural trust region/proximal policy optimization attains globally optimal policy. In Advances in neural information processing systems (Vol. 32).
Paper not yet in RePEc: Add citation now
- Liu, X.‐Y., Xia, Z., Rui, J., Gao, J., Yang, H., Zhu, M., Wang, C. D., Wang, Z., & Guo, J. (2022). FinRL‐Meta: Market environments and benchmarks for data‐driven financial reinforcement learning. arXiv preprint arXiv:2211.03107.
Paper not yet in RePEc: Add citation now
Liu, X.‐Y., Yang, H., Gao, J., & Wang, C. D. (2021). FinRL: Deep reinforcement learning framework to automate trading in quantitative finance. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).
- Liu, Y., Zhang, K., Basar, T., & Yin, W. (2020). An improved analysis of (variance‐reduced) policy gradient and natural policy gradient methods. In NeurIPS.
Paper not yet in RePEc: Add citation now
Longstaff, F. A., & Schwartz, E. S. (2001). Valuing American options by simulation: A simple least‐squares approach. The Review of Financial Studies, 14(1), 113–147.
- Mao, W., Zhang, K., Zhu, R., Simchi‐Levi, D., & Başar, T. (2020). Model‐free non‐stationary RL: Near‐optimal regret and applications in multi‐agent RL and inventory control. arXiv preprint arXiv:2010.03161.
Paper not yet in RePEc: Add citation now
- Markowitz, H. M. (1952). Portfolio selection. Journal of Finance, 7(1), 77–91.
Paper not yet in RePEc: Add citation now
- massoud Farahmand, A., Ghavamzadeh, M., Szepesvàri, C., & Mannor, S. (2009). Regularized fitted Q‐iteration for planning in continuous‐space Markovian decision problems. In 2009 American control conference (pp. 725–730). IEEE.
Paper not yet in RePEc: Add citation now
- Mei, J., Xiao, C., Szepesvari, C., & Schuurmans, D. (2020). On the global convergence rates of softmax policy gradient methods. In International conference on machine learning (pp. 6820–6829). PMLR.
Paper not yet in RePEc: Add citation now
- Melo, F. S., & Ribeiro, M. I. (2007). Q‐learning with linear function approximation. In International conference on computational learning theory (pp. 308–322). Springer.
Paper not yet in RePEc: Add citation now
Meng, T. L., & Khushi, M. (2019). Reinforcement learning in financial markets. Data, 4(3), 110.
Merton, R. C. (1973). Theory of rational option pricing. The Bell Journal of Economics and Management Science, 4, 141–183.
Merton, R. C., & Samuelson, P. A. (1974). Fallacy of the log‐normal approximation to optimal portfolio decision‐making over many periods. Journal of Financial Economics, 1(1), 67–94.
- Mihatsch, O., & Neuneier, R. (2002). Risk‐sensitive reinforcement learning. Machine Learning, 49(2), 267–290.
Paper not yet in RePEc: Add citation now
- Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., & Kavukcuoglu, K. (2016). Asynchronous methods for deep reinforcement learning. In International conference on machine learning (pp. 1928–1937). PMLR.
Paper not yet in RePEc: Add citation now
- Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., & Riedmiller, M. (2013). Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Paper not yet in RePEc: Add citation now
- Moody, J., Wu, L., Liao, Y., & Saffell, M. (1998). Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17(5‐6), 441–470.
Paper not yet in RePEc: Add citation now
Mosavi, A., Faghan, Y., Ghamisi, P., Duan, P., Ardabili, S. F., Salwana, E., & Band, S. S. (2020). Comprehensive review of deep reinforcement learning methods and applications in economics. Mathematics, 8(10), 1640.
Mossin, J. (1968). Optimal multiperiod portfolio policies. The Journal of Business, 41(2), 215–229.
- Nesterov, Y. E. (1983). A method for solving the convex programming problem with convergence rate o (1/k 2). Dokl. Akad. Nauk SSSR, 269, 543–547.
Paper not yet in RePEc: Add citation now
- Nevmyvaka, Y., Feng, Y., & Kearns, M. (2006). Reinforcement learning for optimized trade execution. In Proceedings of the 23rd international conference on machine learning (pp. 673–680).
Paper not yet in RePEc: Add citation now
- Ning, B., Ling, F. H. T., & Jaimungal, S. (2018). Double deep Q‐learning for optimal execution. arXiv preprint arXiv:1812.06600.
Paper not yet in RePEc: Add citation now
Obizhaeva, A. A., & Wang, J. (2013). Optimal trading strategy and supply/demand dynamics. Journal of Financial Markets, 16(1), 1–32.
- Ouyang, Y., Gagrani, M., Nayyar, A., & Jain, R. (2017). Learning unknown Markov decision processes: A Thompson sampling approach. In Advances in neural information processing systems (Vol. 30).
Paper not yet in RePEc: Add citation now
- Pan, S. J., & Yang, Q. (2009). A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10), 1345–1359.
Paper not yet in RePEc: Add citation now
- Papini, M., Binaghi, D., Canonaco, G., Pirotta, M., & Restelli, M. (2018). Stochastic variance‐reduced policy gradient. In International conference on machine learning (pp. 4026–4035). PMLR.
Paper not yet in RePEc: Add citation now
- Park, H., Sim, M. K., & Choi, D. G. (2020). An intelligent financial portfolio trading strategy using deep Q‐learning. Expert Systems with Applications, 158, 113573.
Paper not yet in RePEc: Add citation now
- Patel, Y. (2018). Optimizing market making using multi‐agent reinforcement learning. arXiv preprint arXiv:1812.10252.
Paper not yet in RePEc: Add citation now
- Pedersen, J. L., & Peskir, G. (2017). Optimal mean‐variance portfolio selection. Mathematics and Financial Economics, 11(2), 137–160.
Paper not yet in RePEc: Add citation now
- Pendharkar, P. C., & Cusatis, P. (2018). Trading financial indices with reinforcement learning agents. Expert Systems with Applications, 103, 1–13.
Paper not yet in RePEc: Add citation now
- Perold, A. F. (1988). The implementation shortfall: Paper versus reality. Journal of Portfolio Management, 14(3), 4–9.
Paper not yet in RePEc: Add citation now
- Pomatto, L., Strack, P., & Tamuz, O. (2018). The cost of information. arXiv preprint arXiv:1812.04211.
Paper not yet in RePEc: Add citation now
- Powell, W. B. (2021). Reinforcement learning and stochastic optimization. John Wiley & Sons.
Paper not yet in RePEc: Add citation now
- Preis, T. (2011). Price‐time priority and pro rata matching in an order book model of financial markets. In Econophysics of order‐driven markets (pp. 65–72). Springer.
Paper not yet in RePEc: Add citation now
- Puterman, M. L. (2014). Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons.
Paper not yet in RePEc: Add citation now
- Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.
Paper not yet in RePEc: Add citation now
- Ren, Z., & Zhou, Z. (2020). Dynamic batch learning in high‐dimensional sparse linear contextual bandits. arXiv preprint arXiv:2008.11918.
Paper not yet in RePEc: Add citation now
- Riedmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In European conference on machine learning (pp. 317–328). Springer.
Paper not yet in RePEc: Add citation now
- Robbins, H. (1952). Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5), 527–535.
Paper not yet in RePEc: Add citation now
- Samuelson, P. A. (1975). Lifetime portfolio selection by dynamic stochastic programming. In Stochastic optimization models in finance (pp. 517–524). Academic Press.
Paper not yet in RePEc: Add citation now
- Sato, Y. (2019). Model‐free reinforcement learning for financial portfolios: A brief survey. arXiv preprint arXiv:1904.04973.
Paper not yet in RePEc: Add citation now
- Schulman, J., Levine, S., Abbeel, P., Jordan, M., & Moritz, P. (2015). Trust region policy optimization. In International conference on machine learning (pp. 1889–1897). PMLR.
Paper not yet in RePEc: Add citation now
- Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Paper not yet in RePEc: Add citation now
- Sewak, M. (2019). Policy‐based reinforcement learning approaches. In Deep reinforcement learning (pp. 127–140). Springer.
Paper not yet in RePEc: Add citation now
- Shani, L., Efroni, Y., & Mannor, S. (2020). Adaptive trust region policy optimization: Global convergence and faster rates for regularized MDPs. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, pp. 5668–5675).
Paper not yet in RePEc: Add citation now
- Sharpe, W. F. (1966). Mutual fund performance. The Journal of Business, 39(1), 119–138.
Paper not yet in RePEc: Add citation now
- Shen, Y., Huang, R., Yan, C., & Obermayer, K. (2014). Risk‐averse reinforcement learning for algorithmic trading. In 2014 IEEE conference on computational intelligence for financial engineering & economics (CIFEr) (pp. 391–398). IEEE.
Paper not yet in RePEc: Add citation now
- Shen, Y., Tobia, M. J., Sommer, T., & Obermayer, K. (2014). Risk‐sensitive reinforcement learning. Neural Computation, 26(7), 1298–1328.
Paper not yet in RePEc: Add citation now
- Shen, Z., Ribeiro, A., Hassani, H., Qian, H., & Mi, C. (2019). Hessian aided policy gradient. In International conference on machine learning (pp. 5729–5738). PMLR.
Paper not yet in RePEc: Add citation now
- Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. In International conference on machine learning (pp. 387–395). PMLR.
Paper not yet in RePEc: Add citation now
- Simchi‐Levi, D., & Xu, Y. (2020). Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability. Available at SSRN 3562765, 47(3), 1904–1931.
Paper not yet in RePEc: Add citation now
- Sortino, F. A., & Price, L. N. (1994). Performance measurement in a downside risk framework. The Journal of Investing, 3(3), 59–64.
Paper not yet in RePEc: Add citation now
Spooner, T., & Savani, R. (2020). Robust Market Making via Adversarial Reinforcement Learning. In Proceedings of the 29th international joint conference on artificial intelligence, IJCAI‐20 (pp. 4590–4596).
Spooner, T., Fearnley, J., Savani, R., & Koukorinis, A. (2018). Market making via reinforcement learning. In International foundation for autonomous agents and multiagent systems, AAMAS'18 (pp. 434–442).
- Steinbach, M. (2001). Markowitz revisited: Mean‐variance models in financial portfolio analysis. SIAM Review, 43, 31–85.
Paper not yet in RePEc: Add citation now
- Strehl, A. L., & Littman, M. L. (2005). A theoretical analysis of model‐based interval estimation. In Proceedings of the 22nd international conference on machine learning ICML'05 (pp. 856–863). Association for Computing Machinery.
Paper not yet in RePEc: Add citation now
- Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10(11), 2413–2444.
Paper not yet in RePEc: Add citation now
Strotz, R. H. (1955). Myopia and inconsistency in dynamic utility maximization. The Review of Economic Studies, 23(3), 165–180.
- Sutskever, I., Martens, J., & Hinton, G. E. (2011). Generating text with recurrent neural networks. In ICML.
Paper not yet in RePEc: Add citation now
- Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT Press.
Paper not yet in RePEc: Add citation now
- Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems (pp. 1057–1063).
Paper not yet in RePEc: Add citation now
- Szita, I., & Szepesvári, C. (2010). Model‐based reinforcement learning with nearly tight exploration complexity bounds. In ICML (pp. 1031–1038).
Paper not yet in RePEc: Add citation now
- Tamar, A., Chow, Y., Ghavamzadeh, M., & Mannor, S. (2015). Policy gradient for coherent risk measures. In Advances in neural information processing systems (Vol. 28).
Paper not yet in RePEc: Add citation now
- Tang, W., Zhang, P. Y., & Zhou, X. Y. (2021). Exploratory HJB equations and their convergence. arXiv preprint arXiv:2109.10269.
Paper not yet in RePEc: Add citation now
- Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285–294.
Paper not yet in RePEc: Add citation now
- Tieleman, T., & Hinton, G. (2012). Lecture 6.5‐rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4(2), 26–31.
Paper not yet in RePEc: Add citation now
- Torrey, L., & Shavlik, J. (2010). Transfer learning. In Handbook of research on machine learning applications and trends: Algorithms, methods, and techniques (pp. 242–264). IGI Global.
Paper not yet in RePEc: Add citation now
- Touati, A., & Vincent, P. (2020). Efficient learning in non‐stationary linear Markov decision processes. arXiv preprint arXiv:2010.12870.
Paper not yet in RePEc: Add citation now
- Vadori, N., Ganesh, S., Reddy, P., & Veloso, M. (2020). Risk‐sensitive reinforcement learning: A martingale approach to reward uncertainty. arXiv preprint arXiv:2006.12686.
Paper not yet in RePEc: Add citation now
- Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q‐learning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30).
Paper not yet in RePEc: Add citation now
- Vigna, E. (2016). On time consistency for mean‐variance portfolio selection. Collegio Carlo Alberto Notebook, 476.
Paper not yet in RePEc: Add citation now
- Von Luxburg, U., & Schölkopf, B. (2011). Statistical learning theory: Models, concepts, and results. In Handbook of the history of logic, (Vol. 10, pp. 651–706). Elsevier.
Paper not yet in RePEc: Add citation now
- Wang, H. (2019). Large scale continuous‐time mean‐variance portfolio allocation via reinforcement learning. Available at SSRN 3428125.
Paper not yet in RePEc: Add citation now
Wang, H., & Yu, S. (2021). Robo‐advising: Enhancing investment with inverse optimization and deep reinforcement learning. arXiv preprint arXiv:2105.09264.
- Wang, H., & Zhou, X. Y. (2020). Continuous‐time mean–variance portfolio selection: A reinforcement learning framework. Mathematical Finance, 30(4), 1273–1308.
Paper not yet in RePEc: Add citation now
- Wang, H., Zariphopoulou, T., & Zhou, X. (2020). Exploration versus exploitation in reinforcement learning: A stochastic control approach. Journal of Machine Learning Research, 21, 1–34.
Paper not yet in RePEc: Add citation now
- Wang, L., Cai, Q., Yang, Z., & Wang, Z. (2020). Neural policy gradient methods: Global optimality and rates of convergence. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
- Wang, Y., Dong, K., Chen, X., & Wang, L. (2020). Q‐learning with UCB exploration is sample efficient for infinite‐horizon MDP. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
- Wang, Z., Bapst, V., Heess, N., Mnih, V., Munos, R., Kavukcuoglu, K., & Freitas, N. (2017). Sample efficient actor‐critic with experience replay. In International conference on learning representations (ICLR) (pp. 1–13).
Paper not yet in RePEc: Add citation now
- Wei, C.‐Y., & Luo, H. (2021). Non‐stationary reinforcement learning without prior knowledge: An optimal black‐box approach. In Conference on learning theory (pp. 4300–4354). PMLR.
Paper not yet in RePEc: Add citation now
- Wei, C.‐Y., Jahromi, M. J., Luo, H., Sharma, H., & Jain, R. (2020). Model‐free reinforcement learning in infinite‐horizon average‐reward Markov decision processes. In International conference on machine learning (pp. 10170–10180). PMLR.
Paper not yet in RePEc: Add citation now
- Wei, H., Wang, Y., Mangu, L., & Decker, K. (2019). Model‐based reinforcement learning for predictions and control for limit order books. arXiv preprint arXiv:1910.03743.
Paper not yet in RePEc: Add citation now
Wiese, M., Knobloch, R., Korn, R., & Kretschmer, P. (2020). Quant GANs: Deep generation of financial time series. Quantitative Finance, 20(9), 1419–1440.
- Williams, R. J. (1992). Simple statistical gradient‐following algorithms for connectionist reinforcement learning. Machine Learning, 8(3), 229–256.
Paper not yet in RePEc: Add citation now
- Wu, Y., Shariff, R., Lattimore, T., & Szepesvári, C. (2016). Conservative bandits. In International conference on machine learning (pp. 1254–1262). PMLR.
Paper not yet in RePEc: Add citation now
- Xiao, H., Zhou, Z., Ren, T., Bai, Y., & Liu, W. (2020). Time‐consistent strategies for multi‐period mean‐variance portfolio optimization with the serially correlated returns. Communications in Statistics‐Theory and Methods, 49(12), 2831–2868.
Paper not yet in RePEc: Add citation now
- Xiong, H., Xu, T., Liang, Y., & Zhang, W. (2021). Non‐asymptotic convergence of Adam‐type reinforcement learning algorithms under Markovian sampling. Proceedings of the AAAI Conference on Artificial Intelligence, 35(12), 10460–10468.
Paper not yet in RePEc: Add citation now
- Xiong, H., Zhao, L., Liang, Y., & Zhang, W. (2020). Finite‐time analysis for double Q‐learning. In Advances in neural information processing systems (Vol. 33).
Paper not yet in RePEc: Add citation now
- Xiong, Z., Liu, X.‐Y., Zhong, S., Yang, H., & Walid, A. (2018). Practical deep reinforcement learning approach for stock trading. arXiv preprint arXiv:1811.07522.
Paper not yet in RePEc: Add citation now
- Xu, P., & Gu, Q. (2020). A finite‐time analysis of Q‐learning with neural network function approximation. In International conference on machine learning (pp. 10555–10565). PMLR.
Paper not yet in RePEc: Add citation now
- Xu, P., Gao, F., & Gu, Q. (2020a). An improved convergence analysis of stochastic variance‐reduced policy gradient. In Uncertainty in artificial intelligence (pp. 541–551). PMLR.
Paper not yet in RePEc: Add citation now
- Xu, P., Gao, F., & Gu, Q. (2020b). Sample efficient policy gradient methods with recursive variance reduction. In International conference on learning representations.
Paper not yet in RePEc: Add citation now
- Xu, T., Wang, Z., & Liang, Y. (2020a). Improving sample complexity bounds for (natural) actor‐critic algorithms. In Advances in neural information processing systems (Vol. 33, pp. 4358–4369).
Paper not yet in RePEc: Add citation now
- Xu, T., Wang, Z., & Liang, Y. (2020b). Non‐asymptotic convergence analysis of two time‐scale (natural) actor‐critic algorithms. arXiv preprint arXiv:2005.03557.
Paper not yet in RePEc: Add citation now
- Xu, T., Yang, Z., Wang, Z., & Liang, Y. (2021). Doubly robust off‐policy actor‐critic: Convergence and optimality. arXiv preprint arXiv:2102.11866.
Paper not yet in RePEc: Add citation now
- Yang, H., Liu, X.‐Y., & Wu, Q. (2018). A practical machine learning approach for dynamic stock recommendation. In 2018 17th IEEE international conference on trust, security and privacy in computing and communications/12th IEEE international conference on big data science and engineering (TrustCom/BigDataSE) (pp. 1693–1697). IEEE.
Paper not yet in RePEc: Add citation now
- Yang, L., & Wang, M. (2019). Sample‐optimal parametric Q‐learning using linearly additive features. In International conference on machine learning (pp. 6995–7004). PMLR.
Paper not yet in RePEc: Add citation now
- Yang, L., & Wang, M. (2020). Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound. In International conference on machine learning (pp. 10746–10756). PMLR.
Paper not yet in RePEc: Add citation now
- Yang, R., Sun, X., & Narasimhan, K. (2019). A generalized algorithm for multi‐objective reinforcement learning and policy adaptation. In Advances in neural information processing systems (Vol. 32).
Paper not yet in RePEc: Add citation now
- Ye, Z., Deng, W., Zhou, S., Xu, Y., & Guan, J. (2020). Optimal trade execution based on deep deterministic policy gradient. In Database systems for advanced applications (pp. 638–654). Springer International Publishing.
Paper not yet in RePEc: Add citation now
- Yu, M., & Sun, S. (2020). Policy‐based reinforcement learning for time series anomaly detection. Engineering Applications of Artificial Intelligence, 95, 103919.
Paper not yet in RePEc: Add citation now
- Yu, P., Lee, J. S., Kulyatin, I., Shi, Z., & Dasgupta, S. (2019). Model‐based deep reinforcement learning for dynamic portfolio optimization. arXiv preprint arXiv:1901.08740.
Paper not yet in RePEc: Add citation now
- Yu, S., Wang, H., & Dong, C. (2020). Learning risk preferences from investment portfolios using inverse optimization. arXiv preprint arXiv:2010.01687.
Paper not yet in RePEc: Add citation now
- Zhang, G., & Chen, Y. (2020). Reinforcement learning for optimal market making with the presence of rebate. Available at SSRN 3646753.
Paper not yet in RePEc: Add citation now
- Zhang, J., Kim, J., O'Donoghue, B., & Boyd, S. (2021). Sample efficient reinforcement learning with REINFORCE. In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, pp. 10887–10895).
Paper not yet in RePEc: Add citation now
- Zhang, K., Koppel, A., Zhu, H., & Basar, T. (2020). Global convergence of policy gradient methods to (almost) locally optimal policies. SIAM Journal on Control and Optimization, 58(6), 3586–3612.
Paper not yet in RePEc: Add citation now
- Zhang, Z., Zohren, S., & Roberts, S. (2020). Deep reinforcement learning for trading. The Journal of Financial Data Science, 2(2), 25–40.
Paper not yet in RePEc: Add citation now
- Zhao, M., & Linetsky, V. (2021). High frequency automated market making algorithms with adverse selection risk control via reinforcement learning. In Proceedings of the second ACM international conference on AI in finance (pp. 1–9).
Paper not yet in RePEc: Add citation now
- Zheng, L., & Ratliff, L. (2020). Constrained upper confidence reinforcement learning. In Learning for dynamics and control (pp. 620–629). PMLR.
Paper not yet in RePEc: Add citation now
- Zhou, D., Chen, J., & Gu, Q. (2020). Provable multi‐objective reinforcement learning with generative models. arXiv preprint arXiv:2011.10134.
Paper not yet in RePEc: Add citation now
- Zhou, X. Y., & Li, D. (2000). Continuous‐time mean‐variance portfolio selection: A stochastic LQ framework. Applied Mathematics and Optimization, 42(1), 19–33.
Paper not yet in RePEc: Add citation now
- Zivot, E. (2017). Introduction to computational finance and financial econometrics. Chapman & Hall CRC.
Paper not yet in RePEc: Add citation now
- Zou, S., Xu, T., & Liang, Y. (2019). Finite‐sample analysis for SARSA with linear function approximation. In Advances in neural information processing systems (Vol. 32, pp. 8668–8678).
Paper not yet in RePEc: Add citation now