Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
arXiv:2605.08053v1 Announce Type: new
Abstract: Reinforcement learning (RL) for exponential-utility optimization in discounted Markov decision processes (MDPs) lacks principled value-based algorithms. We address this gap in the fixed risk-aversion set…