Reinforcement Learning for Exponential Utility: Algorithms and Convergence in Discounted MDPs
Researchers have closed a theoretical gap in reinforcement learning by developing principled value-based algorithms for exponential-utility optimization in discounted MDPs, a setting relevant to risk-sensitive decision-making in finance and safety-critical systems. The work establishes contraction properties for two Q-learning extensions, proves convergence guarantees, and characterizes optimal stationary policies. This advances the mathematical foundations of RL beyond standard reward maximization, enabling practitioners to encode risk preferences directly into learning objectives rather than post-hoc adjustments.52






















