cs.AI, cs.LG

A Harmonic Mean Formulation of Average Reward Reinforcement Learning in SMDPs

arXiv:2605.04880v1 Announce Type: new
Abstract: Recent research has revived and amplified interest in algorithms for undiscounted average reward reinforcement learning in infinite-horizon, non-episodic (continuing) tasks. Semi-Markov decision processe…