Tighter Regret Bounds for Contextual Action-Set Reinforcement Learning
arXiv:2605.15692v1 Announce Type: new
Abstract: We study episodic reinforcement learning with fixed reward and transition functions, but with episode-dependent admissible action sets that are observed at the start of each episode. Performance is measu…