Bayesian policy gradient and actor-critic algorithms
arXiv:2604.27563v1 Announce Type: new
Abstract: Policy gradient methods are reinforcement learning algorithms that adapt a parameterized policy by following a performance gradient estimate. Conventional policy gradient methods use Monte-Carlo techniqu…