cs.LG

MDP Planning as Policy Inference

arXiv:2602.17375v2 Announce Type: replace
Abstract: We cast episodic Markov decision process (MDP) planning as Bayesian inference over policies. A policy is treated as the latent variable and is assigned an unnormalized probability of optimality that …