Beyond the Bellman Fixed Point: Geometry and Fast Policy Identification in Value Iteration
arXiv:2604.17457v2 Announce Type: replace-cross
Abstract: Dynamic programming is one of the most fundamental methodologies for solving Markov decision problems. Among its many variants, Q-value iteration (Q-VI) is particularly important due to its con…