A Forensic Analysis of Synthetic Data in RL: Diagnosing and Solving Algorithmic Failures in Model-Based Policy Optimization
arXiv:2510.01457v4 Announce Type: replace
Abstract: Synthetic data is central to data-efficient Dyna-style model-based reinforcement learning, but it can also degrade performance. We study this failure in Model-Based Policy Optimization (MBPO), which …