Driving Intents Amplify Planning-Oriented Reinforcement Learning
arXiv:2605.12625v2 Announce Type: replace
Abstract: Continuous-action policies trained on a single demonstrated trajectory per scene suffer from mode collapse: samples cluster around the demonstrated maneuver and the policy cannot represent semantical…