cs.RO

PrefMoE: Robust Preference Modeling with Mixture-of-Experts Reward Learning

arXiv:2605.00384v1 Announce Type: new
Abstract: Preference-based reinforcement learning offers a scalable alternative to manual reward engineering by learning reward structures from comparative feedback. However, large-scale preference datasets, wheth…