Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret
arXiv:2604.03523v1 Announce Type: cross
Abstract: Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furtherm…