cs.AI, cs.CV, cs.LG, cs.RO

Optimizing Neurorobot Policy under Limited Demonstration Data through Preference Regret

arXiv:2604.03523v1 Announce Type: cross
Abstract: Robot reinforcement learning from demonstrations (RLfD) assumes that expert data is abundant; this is usually unrealistic in the real world given data scarcity as well as high collection cost. Furtherm…