cs.AI, cs.RO

Masked IRL: LLM-Guided Reward Disambiguation from Demonstrations and Language

arXiv:2511.14565v2 Announce Type: replace-cross
Abstract: Robots can adapt to user preferences by learning reward functions from demonstrations, but with limited data, reward models often overfit to spurious correlations and fail to generalize. This h…