cs.CV, cs.RO

EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding

arXiv:2605.14742v1 Announce Type: cross
Abstract: Understanding human–environment interactions from egocentric vision is essential for assistive robotics and embodied intelligent agents, yet existing multimodal large language models (MLLMs) still str…