Object Affordance Recognition and Grounding via Multi-scale Cross-modal Representation Learning
arXiv:2508.01184v2 Announce Type: replace
Abstract: A core problem of Embodied AI is to learn object manipulation from observation, as humans do. To achieve this, it is important to localize 3D object affordance areas through observation such as image…