What-Meets-Where: Unified Learning of Action and Contact Localization in Images
arXiv:2508.09428v2 Announce Type: replace
Abstract: People control their bodies to establish contact with the environment. To comprehensively understand actions across diverse visual contexts, it is essential to simultaneously consider \textbf{what} a…