cs.CL, cs.CV

E2E-GMNER: End-to-End Generative Grounded Multimodal Named Entity Recognition

arXiv:2604.17319v1 Announce Type: new
Abstract: Grounded Multimodal Named Entity Recognition (GMNER) aims to jointly identify named entity mentions in text, predict their semantic types, and ground each entity to a corresponding visual region in an as…