cs.CV

SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models

arXiv:2605.13667v1 Announce Type: new
Abstract: Scene graph generation provides a compact structured representation for visual perception, but accurate and fast graph prediction from images and videos remains challenging. Recent VLM-based methods can …