SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models
arXiv:2605.13667v1 Announce Type: new
Abstract: Scene graph generation provides a compact structured representation for visual perception, but accurate and fast graph prediction from images and videos remains challenging. Recent VLM-based methods can …