SGTA: Scene-Graph Based Multi-Modal Traffic Agent for Video Understanding
arXiv:2604.03697v1 Announce Type: new
Abstract: We present Scene-Graph Based Multi-Modal Traffic Agent (SGTA), a modular framework for traffic video understanding that combines structured scene graphs with multi-modal reasoning. It constructs a traffi…