DriveXQA: Cross-modal Visual Question Answering for Adverse Driving Scene Understanding
arXiv:2603.11380v2 Announce Type: replace
Abstract: Fusing sensors with complementary modalities is crucial for maintaining a stable and comprehensive understanding of abnormal driving scenes. However, Multimodal Large Language Models (MLLMs) are unde…