Lost in Volume: The CT-SpatialVQA Benchmark for Evaluating Semantic-Spatial Understanding of 3D Medical Vision-Language Models
arXiv:2605.08787v1 Announce Type: new
Abstract: Recent advances in 3D medical vision-language models have enabled joint reasoning over volumetric images and text, showing strong performance in medical visual question-answering (VQA) and report generat…