Decoding Scientific Experimental Images: The SPUR Benchmark for Perception, Understanding, and Reasoning
arXiv:2604.27604v1 Announce Type: new
Abstract: We introduce SPUR, a comprehensive benchmark for scientific experimental image perception, understanding, and reasoning, comprising 4,264 question-answering (QA) pairs derived from 1,084 expert-curated i…