JAMMEval: A Refined Collection of Japanese Benchmarks for Reliable VLM Evaluation
arXiv:2604.00909v2 Announce Type: replace
Abstract: Reliable evaluation is essential for the development of vision-language models (VLMs). However, Japanese VQA benchmarks have undergone far less iterative refinement than their English counterparts. A…