cs.CL, cs.SD

HumMusQA: A Human-written Music Understanding QA Benchmark Dataset

arXiv:2603.27877v1 Announce Type: new
Abstract: The evaluation of music understanding in Large Audio-Language Models (LALMs) requires a rigorously defined benchmark that truly tests whether models can perceive and interpret music, a standard that curr…