Test of Time: Rethinking Temporal Signal of Benchmark Contamination
arXiv:2509.00072v4 Announce Type: replace
Abstract: Post-cutoff performance decay of LLMs has been widely interpreted as a temporal signal for benchmark contamination, where public information released before the training cutoff may have been included…