cs.AI, cs.CL, cs.LG

Semantic Needles in Document Haystacks: Sensitivity Testing of LLM-as-a-Judge Similarity Scoring

arXiv:2604.18835v1 Announce Type: cross
Abstract: We propose a scalable, multifactorial experimental framework that systematically probes LLM sensitivity to subtle semantic changes in pairwise document comparison. We analogize this as a needle-in-a-ha…