Rohith Reddy Bellibatlu

JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems

Rohith Reddy Bellibatlu / April 28, 2026

arXiv:2604.23478v1 Announce Type: new
Abstract: Large language models are increasingly deployed as automated judges for evaluating other models, yet the stability of their verdicts under semantically equivalent prompt paraphrases remains unmeasured. W…

Author name: Rohith Reddy Bellibatlu

JudgeSense: A Benchmark for Prompt Sensitivity in LLM-as-a-Judge Systems