SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts
arXiv:2604.26506v1 Announce Type: new
Abstract: As Large Language Models (LLMs) are increasingly integrated into academic peer review, their vulnerability to adversarial prompts — adversarial instructions embedded in submissions to manipulate outcome…