cs.AI, cs.CL

Useless but Safe? Benchmarking Utility Recovery with User Intent Clarification in Multi-Turn Conversations

arXiv:2604.27093v1 Announce Type: cross
Abstract: Current LLM safety alignment techniques improve model robustness against adversarial attacks, but overlook whether and how LLMs can recover helpfulness when benign users clarify their intent. We introd…