Purva Chiniya, Kevin Scaria, Sagar Chaturvedi

Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering

Purva Chiniya, Kevin Scaria, Sagar Chaturvedi / April 8, 2026

arXiv:2604.05179v1 Announce Type: new
Abstract: Large language models (LLMs) remain susceptible to jailbreak and direct prompt-injection attacks, yet the strongest defensive filters frequently over-refuse benign queries and degrade user experience. Pr…

Author name: Purva Chiniya, Kevin Scaria, Sagar Chaturvedi

Gradient-Controlled Decoding: A Safety Guardrail for LLMs with Dual-Anchor Steering