Local Linearity of LLMs Enables Activation Steering via Model-Based Linear Optimal Control
arXiv:2604.19018v1 Announce Type: new
Abstract: Inference-time LLM alignment methods, particularly activation steering, offer an alternative to fine-tuning by directly modifying activations during generation. Existing methods, however, often rely on n…