cs.CL, cs.LG

VSPO: Vector-Steered Policy Optimization for Behavioral Control

arXiv:2605.15604v1 Announce Type: cross
Abstract: Modern language models often need to optimize a primary accuracy objective while also accommodating secondary behavioral preferences, such as verbosity, agreeableness, or the level of technical experti…