VSPO: Vector-Steered Policy Optimization for Behavioral Control
arXiv:2605.15604v1 Announce Type: cross
Abstract: Modern language models often need to optimize a primary accuracy objective while also accommodating secondary behavioral preferences, such as verbosity, agreeableness, or the level of technical experti…