cs.LG

Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens

arXiv:2604.02608v1 Announce Type: new
Abstract: Function vectors (FVs) — mean-difference directions extracted from in-context learning demonstrations — can steer large language model behavior when added to the residual stream. We hypothesized that F…