Relational Preference Encoding in Looped Transformer Internal States
arXiv:2604.09870v1 Announce Type: new
Abstract: We investigate how looped transformers encode human preference in their internal iteration states. Using Ouro-2.6B-Thinking, a 2.6B-parameter looped transformer with iterative refinement, we extract hidd…