On the Rejection Criterion for Proxy-based Test-time Alignment
arXiv:2604.16146v2 Announce Type: replace
Abstract: Recent works proposed test-time alignment methods that rely on a small aligned model as a proxy that guides the generation of a larger base (unaligned) model. The implicit reward approach skews the l…