cs.AI, cs.LG

Reward Weighted Classifier-Free Guidance as Policy Improvement in Autoregressive Models

arXiv:2604.15577v1 Announce Type: cross
Abstract: Consider an auto-regressive model that produces outputs x (e.g., answers to questions, molecules) each of which can be summarized by an attribute vector y (e.g., helpfulness vs. harmlessness, or bio-av…