cs.AI

Humanline: Online Alignment as Perceptual Loss

arXiv:2509.24207v2 Announce Type: replace
Abstract: Online alignment (e.g., GRPO) is generally more performant than offline alignment (e.g., DPO) — but why? Drawing on prospect theory from behavioral economics, we propose a human-centric explanation….