Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh

Humanline: Online Alignment as Perceptual Loss

Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh / March 30, 2026

arXiv:2509.24207v2 Announce Type: replace
Abstract: Online alignment (e.g., GRPO) is generally more performant than offline alignment (e.g., DPO) — but why? Drawing on prospect theory from behavioral economics, we propose a human-centric explanation….

Author name: Sijia Liu, Niklas Muennighoff, Kawin Ethayarajh

Humanline: Online Alignment as Perceptual Loss