Steve Coyne - Provide.ai

Three Models of RLHF Annotation: Extension, Evidence, and Authority

Steve Coyne / April 29, 2026

arXiv:2604.25895v1 Announce Type: cross
Abstract: Preference-based alignment methods, most prominently Reinforcement Learning with Human Feedback (RLHF), use the judgments of human annotators to shape large language model behaviour. However, the norma…

Author name: Steve Coyne

Three Models of RLHF Annotation: Extension, Evidence, and Authority