Author name: Yunsheng Lu, Zijiang Yang, Licheng Pan, Zhixuan Chu

Robust Reward Modeling for Large Language Models via Causal Decomposition

Yunsheng Lu, Zijiang Yang, Licheng Pan, Zhixuan Chu / April 17, 2026

arXiv:2604.13833v2 Announce Type: replace
Abstract: Reward models are central to aligning large language models, yet they often overfit to spurious cues such as response length and overly agreeable tone. Most prior work weakens these cues directly by …

cs.CL

Robust Reward Modeling for Large Language Models via Causal Decomposition

Yunsheng Lu, Zijiang Yang, Licheng Pan, Zhixuan Chu / April 16, 2026

arXiv:2604.13833v1 Announce Type: new
Abstract: Reward models are central to aligning large language models, yet they often overfit to spurious cues such as response length and overly agreeable tone. Most prior work weakens these cues directly by pena…