cs.CL, cs.LG

Uncovering Cross-Objective Interference in Multi-Objective Alignment

arXiv:2602.06869v2 Announce Type: replace
Abstract: We study a persistent failure mode in multi-objective alignment for large language models (LLMs): training improves performance on only a subset of objectives while causing others to degrade. We form…