Theoretical Limits of Language Model Alignment
arXiv:2605.07105v1 Announce Type: cross
Abstract: Language model (LM) alignment improves model outputs to reflect human preferences while preserving the capabilities of the base model. The most common alignment approaches are (i) reinforcement learnin…